Computer NetworksCommon Attacks & Defenses

Network Reconnaissance

LevelAdvanced

Duration75 mins

TopicCommon Attacks & Defenses

4 / 5

Information Gathering

Intelligence Beyond Technical Scanning

Port scanning reveals services. Vulnerability scanning identifies weaknesses. But comprehensive reconnaissance extends far beyond technical probing—it encompasses all methods of gathering intelligence about a target.

Information gathering (often called OSINT—Open Source Intelligence) collects data from publicly available sources, social engineering, and passive observation. A skilled attacker knows that the most valuable information often doesn't come from port scans—it comes from job postings mentioning technology stacks, employee LinkedIn profiles revealing internal project names, GitHub repositories with accidentally committed credentials, or forgotten subdomains hosting development systems.

This page explores the full spectrum of information gathering: passive vs. active techniques, data sources, social engineering components, and how defenders can reduce their information exposure.

Learning Objectives

By mastering this page, you will: (1) Understand passive vs. active information gathering, (2) Leverage OSINT sources for technical and organizational intelligence, (3) Recognize information leakage in common business operations, (4) Apply social engineering concepts ethically, and (5) Implement controls to reduce organizational information exposure.

Passive vs. Active Information Gathering

Information gathering techniques fall into two fundamental categories: passive (no direct target interaction) and active (direct communication with target). Understanding this distinction is crucial—it affects detectability, legality, and the type of information obtained.

Passive gathering:

The reconnaissance subject (target) is completely unaware. Information comes from:

Public websites and databases
Search engines and archives
Third-party sources that aggregated the data
Previously leaked or published information

Active gathering:

Direct interaction creates evidence of reconnaissance:

Browsing target's website (they see your IP in logs)
Sending emails (they receive your messages)
Social engineering calls (they hear your voice)
Port scanning (they see your probes)

Passive vs. Active Reconnaissance Comparison
Aspect	Passive	Active
Detection Risk	Nearly zero	Moderate to high
Legal Concerns	Generally safe	May require authorization
Information Depth	Surface level	Deeper, verified
Data Freshness	May be outdated	Current
Examples	Shodan, Google, Archive.org	Nmap, direct contact, social engineering
Time Required	Extensive (many sources)	Faster (direct answers)

The reconnaissance spectrum:

[Fully Passive] ◄─────────────────────────────────► [Fully Active]
      │                                                    │
   Shodan    Google    Website    Email     Social    Port
   lookups   dorking   browsing   contact   eng.      scan

Practical implications:

Attackers start passive to avoid detection, move active when necessary
Penetration testers document which phase allows which activities
Defenders monitor for active reconnaissance, can't prevent passive
Threat intelligence assesses if organization is being targeted (active observed)

The Gray Zone

Some activities are ambiguous. Viewing target's website seems passive, but your IP appears in their logs—technically leaving a trace. Using third-party services that probe on your behalf (e.g., subdomain enumeration services) is often considered passive since you don't interact directly.

Open Source Intelligence (OSINT) Sources

OSINT encompasses all intelligence gathered from publicly available information. The sheer volume of useful data available without any hacking is remarkable—and often underestimated.

Categories of OSINT sources:

•WHOIS records: Domain ownership, admin contacts, registration dates
•DNS records: Subdomains, mail servers, nameservers, TXT records
•Certificate Transparency logs: All SSL certificates issued
•Shodan/Censys: Internet-wide scans showing exposed services
•BGP routing data: IP ranges, ASN ownership, peering relationships
•Web Archives (Wayback Machine): Historical website versions
•Passive DNS: Historical DNS resolutions
•Security research: Published CVE reports, bug bounty findings

technical_osint_examples.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# WHOIS lookup
whois example.com
 
# Certificate Transparency search
curl "https://crt.sh/?q=%.example.com&output=json" | jq
 
# Shodan search (requires API key)
shodan search "hostname:example.com"
 
# Historical DNS
curl "https://api.securitytrails.com/v1/history/example.com/dns/a"
 
# Wayback Machine API
curl "https://web.archive.org/cdx/search/cdx?url=example.com&output=json"

Google Dorking

Google dorking (or Google hacking) uses advanced search operators to find information that organizations inadvertently exposed to search engines.

Core operators:

Essential Google Dork Operators
Operator	Purpose	Example
site:	Limit to specific domain	site:example.com
inurl:	Terms must appear in URL	inurl:admin
intitle:	Terms in page title	intitle:"index of"
filetype:	Specific file extensions	filetype:pdf
intext:	Terms in page body	intext:password
cache:	Google's cached version	cache:example.com
"exact phrase"	Exact string match	"database error"
-term	Exclude results	site:example.com -www

google_dorks.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Find subdomains indexed by Google
site:example.com -www
 
# Find exposed configuration files
site:example.com filetype:conf OR filetype:config OR filetype:cfg
 
# Find directory listings
site:example.com intitle:"Index of"
 
# Find exposed login pages
site:example.com inurl:login OR inurl:admin OR inurl:signin
 
# Find error messages revealing info
site:example.com "database error" OR "mysql error" OR "syntax error"
 
# Find exposed documents
site:example.com filetype:pdf OR filetype:doc OR filetype:xls
 
# Find exposed backup files
site:example.com filetype:bak OR filetype:sql OR filetype:old
 
# Find potentially sensitive pages
site:example.com inurl:password OR inurl:secret OR inurl:credentials
 
# Find exposed cameras/IoT
inurl:"/view/view.shtml" OR inurl:"viewerframe?mode="
 
# Find exposed S3 buckets
site:s3.amazonaws.com "example"

The Google Hacking Database (GHDB):

Exploit-DB maintains a database of effective dorks categorized by purpose:

Sensitive directories
Error messages
Sensitive online shopping info
Vulnerable servers
Web server detection
Files containing passwords
Login portals
Network or vulnerability data

URL: https://www.exploit-db.com/google-hacking-database

Automation:

Tools like Googler (CLI), DorkSearch, and Shodan automate dork queries at scale:

Rate Limiting and Detection

Google limits automated searches and may flag your IP for unusual query patterns. Use delays, rotate queries, and consider using API access where available. Excessive dorking can resemble attack traffic to the sites you're researching.

Domain and Infrastructure Intelligence

Deep investigation of domain and infrastructure relationships reveals organizational scope, hosting decisions, and potential attack vectors beyond the obvious main website.

Domain intelligence sources:

Domain and Infrastructure OSINT

•WHOIS historical data: Track ownership changes, identify related domains
•Reverse WHOIS: Find all domains registered by same entity/email
•Subdomain enumeration: Certificate transparency, DNS brute force, passive sources
•Reverse IP lookup: Find what else is hosted on same IP/range
•ASN enumeration: Identify all IP ranges owned by organization
•Cloud asset discovery: Find S3 buckets, Azure blobs, GCP storage by naming patterns
•Favicon hashing: Identify servers running same software by favicon fingerprint

domain_intelligence.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Reverse WHOIS (find all domains by registrant)
# ViewDNS.info, DomainTools (commercial)
 
# ASN lookup
whois AS12345
# Shows all IP prefixes announced by organization
 
# Find subdomains via multiple sources
amass enum -d example.com -passive
 
# Certificate transparency subdomain search  
subfinder -d example.com -all -silent
 
# Reverse IP lookup
curl "https://api.hackertarget.com/reverseiplookup/?q=192.168.1.1"
 
# Cloud bucket discovery
# Common patterns: {company}, {company}-dev, {company}-backup
aws s3 ls s3://example-company/ --no-sign-request
 
# Shodan host lookup
shodan host 192.168.1.1

Attack surface expansion:

Starting with one domain, good OSINT often reveals:

example.com
 ├── www.example.com (main website)
 ├── mail.example.com (email infrastructure)
 ├── vpn.example.com (🎯 VPN endpoint)
 ├── dev.example.com (🎯 development server)
 ├── staging.example.com (🎯 possibly weaker security)
 ├── api.example.com (🎯 API surface)
 ├── jenkins.example.com (🎯 CI/CD system)
 └── related discoveries:
     ├── exampleinc.com (alternate domain)
     ├── acq-target.com (recent acquisition)
     └── example-internal.slack.com (cloud services)

Each discovery becomes a potential attack vector. Development and staging environments often have weaker controls than production.

The Acquisition Factor

Acquired companies are often security weak points. They may have been integrated operationally but not security-wise—retaining old infrastructure, different security standards, or forgotten systems. Always research parent/subsidiary relationships.

Social Engineering Intelligence

The most sophisticated technical defenses can be bypassed by exploiting human psychology. Social engineering intelligence gathers information that enables manipulation of people rather than systems.

What social engineers look for:

Social Engineering Intelligence Categories
Category	Examples	Attack Use
Organizational Structure	Who reports to whom, department names	Impersonation, authority exploitation
Employee Details	Names, roles, email formats, phone numbers	Spear phishing, pretexting
Operational Patterns	Work hours, office locations, remote work	Timing attacks, in-person access
Technology Context	Help desk processes, vendor names	Tech support impersonation
Personal Interests	Hobbies, social groups, family info	Building rapport, pretext creation
Current Events	Recent projects, reorganizations, outages	Timely pretexts

Building employee lists:

Comprehensive employee enumeration enables:

Password spraying (try common passwords against all users)
Spear phishing campaigns
Credential stuffing (using breach passwords)
Physical access attempts

Methods:

# LinkedIn - Company page followers/employees
# Tool: linkedin2username, CrossLinked

# GitHub - Contributors to company repositories
git log --format='%aN <%aE>' | sort -u

# Email harvesting from web
theHarvester -d example.com -b all

# Hunter.io - Email format discovery
curl "https://api.hunter.io/v2/domain-search?domain=example.com"

# Google dork for email addresses
site:example.com "@example.com"

Ethical Boundaries

Gathering information about individuals raises significant ethical and legal concerns. In legitimate assessments, scope explicitly defines what's permitted. Never stalk, harass, or gather information on individuals without clear authorization. Privacy laws (GDPR, CCPA, etc.) may restrict certain data collection.

Pretexting development:

With gathered intelligence, attackers craft believable pretexts:

Gathered Intel	Resulting Pretext
Uses Office 365 + IT director name	"Hi, John from IT, there's an Office 365 issue affecting accounts"
Recent acquisition of Company B	"I'm from Company B, trying to set up my new access"
Uses ServiceNow for tickets	"This is ServiceNow support, we detected suspicious activity"
CEO travels frequently	"The CEO needs this wire transfer approved while traveling"

Technical Metadata Extraction

Files published by organizations often contain metadata—hidden information about their creation and the systems that created them. This metadata can reveal usernames, software versions, internal paths, and more.

Common metadata sources:

File Types and Their Metadata

•Microsoft Office documents: Author, company, software version, internal paths
•PDF files: Creator software, author, embedded fonts, creation date
•Images (JPEG/PNG): Camera model, GPS coordinates, software, timestamps
•Web pages: Generator tags, framework versions, comments
•Compiled software: Build paths, compiler versions, developer names

metadata_extraction.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# ExifTool - comprehensive metadata extraction
exiftool document.pdf
# Output may include:
# Creator: John.Smith
# Operating System: Windows 10
# PDF Producer: Microsoft Word 2019
# Create Date: 2024:01:15 14:30:22
 
# Image metadata (may reveal location)
exiftool photo.jpg
# GPS Position: 37.7749° N, 122.4194° W
# Camera Make: Apple
# Camera Model: iPhone 15 Pro
 
# FOCA - automated document metadata analysis (Windows)
# Analyzes enterprise document collections
 
# Web page metadata
curl -s https://example.com | grep -E "generator|author|framework"
 
# Metagoofil - automated document download and analysis
metagoofil -d example.com -t pdf,doc,xls -l 100 -o output/

What metadata reveals:

Metadata Type	Example	Intelligence Value
Author/Creator	jsmith, john.smith@corp.local	Username format, domain name
Software Version	Microsoft Word 2016	Indicates patch levels, potential vulns
Internal Paths	C:\Users\jsmith\Documents\Confidential\	Directory structure, user naming
Network Paths	\\fileserver\share\HR\	Internal server names, SMB shares
GPS Coordinates	37.7749, -122.4194	Physical locations, executive travel
Printer Names	HP-LaserJet-Floor3-Legal	Office layout, department locations

Bulk Document Analysis

Download all PDFs, Word docs, and Excel files from target's website, then batch-analyze metadata. This often reveals: consistent username patterns, internal server names, software versions, and sometimes credentials embedded in file properties.

Code Repository Mining

Public code repositories (GitHub, GitLab, Bitbucket) are treasure troves for reconnaissance. Developers accidentally commit secrets, configuration files reveal infrastructure details, and code history shows evolution of systems.

What to search for:

Code Repository Intelligence Targets
Target	Search Terms	Value
API keys/secrets	api_key, apikey, secret, token	Direct access to services
Passwords	password=, passwd, pwd	Potential valid credentials
AWS credentials	AKIA, aws_secret_access_key	Cloud account access
Database credentials	mysql://, postgres://, mongodb://	Database access
Internal URLs	internal, staging, dev, corp	Internal infrastructure
SSH keys	BEGIN RSA PRIVATE KEY	Server access
Configuration files	.env, config.yml, settings.json	Infrastructure details

github_dorking.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# GitHub search operators
org:example-company password
org:example-company filename:.env
org:example-company extension:pem
org:example-company "api_key ="
org:example-company "AKIA"  # AWS access key prefix
 
# Automated secret scanning tools
# TruffleHog - scans git history for secrets
trufflehog git https://github.com/example/repo.git
 
# GitLeaks - SAST tool for secrets
gitleaks detect --source=/path/to/repo
 
# Gitrob - organizational scanning (deprecated but pattern useful)
# Modern alternatives: GitHound, shhgit
 
# Search commit history
git log -p | grep -E "(password|secret|api_key|token)"
 
# Find deleted (but still in history) secrets
git log --diff-filter=D --summary | grep -E "\.(env|pem|key)"

Git history awareness:

Even if secrets are removed from current code, git history preserves them forever unless explicitly purged:

# Show file at any historical commit
git show <commit-hash>:path/to/file.env

# Find when a file was deleted
git log --all --full-history -- "**/secret.txt"

# Check if sensitive file ever existed
git log --all --oneline -- "**/.env*"

Common findings:

AWS keys committed during development
Database credentials in configs
Internal IP addresses and hostnames
API tokens for third-party services
SSH private keys (massive security breach)
Development certificates (may be valid in prod)

Secrets Require Rotation, Not Just Removal

Finding a secret in git history means it must be considered compromised. Removing it from the current version doesn't help if attackers already cloned the repo. All discovered secrets must be rotated immediately.

Reducing Information Exposure

From a defensive perspective, minimizing information exposure reduces reconnaissance effectiveness. You can't prevent all OSINT collection, but you can reduce what attackers learn.

Strategic information reduction:

OSINT Defense Strategies

•Domain privacy: Use WHOIS privacy for registrations where appropriate
•Email address protection: Use role addresses (security@) vs. personal names
•Job posting review: Audit postings for excessive technology disclosure
•Social media policies: Guidelines for what employees post about work
•Document metadata scrubbing: Strip metadata before publishing
•Repository scanning: Automated pre-commit hooks for secret detection
•Certificate management: Avoid revealing internal hostnames in certs
•Error message sanitization: Generic errors, not implementation details

Proactive monitoring:

Monitor for your own exposure:

# Google alerts for company name + sensitive terms
# "Example Corp" password OR leaked OR breach

# Monitor Shodan for your IP ranges
shodan alert create "My Company" net:192.168.0.0/16

# GitHub code search for your domain
# org:* @example.com password

# Have I Been Pwned domain search
# Monitor for corporate emails in breach databases

# Certificate transparency monitoring
# Get alerted when new certs issued for your domain

Regular OSINT assessments:

Conduct periodic OSINT against your own organization:

What does Google know about us?
What's on Shodan for our IP ranges?
Are there secrets in our public repos?
What do job postings reveal?
What's in the metadata of our published documents?

Know What Attackers Know

The only way to understand your OSINT exposure is to conduct reconnaissance against yourself. Regular self-assessment reveals what attackers will find—and gives you the opportunity to reduce exposure before it's exploited.

Summary: Information Gathering Mastery

Information gathering extends reconnaissance far beyond technical scanning—encompassing all intelligence sources that build target understanding. Let's consolidate the key concepts:

Key Takeaways

•Passive vs. active matters — Passive collection is undetectable; active creates evidence
•OSINT sources are vast — Technical, organizational, social, and breach data all contribute
•Google dorking is powerful — Advanced operators reveal accidentally exposed information
•Infrastructure expands attack surface — One domain leads to many assets through DNS, CT, ASN analysis
•Social engineering requires intelligence — Effective pretexts are built on gathered information
•Metadata leaks secrets — Documents, images, and files reveal infrastructure and users
•Code repositories are high-value targets — Secrets in git history persist even after deletion
•Defense requires OSINT awareness — Know what you expose; monitor and reduce it

What's next:

With comprehensive reconnaissance coverage complete—port scanning, network mapping, vulnerability scanning, and information gathering—we'll conclude with Detection and Prevention. This final page covers how defenders detect reconnaissance activities and implement controls to prevent or limit intelligence gathering.

Page Complete

You now understand the full scope of information gathering—from Google dorking to social engineering intelligence. You can conduct comprehensive OSINT and implement defensive measures to reduce your organization's exposure. Next, we'll examine detection and prevention strategies for the complete reconnaissance lifecycle.

4 / 5

Loading learning content...

Computer NetworksCommon Attacks & Defenses

Network Reconnaissance

LevelAdvanced

Duration75 mins

TopicCommon Attacks & Defenses

4 / 5

Information Gathering

Intelligence Beyond Technical Scanning

This page explores the full spectrum of information gathering: passive vs. active techniques, data sources, social engineering components, and how defenders can reduce their information exposure.

Learning Objectives

Passive vs. Active Information Gathering

Passive gathering:

The reconnaissance subject (target) is completely unaware. Information comes from:

Public websites and databases
Search engines and archives
Third-party sources that aggregated the data
Previously leaked or published information

Active gathering:

Direct interaction creates evidence of reconnaissance:

Browsing target's website (they see your IP in logs)
Sending emails (they receive your messages)
Social engineering calls (they hear your voice)
Port scanning (they see your probes)

Passive vs. Active Reconnaissance Comparison
Aspect	Passive	Active
Detection Risk	Nearly zero	Moderate to high
Legal Concerns	Generally safe	May require authorization
Information Depth	Surface level	Deeper, verified
Data Freshness	May be outdated	Current
Examples	Shodan, Google, Archive.org	Nmap, direct contact, social engineering
Time Required	Extensive (many sources)	Faster (direct answers)

The reconnaissance spectrum:

[Fully Passive] ◄─────────────────────────────────► [Fully Active]
      │                                                    │
   Shodan    Google    Website    Email     Social    Port
   lookups   dorking   browsing   contact   eng.      scan

Practical implications:

Attackers start passive to avoid detection, move active when necessary
Penetration testers document which phase allows which activities
Defenders monitor for active reconnaissance, can't prevent passive
Threat intelligence assesses if organization is being targeted (active observed)

The Gray Zone

Open Source Intelligence (OSINT) Sources

OSINT encompasses all intelligence gathered from publicly available information. The sheer volume of useful data available without any hacking is remarkable—and often underestimated.

Categories of OSINT sources:

•WHOIS records: Domain ownership, admin contacts, registration dates
•DNS records: Subdomains, mail servers, nameservers, TXT records
•Certificate Transparency logs: All SSL certificates issued
•Shodan/Censys: Internet-wide scans showing exposed services
•BGP routing data: IP ranges, ASN ownership, peering relationships
•Web Archives (Wayback Machine): Historical website versions
•Passive DNS: Historical DNS resolutions
•Security research: Published CVE reports, bug bounty findings

technical_osint_examples.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# WHOIS lookup
whois example.com
 
# Certificate Transparency search
curl "https://crt.sh/?q=%.example.com&output=json" | jq
 
# Shodan search (requires API key)
shodan search "hostname:example.com"
 
# Historical DNS
curl "https://api.securitytrails.com/v1/history/example.com/dns/a"
 
# Wayback Machine API
curl "https://web.archive.org/cdx/search/cdx?url=example.com&output=json"

Google Dorking

Google dorking (or Google hacking) uses advanced search operators to find information that organizations inadvertently exposed to search engines.

Core operators:

Essential Google Dork Operators
Operator	Purpose	Example
site:	Limit to specific domain	site:example.com
inurl:	Terms must appear in URL	inurl:admin
intitle:	Terms in page title	intitle:"index of"
filetype:	Specific file extensions	filetype:pdf
intext:	Terms in page body	intext:password
cache:	Google's cached version	cache:example.com
"exact phrase"	Exact string match	"database error"
-term	Exclude results	site:example.com -www

google_dorks.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Find subdomains indexed by Google
site:example.com -www
 
# Find exposed configuration files
site:example.com filetype:conf OR filetype:config OR filetype:cfg
 
# Find directory listings
site:example.com intitle:"Index of"
 
# Find exposed login pages
site:example.com inurl:login OR inurl:admin OR inurl:signin
 
# Find error messages revealing info
site:example.com "database error" OR "mysql error" OR "syntax error"
 
# Find exposed documents
site:example.com filetype:pdf OR filetype:doc OR filetype:xls
 
# Find exposed backup files
site:example.com filetype:bak OR filetype:sql OR filetype:old
 
# Find potentially sensitive pages
site:example.com inurl:password OR inurl:secret OR inurl:credentials
 
# Find exposed cameras/IoT
inurl:"/view/view.shtml" OR inurl:"viewerframe?mode="
 
# Find exposed S3 buckets
site:s3.amazonaws.com "example"

The Google Hacking Database (GHDB):

Exploit-DB maintains a database of effective dorks categorized by purpose:

Sensitive directories
Error messages
Sensitive online shopping info
Vulnerable servers
Web server detection
Files containing passwords
Login portals
Network or vulnerability data

URL: https://www.exploit-db.com/google-hacking-database

Automation:

Tools like Googler (CLI), DorkSearch, and Shodan automate dork queries at scale:

Rate Limiting and Detection

Domain and Infrastructure Intelligence

Deep investigation of domain and infrastructure relationships reveals organizational scope, hosting decisions, and potential attack vectors beyond the obvious main website.

Domain intelligence sources:

Domain and Infrastructure OSINT

•WHOIS historical data: Track ownership changes, identify related domains
•Reverse WHOIS: Find all domains registered by same entity/email
•Subdomain enumeration: Certificate transparency, DNS brute force, passive sources
•Reverse IP lookup: Find what else is hosted on same IP/range
•ASN enumeration: Identify all IP ranges owned by organization
•Cloud asset discovery: Find S3 buckets, Azure blobs, GCP storage by naming patterns
•Favicon hashing: Identify servers running same software by favicon fingerprint

domain_intelligence.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Reverse WHOIS (find all domains by registrant)
# ViewDNS.info, DomainTools (commercial)
 
# ASN lookup
whois AS12345
# Shows all IP prefixes announced by organization
 
# Find subdomains via multiple sources
amass enum -d example.com -passive
 
# Certificate transparency subdomain search  
subfinder -d example.com -all -silent
 
# Reverse IP lookup
curl "https://api.hackertarget.com/reverseiplookup/?q=192.168.1.1"
 
# Cloud bucket discovery
# Common patterns: {company}, {company}-dev, {company}-backup
aws s3 ls s3://example-company/ --no-sign-request
 
# Shodan host lookup
shodan host 192.168.1.1

Attack surface expansion:

Starting with one domain, good OSINT often reveals:

example.com
 ├── www.example.com (main website)
 ├── mail.example.com (email infrastructure)
 ├── vpn.example.com (🎯 VPN endpoint)
 ├── dev.example.com (🎯 development server)
 ├── staging.example.com (🎯 possibly weaker security)
 ├── api.example.com (🎯 API surface)
 ├── jenkins.example.com (🎯 CI/CD system)
 └── related discoveries:
     ├── exampleinc.com (alternate domain)
     ├── acq-target.com (recent acquisition)
     └── example-internal.slack.com (cloud services)

Each discovery becomes a potential attack vector. Development and staging environments often have weaker controls than production.

The Acquisition Factor

Social Engineering Intelligence

What social engineers look for:

Social Engineering Intelligence Categories
Category	Examples	Attack Use
Organizational Structure	Who reports to whom, department names	Impersonation, authority exploitation
Employee Details	Names, roles, email formats, phone numbers	Spear phishing, pretexting
Operational Patterns	Work hours, office locations, remote work	Timing attacks, in-person access
Technology Context	Help desk processes, vendor names	Tech support impersonation
Personal Interests	Hobbies, social groups, family info	Building rapport, pretext creation
Current Events	Recent projects, reorganizations, outages	Timely pretexts

Building employee lists:

Comprehensive employee enumeration enables:

Password spraying (try common passwords against all users)
Spear phishing campaigns
Credential stuffing (using breach passwords)
Physical access attempts

Methods:

# LinkedIn - Company page followers/employees
# Tool: linkedin2username, CrossLinked

# GitHub - Contributors to company repositories
git log --format='%aN <%aE>' | sort -u

# Email harvesting from web
theHarvester -d example.com -b all

# Hunter.io - Email format discovery
curl "https://api.hunter.io/v2/domain-search?domain=example.com"

# Google dork for email addresses
site:example.com "@example.com"

Ethical Boundaries

Pretexting development:

With gathered intelligence, attackers craft believable pretexts:

Gathered Intel	Resulting Pretext
Uses Office 365 + IT director name	"Hi, John from IT, there's an Office 365 issue affecting accounts"
Recent acquisition of Company B	"I'm from Company B, trying to set up my new access"
Uses ServiceNow for tickets	"This is ServiceNow support, we detected suspicious activity"
CEO travels frequently	"The CEO needs this wire transfer approved while traveling"

Technical Metadata Extraction

Common metadata sources:

File Types and Their Metadata

•Microsoft Office documents: Author, company, software version, internal paths
•PDF files: Creator software, author, embedded fonts, creation date
•Images (JPEG/PNG): Camera model, GPS coordinates, software, timestamps
•Web pages: Generator tags, framework versions, comments
•Compiled software: Build paths, compiler versions, developer names

metadata_extraction.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# ExifTool - comprehensive metadata extraction
exiftool document.pdf
# Output may include:
# Creator: John.Smith
# Operating System: Windows 10
# PDF Producer: Microsoft Word 2019
# Create Date: 2024:01:15 14:30:22
 
# Image metadata (may reveal location)
exiftool photo.jpg
# GPS Position: 37.7749° N, 122.4194° W
# Camera Make: Apple
# Camera Model: iPhone 15 Pro
 
# FOCA - automated document metadata analysis (Windows)
# Analyzes enterprise document collections
 
# Web page metadata
curl -s https://example.com | grep -E "generator|author|framework"
 
# Metagoofil - automated document download and analysis
metagoofil -d example.com -t pdf,doc,xls -l 100 -o output/

What metadata reveals:

Metadata Type	Example	Intelligence Value
Author/Creator	jsmith, john.smith@corp.local	Username format, domain name
Software Version	Microsoft Word 2016	Indicates patch levels, potential vulns
Internal Paths	C:\Users\jsmith\Documents\Confidential\	Directory structure, user naming
Network Paths	\\fileserver\share\HR\	Internal server names, SMB shares
GPS Coordinates	37.7749, -122.4194	Physical locations, executive travel
Printer Names	HP-LaserJet-Floor3-Legal	Office layout, department locations

Bulk Document Analysis

Code Repository Mining

What to search for:

Code Repository Intelligence Targets
Target	Search Terms	Value
API keys/secrets	api_key, apikey, secret, token	Direct access to services
Passwords	password=, passwd, pwd	Potential valid credentials
AWS credentials	AKIA, aws_secret_access_key	Cloud account access
Database credentials	mysql://, postgres://, mongodb://	Database access
Internal URLs	internal, staging, dev, corp	Internal infrastructure
SSH keys	BEGIN RSA PRIVATE KEY	Server access
Configuration files	.env, config.yml, settings.json	Infrastructure details

github_dorking.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# GitHub search operators
org:example-company password
org:example-company filename:.env
org:example-company extension:pem
org:example-company "api_key ="
org:example-company "AKIA"  # AWS access key prefix
 
# Automated secret scanning tools
# TruffleHog - scans git history for secrets
trufflehog git https://github.com/example/repo.git
 
# GitLeaks - SAST tool for secrets
gitleaks detect --source=/path/to/repo
 
# Gitrob - organizational scanning (deprecated but pattern useful)
# Modern alternatives: GitHound, shhgit
 
# Search commit history
git log -p | grep -E "(password|secret|api_key|token)"
 
# Find deleted (but still in history) secrets
git log --diff-filter=D --summary | grep -E "\.(env|pem|key)"

Git history awareness:

Even if secrets are removed from current code, git history preserves them forever unless explicitly purged:

# Show file at any historical commit
git show <commit-hash>:path/to/file.env

# Find when a file was deleted
git log --all --full-history -- "**/secret.txt"

# Check if sensitive file ever existed
git log --all --oneline -- "**/.env*"

Common findings:

AWS keys committed during development
Database credentials in configs
Internal IP addresses and hostnames
API tokens for third-party services
SSH private keys (massive security breach)
Development certificates (may be valid in prod)

Secrets Require Rotation, Not Just Removal

Reducing Information Exposure

From a defensive perspective, minimizing information exposure reduces reconnaissance effectiveness. You can't prevent all OSINT collection, but you can reduce what attackers learn.

Strategic information reduction:

OSINT Defense Strategies

•Domain privacy: Use WHOIS privacy for registrations where appropriate
•Email address protection: Use role addresses (security@) vs. personal names
•Job posting review: Audit postings for excessive technology disclosure
•Social media policies: Guidelines for what employees post about work
•Document metadata scrubbing: Strip metadata before publishing
•Repository scanning: Automated pre-commit hooks for secret detection
•Certificate management: Avoid revealing internal hostnames in certs
•Error message sanitization: Generic errors, not implementation details

Proactive monitoring:

Monitor for your own exposure:

# Google alerts for company name + sensitive terms
# "Example Corp" password OR leaked OR breach

# Monitor Shodan for your IP ranges
shodan alert create "My Company" net:192.168.0.0/16

# GitHub code search for your domain
# org:* @example.com password

# Have I Been Pwned domain search
# Monitor for corporate emails in breach databases

# Certificate transparency monitoring
# Get alerted when new certs issued for your domain

Regular OSINT assessments:

Conduct periodic OSINT against your own organization:

What does Google know about us?
What's on Shodan for our IP ranges?
Are there secrets in our public repos?
What do job postings reveal?
What's in the metadata of our published documents?

Know What Attackers Know

Summary: Information Gathering Mastery

Information gathering extends reconnaissance far beyond technical scanning—encompassing all intelligence sources that build target understanding. Let's consolidate the key concepts:

Key Takeaways

•Passive vs. active matters — Passive collection is undetectable; active creates evidence
•OSINT sources are vast — Technical, organizational, social, and breach data all contribute
•Google dorking is powerful — Advanced operators reveal accidentally exposed information
•Infrastructure expands attack surface — One domain leads to many assets through DNS, CT, ASN analysis
•Social engineering requires intelligence — Effective pretexts are built on gathered information
•Metadata leaks secrets — Documents, images, and files reveal infrastructure and users
•Code repositories are high-value targets — Secrets in git history persist even after deletion
•Defense requires OSINT awareness — Know what you expose; monitor and reduce it

What's next:

Page Complete

4 / 5