Computer NetworksDefense Strategies

Network Defense Strategies

LevelAdvanced

Duration120 mins

TopicDefense Strategies

3 / 5

Monitoring and Logging: Security Visibility and Detection

The Eyes and Ears of Security

In the famous words of security researcher Richard Bejtlich: "Prevention eventually fails." This isn't pessimism—it's operational reality. No matter how robust your defenses, sophisticated attackers, insider threats, and unforeseen vulnerabilities will eventually allow malicious activity to occur. When that happens, the difference between a minor incident and a catastrophic breach often comes down to one factor: detection time.

The 2024 IBM Cost of a Data Breach Report found that breaches identified within 200 days cost an average of $3.93 million, while those taking longer cost $4.95 million—a 26% increase. More critically, many organizations discover breaches only when external parties notify them, sometimes months or years after compromise.

Monitoring and logging are the foundation of security detection. They provide the visibility necessary to:

Detect ongoing attacks before damage is complete
Identify compromised systems and accounts
Understand the scope and impact of incidents
Provide forensic evidence for investigation
Demonstrate compliance with regulatory requirements
Enable continuous improvement through trend analysis

This page explores logging architectures, monitoring technologies, detection methodologies, and the operational practices that transform raw data into actionable security intelligence.

What You Will Learn

By the end of this page, you will understand log architecture design, SIEM implementation and optimization, threat detection approaches (signature-based, behavioral, and machine learning), security metrics and dashboards, and the operational processes that make monitoring effective. You'll be equipped to design and evaluate monitoring programs that provide genuine security visibility.

Log Architecture Fundamentals

Effective security monitoring begins with comprehensive, reliable logging. Log architecture addresses what to log, how to collect logs, where to store them, and how to ensure their integrity.

What to Log

Security-relevant events span multiple domains:

Authentication Events:

Successful and failed login attempts
Password changes and resets
MFA enrollment and usage
Session creation and termination
Privilege escalation (sudo, runas)

Authorization Events:

Access granted and denied
Permission changes
Group membership modifications
Role assignments

System Events:

Service starts and stops
Configuration changes
Patch installations
System errors and crashes
Resource exhaustion warnings

Network Events:

Firewall accepts and denies
VPN connections
DNS queries and responses
HTTP/S transactions
Email sending and receiving

Application Events:

Application-specific authentication
Business transactions
Data access and modification
Error conditions
API calls (especially to sensitive endpoints)

Physical Security Events:

Badge access attempts
Camera motion detection
Door alarms
Environmental alerts

Essential Log Fields for Security Analysis
Field	Purpose	Example
Timestamp	When the event occurred (UTC)	2025-01-17T14:23:47.123Z
Source	System generating the log	webserver-prod-01.example.com
Event Type	Category of event	AUTH_FAILURE
Severity	Importance level	WARNING
Actor/User	Who initiated the action	jsmith@example.com
Target/Object	What was affected	/etc/passwd
Action	What operation was attempted	READ
Outcome	Success or failure	DENIED
Source IP	Originating network address	192.168.1.100
Destination IP	Target network address	10.0.0.50
Session ID	Link events to sessions	a1b2c3d4-e5f6-7890
Additional Context	Event-specific details	{attempts: 5, lockout: true}

Log Collection Architectures

Push vs. Pull Collection:

Push (Agent-based):

Agents on each system forward logs to central collector
Real-time or near-real-time transmission
Agents can buffer during network outages
Adds software to manage on each endpoint

Pull (Agentless):

Central collector queries systems for logs
Simpler infrastructure but may miss events between polls
Network-accessible log sources required
Suitable for network devices, legacy systems

Collection Technologies:

Syslog — Traditional UNIX logging, now standardized (RFC 5424). UDP (unreliable) or TCP (reliable). Widely supported.
Windows Event Forwarding (WEF) — Native Windows log collection using WinRM/Kerberos. Scales to enterprise level.
Beats/Fluentd/Vector — Modern log shippers with parsing, filtering, and multiple output support.
Cloud-native — Cloud providers offer native log routing (CloudWatch, Stackdriver, Azure Monitor).

Log Storage Considerations

Retention Requirements:

Regulatory requirements (often 1-7 years)
Forensic needs (investigate incidents months later)
Storage costs (logs can be massive)
Tiered storage (hot/warm/cold) for cost optimization

Integrity Protection:

Logs are critical evidence; attackers often try to modify them
Write-once storage (WORM) for compliance
Cryptographic hash chains for tamper detection
Separate storage from monitored systems

log-collection-architecture
Syslog Configuration (rsyslog)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# /etc/rsyslog.conf - Enterprise syslog forwarding configuration
 
# =========================================
# Module Loading
# =========================================
module(load="imuxsock")    # Local system logging
module(load="imjournal")   # systemd journal
module(load="imfile")      # File-based input for app logs
 
# =========================================
# Global Settings
# =========================================
global(
    # Use RFC 5424 format for consistent parsing
    parser.permitSlashDefaultsForTemplate="on"
    # Queue settings for reliability
    workDirectory="/var/lib/rsyslog"
)
 
# =========================================
# Log Format Template - Structured Logging
# =========================================
template(name="JSONFormat" type="list") {
    constant(value="{")
    constant(value=""timestamp":"")
    property(name="timereported" dateFormat="rfc3339")
    constant(value="","hostname":"")
    property(name="hostname")
    constant(value="","severity":"")
    property(name="syslogseverity-text")
    constant(value="","facility":"")
    property(name="syslogfacility-text")
    constant(value="","program":"")
    property(name="programname")
    constant(value="","message":"")
    property(name="msg" format="json")
    constant(value=""}")
    constant(value="
")
}
 
# =========================================
# Forwarding Configuration with TLS
# =========================================
# Primary SIEM
action(
    type="omfwd"
    target="siem.example.com"
    port="6514"
    protocol="tcp"
    # TLS encryption for log confidentiality
    StreamDriver="gtls"
    StreamDriverMode="1"
    StreamDriverAuthMode="x509/certvalid"
    StreamDriverPermittedPeers="*.example.com"
    # Reliable delivery with disk queue
    queue.type="LinkedList"
    queue.size="10000"
    queue.filename="siem_queue"
    queue.saveOnShutdown="on"
    queue.maxDiskSpace="1g"
    action.resumeRetryCount="-1"
    template="JSONFormat"
)
 
# =========================================
# Application-Specific Log Collection
# =========================================
# Monitor application logs in JSON format
input(type="imfile"
    File="/var/log/app/*.json"
    Tag="app-log:"
    Facility="local0"
    Severity="info"
    addMetadata="on"
)
 
# =========================================
# Local Retention (for forensics if SIEM fails)
# =========================================
# Keep 90 days locally with log rotation
*.* action(
    type="omfile"
    file="/var/log/security/all.log"
    template="JSONFormat"
    fileCreateMode="0600"
    fileOwner="root"
)

Log Volume Challenges

Modern environments generate enormous log volumes—terabytes per day for large enterprises. Without careful architecture (filtering, aggregation, tiered storage), costs explode while signal drowns in noise. Define which logs must be collected in full fidelity versus which can be sampled or summarized. Storage costs should drive thoughtful logging, not fear-driven collection of everything.

Security Information and Event Management (SIEM)

SIEM systems are the central nervous system of security monitoring, aggregating logs from across the environment, normalizing data into common formats, and enabling correlation, alerting, and analysis.

Core SIEM Functions

Log Aggregation:

Collect logs from all sources (servers, networks, applications, cloud)
Handle diverse log formats (syslog, JSON, XML, proprietary)
Scale to handle enterprise log volumes (hundreds of TB+)

Normalization:

Transform varied log formats into common schema
Map vendor-specific fields to standard fields (source IP, user, action)
Enable cross-source correlation and querying

Correlation:

Identify relationships between events from different sources
Detect attack patterns spanning multiple systems
Reduce noise by consolidating related events

Alerting:

Match events against detection rules
Prioritize alerts by severity and confidence
Route alerts to appropriate responders
Reduce alert fatigue through tuning

Search and Investigation:

Full-text and structured search across historical data
Timeline reconstruction for incidents
Pivot from one indicator to related events

Reporting and Compliance:

Dashboards for security posture visibility
Automated compliance reports
Long-term trend analysis

Converting Mermaid diagram...

Modern SIEM Technologies

Traditional SIEM Products:

Splunk Enterprise Security
IBM QRadar
LogRhythm
ArcSight (Micro Focus)

Cloud-Native SIEM:

Microsoft Sentinel
Google Chronicle
Sumo Logic
Elastic Security

Open Source Options:

Elastic Stack (ELK)
Wazuh
OSSIM/AlienVault
Graylog

SIEM Optimization Challenges

Alert Fatigue: The number one SIEM failure mode. Too many alerts, too many false positives, analysts stop investigating.

Solutions:

Tune rules continuously based on false positive feedback
Implement tiered alerting (critical vs. informational)
Use correlation to consolidate related alerts
Employ risk scoring to prioritize alerts

Data Overload: Collecting everything is expensive and makes analysis harder.

Solutions:

Define clear log collection policies
Use log filtering at source
Implement tiered storage (hot/warm/cold)
Archive rather than delete for compliance

Skill Requirements: SIEM effectiveness depends on skilled analysts and rule developers.

Solutions:

Invest in analyst training
Use vendor-provided detection content
Share rules via threat intelligence communities
Implement automation to handle routine tasks

siem-detection-rule

| Detection Rule: Brute Force Attack Detection |
| Detects multiple failed logins followed by success from same source |
 
index=authentication sourcetype=windows:security
| bin _time span=10m
| stats 
    count(eval(EventCode=4625)) as failed_logins
    count(eval(EventCode=4624)) as successful_logins
    earliest(_time) as first_event
    latest(_time) as last_event
    values(TargetUserName) as users
    by src_ip, _time
| where failed_logins >= 5 AND successful_logins >= 1
| eval 
    attack_type="Brute Force - Successful",
    severity=case(
        failed_logins >= 20, "Critical",
        failed_logins >= 10, "High",
        true(), "Medium"
    ),
    description="Multiple failed logins (".failed_logins.") followed by success from ".src_ip
| table _time, src_ip, users, failed_logins, successful_logins, severity, description
 
| This rule identifies brute force attacks that succeeded by:
| 1. Binning events into 10-minute windows
| 2. Counting failed logins (4625) and successful logins (4624) per source IP
| 3. Alerting when pattern shows 5+ failures followed by success
| 4. Calculating severity based on attack intensity

SIEM Success Factors

Successful SIEM programs invest more in people and process than technology. A mediocre SIEM platform with skilled analysts and well-tuned rules outperforms an advanced platform with default configuration and overwhelmed operators. Budget for ongoing care and feeding, not just initial deployment.

Threat Detection Methodologies

Detecting malicious activity requires multiple complementary approaches. No single detection methodology catches all threats, and effective security operations combine several techniques.

Signature-Based Detection

How it Works: Matches observed activity against known patterns of malicious behavior—specific byte sequences, command strings, network packets, or file hashes.

Examples:

Antivirus matching malware file hashes
IDS rules matching exploit packet patterns
SIEM rules detecting known attack patterns
Email filters matching known phishing indicators

Strengths:

Low false positive rates for well-crafted signatures
Fast detection (simple pattern match)
Clear explanation of why something triggered

Weaknesses:

Cannot detect novel (zero-day) attacks
Easily evaded with minor modifications
Requires constant signature updates
Growing signature databases impact performance

Anomaly-Based Detection

How it Works: Establishes baselines of "normal" behavior, then alerts on deviations from those baselines.

Examples:

Network traffic anomalies (unusual volume, protocols, destinations)
User behavior anomalies (access at unusual times, unusual data access patterns)
Process behavior anomalies (unusual parent-child relationships, unexpected network connections)

Strengths:

Can detect previously unknown attacks
Adapts to environment-specific patterns
Effective against insider threats

Weaknesses:

High false positive rates (normal varies widely)
Attackers can slowly normalize malicious behavior
Requires substantial baseline training period
"Normal" changes over time, requiring recalibration

Detection Methodology Comparison
Aspect	Signature-Based	Anomaly-Based	Behavioral Analysis
Known Threats	Excellent	Poor (may not distinguish)	Good (attack patterns)
Unknown Threats	Poor	Good (deviation from normal)	Good (malicious behaviors)
False Positives	Low	High	Medium
Evasion Difficulty	Easy (modify signature)	Medium (gradual change)	Hard (change behavior)
Setup Complexity	Low (install and update)	High (baseline training)	Medium (define behaviors)
Maintenance	Signature updates	Baseline recalibration	Behavior model updates
Explainability	High (matched rule X)	Low (deviation from norm)	Medium (observed behavior Y)

Behavioral Analysis

How it Works: Defines behaviors characteristic of attacks (TTPs - Tactics, Techniques, Procedures) and detects when those behaviors occur, regardless of specific implementation.

Examples:

Credential dumping behaviors (accessing LSASS memory)
Lateral movement patterns (admin tools from non-admin hosts)
Data exfiltration patterns (large outbound transfers to new destinations)
Persistence mechanisms (registry modifications, scheduled tasks)

MITRE ATT&CK Framework: The ATT&CK framework catalogs adversary behaviors across attack lifecycle phases:

Tactics — Why: The adversary's tactical goal (Persistence, Lateral Movement, Exfiltration)
Techniques — How: General methods for achieving goals (T1003: OS Credential Dumping)
Sub-techniques — Specific variations (T1003.001: LSASS Memory)
Procedures — Actual implementation by specific threat actors

Mapping detections to ATT&CK helps identify coverage gaps.

User and Entity Behavior Analytics (UEBA)

How it Works: Machine learning models profile normal behavior of users and entities, detecting anomalies that may indicate compromise or insider threats.

Capabilities:

Peer group analysis (user differs from similar users)
Risk scoring (aggregate anomalies into risk level)
Session analysis (chain of activities within session)
Credential usage patterns (unusual authentication patterns)

Use Cases:

Compromised credential detection
Insider threat identification
Privileged access monitoring
Account takeover detection

Layered Detection Strategy

•Signatures for known threats — Fast detection of common attacks with low false positives; foundational layer
•Behavioral rules for attack patterns — Detect techniques even when implementation varies; maps to ATT&CK
•Anomaly detection for unknown threats — Catch novel attacks that don't match known patterns; requires tuning
•UEBA for insider and credential threats — Identify compromised accounts and malicious insiders; user-centric view
•Threat intelligence integration — Enrich detections with external context; prioritize based on active threats
•Threat hunting for gaps — Proactive search for threats that evaded automated detection; human-driven

Detection Engineering as Discipline

Organizations increasingly treat detection engineering as a software engineering discipline—version-controlled detection rules, testing with red team data, CI/CD pipelines for rule deployment, and metrics on detection efficacy. This approach improves detection quality and enables continuous improvement based on missed attacks and false positive feedback.

Network Monitoring and Traffic Analysis

Network monitoring provides visibility into communications between systems, enabling detection of command and control, lateral movement, data exfiltration, and policy violations.

Network Data Sources

Full Packet Capture (PCAP):

Complete network traffic recording
Ultimate forensic fidelity
Massive storage requirements (TB/hour for enterprise)
Privacy concerns with encrypted traffic
Used for targeted capture or post-incident analysis

Flow Data (NetFlow/IPFIX):

Metadata about connections (source, dest, ports, bytes, duration)
Much smaller than PCAP (1000x+ reduction)
Sufficient for many detection use cases
Loses payload content (can't detect payload-based attacks)
Cost-effective for long-term retention

DNS Query Logs:

All DNS resolutions from internal resolvers
Reveals communication intent even when encrypted
Detects C2 domains, data exfiltration via DNS tunneling
Useful for historical investigation

Proxy/Web Gateway Logs:

HTTP/HTTPS traffic through web proxies
URL categories, content types, user attribution
SSL inspection enables payload visibility (with privacy tradeoffs)

Email Gateway Logs:

Email flow, attachment types, spam/phishing verdicts
Useful for phishing investigation
Links email to subsequent web activity

network-monitoring-tools

Zeek Network Analysis (zeek scripts)

# Zeek (formerly Bro) is the gold standard for network security monitoring
# It produces rich log data about network activity
 
# zeek.log outputs include:
#
# conn.log      - Connection summaries (flow-like data)
# dns.log       - DNS queries and responses
# http.log      - HTTP transactions
# ssl.log       - TLS/SSL handshake details
# files.log     - Files transferred over network
# notice.log    - Zeek-generated alerts
# weird.log     - Protocol anomalies
 
# Example conn.log fields:
# ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,
# duration,orig_bytes,resp_bytes,conn_state,local_orig,local_resp,
# missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes
 
# Custom Zeek script for DNS tunneling detection
@load base/protocols/dns
 
module DNS_Tunnel;
 
export {
    redef enum Notice::Type += {
        DNS_Tunnel_Detected
    };
    
    # Threshold: Queries with long subdomains
    const min_suspicious_length = 50 &redef;
    
    # Track domains per host
    global host_query_counts: table[addr] of count &default=0;
}
 
event dns_request(c: connection, msg: dns_msg, query: string, qtype: count)
{
    # Check for long DNS queries (potential tunneling)
    if ( |query| > min_suspicious_length )
    {
        NOTICE([
            $note=DNS_Tunnel_Detected,
            $msg=fmt("Possible DNS tunneling: %s queried %s (len:%d)",
                     c$id$orig_h, query, |query|),
            $conn=c,
            $identifier=cat(c$id$orig_h,query)
        ]);
    }
    
    # Track high volume per host
    host_query_counts[c$id$orig_h] += 1;
    
    if ( host_query_counts[c$id$orig_h] > 1000 )
    {
        NOTICE([
            $note=DNS_Tunnel_Detected,
            $msg=fmt("High DNS query volume from %s: %d queries",
                     c$id$orig_h, host_query_counts[c$id$orig_h]),
            $conn=c
        ]);
    }
}

Network Detection and Response (NDR)

NDR platforms combine multiple network data sources with advanced analytics:

Capabilities:

Passive traffic analysis without inline devices
Machine learning for anomaly detection
Encrypted traffic analysis (detecting anomalies without decryption)
Protocol anomaly detection
Lateral movement visibility
Retrospective analysis (search historical traffic)

Key Detection Categories:

Command and Control (C2):

Beaconing detection (periodic callback patterns)
Known malicious destination blocking
Protocol anomalies (HTTP over non-standard ports)
DNS-based C2 (DGA domains, DNS tunneling)

Lateral Movement:

Internal scanning detection
Unusual internal connectivity patterns
Credential-based protocols to unusual destinations (RDP, SMB, SSH)
Pass-the-hash/pass-the-ticket detection

Data Exfiltration:

Large outbound data transfers
Connections to new external destinations
Encrypted channel usage patterns
Cloud storage uploads

Encrypted Traffic Challenges: With TLS 1.3 and encrypted DNS, network visibility decreases. Strategies include:

TLS inspection — Decrypt and re-encrypt (privacy/performance tradeoffs)
Encrypted Traffic Analytics — Detect anomalies from metadata (timing, packet sizes, JA3 fingerprints)
Endpoint visibility — Complement network monitoring with EDR
DNS visibility — Maintain logging at internal DNS resolvers

Network Monitoring Architecture

Strategic sensor placement is critical. Deploy network sensors at trust boundaries (perimeter, between segments, data center entry), east-west traffic paths (between servers), and egress points. Mirror ports (SPAN) or network taps provide traffic copies without inline latency. Cloud environments require virtual taps or VPC flow logs.

Endpoint Detection and Response (EDR)

Endpoint Detection and Response provides deep visibility into host-level activity—processes, file operations, network connections, registry changes, and more. As network encryption increases, endpoint visibility becomes increasingly critical.

EDR Capabilities

Telemetry Collection:

Process creation and termination
Command line arguments
Parent-child process relationships
File creation, modification, deletion
Registry changes
Network connections per process
Module/DLL loading
User logon events
Scheduled task/service creation

Real-Time Detection:

Behavioral analysis of running processes
Memory scanning for known malware
Script execution monitoring (PowerShell, WMI)
Exploit technique detection (injections, hooks)

Response Capabilities:

Process termination
Network isolation
File quarantine
Memory dump collection
Remote shell for investigation

EDR vs. Traditional Antivirus
Capability	Traditional AV	EDR
Detection Method	Signature-based, static analysis	Behavioral, real-time analysis
Visibility	File scanning	Process, network, registry, memory
Response	Quarantine files	Isolate host, kill process, memory dump
Investigation	Limited	Full historical timeline
Evasion Resistance	Low (modify bytes → evade)	Higher (must evade behavior detection)
Retrospective Search	No	Yes (hunt across historical telemetry)
Centralized Management	Basic	Full visibility across fleet
Cloud/On-Prem	Often on-prem	Typically cloud-managed

Key EDR Detection Categories

Process Injection Techniques:

DLL injection into running processes
Process hollowing (replacing legitimate process memory)
Thread hijacking
APC injection

Credential Access:

LSASS memory access attempts
SAM database access
Credential dumping tools (Mimikatz patterns)
Kerberos ticket manipulation

Persistence Mechanisms:

Registry run key modifications
Scheduled task creation
Service installation
Startup folder modifications
WMI event subscriptions

Defense Evasion:

AMSI bypass attempts
ETW patching
Security tool process termination
Log clearing

Living-Off-the-Land Techniques:

Suspicious PowerShell usage
WMIC/CertUtil abuse
mshta/rundll32/regsvr32 misuse
BITSAdmin/msiexec abuse

EDR Deployment Considerations

Coverage Goals:

All endpoints: workstations, laptops, servers
Include cloud workloads and containers where possible
Critical systems first, then expand

Performance Impact:

Modern EDR has minimal performance overhead
May require exclusions for high-I/O applications
Test on production-representative systems

Integration Points:

SIEM: Forward alerts for correlation with other sources
SOAR: Enable automated response playbooks
Threat Intelligence: Enrich with external context

Beyond EDR: XDR

Extended Detection and Response (XDR) integrates endpoint, network, email, identity, and cloud telemetry into a unified platform. Rather than correlating separate tools in a SIEM, XDR platforms provide native cross-domain visibility and correlation, reducing integration complexity and enabling faster detection and response.

Security Metrics and Operational Dashboards

Effective security monitoring requires not just data collection, but meaningful metrics that inform decision-making, demonstrate program effectiveness, and drive continuous improvement.

Detection Metrics

Mean Time to Detect (MTTD): Average time from attack occurrence to detection.

Industry average: 197 days (2024, improving)
Target: Hours to days for critical threats
Measures detection capability
Broken down by attack type, data source

Mean Time to Respond (MTTR): Average time from detection to containment.

Measures response capability
Include investigation, decision, and action time
Target: Hours for critical incidents

True Positive Rate: Percentage of alerts representing actual threats.

Low rates indicate noisy rules
Target: 80%+ for actionable alerts
Track per rule/detection for tuning

False Positive Rate: Percentage of alerts that are not threats.

High rates cause alert fatigue
Investigate top false positive sources
Continuous feedback loop for tuning

Detection Coverage: Percentage of ATT&CK techniques with detection rules.

Map detections to framework
Identify gaps in coverage
Prioritize based on threat intelligence

Key Security Monitoring KPIs
Metric	Target	Measurement Method	Improvement Action
MTTD (Critical)	< 24 hours	Timestamp analysis of incidents	Add detections, improve telemetry
MTTR (Critical)	< 4 hours	Incident ticket timestamps	Playbook automation, training
Alert True Positive Rate	80%	Weekly alert sampling	Tune rules, add context
ATT&CK Coverage	60%	Detection-to-technique mapping	Gap analysis, detection engineering
Log Ingestion Latency	< 5 minutes	Timestamp comparison	Infrastructure scaling, streaming
System Coverage	100% critical assets	Asset inventory vs. log sources	Agent deployment, log forwarding
Alert Closure Rate	95% within SLA	Ticket aging report	Staffing, automation

Operational Dashboards

SOC Operations Dashboard:

Alert queue depth (open alerts by priority)
Alert age distribution (how long alerts wait)
Analyst workload (cases per analyst)
Escalation rate (Tier 1 to Tier 2)
Alert sources (which systems generate alerts)

Threat Detection Dashboard:

Alerts by type/category over time
True positive trends
Top triggered rules
Geographic/network segment distribution
Correlated incident view

Compliance Dashboard:

Log collection completeness
Retention compliance by data type
Review cadence compliance (access reviews, etc.)
Audit finding status
Policy exception status

Threat Intelligence Dashboard:

IOC matches over time
Top threat actors targeting org
Intelligence source coverage
Time from intelligence to detection rule

Dashboard Design Principles

Purpose-Driven: Each dashboard serves a specific audience and use case. Don't combine executive reporting with analyst workload.

Actionable: Every metric should drive potential action. If you can't act on it, don't track it.

Trending: Point-in-time metrics without trend are limited. Show direction over time.

Drill-Down: Allow investigation from summary to detail without leaving the dashboard.

Red/Yellow/Green: Clear visual indicators of status against targets. Don't require interpretation.

Avoiding Vanity Metrics

Beware metrics that look good but don't reflect security posture. "Blocked 1 million attacks" means nothing without context (were they targeted or spray-and-pray?). "100% uptime" for security tools doesn't mean they're detecting threats. Focus on outcomes: threats detected, incidents contained, business impact prevented. Question whether each metric would change your decisions if it moved.

Proactive Threat Hunting

Threat hunting is the proactive, human-driven search for threats that evade automated detection. While SIEM alerts react to known patterns, hunting assumes sophisticated threats are already present and seeks them actively.

Hunting vs. Detection

Automated Detection:

Reactive: Waits for patterns to match
Scalable: Processes all events equally
Rule-based: Limited to defined patterns
Fast: Alerts in near-real-time

Threat Hunting:

Proactive: Doesn't wait for alerts
Human-driven: Analyst intuition and creativity
Hypothesis-based: Tests theories about attacker presence
Thorough: Deep investigation of suspicious patterns

Hunting Methodologies

Intelligence-Driven: Start from threat intelligence about active threats.

Receive intelligence about threat actor TTP
Develop hypothesis: "If actor X targeted us, they would leave indicators Y"
Search telemetry for indicators and behaviors
Investigate any matches; escalate or close
Convert successful hunts into automated detections

Anomaly-Driven: Start from statistical anomalies, investigate for malicious cause.

Identify statistical outliers (rare processes, unusual access patterns)
Develop hypothesis: "These outliers may indicate compromise"
Investigate context around anomalies
Determine if benign or malicious
Either tune baseline or escalate

ATT&CK-Driven: Systematically hunt for specific techniques.

Select ATT&CK technique not covered by detections
Research how technique manifests in telemetry
Hunt for evidence of technique execution
Build detection rule if pattern found
Document even if nothing found (validates coverage)

threat-hunting-queries
Threat Hunting Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
-- Hunt #1: Credential Dumping via LSASS Access
-- ATT&CK T1003.001 - LSASS Memory
-- 
-- Hypothesis: If attackers dumped credentials, they accessed lsass.exe memory
-- 
-- Look for processes accessing lsass.exe with suspicious access rights
 
SELECT 
    hostname,
    process_name AS source_process,
    process_commandline,
    target_process_name,
    granted_access_numeric,
    CASE 
        WHEN granted_access_numeric & 0x1F0FFF = 0x1F0FFF THEN 'FULL_ACCESS (suspicious)'
        WHEN granted_access_numeric & 0x1010 = 0x1010 THEN 'PROCESS_QUERY + VM_READ (suspicious)'
        ELSE 'Other'
    END as access_type,
    event_time,
    user_name
FROM process_access_events
WHERE target_process_name = 'lsass.exe'
  AND source_process NOT IN (
    'lsass.exe', 'csrss.exe', 'services.exe', 'winlogon.exe',
    'svchost.exe', 'wininit.exe', 'MsMpEng.exe'  -- legitimate processes
  )
  AND event_time > now() - INTERVAL 7 DAY
ORDER BY event_time DESC;
 
-- Investigate any results: What is the source process?
-- Is it legitimate security software or potential mimikatz/dump tool?
 
 
-- Hunt #2: Beaconing Detection
-- ATT&CK T1071 - Application Layer Protocol
--
-- Hypothesis: C2 beacons at regular intervals; find periodic connections
 
WITH connection_intervals AS (
    SELECT 
        source_ip,
        dest_ip,
        dest_port,
        COUNT(*) as connection_count,
        STDDEV(UNIX_TIMESTAMP(event_time) - 
               LAG(UNIX_TIMESTAMP(event_time)) OVER (
                   PARTITION BY source_ip, dest_ip, dest_port 
                   ORDER BY event_time
               )) as interval_stddev,
        AVG(UNIX_TIMESTAMP(event_time) - 
            LAG(UNIX_TIMESTAMP(event_time)) OVER (
                PARTITION BY source_ip, dest_ip, dest_port 
                ORDER BY event_time
            )) as avg_interval
    FROM network_connections
    WHERE event_time > now() - INTERVAL 24 HOUR
      AND dest_ip NOT IN (SELECT ip FROM known_good_destinations)
    GROUP BY source_ip, dest_ip, dest_port
    HAVING COUNT(*) > 20  -- Need enough samples
)
SELECT *
FROM connection_intervals
WHERE interval_stddev / avg_interval < 0.1  -- Very regular intervals
  AND avg_interval BETWEEN 30 AND 3600       -- 30 sec to 1 hour
ORDER BY connection_count DESC;
 
-- Investigate: Why is this host connecting to this destination at regular intervals?
-- Check destination reputation, examine connection content if possible
 
 
-- Hunt #3: Rare Process Execution
-- Look for processes that only executed on 1-2 hosts (potentially targeted)
 
SELECT 
    process_name,
    process_hash,
    COUNT(DISTINCT hostname) as unique_hosts,
    GROUP_CONCAT(DISTINCT hostname) as hosts,
    MIN(event_time) as first_seen
FROM process_creation_events
WHERE event_time > now() - INTERVAL 7 DAY
GROUP BY process_name, process_hash
HAVING COUNT(DISTINCT hostname) <= 2
   AND COUNT(*) > 5  -- Ran multiple times on those hosts
ORDER BY unique_hosts, first_seen DESC;
 
-- Investigate: Why is this process only on these hosts?
-- Is it legitimate software specific to these roles?
-- Or potentially targeted malware?

Hunting Program Best Practices

•Document hypotheses — Write down what you're looking for and why before hunting. Enables process improvement.
•Time-box hunts — Spend defined time on each hunt (2-4 hours). Don't let hunts drag indefinitely.
•Track metrics — Hunts conducted, threats found, detections created. Prove hunting value.
•Convert to detections — Every successful hunt should become an automated detection rule.
•Share knowledge — Document techniques and queries. Build institutional knowledge.
•Use threat intelligence — Prioritize hunts based on threats targeting your industry/region.
•Ensure data quality — Hunting is only as good as available telemetry. Identify gaps.
•Collaborate with red team — Use exercises to validate hunting effectiveness.

Hunting Maturity Model

Organizations progress through hunting maturity levels: HM0 (no hunting), HM1 (reactive, indicator-based), HM2 (procedural, following playbooks), HM3 (hypothesis-driven, developing new techniques), HM4 (innovative, creating new hunting methodologies). Most organizations should aim for HM2-HM3. HM4 requires dedicated research capability.

Summary: Monitoring and Logging

Monitoring and logging provide the visibility foundation that makes security detection and response possible. Without comprehensive, well-architected observability, organizations operate blind to threats that prevention eventually fails to stop.

Key Takeaways

•Log architecture requires comprehensive collection, structured formatting, reliable transport, and integrity-protected storage
•SIEM systems aggregate, normalize, correlate, and alert on security events across the enterprise; success depends on tuning and analyst capability
•Detection methodologies combine signatures, behavioral analysis, anomaly detection, and UEBA for layered threat detection
•Network monitoring provides visibility into communications, C2 detection, lateral movement, and data exfiltration; encrypted traffic challenges require complementary approaches
•Endpoint detection (EDR) delivers deep host visibility increasingly critical as network encryption expands
•Security metrics must be outcome-focused and actionable—MTTD, MTTR, true positive rates, and detection coverage drive improvement
•Threat hunting proactively searches for threats that evade automated detection; hypothesis-driven approach finds sophisticated attackers
•Integration across domains—network, endpoint, identity, cloud—provides complete visibility for modern environments

What's Next:

With visibility established through monitoring and logging, we'll next explore Incident Response—the structured processes and procedures for responding when threats are detected. Detection without response capability merely documents attacks without stopping them.

Page Complete

You now understand how to architect logging systems, implement SIEM platforms, apply multiple detection methodologies, and conduct proactive threat hunting. This visibility foundation enables the detection of attacks that prevention fails to stop—the difference between a brief incident and a months-long compromise. Every defense strategy depends on the ability to see what's happening in your environment.

3 / 5

Loading learning content...

Computer NetworksDefense Strategies

Network Defense Strategies

LevelAdvanced

Duration120 mins

TopicDefense Strategies

3 / 5

Monitoring and Logging: Security Visibility and Detection

The Eyes and Ears of Security

Monitoring and logging are the foundation of security detection. They provide the visibility necessary to:

Detect ongoing attacks before damage is complete
Identify compromised systems and accounts
Understand the scope and impact of incidents
Provide forensic evidence for investigation
Demonstrate compliance with regulatory requirements
Enable continuous improvement through trend analysis

This page explores logging architectures, monitoring technologies, detection methodologies, and the operational practices that transform raw data into actionable security intelligence.

What You Will Learn

Log Architecture Fundamentals

Effective security monitoring begins with comprehensive, reliable logging. Log architecture addresses what to log, how to collect logs, where to store them, and how to ensure their integrity.

What to Log

Security-relevant events span multiple domains:

Authentication Events:

Successful and failed login attempts
Password changes and resets
MFA enrollment and usage
Session creation and termination
Privilege escalation (sudo, runas)

Authorization Events:

Access granted and denied
Permission changes
Group membership modifications
Role assignments

System Events:

Service starts and stops
Configuration changes
Patch installations
System errors and crashes
Resource exhaustion warnings

Network Events:

Firewall accepts and denies
VPN connections
DNS queries and responses
HTTP/S transactions
Email sending and receiving

Application Events:

Application-specific authentication
Business transactions
Data access and modification
Error conditions
API calls (especially to sensitive endpoints)

Physical Security Events:

Badge access attempts
Camera motion detection
Door alarms
Environmental alerts

Essential Log Fields for Security Analysis
Field	Purpose	Example
Timestamp	When the event occurred (UTC)	2025-01-17T14:23:47.123Z
Source	System generating the log	webserver-prod-01.example.com
Event Type	Category of event	AUTH_FAILURE
Severity	Importance level	WARNING
Actor/User	Who initiated the action	jsmith@example.com
Target/Object	What was affected	/etc/passwd
Action	What operation was attempted	READ
Outcome	Success or failure	DENIED
Source IP	Originating network address	192.168.1.100
Destination IP	Target network address	10.0.0.50
Session ID	Link events to sessions	a1b2c3d4-e5f6-7890
Additional Context	Event-specific details	{attempts: 5, lockout: true}

Log Collection Architectures

Push vs. Pull Collection:

Push (Agent-based):

Agents on each system forward logs to central collector
Real-time or near-real-time transmission
Agents can buffer during network outages
Adds software to manage on each endpoint

Pull (Agentless):

Central collector queries systems for logs
Simpler infrastructure but may miss events between polls
Network-accessible log sources required
Suitable for network devices, legacy systems

Collection Technologies:

Syslog — Traditional UNIX logging, now standardized (RFC 5424). UDP (unreliable) or TCP (reliable). Widely supported.
Windows Event Forwarding (WEF) — Native Windows log collection using WinRM/Kerberos. Scales to enterprise level.
Beats/Fluentd/Vector — Modern log shippers with parsing, filtering, and multiple output support.
Cloud-native — Cloud providers offer native log routing (CloudWatch, Stackdriver, Azure Monitor).

Log Storage Considerations

Retention Requirements:

Regulatory requirements (often 1-7 years)
Forensic needs (investigate incidents months later)
Storage costs (logs can be massive)
Tiered storage (hot/warm/cold) for cost optimization

Integrity Protection:

Logs are critical evidence; attackers often try to modify them
Write-once storage (WORM) for compliance
Cryptographic hash chains for tamper detection
Separate storage from monitored systems

log-collection-architecture
Syslog Configuration (rsyslog)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# /etc/rsyslog.conf - Enterprise syslog forwarding configuration
 
# =========================================
# Module Loading
# =========================================
module(load="imuxsock")    # Local system logging
module(load="imjournal")   # systemd journal
module(load="imfile")      # File-based input for app logs
 
# =========================================
# Global Settings
# =========================================
global(
    # Use RFC 5424 format for consistent parsing
    parser.permitSlashDefaultsForTemplate="on"
    # Queue settings for reliability
    workDirectory="/var/lib/rsyslog"
)
 
# =========================================
# Log Format Template - Structured Logging
# =========================================
template(name="JSONFormat" type="list") {
    constant(value="{")
    constant(value=""timestamp":"")
    property(name="timereported" dateFormat="rfc3339")
    constant(value="","hostname":"")
    property(name="hostname")
    constant(value="","severity":"")
    property(name="syslogseverity-text")
    constant(value="","facility":"")
    property(name="syslogfacility-text")
    constant(value="","program":"")
    property(name="programname")
    constant(value="","message":"")
    property(name="msg" format="json")
    constant(value=""}")
    constant(value="
")
}
 
# =========================================
# Forwarding Configuration with TLS
# =========================================
# Primary SIEM
action(
    type="omfwd"
    target="siem.example.com"
    port="6514"
    protocol="tcp"
    # TLS encryption for log confidentiality
    StreamDriver="gtls"
    StreamDriverMode="1"
    StreamDriverAuthMode="x509/certvalid"
    StreamDriverPermittedPeers="*.example.com"
    # Reliable delivery with disk queue
    queue.type="LinkedList"
    queue.size="10000"
    queue.filename="siem_queue"
    queue.saveOnShutdown="on"
    queue.maxDiskSpace="1g"
    action.resumeRetryCount="-1"
    template="JSONFormat"
)
 
# =========================================
# Application-Specific Log Collection
# =========================================
# Monitor application logs in JSON format
input(type="imfile"
    File="/var/log/app/*.json"
    Tag="app-log:"
    Facility="local0"
    Severity="info"
    addMetadata="on"
)
 
# =========================================
# Local Retention (for forensics if SIEM fails)
# =========================================
# Keep 90 days locally with log rotation
*.* action(
    type="omfile"
    file="/var/log/security/all.log"
    template="JSONFormat"
    fileCreateMode="0600"
    fileOwner="root"
)

Log Volume Challenges

Security Information and Event Management (SIEM)

Core SIEM Functions

Log Aggregation:

Collect logs from all sources (servers, networks, applications, cloud)
Handle diverse log formats (syslog, JSON, XML, proprietary)
Scale to handle enterprise log volumes (hundreds of TB+)

Normalization:

Transform varied log formats into common schema
Map vendor-specific fields to standard fields (source IP, user, action)
Enable cross-source correlation and querying

Correlation:

Identify relationships between events from different sources
Detect attack patterns spanning multiple systems
Reduce noise by consolidating related events

Alerting:

Match events against detection rules
Prioritize alerts by severity and confidence
Route alerts to appropriate responders
Reduce alert fatigue through tuning

Search and Investigation:

Full-text and structured search across historical data
Timeline reconstruction for incidents
Pivot from one indicator to related events

Reporting and Compliance:

Dashboards for security posture visibility
Automated compliance reports
Long-term trend analysis

Converting Mermaid diagram...

Modern SIEM Technologies

Traditional SIEM Products:

Splunk Enterprise Security
IBM QRadar
LogRhythm
ArcSight (Micro Focus)

Cloud-Native SIEM:

Microsoft Sentinel
Google Chronicle
Sumo Logic
Elastic Security

Open Source Options:

Elastic Stack (ELK)
Wazuh
OSSIM/AlienVault
Graylog

SIEM Optimization Challenges

Alert Fatigue: The number one SIEM failure mode. Too many alerts, too many false positives, analysts stop investigating.

Solutions:

Tune rules continuously based on false positive feedback
Implement tiered alerting (critical vs. informational)
Use correlation to consolidate related alerts
Employ risk scoring to prioritize alerts

Data Overload: Collecting everything is expensive and makes analysis harder.

Solutions:

Define clear log collection policies
Use log filtering at source
Implement tiered storage (hot/warm/cold)
Archive rather than delete for compliance

Skill Requirements: SIEM effectiveness depends on skilled analysts and rule developers.

Solutions:

Invest in analyst training
Use vendor-provided detection content
Share rules via threat intelligence communities
Implement automation to handle routine tasks

siem-detection-rule

| Detection Rule: Brute Force Attack Detection |
| Detects multiple failed logins followed by success from same source |
 
index=authentication sourcetype=windows:security
| bin _time span=10m
| stats 
    count(eval(EventCode=4625)) as failed_logins
    count(eval(EventCode=4624)) as successful_logins
    earliest(_time) as first_event
    latest(_time) as last_event
    values(TargetUserName) as users
    by src_ip, _time
| where failed_logins >= 5 AND successful_logins >= 1
| eval 
    attack_type="Brute Force - Successful",
    severity=case(
        failed_logins >= 20, "Critical",
        failed_logins >= 10, "High",
        true(), "Medium"
    ),
    description="Multiple failed logins (".failed_logins.") followed by success from ".src_ip
| table _time, src_ip, users, failed_logins, successful_logins, severity, description
 
| This rule identifies brute force attacks that succeeded by:
| 1. Binning events into 10-minute windows
| 2. Counting failed logins (4625) and successful logins (4624) per source IP
| 3. Alerting when pattern shows 5+ failures followed by success
| 4. Calculating severity based on attack intensity

SIEM Success Factors

Threat Detection Methodologies

Detecting malicious activity requires multiple complementary approaches. No single detection methodology catches all threats, and effective security operations combine several techniques.

Signature-Based Detection

How it Works: Matches observed activity against known patterns of malicious behavior—specific byte sequences, command strings, network packets, or file hashes.

Examples:

Antivirus matching malware file hashes
IDS rules matching exploit packet patterns
SIEM rules detecting known attack patterns
Email filters matching known phishing indicators

Strengths:

Low false positive rates for well-crafted signatures
Fast detection (simple pattern match)
Clear explanation of why something triggered

Weaknesses:

Cannot detect novel (zero-day) attacks
Easily evaded with minor modifications
Requires constant signature updates
Growing signature databases impact performance

Anomaly-Based Detection

How it Works: Establishes baselines of "normal" behavior, then alerts on deviations from those baselines.

Examples:

Network traffic anomalies (unusual volume, protocols, destinations)
User behavior anomalies (access at unusual times, unusual data access patterns)
Process behavior anomalies (unusual parent-child relationships, unexpected network connections)

Strengths:

Can detect previously unknown attacks
Adapts to environment-specific patterns
Effective against insider threats

Weaknesses:

High false positive rates (normal varies widely)
Attackers can slowly normalize malicious behavior
Requires substantial baseline training period
"Normal" changes over time, requiring recalibration

Detection Methodology Comparison
Aspect	Signature-Based	Anomaly-Based	Behavioral Analysis
Known Threats	Excellent	Poor (may not distinguish)	Good (attack patterns)
Unknown Threats	Poor	Good (deviation from normal)	Good (malicious behaviors)
False Positives	Low	High	Medium
Evasion Difficulty	Easy (modify signature)	Medium (gradual change)	Hard (change behavior)
Setup Complexity	Low (install and update)	High (baseline training)	Medium (define behaviors)
Maintenance	Signature updates	Baseline recalibration	Behavior model updates
Explainability	High (matched rule X)	Low (deviation from norm)	Medium (observed behavior Y)

Behavioral Analysis

How it Works: Defines behaviors characteristic of attacks (TTPs - Tactics, Techniques, Procedures) and detects when those behaviors occur, regardless of specific implementation.

Examples:

Credential dumping behaviors (accessing LSASS memory)
Lateral movement patterns (admin tools from non-admin hosts)
Data exfiltration patterns (large outbound transfers to new destinations)
Persistence mechanisms (registry modifications, scheduled tasks)

MITRE ATT&CK Framework: The ATT&CK framework catalogs adversary behaviors across attack lifecycle phases:

Tactics — Why: The adversary's tactical goal (Persistence, Lateral Movement, Exfiltration)
Techniques — How: General methods for achieving goals (T1003: OS Credential Dumping)
Sub-techniques — Specific variations (T1003.001: LSASS Memory)
Procedures — Actual implementation by specific threat actors

Mapping detections to ATT&CK helps identify coverage gaps.

User and Entity Behavior Analytics (UEBA)

How it Works: Machine learning models profile normal behavior of users and entities, detecting anomalies that may indicate compromise or insider threats.

Capabilities:

Peer group analysis (user differs from similar users)
Risk scoring (aggregate anomalies into risk level)
Session analysis (chain of activities within session)
Credential usage patterns (unusual authentication patterns)

Use Cases:

Compromised credential detection
Insider threat identification
Privileged access monitoring
Account takeover detection

Layered Detection Strategy

•Signatures for known threats — Fast detection of common attacks with low false positives; foundational layer
•Behavioral rules for attack patterns — Detect techniques even when implementation varies; maps to ATT&CK
•Anomaly detection for unknown threats — Catch novel attacks that don't match known patterns; requires tuning
•UEBA for insider and credential threats — Identify compromised accounts and malicious insiders; user-centric view
•Threat intelligence integration — Enrich detections with external context; prioritize based on active threats
•Threat hunting for gaps — Proactive search for threats that evaded automated detection; human-driven

Detection Engineering as Discipline

Network Monitoring and Traffic Analysis

Network monitoring provides visibility into communications between systems, enabling detection of command and control, lateral movement, data exfiltration, and policy violations.

Network Data Sources

Full Packet Capture (PCAP):

Complete network traffic recording
Ultimate forensic fidelity
Massive storage requirements (TB/hour for enterprise)
Privacy concerns with encrypted traffic
Used for targeted capture or post-incident analysis

Flow Data (NetFlow/IPFIX):

Metadata about connections (source, dest, ports, bytes, duration)
Much smaller than PCAP (1000x+ reduction)
Sufficient for many detection use cases
Loses payload content (can't detect payload-based attacks)
Cost-effective for long-term retention

DNS Query Logs:

All DNS resolutions from internal resolvers
Reveals communication intent even when encrypted
Detects C2 domains, data exfiltration via DNS tunneling
Useful for historical investigation

Proxy/Web Gateway Logs:

HTTP/HTTPS traffic through web proxies
URL categories, content types, user attribution
SSL inspection enables payload visibility (with privacy tradeoffs)

Email Gateway Logs:

Email flow, attachment types, spam/phishing verdicts
Useful for phishing investigation
Links email to subsequent web activity

network-monitoring-tools

Zeek Network Analysis (zeek scripts)

# Zeek (formerly Bro) is the gold standard for network security monitoring
# It produces rich log data about network activity
 
# zeek.log outputs include:
#
# conn.log      - Connection summaries (flow-like data)
# dns.log       - DNS queries and responses
# http.log      - HTTP transactions
# ssl.log       - TLS/SSL handshake details
# files.log     - Files transferred over network
# notice.log    - Zeek-generated alerts
# weird.log     - Protocol anomalies
 
# Example conn.log fields:
# ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,
# duration,orig_bytes,resp_bytes,conn_state,local_orig,local_resp,
# missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes
 
# Custom Zeek script for DNS tunneling detection
@load base/protocols/dns
 
module DNS_Tunnel;
 
export {
    redef enum Notice::Type += {
        DNS_Tunnel_Detected
    };
    
    # Threshold: Queries with long subdomains
    const min_suspicious_length = 50 &redef;
    
    # Track domains per host
    global host_query_counts: table[addr] of count &default=0;
}
 
event dns_request(c: connection, msg: dns_msg, query: string, qtype: count)
{
    # Check for long DNS queries (potential tunneling)
    if ( |query| > min_suspicious_length )
    {
        NOTICE([
            $note=DNS_Tunnel_Detected,
            $msg=fmt("Possible DNS tunneling: %s queried %s (len:%d)",
                     c$id$orig_h, query, |query|),
            $conn=c,
            $identifier=cat(c$id$orig_h,query)
        ]);
    }
    
    # Track high volume per host
    host_query_counts[c$id$orig_h] += 1;
    
    if ( host_query_counts[c$id$orig_h] > 1000 )
    {
        NOTICE([
            $note=DNS_Tunnel_Detected,
            $msg=fmt("High DNS query volume from %s: %d queries",
                     c$id$orig_h, host_query_counts[c$id$orig_h]),
            $conn=c
        ]);
    }
}

Network Detection and Response (NDR)

NDR platforms combine multiple network data sources with advanced analytics:

Capabilities:

Passive traffic analysis without inline devices
Machine learning for anomaly detection
Encrypted traffic analysis (detecting anomalies without decryption)
Protocol anomaly detection
Lateral movement visibility
Retrospective analysis (search historical traffic)

Key Detection Categories:

Command and Control (C2):

Beaconing detection (periodic callback patterns)
Known malicious destination blocking
Protocol anomalies (HTTP over non-standard ports)
DNS-based C2 (DGA domains, DNS tunneling)

Lateral Movement:

Internal scanning detection
Unusual internal connectivity patterns
Credential-based protocols to unusual destinations (RDP, SMB, SSH)
Pass-the-hash/pass-the-ticket detection

Data Exfiltration:

Large outbound data transfers
Connections to new external destinations
Encrypted channel usage patterns
Cloud storage uploads

Encrypted Traffic Challenges: With TLS 1.3 and encrypted DNS, network visibility decreases. Strategies include:

TLS inspection — Decrypt and re-encrypt (privacy/performance tradeoffs)
Encrypted Traffic Analytics — Detect anomalies from metadata (timing, packet sizes, JA3 fingerprints)
Endpoint visibility — Complement network monitoring with EDR
DNS visibility — Maintain logging at internal DNS resolvers

Network Monitoring Architecture

Endpoint Detection and Response (EDR)

EDR Capabilities

Telemetry Collection:

Process creation and termination
Command line arguments
Parent-child process relationships
File creation, modification, deletion
Registry changes
Network connections per process
Module/DLL loading
User logon events
Scheduled task/service creation

Real-Time Detection:

Behavioral analysis of running processes
Memory scanning for known malware
Script execution monitoring (PowerShell, WMI)
Exploit technique detection (injections, hooks)

Response Capabilities:

Process termination
Network isolation
File quarantine
Memory dump collection
Remote shell for investigation

EDR vs. Traditional Antivirus
Capability	Traditional AV	EDR
Detection Method	Signature-based, static analysis	Behavioral, real-time analysis
Visibility	File scanning	Process, network, registry, memory
Response	Quarantine files	Isolate host, kill process, memory dump
Investigation	Limited	Full historical timeline
Evasion Resistance	Low (modify bytes → evade)	Higher (must evade behavior detection)
Retrospective Search	No	Yes (hunt across historical telemetry)
Centralized Management	Basic	Full visibility across fleet
Cloud/On-Prem	Often on-prem	Typically cloud-managed

Key EDR Detection Categories

Process Injection Techniques:

DLL injection into running processes
Process hollowing (replacing legitimate process memory)
Thread hijacking
APC injection

Credential Access:

LSASS memory access attempts
SAM database access
Credential dumping tools (Mimikatz patterns)
Kerberos ticket manipulation

Persistence Mechanisms:

Registry run key modifications
Scheduled task creation
Service installation
Startup folder modifications
WMI event subscriptions

Defense Evasion:

AMSI bypass attempts
ETW patching
Security tool process termination
Log clearing

Living-Off-the-Land Techniques:

Suspicious PowerShell usage
WMIC/CertUtil abuse
mshta/rundll32/regsvr32 misuse
BITSAdmin/msiexec abuse

EDR Deployment Considerations

Coverage Goals:

All endpoints: workstations, laptops, servers
Include cloud workloads and containers where possible
Critical systems first, then expand

Performance Impact:

Modern EDR has minimal performance overhead
May require exclusions for high-I/O applications
Test on production-representative systems

Integration Points:

SIEM: Forward alerts for correlation with other sources
SOAR: Enable automated response playbooks
Threat Intelligence: Enrich with external context

Beyond EDR: XDR

Security Metrics and Operational Dashboards

Effective security monitoring requires not just data collection, but meaningful metrics that inform decision-making, demonstrate program effectiveness, and drive continuous improvement.

Detection Metrics

Mean Time to Detect (MTTD): Average time from attack occurrence to detection.

Industry average: 197 days (2024, improving)
Target: Hours to days for critical threats
Measures detection capability
Broken down by attack type, data source

Mean Time to Respond (MTTR): Average time from detection to containment.

Measures response capability
Include investigation, decision, and action time
Target: Hours for critical incidents

True Positive Rate: Percentage of alerts representing actual threats.

Low rates indicate noisy rules
Target: 80%+ for actionable alerts
Track per rule/detection for tuning

False Positive Rate: Percentage of alerts that are not threats.

High rates cause alert fatigue
Investigate top false positive sources
Continuous feedback loop for tuning

Detection Coverage: Percentage of ATT&CK techniques with detection rules.

Map detections to framework
Identify gaps in coverage
Prioritize based on threat intelligence

Key Security Monitoring KPIs
Metric	Target	Measurement Method	Improvement Action
MTTD (Critical)	< 24 hours	Timestamp analysis of incidents	Add detections, improve telemetry
MTTR (Critical)	< 4 hours	Incident ticket timestamps	Playbook automation, training
Alert True Positive Rate	80%	Weekly alert sampling	Tune rules, add context
ATT&CK Coverage	60%	Detection-to-technique mapping	Gap analysis, detection engineering
Log Ingestion Latency	< 5 minutes	Timestamp comparison	Infrastructure scaling, streaming
System Coverage	100% critical assets	Asset inventory vs. log sources	Agent deployment, log forwarding
Alert Closure Rate	95% within SLA	Ticket aging report	Staffing, automation

Operational Dashboards

SOC Operations Dashboard:

Alert queue depth (open alerts by priority)
Alert age distribution (how long alerts wait)
Analyst workload (cases per analyst)
Escalation rate (Tier 1 to Tier 2)
Alert sources (which systems generate alerts)

Threat Detection Dashboard:

Alerts by type/category over time
True positive trends
Top triggered rules
Geographic/network segment distribution
Correlated incident view

Compliance Dashboard:

Log collection completeness
Retention compliance by data type
Review cadence compliance (access reviews, etc.)
Audit finding status
Policy exception status

Threat Intelligence Dashboard:

IOC matches over time
Top threat actors targeting org
Intelligence source coverage
Time from intelligence to detection rule

Dashboard Design Principles

Purpose-Driven: Each dashboard serves a specific audience and use case. Don't combine executive reporting with analyst workload.

Actionable: Every metric should drive potential action. If you can't act on it, don't track it.

Trending: Point-in-time metrics without trend are limited. Show direction over time.

Drill-Down: Allow investigation from summary to detail without leaving the dashboard.

Red/Yellow/Green: Clear visual indicators of status against targets. Don't require interpretation.

Avoiding Vanity Metrics

Proactive Threat Hunting

Hunting vs. Detection

Automated Detection:

Reactive: Waits for patterns to match
Scalable: Processes all events equally
Rule-based: Limited to defined patterns
Fast: Alerts in near-real-time

Threat Hunting:

Proactive: Doesn't wait for alerts
Human-driven: Analyst intuition and creativity
Hypothesis-based: Tests theories about attacker presence
Thorough: Deep investigation of suspicious patterns

Hunting Methodologies

Intelligence-Driven: Start from threat intelligence about active threats.

Receive intelligence about threat actor TTP
Develop hypothesis: "If actor X targeted us, they would leave indicators Y"
Search telemetry for indicators and behaviors
Investigate any matches; escalate or close
Convert successful hunts into automated detections

Anomaly-Driven: Start from statistical anomalies, investigate for malicious cause.

Identify statistical outliers (rare processes, unusual access patterns)
Develop hypothesis: "These outliers may indicate compromise"
Investigate context around anomalies
Determine if benign or malicious
Either tune baseline or escalate

ATT&CK-Driven: Systematically hunt for specific techniques.

Select ATT&CK technique not covered by detections
Research how technique manifests in telemetry
Hunt for evidence of technique execution
Build detection rule if pattern found
Document even if nothing found (validates coverage)

threat-hunting-queries
Threat Hunting Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
-- Hunt #1: Credential Dumping via LSASS Access
-- ATT&CK T1003.001 - LSASS Memory
-- 
-- Hypothesis: If attackers dumped credentials, they accessed lsass.exe memory
-- 
-- Look for processes accessing lsass.exe with suspicious access rights
 
SELECT 
    hostname,
    process_name AS source_process,
    process_commandline,
    target_process_name,
    granted_access_numeric,
    CASE 
        WHEN granted_access_numeric & 0x1F0FFF = 0x1F0FFF THEN 'FULL_ACCESS (suspicious)'
        WHEN granted_access_numeric & 0x1010 = 0x1010 THEN 'PROCESS_QUERY + VM_READ (suspicious)'
        ELSE 'Other'
    END as access_type,
    event_time,
    user_name
FROM process_access_events
WHERE target_process_name = 'lsass.exe'
  AND source_process NOT IN (
    'lsass.exe', 'csrss.exe', 'services.exe', 'winlogon.exe',
    'svchost.exe', 'wininit.exe', 'MsMpEng.exe'  -- legitimate processes
  )
  AND event_time > now() - INTERVAL 7 DAY
ORDER BY event_time DESC;
 
-- Investigate any results: What is the source process?
-- Is it legitimate security software or potential mimikatz/dump tool?
 
 
-- Hunt #2: Beaconing Detection
-- ATT&CK T1071 - Application Layer Protocol
--
-- Hypothesis: C2 beacons at regular intervals; find periodic connections
 
WITH connection_intervals AS (
    SELECT 
        source_ip,
        dest_ip,
        dest_port,
        COUNT(*) as connection_count,
        STDDEV(UNIX_TIMESTAMP(event_time) - 
               LAG(UNIX_TIMESTAMP(event_time)) OVER (
                   PARTITION BY source_ip, dest_ip, dest_port 
                   ORDER BY event_time
               )) as interval_stddev,
        AVG(UNIX_TIMESTAMP(event_time) - 
            LAG(UNIX_TIMESTAMP(event_time)) OVER (
                PARTITION BY source_ip, dest_ip, dest_port 
                ORDER BY event_time
            )) as avg_interval
    FROM network_connections
    WHERE event_time > now() - INTERVAL 24 HOUR
      AND dest_ip NOT IN (SELECT ip FROM known_good_destinations)
    GROUP BY source_ip, dest_ip, dest_port
    HAVING COUNT(*) > 20  -- Need enough samples
)
SELECT *
FROM connection_intervals
WHERE interval_stddev / avg_interval < 0.1  -- Very regular intervals
  AND avg_interval BETWEEN 30 AND 3600       -- 30 sec to 1 hour
ORDER BY connection_count DESC;
 
-- Investigate: Why is this host connecting to this destination at regular intervals?
-- Check destination reputation, examine connection content if possible
 
 
-- Hunt #3: Rare Process Execution
-- Look for processes that only executed on 1-2 hosts (potentially targeted)
 
SELECT 
    process_name,
    process_hash,
    COUNT(DISTINCT hostname) as unique_hosts,
    GROUP_CONCAT(DISTINCT hostname) as hosts,
    MIN(event_time) as first_seen
FROM process_creation_events
WHERE event_time > now() - INTERVAL 7 DAY
GROUP BY process_name, process_hash
HAVING COUNT(DISTINCT hostname) <= 2
   AND COUNT(*) > 5  -- Ran multiple times on those hosts
ORDER BY unique_hosts, first_seen DESC;
 
-- Investigate: Why is this process only on these hosts?
-- Is it legitimate software specific to these roles?
-- Or potentially targeted malware?

Hunting Program Best Practices

•Document hypotheses — Write down what you're looking for and why before hunting. Enables process improvement.
•Time-box hunts — Spend defined time on each hunt (2-4 hours). Don't let hunts drag indefinitely.
•Track metrics — Hunts conducted, threats found, detections created. Prove hunting value.
•Convert to detections — Every successful hunt should become an automated detection rule.
•Share knowledge — Document techniques and queries. Build institutional knowledge.
•Use threat intelligence — Prioritize hunts based on threats targeting your industry/region.
•Ensure data quality — Hunting is only as good as available telemetry. Identify gaps.
•Collaborate with red team — Use exercises to validate hunting effectiveness.

Hunting Maturity Model

Summary: Monitoring and Logging

Key Takeaways

•Log architecture requires comprehensive collection, structured formatting, reliable transport, and integrity-protected storage
•SIEM systems aggregate, normalize, correlate, and alert on security events across the enterprise; success depends on tuning and analyst capability
•Detection methodologies combine signatures, behavioral analysis, anomaly detection, and UEBA for layered threat detection
•Network monitoring provides visibility into communications, C2 detection, lateral movement, and data exfiltration; encrypted traffic challenges require complementary approaches
•Endpoint detection (EDR) delivers deep host visibility increasingly critical as network encryption expands
•Security metrics must be outcome-focused and actionable—MTTD, MTTR, true positive rates, and detection coverage drive improvement
•Threat hunting proactively searches for threats that evade automated detection; hypothesis-driven approach finds sophisticated attackers
•Integration across domains—network, endpoint, identity, cloud—provides complete visibility for modern environments

What's Next:

Page Complete

3 / 5