Loading learning content...
In the famous words of security researcher Richard Bejtlich: "Prevention eventually fails." This isn't pessimism—it's operational reality. No matter how robust your defenses, sophisticated attackers, insider threats, and unforeseen vulnerabilities will eventually allow malicious activity to occur. When that happens, the difference between a minor incident and a catastrophic breach often comes down to one factor: detection time.
The 2024 IBM Cost of a Data Breach Report found that breaches identified within 200 days cost an average of $3.93 million, while those taking longer cost $4.95 million—a 26% increase. More critically, many organizations discover breaches only when external parties notify them, sometimes months or years after compromise.
Monitoring and logging are the foundation of security detection. They provide the visibility necessary to:
This page explores logging architectures, monitoring technologies, detection methodologies, and the operational practices that transform raw data into actionable security intelligence.
By the end of this page, you will understand log architecture design, SIEM implementation and optimization, threat detection approaches (signature-based, behavioral, and machine learning), security metrics and dashboards, and the operational processes that make monitoring effective. You'll be equipped to design and evaluate monitoring programs that provide genuine security visibility.
Effective security monitoring begins with comprehensive, reliable logging. Log architecture addresses what to log, how to collect logs, where to store them, and how to ensure their integrity.
Security-relevant events span multiple domains:
Authentication Events:
Authorization Events:
System Events:
Network Events:
Application Events:
Physical Security Events:
| Field | Purpose | Example |
|---|---|---|
| Timestamp | When the event occurred (UTC) | 2025-01-17T14:23:47.123Z |
| Source | System generating the log | webserver-prod-01.example.com |
| Event Type | Category of event | AUTH_FAILURE |
| Severity | Importance level | WARNING |
| Actor/User | Who initiated the action | jsmith@example.com |
| Target/Object | What was affected | /etc/passwd |
| Action | What operation was attempted | READ |
| Outcome | Success or failure | DENIED |
| Source IP | Originating network address | 192.168.1.100 |
| Destination IP | Target network address | 10.0.0.50 |
| Session ID | Link events to sessions | a1b2c3d4-e5f6-7890 |
| Additional Context | Event-specific details | {attempts: 5, lockout: true} |
Push vs. Pull Collection:
Push (Agent-based):
Pull (Agentless):
Collection Technologies:
Retention Requirements:
Integrity Protection:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
# /etc/rsyslog.conf - Enterprise syslog forwarding configuration # =========================================# Module Loading# =========================================module(load="imuxsock") # Local system loggingmodule(load="imjournal") # systemd journalmodule(load="imfile") # File-based input for app logs # =========================================# Global Settings# =========================================global( # Use RFC 5424 format for consistent parsing parser.permitSlashDefaultsForTemplate="on" # Queue settings for reliability workDirectory="/var/lib/rsyslog") # =========================================# Log Format Template - Structured Logging# =========================================template(name="JSONFormat" type="list") { constant(value="{") constant(value=""timestamp":"") property(name="timereported" dateFormat="rfc3339") constant(value="","hostname":"") property(name="hostname") constant(value="","severity":"") property(name="syslogseverity-text") constant(value="","facility":"") property(name="syslogfacility-text") constant(value="","program":"") property(name="programname") constant(value="","message":"") property(name="msg" format="json") constant(value=""}") constant(value="")} # =========================================# Forwarding Configuration with TLS# =========================================# Primary SIEMaction( type="omfwd" target="siem.example.com" port="6514" protocol="tcp" # TLS encryption for log confidentiality StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/certvalid" StreamDriverPermittedPeers="*.example.com" # Reliable delivery with disk queue queue.type="LinkedList" queue.size="10000" queue.filename="siem_queue" queue.saveOnShutdown="on" queue.maxDiskSpace="1g" action.resumeRetryCount="-1" template="JSONFormat") # =========================================# Application-Specific Log Collection# =========================================# Monitor application logs in JSON formatinput(type="imfile" File="/var/log/app/*.json" Tag="app-log:" Facility="local0" Severity="info" addMetadata="on") # =========================================# Local Retention (for forensics if SIEM fails)# =========================================# Keep 90 days locally with log rotation*.* action( type="omfile" file="/var/log/security/all.log" template="JSONFormat" fileCreateMode="0600" fileOwner="root")Modern environments generate enormous log volumes—terabytes per day for large enterprises. Without careful architecture (filtering, aggregation, tiered storage), costs explode while signal drowns in noise. Define which logs must be collected in full fidelity versus which can be sampled or summarized. Storage costs should drive thoughtful logging, not fear-driven collection of everything.
SIEM systems are the central nervous system of security monitoring, aggregating logs from across the environment, normalizing data into common formats, and enabling correlation, alerting, and analysis.
Log Aggregation:
Normalization:
Correlation:
Alerting:
Search and Investigation:
Reporting and Compliance:
Traditional SIEM Products:
Cloud-Native SIEM:
Open Source Options:
Alert Fatigue: The number one SIEM failure mode. Too many alerts, too many false positives, analysts stop investigating.
Solutions:
Data Overload: Collecting everything is expensive and makes analysis harder.
Solutions:
Skill Requirements: SIEM effectiveness depends on skilled analysts and rule developers.
Solutions:
12345678910111213141516171819202122232425262728
| Detection Rule: Brute Force Attack Detection || Detects multiple failed logins followed by success from same source | index=authentication sourcetype=windows:security| bin _time span=10m| stats count(eval(EventCode=4625)) as failed_logins count(eval(EventCode=4624)) as successful_logins earliest(_time) as first_event latest(_time) as last_event values(TargetUserName) as users by src_ip, _time| where failed_logins >= 5 AND successful_logins >= 1| eval attack_type="Brute Force - Successful", severity=case( failed_logins >= 20, "Critical", failed_logins >= 10, "High", true(), "Medium" ), description="Multiple failed logins (".failed_logins.") followed by success from ".src_ip| table _time, src_ip, users, failed_logins, successful_logins, severity, description | This rule identifies brute force attacks that succeeded by:| 1. Binning events into 10-minute windows| 2. Counting failed logins (4625) and successful logins (4624) per source IP| 3. Alerting when pattern shows 5+ failures followed by success| 4. Calculating severity based on attack intensitySuccessful SIEM programs invest more in people and process than technology. A mediocre SIEM platform with skilled analysts and well-tuned rules outperforms an advanced platform with default configuration and overwhelmed operators. Budget for ongoing care and feeding, not just initial deployment.
Detecting malicious activity requires multiple complementary approaches. No single detection methodology catches all threats, and effective security operations combine several techniques.
How it Works: Matches observed activity against known patterns of malicious behavior—specific byte sequences, command strings, network packets, or file hashes.
Examples:
Strengths:
Weaknesses:
How it Works: Establishes baselines of "normal" behavior, then alerts on deviations from those baselines.
Examples:
Strengths:
Weaknesses:
| Aspect | Signature-Based | Anomaly-Based | Behavioral Analysis |
|---|---|---|---|
| Known Threats | Excellent | Poor (may not distinguish) | Good (attack patterns) |
| Unknown Threats | Poor | Good (deviation from normal) | Good (malicious behaviors) |
| False Positives | Low | High | Medium |
| Evasion Difficulty | Easy (modify signature) | Medium (gradual change) | Hard (change behavior) |
| Setup Complexity | Low (install and update) | High (baseline training) | Medium (define behaviors) |
| Maintenance | Signature updates | Baseline recalibration | Behavior model updates |
| Explainability | High (matched rule X) | Low (deviation from norm) | Medium (observed behavior Y) |
How it Works: Defines behaviors characteristic of attacks (TTPs - Tactics, Techniques, Procedures) and detects when those behaviors occur, regardless of specific implementation.
Examples:
MITRE ATT&CK Framework: The ATT&CK framework catalogs adversary behaviors across attack lifecycle phases:
Mapping detections to ATT&CK helps identify coverage gaps.
How it Works: Machine learning models profile normal behavior of users and entities, detecting anomalies that may indicate compromise or insider threats.
Capabilities:
Use Cases:
Organizations increasingly treat detection engineering as a software engineering discipline—version-controlled detection rules, testing with red team data, CI/CD pipelines for rule deployment, and metrics on detection efficacy. This approach improves detection quality and enables continuous improvement based on missed attacks and false positive feedback.
Network monitoring provides visibility into communications between systems, enabling detection of command and control, lateral movement, data exfiltration, and policy violations.
Full Packet Capture (PCAP):
Flow Data (NetFlow/IPFIX):
DNS Query Logs:
Proxy/Web Gateway Logs:
Email Gateway Logs:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# Zeek (formerly Bro) is the gold standard for network security monitoring# It produces rich log data about network activity # zeek.log outputs include:## conn.log - Connection summaries (flow-like data)# dns.log - DNS queries and responses# http.log - HTTP transactions# ssl.log - TLS/SSL handshake details# files.log - Files transferred over network# notice.log - Zeek-generated alerts# weird.log - Protocol anomalies # Example conn.log fields:# ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,# duration,orig_bytes,resp_bytes,conn_state,local_orig,local_resp,# missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes # Custom Zeek script for DNS tunneling detection@load base/protocols/dns module DNS_Tunnel; export { redef enum Notice::Type += { DNS_Tunnel_Detected }; # Threshold: Queries with long subdomains const min_suspicious_length = 50 &redef; # Track domains per host global host_query_counts: table[addr] of count &default=0;} event dns_request(c: connection, msg: dns_msg, query: string, qtype: count){ # Check for long DNS queries (potential tunneling) if ( |query| > min_suspicious_length ) { NOTICE([ $note=DNS_Tunnel_Detected, $msg=fmt("Possible DNS tunneling: %s queried %s (len:%d)", c$id$orig_h, query, |query|), $conn=c, $identifier=cat(c$id$orig_h,query) ]); } # Track high volume per host host_query_counts[c$id$orig_h] += 1; if ( host_query_counts[c$id$orig_h] > 1000 ) { NOTICE([ $note=DNS_Tunnel_Detected, $msg=fmt("High DNS query volume from %s: %d queries", c$id$orig_h, host_query_counts[c$id$orig_h]), $conn=c ]); }}NDR platforms combine multiple network data sources with advanced analytics:
Capabilities:
Key Detection Categories:
Command and Control (C2):
Lateral Movement:
Data Exfiltration:
Encrypted Traffic Challenges: With TLS 1.3 and encrypted DNS, network visibility decreases. Strategies include:
Strategic sensor placement is critical. Deploy network sensors at trust boundaries (perimeter, between segments, data center entry), east-west traffic paths (between servers), and egress points. Mirror ports (SPAN) or network taps provide traffic copies without inline latency. Cloud environments require virtual taps or VPC flow logs.
Endpoint Detection and Response provides deep visibility into host-level activity—processes, file operations, network connections, registry changes, and more. As network encryption increases, endpoint visibility becomes increasingly critical.
Telemetry Collection:
Real-Time Detection:
Response Capabilities:
| Capability | Traditional AV | EDR |
|---|---|---|
| Detection Method | Signature-based, static analysis | Behavioral, real-time analysis |
| Visibility | File scanning | Process, network, registry, memory |
| Response | Quarantine files | Isolate host, kill process, memory dump |
| Investigation | Limited | Full historical timeline |
| Evasion Resistance | Low (modify bytes → evade) | Higher (must evade behavior detection) |
| Retrospective Search | No | Yes (hunt across historical telemetry) |
| Centralized Management | Basic | Full visibility across fleet |
| Cloud/On-Prem | Often on-prem | Typically cloud-managed |
Process Injection Techniques:
Credential Access:
Persistence Mechanisms:
Defense Evasion:
Living-Off-the-Land Techniques:
Coverage Goals:
Performance Impact:
Integration Points:
Extended Detection and Response (XDR) integrates endpoint, network, email, identity, and cloud telemetry into a unified platform. Rather than correlating separate tools in a SIEM, XDR platforms provide native cross-domain visibility and correlation, reducing integration complexity and enabling faster detection and response.
Effective security monitoring requires not just data collection, but meaningful metrics that inform decision-making, demonstrate program effectiveness, and drive continuous improvement.
Mean Time to Detect (MTTD): Average time from attack occurrence to detection.
Mean Time to Respond (MTTR): Average time from detection to containment.
True Positive Rate: Percentage of alerts representing actual threats.
False Positive Rate: Percentage of alerts that are not threats.
Detection Coverage: Percentage of ATT&CK techniques with detection rules.
| Metric | Target | Measurement Method | Improvement Action |
|---|---|---|---|
| MTTD (Critical) | < 24 hours | Timestamp analysis of incidents | Add detections, improve telemetry |
| MTTR (Critical) | < 4 hours | Incident ticket timestamps | Playbook automation, training |
| Alert True Positive Rate | 80% | Weekly alert sampling | Tune rules, add context |
| ATT&CK Coverage | 60% | Detection-to-technique mapping | Gap analysis, detection engineering |
| Log Ingestion Latency | < 5 minutes | Timestamp comparison | Infrastructure scaling, streaming |
| System Coverage | 100% critical assets | Asset inventory vs. log sources | Agent deployment, log forwarding |
| Alert Closure Rate | 95% within SLA | Ticket aging report | Staffing, automation |
SOC Operations Dashboard:
Threat Detection Dashboard:
Compliance Dashboard:
Threat Intelligence Dashboard:
Purpose-Driven: Each dashboard serves a specific audience and use case. Don't combine executive reporting with analyst workload.
Actionable: Every metric should drive potential action. If you can't act on it, don't track it.
Trending: Point-in-time metrics without trend are limited. Show direction over time.
Drill-Down: Allow investigation from summary to detail without leaving the dashboard.
Red/Yellow/Green: Clear visual indicators of status against targets. Don't require interpretation.
Beware metrics that look good but don't reflect security posture. "Blocked 1 million attacks" means nothing without context (were they targeted or spray-and-pray?). "100% uptime" for security tools doesn't mean they're detecting threats. Focus on outcomes: threats detected, incidents contained, business impact prevented. Question whether each metric would change your decisions if it moved.
Threat hunting is the proactive, human-driven search for threats that evade automated detection. While SIEM alerts react to known patterns, hunting assumes sophisticated threats are already present and seeks them actively.
Automated Detection:
Threat Hunting:
Intelligence-Driven: Start from threat intelligence about active threats.
Anomaly-Driven: Start from statistical anomalies, investigate for malicious cause.
ATT&CK-Driven: Systematically hunt for specific techniques.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
-- Hunt #1: Credential Dumping via LSASS Access-- ATT&CK T1003.001 - LSASS Memory-- -- Hypothesis: If attackers dumped credentials, they accessed lsass.exe memory-- -- Look for processes accessing lsass.exe with suspicious access rights SELECT hostname, process_name AS source_process, process_commandline, target_process_name, granted_access_numeric, CASE WHEN granted_access_numeric & 0x1F0FFF = 0x1F0FFF THEN 'FULL_ACCESS (suspicious)' WHEN granted_access_numeric & 0x1010 = 0x1010 THEN 'PROCESS_QUERY + VM_READ (suspicious)' ELSE 'Other' END as access_type, event_time, user_nameFROM process_access_eventsWHERE target_process_name = 'lsass.exe' AND source_process NOT IN ( 'lsass.exe', 'csrss.exe', 'services.exe', 'winlogon.exe', 'svchost.exe', 'wininit.exe', 'MsMpEng.exe' -- legitimate processes ) AND event_time > now() - INTERVAL 7 DAYORDER BY event_time DESC; -- Investigate any results: What is the source process?-- Is it legitimate security software or potential mimikatz/dump tool? -- Hunt #2: Beaconing Detection-- ATT&CK T1071 - Application Layer Protocol---- Hypothesis: C2 beacons at regular intervals; find periodic connections WITH connection_intervals AS ( SELECT source_ip, dest_ip, dest_port, COUNT(*) as connection_count, STDDEV(UNIX_TIMESTAMP(event_time) - LAG(UNIX_TIMESTAMP(event_time)) OVER ( PARTITION BY source_ip, dest_ip, dest_port ORDER BY event_time )) as interval_stddev, AVG(UNIX_TIMESTAMP(event_time) - LAG(UNIX_TIMESTAMP(event_time)) OVER ( PARTITION BY source_ip, dest_ip, dest_port ORDER BY event_time )) as avg_interval FROM network_connections WHERE event_time > now() - INTERVAL 24 HOUR AND dest_ip NOT IN (SELECT ip FROM known_good_destinations) GROUP BY source_ip, dest_ip, dest_port HAVING COUNT(*) > 20 -- Need enough samples)SELECT *FROM connection_intervalsWHERE interval_stddev / avg_interval < 0.1 -- Very regular intervals AND avg_interval BETWEEN 30 AND 3600 -- 30 sec to 1 hourORDER BY connection_count DESC; -- Investigate: Why is this host connecting to this destination at regular intervals?-- Check destination reputation, examine connection content if possible -- Hunt #3: Rare Process Execution-- Look for processes that only executed on 1-2 hosts (potentially targeted) SELECT process_name, process_hash, COUNT(DISTINCT hostname) as unique_hosts, GROUP_CONCAT(DISTINCT hostname) as hosts, MIN(event_time) as first_seenFROM process_creation_eventsWHERE event_time > now() - INTERVAL 7 DAYGROUP BY process_name, process_hashHAVING COUNT(DISTINCT hostname) <= 2 AND COUNT(*) > 5 -- Ran multiple times on those hostsORDER BY unique_hosts, first_seen DESC; -- Investigate: Why is this process only on these hosts?-- Is it legitimate software specific to these roles?-- Or potentially targeted malware?Organizations progress through hunting maturity levels: HM0 (no hunting), HM1 (reactive, indicator-based), HM2 (procedural, following playbooks), HM3 (hypothesis-driven, developing new techniques), HM4 (innovative, creating new hunting methodologies). Most organizations should aim for HM2-HM3. HM4 requires dedicated research capability.
Monitoring and logging provide the visibility foundation that makes security detection and response possible. Without comprehensive, well-architected observability, organizations operate blind to threats that prevention eventually fails to stop.
What's Next:
With visibility established through monitoring and logging, we'll next explore Incident Response—the structured processes and procedures for responding when threats are detected. Detection without response capability merely documents attacks without stopping them.
You now understand how to architect logging systems, implement SIEM platforms, apply multiple detection methodologies, and conduct proactive threat hunting. This visibility foundation enables the detection of attacks that prevention fails to stop—the difference between a brief incident and a months-long compromise. Every defense strategy depends on the ability to see what's happening in your environment.