Loading learning content...
At 2:47 AM, your SIEM generates an alert: a privileged administrator account just authenticated from an IP address in a country where you have no operations. What happens next—in the next minutes, hours, and days—determines whether this becomes a footnote in a weekly report or a front-page breach.
Incident Response (IR) is the disciplined process of identifying, containing, eradicating, and recovering from security incidents. It transforms detection into action, ensuring that when threats are identified, organizations respond effectively rather than chaotically.
Without structured incident response:
With mature incident response:
This page covers the complete incident response lifecycle—from preparation through lessons learned—providing the framework for effective security operations.
By the end of this page, you will understand incident classification and severity, the NIST and SANS incident response frameworks, team structures and roles, investigative techniques, containment and eradication strategies, recovery procedures, and post-incident analysis. You'll be equipped to develop, evaluate, and execute incident response programs.
Not all security events are incidents, and not all incidents require the same response. Clear classification enables appropriate resource allocation and response urgency.
Security Event: Any observable occurrence in a system or network—a login, a firewall rule trigger, an antivirus detection. Events are routine; organizations generate millions daily.
Security Incident: An event or series of events that violates security policies, acceptable use policies, or standard security practices. Incidents require investigation and response.
Examples:
| Event (Not Incident) | Incident |
|---|---|
| Failed login attempt | Successful login after 100 failed attempts |
| Antivirus blocking known malware | Malware executing before detection |
| Firewall denying external scan | Firewall rule changed without authorization |
| User visiting suspicious URL | User credentials phished and used |
Malware:
Unauthorized Access:
Denial of Service:
Insider Threat:
Data Breach:
| Severity | Definition | Examples | Response Time | Escalation |
|---|---|---|---|---|
| Critical (P1) | Active, widespread impact on critical systems or data | Active ransomware, confirmed data breach, widespread outage | Immediate (24/7) | CISO, Legal, Executive team |
| High (P2) | Potential significant impact, contained but active threat | Compromised privileged account, targeted attack in progress | < 1 hour (business hours), < 4 hours (off-hours) | Security management |
| Medium (P3) | Limited impact, threat contained or potential threat | Malware on isolated system, policy violation, attempted attack | < 4 hours (business hours) | Security team lead |
| Low (P4) | Minimal impact, informational | Failed attacks, minor policy violations, suspicious activity | Next business day | Analyst handling |
Impact Assessment:
Threat Assessment:
Initial severity assessment is based on available information. As investigation progresses, severity may increase (malware on one host was actually on fifty) or decrease (suspicious access was actually authorized maintenance). Build processes that allow severity changes and appropriate escalation when scope expands.
Two frameworks dominate incident response practice: NIST SP 800-61 and SANS Incident Handling. Both describe similar processes with slightly different phase definitions.
1. Preparation: Everything done before an incident to enable effective response:
2. Detection and Analysis: Identifying incidents and understanding them:
3. Containment, Eradication, and Recovery: Stopping the attack and returning to normal:
4. Post-Incident Activity: Learning from the incident:
SANS separates the containment/eradication/recovery phase into distinct steps:
1. Preparation: Same as NIST—readiness activities
2. Identification: Detecting and validating incidents
3. Containment: Limiting the scope of damage:
4. Eradication: Removing the threat:
5. Recovery: Returning to normal operations:
6. Lessons Learned: Improving for next time:
| Aspect | NIST 800-61 | SANS 6-Step |
|---|---|---|
| Phases | 4 | 6 |
| Detail Level | High-level guidance | More granular steps |
| Containment/Eradication/Recovery | Combined | Separate phases |
| Audience | All organizations | Security practitioners |
| Prescriptiveness | Flexible | More specific |
| Documentation | Comprehensive publication | Training-focused |
Both frameworks work well; choose based on organizational preference and existing practices.
Frameworks are starting points, not rigid prescriptions. Adapt phases to your organization's structure, risk profile, and existing processes. The value is in having a defined, practiced process—not in strict adherence to any particular model. Document your chosen approach in your incident response plan.
Effective incident response requires a defined team with clear roles. During high-stress incidents, uncertainty about who does what creates delays and dropped balls.
Incident Commander (IC):
Technical Lead:
Analysts/Investigators:
Communications Lead:
Legal Representative:
HR Representative (if insider-related):
| Stakeholder | When Involved | Role During Incident |
|---|---|---|
| Executive Leadership | Critical (P1) incidents | Strategic decisions, external communication approval |
| Legal/General Counsel | Data breaches, potential litigation | Legal guidance, notification decisions |
| Public Relations | Public-facing incidents | External communication, media handling |
| Human Resources | Insider threats, employee involvement | Employee relations, disciplinary coordination |
| Business Unit Leaders | Business process impact | Business continuity decisions |
| IT Operations | System containment/recovery | System access, restoration execution |
| Third-Party Vendors | Vendor products involved | Technical support, forensics assistance |
| Law Enforcement | Criminal activity, regulatory requirement | Investigation coordination |
| Cyber Insurance | Significant incidents | Coverage, breach response services |
Dedicated IR Team:
Virtual/Hybrid Team:
Outsourced/Retainer:
On-Call Rotation:
Escalation Matrix:
| Severity | Initial Responder | Escalation (30 min no progress) | Executive Notification |
|---|---|---|---|
| Critical | IR Lead + On-call | CISO immediately | CEO, Legal within 1 hour |
| High | On-call analyst | IR Lead | Security management |
| Medium | Scheduled analyst | On-call if complex | Weekly report |
| Low | Scheduled analyst | Lead if needed | Monthly report |
Internal Communication:
External Communication:
An IR team that has never practiced together will struggle during a real incident. Regular tabletop exercises, simulated incidents, and role-playing critical scenarios build muscle memory. During an actual crisis, there's no time to figure out how communication channels work or who has authority for what decision.
Investigation seeks to understand what happened, how it happened, what was impacted, and whether the threat is contained. This requires systematic evidence collection and analysis.
Order of Volatility: Collect evidence in order of how quickly it may be lost:
Chain of Custody:
Evidence Preservation:
Memory analysis reveals running processes, network connections, and malware that may not persist on disk.
What Memory Reveals:
Memory Acquisition:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
# Windows Forensic Triage - Quick Assessment Commands# Run these to gather initial evidence before imaging # ===== SYSTEM INFORMATION =====# Basic system infosysteminfo | Out-File -FilePath C:\IR\systeminfo.txt # Current user and privilegeswhoami /all | Out-File -FilePath C:\IR\whoami.txt # ===== NETWORK INFORMATION =====# Current network connectionsGet-NetTCPConnection | Where-Object {$_.State -eq "Established"} | Select-Object LocalAddress, LocalPort, RemoteAddress, RemotePort, OwningProcess, @{n='ProcessName';e={(Get-Process -Id $_.OwningProcess).ProcessName}} | Export-Csv -Path C:\IRetwork_connections.csv # DNS cache (recently resolved domains)Get-DnsClientCache | Export-Csv -Path C:\IR\dns_cache.csv # ARP cache (local network discovery)Get-NetNeighbor | Export-Csv -Path C:\IR\arp_cache.csv # ===== PROCESS INFORMATION =====# Running processes with command linesGet-WmiObject Win32_Process | Select-Object ProcessId, Name, CommandLine, ParentProcessId, CreationDate, ExecutablePath | Export-Csv -Path C:\IR\processes.csv # Services (persistence mechanism)Get-Service | Select-Object Name, DisplayName, Status, StartType | Export-Csv -Path C:\IR\services.csv # ===== PERSISTENCE MECHANISMS =====# Scheduled tasksGet-ScheduledTask | Where-Object {$_.State -ne "Disabled"} | Export-Csv -Path C:\IR\scheduled_tasks.csv # Startup items (Run keys)$runKeys = @( "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run", "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce", "HKCU:\SOFTWARE\Microsoft\Windows\CurrentVersion\Run", "HKCU:\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce")foreach ($key in $runKeys) { if (Test-Path $key) { Get-ItemProperty $key | Out-File -Append -FilePath C:\IR\run_keys.txt }} # ===== USER ACTIVITY =====# Recent PowerShell history (if enabled)Get-Content (Get-PSReadLineOption).HistorySavePath -ErrorAction SilentlyContinue | Out-File -FilePath C:\IR\powershell_history.txt # Recently accessed files$shell = New-Object -ComObject Shell.Application$shell.NameSpace('shell:::{22877A6D-37A1-461A-91B0-DBDA5AAEBC99}').Items() | Select-Object Name, Path | Export-Csv -Path C:\IR\recent_files.csv # ===== SECURITY EVENTS =====# Failed logins (last 24 hours)Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4625; StartTime=(Get-Date).AddDays(-1)} -MaxEvents 100 | Export-Csv -Path C:\IR\failed_logins.csv # Successful logins (last 24 hours)Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4624; StartTime=(Get-Date).AddDays(-1)} -MaxEvents 100 | Export-Csv -Path C:\IR\successful_logins.csv Write-Host "Triage collection complete. Review C:\IR\ for results."Timeline construction sequences events from multiple sources to understand attack progression.
Timeline Data Sources:
Timeline Creation:
Tools:
Extract indicators of compromise (IOCs) for detection and intelligence:
Network Indicators:
Host Indicators:
Behavioral Indicators:
Forensic analysis on live systems risks contaminating evidence and alerting attackers. After initial triage, work from forensic images and memory dumps. If extended live analysis is needed, preserve volatile evidence first and document any changes your investigation causes.
Containment stops the bleeding—preventing further damage while investigation continues. Eradication removes the threat completely. Both require careful execution to avoid alerting attackers or causing additional damage.
Stop Active Damage:
Preserve Evidence:
Maintain Operations:
Network Isolation:
Complete Isolation:
Segmentation:
VLAN Quarantine:
Credential Actions:
Endpoint Actions:
| Factor | Aggressive Containment | Measured Containment |
|---|---|---|
| Damage Active | Yes - ongoing encryption/exfiltration | No - historical compromise |
| Scope Known | Unknown - assume worst case | Well-understood, limited |
| Evidence Preserved | May sacrifice some evidence | Evidence collection priority |
| Attacker Sophistication | Unsophisticated - won't notice | Sophisticated - may react |
| Business Impact | Acceptable - safety first | Must minimize disruption |
| Recovery Ready | Have known-good backups | Need affected systems for recovery |
Eradication removes all attacker access and artifacts from the environment.
Eradication Actions:
Eradication Challenges:
Incomplete Scope:
Tipping Off Attacker:
Missed Persistence:
When to Rebuild (Scorched Earth):
When to Clean (Surgical):
Recommendation: For critical systems and advanced threats, always prefer rebuild from known-good state.
For sophisticated adversaries, piece-by-piece containment allows them to notice and respond. Instead, plan containment actions for all systems, then execute simultaneously: 'At 2:00 AM, we will isolate these 47 systems, reset these 200 accounts, and block these 15 external IPs—all at once.' This requires preparation but prevents adversary adaptation.
Recovery returns the organization to normal operations with confidence that the threat is eliminated. Rushing recovery invites re-compromise.
Phase 1: Validation
Before returning systems to production:
Phase 2: Restoration
Restore systems to production:
From Backup:
Rebuild:
Phase 3: Monitoring
Phase 4: Return to Normal
Prioritization:
Interim Operations:
Communication:
Ransom Payment Decision:
Recovery Without Payment:
If Paying (as last resort):
Recovery is the most dangerous phase for re-compromise. Attackers often leave backdoors triggered by recovery. If you restore from a backup that contains their persistence mechanism, they're right back in. If you bring systems online before patching, they exploit the same vulnerability. Validate every recovery action against the possibility of re-entry.
The incident isn't truly over until lessons are extracted and improvements implemented. This phase is often skipped under pressure to move on, but it's where lasting security improvement happens.
When: Within 1-2 weeks of incident closure (memories still fresh)
Who: All incident participants, relevant stakeholders
Goals:
Agenda Structure:
Five Whys Technique:
Problem: Ransomware encrypted servers
Root causes: Detection gaps for novel threats, user awareness, network segmentation, credential hygiene
Contributing Factors:
Most incidents have multiple contributing factors:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
# INCIDENT REPORT **Incident ID:** INC-2025-0042**Classification:** Malware - Ransomware**Severity:** Critical (P1)**Status:** Closed ## Executive Summary On January 15, 2025, ransomware was executed on 47 systems following a phishing attack. The incident was contained within 4 hours and systems were recovered from backup within 72 hours. No data exfiltration was confirmed. This report details the incident, response actions, and improvement recommendations. ## Timeline | Date/Time (UTC) | Event ||-----------------|-------|| Jan 15, 08:14 | Phishing email received by user jsmith || Jan 15, 08:22 | User opens attachment, malware executes || Jan 15, 08:23 | Malware establishes C2, downloads ransomware || Jan 15, 08:25-09:15 | Lateral movement via compromised credentials || Jan 15, 09:15 | Ransomware begins encryption || Jan 15, 09:18 | EDR alert: Suspicious file encryption activity || Jan 15, 09:22 | SOC analyst validates alert, escalates to IR || Jan 15, 09:30 | Incident Commander activated, war room opened || Jan 15, 09:45 | Network isolation of affected segments || Jan 15, 10:30 | Scope assessment: 47 systems encrypted || Jan 15, 13:00 | Containment verified, eradication begins || Jan 16-18 | System restoration from backup || Jan 18, 14:00 | All systems returned to production || Jan 18, 16:00 | Incident closed, monitoring period begins | ## Attack Path 1. **Initial Access:** Phishing email with malicious Word document2. **Execution:** User enabled macros, PowerShell payload executed3. **Persistence:** Scheduled task created for backup C24. **Credential Access:** Mimikatz used to dump credentials5. **Lateral Movement:** PsExec used with domain admin credentials6. **Impact:** Files encrypted with Conti ransomware variant ## Impact Assessment - **Systems Affected:** 47 workstations and servers- **Data Impact:** Files encrypted; no confirmed exfiltration- **Business Impact:** 72-hour disruption to finance department- **Financial Impact:** Estimated $150,000 (recovery costs, lost productivity) ## Response Actions Taken - Network isolation of affected VLANs- All domain credentials reset- Affected systems reimaged from golden images- Data restored from backup (RPO: 4 hours before incident)- Vulnerability patched across environment- Phishing indicators blocked at email gateway ## Root Cause Analysis **Primary Cause:** Malicious macro executed in phishing email **Contributing Factors:**1. Email filtering did not block novel phishing variant2. User security training was 8 months old3. Macro execution was not disabled by policy4. Flat network allowed lateral movement5. Domain admin credentials stored in LSASS ## Lessons Learned: What Worked Well - EDR detected ransomware activity within 3 minutes- SOC escalation was rapid and appropriate- IR team assembled quickly, communication was clear- Backups were intact and recovery was successful- Stakeholder communication was timely ## Lessons Learned: Areas for Improvement - Phishing detection should be enhanced- User training frequency should increase- Macros should be disabled or restricted- Network segmentation needs improvement- Privileged access should use PAM solution ## Action Items | # | Action | Owner | Due Date | Status ||---|--------|-------|----------|--------|| 1 | Implement macro blocking via GPO | IT Ops | Feb 15 | Open || 2 | Deploy advanced email filtering | SecOps | Mar 1 | Open || 3 | Quarterly phishing simulations | Training | Ongoing | Open || 4 | Network segmentation project | Network | Q2 2025 | Planned || 5 | PAM solution deployment | IAM Team | Q2 2025 | Planned || 6 | Add EDR detection for Mimikatz | SecOps | Jan 31 | Complete | ## Appendices - A: Detailed IOCs (hashes, IPs, domains)- B: Affected system list- C: Evidence preservation log- D: External communication recordsDetection Improvements:
Prevention Improvements:
Process Improvements:
Training Improvements:
Track incident metrics over time:
The goal of post-incident analysis is improvement, not punishment. Blaming individuals discourages reporting and honest analysis. Focus on process and controls rather than personnel failures. 'User clicked phishing email' isn't root cause—'controls didn't prevent or rapidly contain compromise from user action' is actionable. Create psychological safety for honest assessment.
Incident response transforms security detection into security action. It provides the structured processes that ensure organizations respond effectively to threats rather than chaotically, ensuring damage is minimized and recovery is swift.
What's Next:
With incident response procedures established, we'll conclude this module with Security Best Practices—the consolidated wisdom of security operations that ties together defense in depth, policy, monitoring, and incident response into a coherent operational security program.
You now understand how to structure and execute incident response from detection through lessons learned. The true measure of a security program isn't whether incidents occur—they will—but how effectively the organization responds. Preparation, clear processes, practiced teams, and commitment to improvement transform inevitable incidents into manageable events rather than organizational crises.