System Design (HLD)Design Validation

Design Validation

LevelAdvanced

Duration90 mins

TopicDesign Validation

5 / 5

Design Summary

Telling the Story of Your Design

A system design exists in your mind and in scattered documents—component diagrams, API specifications, capacity models, failure analyses. But until you can synthesize this information into a coherent narrative, it remains fragmented knowledge rather than actionable architecture.

The design summary is the final deliverable of the design phase. It's not a formality or a documentation exercise—it's the artifact that:

Aligns stakeholders on what will be built and why
Guides implementers on how to build it correctly
Preserves decisions so future maintainers understand the rationale
Communicates trade-offs so leadership can validate alignment with business goals

Principal engineers treat the design summary as the most important document they produce. It's the contract between design and implementation—the bridge between intent and reality.

What You Will Master

By the end of this page, you will understand how to synthesize a validated system design into a compelling summary. You'll learn to structure design documents effectively, communicate trade-offs clearly, present to different audiences, and create documentation that serves as a living artifact guiding implementation and maintenance.

The Anatomy of a Design Summary

A complete design summary follows a predictable structure that answers the fundamental questions every reader asks. This structure isn't arbitrary—it reflects how engineers and stakeholders need to consume design information.

The Core Sections

Design Summary Structure
Section	Purpose	Primary Audience	Key Questions Answered
Executive Summary	High-level overview of the design	Leadership, cross-team stakeholders	What are we building? Why? What's the impact?
Problem Statement	Defines what problem is being solved	All readers	What exactly are we trying to solve?
Requirements	Functional and non-functional requirements	Product, QA, implementers	What must the system do? How well?
Architecture Overview	High-level system structure	Engineers, architects	What are the major components? How do they interact?
Detailed Design	Component specifics, data models, APIs	Implementers	How exactly should each piece be built?
Trade-off Analysis	Decisions made and alternatives considered	Senior engineers, architects	Why this approach? What did we give up?
Validation Results	How the design was verified	QA, senior engineers	How do we know this design is sound?
Risks and Mitigations	Known limitations and how to address them	Leadership, operations	What could go wrong? What are we doing about it?
Implementation Plan	How the design will be realized	Project managers, implementers	In what order? What are the dependencies?
Appendix	Supporting details, calculations, references	Deep-divers, future maintainers	Where are the details?

Layered Information Density

A good design summary is designed to be read at multiple levels:

5-minute read: Executive summary gives the essence
30-minute read: Architecture overview and trade-offs for understanding the approach
2-hour read: Complete document for implementers
Deep dive: Appendix and linked documents for specific questions

Each layer should be self-contained—a reader shouldn't need to read the detailed design to understand the architecture overview.

The One-Page Test

Can you explain your design on one page? If not, you haven't finished synthesizing it. The one-page version forces you to identify what's truly essential. Everything else is detail supporting that essence. Principal engineers often create a one-pager first, then expand—never the reverse.

Crafting the Executive Summary

The executive summary is the most read and least understood section of any design document. Most engineers write executive summaries that are too technical, too long, or that bury the key points in unnecessary context.

The Executive Summary Formula

A compelling executive summary follows this structure:

The Problem (1-2 sentences): What pain are we addressing?
The Solution (2-3 sentences): What are we building?
The Approach (2-3 sentences): How does it work at the highest level?
The Impact (1-2 sentences): What business value does this deliver?
Key Risks (1-2 sentences): What are the major unknowns or challenges?

Executive Summary Example: Payment Processing System RedesignA well-structured executive summary

Input

Technical design for modernizing payment infrastructure

Output

**Problem**: Our current payment system cannot scale beyond 5,000 transactions per second and has a 99.5% availability (43+ hours annual downtime), causing revenue loss and customer frustration during peak sales events.

**Solution**: We propose a redesigned payment processing system built on event sourcing with multi-region active-active deployment, designed for 50,000 TPS capacity and 99.99% availability.

**Approach**: The new system replaces the monolithic payment processor with a microservices architecture using Kafka for event streaming, Cassandra for transaction storage, and Kubernetes for orchestration. Payments are processed asynchronously with real-time status updates to customers.

**Impact**: Eliminates payment failures during peak events, reduces checkout latency by 60%, and improves availability from 99.5% to 99.99%—preventing an estimated $4.2M in annual revenue loss.

**Key Risks**: Migration complexity requires a 3-month parallel operation period. Team needs training on event sourcing patterns. New external payment provider integration in progress.

Explanation

Notice: No implementation details. No specific technologies beyond necessary context. Focus on business impact. Executives reading this understand what problem is solved, roughly how, and what they get.

Common Executive Summary Failures

Avoid: Starting with background/history (readers don't need context they already have). Avoid: Technical jargon ("we'll use Kafka with exactly-once semantics"). Avoid: Listing features instead of outcomes ("supports webhooks" vs "enables real-time partner integrations"). Avoid: More than one page (if it's longer, it's not executive).

Communicating Architecture

Architecture diagrams are where most design documents fail. Engineers create diagrams that are either too complex (showing every detail) or too simple (showing nothing useful). Effective architecture communication uses layered diagrams that progressively reveal detail.

The C4 Model for Architecture Diagrams

The C4 model provides a proven framework for layered architecture documentation:

C4 Diagram Levels
Level	Name	Shows	Audience	When to Include
1	Context	System and its external interactions	Everyone, including non-technical	Always
2	Container	High-level technology choices (apps, DBs, queues)	Technical stakeholders	Always
3	Component	Components within each container	Development team	For complex containers
4	Code	Classes, modules, implementation details	Individual developers	Rarely in design docs

Architecture Diagram Best Practices

Diagram Anti-patterns

•Too many boxes — More than 7±2 elements overwhelms
•Unlabeled arrows — What data? What protocol?
•Inconsistent abstraction — Mixing high and low level
•Missing context — What's external? What's owned?
•Static captures — Out of date immediately

Diagram Best Practices

•One level per diagram — Don't mix abstraction levels
•Label all connections — Protocol, data type, direction
•Show flow, not just structure — Numbered request flow
•Highlight key elements — Color-code by ownership/risk
•Generate from code — Living diagrams stay accurate

Data Flow Diagrams

Beyond structural diagrams, data flow diagrams are essential for understanding how information moves through the system. For each major capability, show:

Entry point: Where does data enter the system?
Transformations: How is data processed at each step?
Storage: Where is data persisted?
Exit point: Where does data leave the system?
Error paths: What happens when processing fails?

Converting Mermaid diagram...

The Diagram as Communication Tool

A diagram succeeds when a new team member can look at it and understand the system's structure without additional explanation. Test your diagrams by showing them to someone unfamiliar with the project and asking them to explain what they see.

Documenting Trade-offs

Every design involves trade-offs. Documenting these trade-offs serves two purposes: it demonstrates that alternatives were considered (building confidence in the design), and it preserves the rationale for future maintainers who might question past decisions.

The Architecture Decision Record (ADR)

ADRs are a standardized format for documenting significant design decisions:

adr-template.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# ADR-007: Event Sourcing for Order Management
 
## Status
Accepted
 
## Context
The order management system requires:
- Complete audit trail of all order state changes
- Ability to reconstruct order state at any point in time
- Support for complex workflows with multiple state transitions
- Integration with analytics systems requiring event data
 
Current state: Orders stored as mutable records with limited change history.
 
## Decision
We will implement event sourcing for the Order aggregate:
- All state changes recorded as immutable events
- Current state computed by replaying events
- Event store using Apache Kafka with long-term retention
- CQRS pattern with separate read models for queries
 
## Alternatives Considered
 
### Alternative 1: Enhanced Audit Logging
- Approach: Add audit log table alongside mutable order records
- Pros: Simpler implementation, familiar pattern
- Cons: Dual-write instability, logs can diverge from reality
- Reason rejected: Cannot guarantee consistency between order state and audit log
 
### Alternative 2: Temporal Database (e.g., PostgreSQL with temporal tables)
- Approach: Use database-native versioning for historical state
- Pros: Database handles complexity, SQL interface maintained
- Cons: Vendor lock-in, limited event-driven integration
- Reason rejected: Doesn't provide event stream for downstream systems
 
## Consequences
 
### Positive
- Complete, guaranteed audit trail
- Natural fit for event-driven integration
- Enables time-travel debugging ("show me order state at 3pm Tuesday")
- Supports event replay for analytics reprocessing
 
### Negative  
- Team requires event sourcing training
- Read model complexity (eventual consistency)
- Storage growth requires event archival strategy
- More complex debugging (must understand event replay)
 
## Implementation Notes
- See Event Modeling document for event schema
- Estimated 6-week implementation timeline
- Training scheduled for sprint 23
 
## Related Decisions
- ADR-005: Kafka as Event Backbone
- ADR-008: CQRS Read Model Strategy

Trade-off Categories to Document

Ensure your design summary addresses trade-offs in these key areas:

Trade-off Categories

•Consistency vs. Availability — CAP theorem position, consistency model choices
•Performance vs. Simplicity — Complex optimizations vs. maintainable code
•Build vs. Buy — Custom development vs. third-party solutions
•Coupling vs. Cohesion — Microservices vs. monolith, shared libraries
•Cost vs. Capability — Infrastructure spending vs. feature richness
•Time-to-market vs. Technical debt — Shortcuts taken and their planned remediation
•Flexibility vs. Optimization — Generic solutions vs. specialized performance

The 'Why Not' Question

For every major decision, anticipate the 'Why not X?' question and document the answer. If you chose PostgreSQL, document why not MySQL, MongoDB, or Aurora. If you chose Kubernetes, document why not ECS, Nomad, or bare VMs. These 'Why not' answers are often more illuminating than the 'Why' answers.

Documenting Validation Results

The validation work from earlier in this module—requirements verification, bottleneck analysis, failure scenario testing, edge case handling—must be summarized in the design document. This builds confidence that the design is sound and identifies remaining risks.

Validation Summary Structure

Validation Results Documentation
Validation Type	What to Include	Format	Audience Interest
Requirements Verification	Coverage matrix, any gaps identified	Table showing requirement → component mapping	Product managers, QA
Capacity Analysis	Load projections, bottleneck identification	Capacity model spreadsheet, summary findings	Operations, infrastructure
Failure Analysis	FMEA summary, top risks with mitigations	Risk register with RPN scores	Operations, leadership
Edge Case Analysis	Categories explored, behaviors defined	Summary list, detailed specs in appendix	Developers, QA
Security Review	Threat model, security controls mapped	Threat matrix, control coverage	Security team, compliance

The Residual Risk Statement

No design is perfect. Honest documentation includes a clear statement of remaining risks—things that could still go wrong despite the validation work performed:

Residual Risk Statement ExampleWhat risks remain after validation

Input

Order Processing System Design

Output

**Accepted Risks:**
1. Third-party payment provider has 99.9% SLA; we cannot guarantee higher availability for payment-dependent operations
2. Event replay time for rebuilding read models is O(total events); recovery from catastrophic failure may take 2-4 hours
3. Celebrity user scenario (100K+ followers) not fully load-tested; assumed manageable via queue-based fan-out

**Mitigated but Monitorable Risks:**
1. Database replication lag monitored; alert if lag exceeds 500ms (see runbook #DR-004)
2. Memory pressure from large events monitored; circuit breaker triggers if heap exceeds 80%

**Deferred Risks:**
1. Multi-region active-active deployment deferred to Phase 2; current design is single-region with backup
2. Real-time dashboard performance with 10K concurrent users untested; Phase 1 limits to 1K concurrent

Explanation

This statement is honest about limitations. Stakeholders can make informed decisions about acceptable risk levels.

Never Hide Known Risks

Undocumented risks don't disappear—they become surprises. A risk discovered during implementation is frustrating. A risk discovered in production is a crisis. A risk documented in the design summary is professional engineering.

Presenting to Different Audiences

A design document serves multiple audiences with different interests, technical depths, and concerns. Effective presentation adapts to each audience while maintaining a single source of truth.

Audience-Tailored Presentations

Adapting Design Presentation by Audience
Audience	Primary Concern	Focus On	Avoid	Time Allocation
Executive Leadership	Business impact, risk, cost	Problem, solution, impact, timeline, risks	Technical details, implementation specifics	15 minutes
Product Management	User value, feature coverage, timeline	Requirements mapping, MVP scope, phasing	Infrastructure details	30 minutes
Architecture Review Board	Technical soundness, patterns, standards	Trade-offs, scalability, consistency model	Business justification (already approved)	60 minutes
Development Team	How to implement, API contracts, data models	Detailed design, interfaces, component responsibilities	Strategic justification	90+ minutes
Operations/SRE	How to run, monitor, troubleshoot	Deployment, monitoring, runbooks, failure modes	Code-level details	45 minutes
Security Team	Threat landscape, controls, compliance	Security controls, data flows, encryption	Performance optimizations	45 minutes

The Presentation Pyramid

Structure presentations from the top down:

Start with the answer: 'We're building X to solve Y'
Support with key points: 3-5 most important aspects
Offer detail on request: 'I can go deeper on any of these'

This lets audiences self-select their depth. Executives stop at step 1. Architects engage at step 2. Implementers drive into step 3.

Presentation Best Practices

•Start with context the audience doesn't have — What's new since they last engaged?
•Anticipate questions and address them proactively — 'You might wonder about X; here's the answer'
•Use visuals appropriate to the audience — Executives need different diagrams than developers
•Leave time for questions — At least 30% of the meeting
•Document decisions and action items — Send summary within 24 hours
•Be willing to say 'I don't know' — Follow up rather than guess

The Architecture Review

For architecture reviews, prepare for adversarial questioning. Senior engineers will probe your weakest points. This isn't hostility—it's quality assurance. Prepare by identifying your design's three weakest points and having thoughtful responses ready. If reviewers find weaknesses you haven't identified, your preparation was incomplete.

Living Documentation

A design document is only valuable if it's maintained. Stale documentation is worse than no documentation—it misleads. Living documentation requires intentional practices to keep it current.

The Documentation Decay Problem

Without active maintenance, design documents become outdated:

Implementation diverges from design (pragmatic adjustments not backported)
New team members don't know the document exists
The document lives in an inaccessible location
Updates require significant effort (complex formats, approvals)

Making Documentation Live

Living Documentation Practices

•Store with code — Design docs in the same repo as implementation, version-controlled together
•Generate diagrams from code — Tools like Structurizr, PlantUML, or Mermaid keep diagrams in sync
•Link to executable specifications — API docs generated from OpenAPI, tests as behavioral documentation
•Include in code review — Changes to architecture require corresponding doc updates
•Regular review cadence — Quarterly review to identify drift
•Clear ownership — Designated maintainer for each document
•Lightweight format — Markdown or similar; low friction to update

The Documentation Stack

Recommended Documentation Artifacts
Document Type	Purpose	Location	Update Trigger
README.md	Quick start, project overview	Repo root	Every significant change
ARCHITECTURE.md	High-level design summary	Repo root or /docs	Architecture changes
ADRs	Individual decisions and rationale	/docs/adr/	New significant decisions
API Documentation	Contract for consumers	Generated from spec	API changes (automated)
Runbooks	Operational procedures	Operations wiki + repo	Incident learnings
Incident Reports	What went wrong and why	Operations wiki	Post-mortem completion

The Two-Week Test

If a new engineer joins the team and can understand the system architecture within two weeks using only the documentation (plus asking clarifying questions), your documentation is adequate. If they need heroic tribal knowledge transfer, your documentation has failed. Test this by actually onboarding new team members and measuring what they can learn independently.

The Complete Design Summary Checklist

Before finalizing a design summary, verify completeness against this comprehensive checklist. A design summary that misses any of these elements is incomplete.

Design Summary Completeness Checklist

•☐ Executive Summary — Problem, solution, impact in one page
•☐ Problem Statement — Clear definition of what's being solved
•☐ Functional Requirements — Complete list with priorities
•☐ Non-Functional Requirements — Quantified quality attributes
•☐ Context Diagram — System and its external interactions
•☐ Container Diagram — Major components and technology choices
•☐ Data Model — Key entities and relationships
•☐ API Contracts — External interfaces specified
•☐ Data Flow Diagrams — Request flows for major use cases
•☐ Trade-off Analysis — Key decisions with alternatives considered
•☐ Capacity Model — Load projections and bottleneck analysis
•☐ Failure Analysis — FMEA summary and mitigations
•☐ Security Considerations — Threat model and controls
•☐ Operational Requirements — Monitoring, alerting, deployment
•☐ Residual Risks — Known limitations and accepted risks
•☐ Implementation Plan — Phases, milestones, dependencies
•☐ Glossary — Terminology definitions for clarity
•☐ References — Links to related documents and resources

The Final Review Questions

Before declaring a design complete, answer these questions:

Final Review Questions

•Can a new team member understand the system from this document? — Test with actual new team members
•Are all requirements clearly satisfied in the design? — Trace each requirement to components
•Are the major trade-offs documented? — Future maintainers will ask 'why'
•Are the risks and limitations honest? — No overselling, no hidden problems
•Is the implementation path clear? — Teams can start work without additional design
•Is the document maintainable? — Can it be updated without heroic effort?
•Has the design been reviewed by critical stakeholders? — Architecture, security, operations

Summary: Design Summary and Module Conclusion

The design summary is where all your validation work comes together into a coherent, communicable artifact. It's not just documentation—it's the contract between design intent and implementation reality.

Design Summary Key Takeaways

Key Takeaways

•Structure for multiple audiences — Executive summary to detailed design, each self-contained
•Lead with impact — Problem, solution, value before technical details
•Layered diagrams tell the architecture story — C4 model provides progressive detail
•Document trade-offs explicitly — ADRs preserve decision rationale
•Include validation results — Builds confidence and surfaces risks
•Adapt presentation to audience — Different concerns, different emphasis
•Plan for living documentation — Stored with code, low friction to update
•Verify completeness before finalizing — Use the checklist

Module 7: Design Validation - Complete

You have now completed the Design Validation module. You've learned to:

Verify requirements — Ensuring every requirement traces to implementing components
Analyze bottlenecks — Predicting capacity constraints before they manifest
Test failure scenarios — Validating resilience through systematic analysis
Handle edge cases — Identifying and addressing unusual conditions
Synthesize the design summary — Communicating validated designs effectively

These skills transform competent system designers into principal engineers who produce designs that don't just look good—they actually work in production.

Module Complete: Design Validation

Congratulations! You've mastered the Design Validation phase of the System Design Framework. You can now systematically verify requirements, analyze bottlenecks, test failure scenarios, handle edge cases, and synthesize compelling design summaries. These skills are the hallmark of senior system designers who build systems that survive contact with production reality.

5 / 5

Loading learning content...

System Design (HLD)Design Validation

Design Validation

LevelAdvanced

Duration90 mins

TopicDesign Validation

5 / 5

Design Summary

Telling the Story of Your Design

The design summary is the final deliverable of the design phase. It's not a formality or a documentation exercise—it's the artifact that:

Aligns stakeholders on what will be built and why
Guides implementers on how to build it correctly
Preserves decisions so future maintainers understand the rationale
Communicates trade-offs so leadership can validate alignment with business goals

Principal engineers treat the design summary as the most important document they produce. It's the contract between design and implementation—the bridge between intent and reality.

What You Will Master

The Anatomy of a Design Summary

The Core Sections

Design Summary Structure
Section	Purpose	Primary Audience	Key Questions Answered
Executive Summary	High-level overview of the design	Leadership, cross-team stakeholders	What are we building? Why? What's the impact?
Problem Statement	Defines what problem is being solved	All readers	What exactly are we trying to solve?
Requirements	Functional and non-functional requirements	Product, QA, implementers	What must the system do? How well?
Architecture Overview	High-level system structure	Engineers, architects	What are the major components? How do they interact?
Detailed Design	Component specifics, data models, APIs	Implementers	How exactly should each piece be built?
Trade-off Analysis	Decisions made and alternatives considered	Senior engineers, architects	Why this approach? What did we give up?
Validation Results	How the design was verified	QA, senior engineers	How do we know this design is sound?
Risks and Mitigations	Known limitations and how to address them	Leadership, operations	What could go wrong? What are we doing about it?
Implementation Plan	How the design will be realized	Project managers, implementers	In what order? What are the dependencies?
Appendix	Supporting details, calculations, references	Deep-divers, future maintainers	Where are the details?

Layered Information Density

A good design summary is designed to be read at multiple levels:

5-minute read: Executive summary gives the essence
30-minute read: Architecture overview and trade-offs for understanding the approach
2-hour read: Complete document for implementers
Deep dive: Appendix and linked documents for specific questions

Each layer should be self-contained—a reader shouldn't need to read the detailed design to understand the architecture overview.

The One-Page Test

Crafting the Executive Summary

The Executive Summary Formula

A compelling executive summary follows this structure:

The Problem (1-2 sentences): What pain are we addressing?
The Solution (2-3 sentences): What are we building?
The Approach (2-3 sentences): How does it work at the highest level?
The Impact (1-2 sentences): What business value does this deliver?
Key Risks (1-2 sentences): What are the major unknowns or challenges?

Executive Summary Example: Payment Processing System RedesignA well-structured executive summary

Input

Technical design for modernizing payment infrastructure

Output

**Problem**: Our current payment system cannot scale beyond 5,000 transactions per second and has a 99.5% availability (43+ hours annual downtime), causing revenue loss and customer frustration during peak sales events.

**Solution**: We propose a redesigned payment processing system built on event sourcing with multi-region active-active deployment, designed for 50,000 TPS capacity and 99.99% availability.

**Approach**: The new system replaces the monolithic payment processor with a microservices architecture using Kafka for event streaming, Cassandra for transaction storage, and Kubernetes for orchestration. Payments are processed asynchronously with real-time status updates to customers.

**Impact**: Eliminates payment failures during peak events, reduces checkout latency by 60%, and improves availability from 99.5% to 99.99%—preventing an estimated $4.2M in annual revenue loss.

**Key Risks**: Migration complexity requires a 3-month parallel operation period. Team needs training on event sourcing patterns. New external payment provider integration in progress.

Explanation

Common Executive Summary Failures

Communicating Architecture

The C4 Model for Architecture Diagrams

The C4 model provides a proven framework for layered architecture documentation:

C4 Diagram Levels
Level	Name	Shows	Audience	When to Include
1	Context	System and its external interactions	Everyone, including non-technical	Always
2	Container	High-level technology choices (apps, DBs, queues)	Technical stakeholders	Always
3	Component	Components within each container	Development team	For complex containers
4	Code	Classes, modules, implementation details	Individual developers	Rarely in design docs

Architecture Diagram Best Practices

Diagram Anti-patterns

•Too many boxes — More than 7±2 elements overwhelms
•Unlabeled arrows — What data? What protocol?
•Inconsistent abstraction — Mixing high and low level
•Missing context — What's external? What's owned?
•Static captures — Out of date immediately

Diagram Best Practices

•One level per diagram — Don't mix abstraction levels
•Label all connections — Protocol, data type, direction
•Show flow, not just structure — Numbered request flow
•Highlight key elements — Color-code by ownership/risk
•Generate from code — Living diagrams stay accurate

Data Flow Diagrams

Beyond structural diagrams, data flow diagrams are essential for understanding how information moves through the system. For each major capability, show:

Entry point: Where does data enter the system?
Transformations: How is data processed at each step?
Storage: Where is data persisted?
Exit point: Where does data leave the system?
Error paths: What happens when processing fails?

Converting Mermaid diagram...

The Diagram as Communication Tool

Documenting Trade-offs

The Architecture Decision Record (ADR)

ADRs are a standardized format for documenting significant design decisions:

adr-template.md
Markdown
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# ADR-007: Event Sourcing for Order Management
 
## Status
Accepted
 
## Context
The order management system requires:
- Complete audit trail of all order state changes
- Ability to reconstruct order state at any point in time
- Support for complex workflows with multiple state transitions
- Integration with analytics systems requiring event data
 
Current state: Orders stored as mutable records with limited change history.
 
## Decision
We will implement event sourcing for the Order aggregate:
- All state changes recorded as immutable events
- Current state computed by replaying events
- Event store using Apache Kafka with long-term retention
- CQRS pattern with separate read models for queries
 
## Alternatives Considered
 
### Alternative 1: Enhanced Audit Logging
- Approach: Add audit log table alongside mutable order records
- Pros: Simpler implementation, familiar pattern
- Cons: Dual-write instability, logs can diverge from reality
- Reason rejected: Cannot guarantee consistency between order state and audit log
 
### Alternative 2: Temporal Database (e.g., PostgreSQL with temporal tables)
- Approach: Use database-native versioning for historical state
- Pros: Database handles complexity, SQL interface maintained
- Cons: Vendor lock-in, limited event-driven integration
- Reason rejected: Doesn't provide event stream for downstream systems
 
## Consequences
 
### Positive
- Complete, guaranteed audit trail
- Natural fit for event-driven integration
- Enables time-travel debugging ("show me order state at 3pm Tuesday")
- Supports event replay for analytics reprocessing
 
### Negative  
- Team requires event sourcing training
- Read model complexity (eventual consistency)
- Storage growth requires event archival strategy
- More complex debugging (must understand event replay)
 
## Implementation Notes
- See Event Modeling document for event schema
- Estimated 6-week implementation timeline
- Training scheduled for sprint 23
 
## Related Decisions
- ADR-005: Kafka as Event Backbone
- ADR-008: CQRS Read Model Strategy

Trade-off Categories to Document

Ensure your design summary addresses trade-offs in these key areas:

Trade-off Categories

•Consistency vs. Availability — CAP theorem position, consistency model choices
•Performance vs. Simplicity — Complex optimizations vs. maintainable code
•Build vs. Buy — Custom development vs. third-party solutions
•Coupling vs. Cohesion — Microservices vs. monolith, shared libraries
•Cost vs. Capability — Infrastructure spending vs. feature richness
•Time-to-market vs. Technical debt — Shortcuts taken and their planned remediation
•Flexibility vs. Optimization — Generic solutions vs. specialized performance

The 'Why Not' Question

Documenting Validation Results

Validation Summary Structure

Validation Results Documentation
Validation Type	What to Include	Format	Audience Interest
Requirements Verification	Coverage matrix, any gaps identified	Table showing requirement → component mapping	Product managers, QA
Capacity Analysis	Load projections, bottleneck identification	Capacity model spreadsheet, summary findings	Operations, infrastructure
Failure Analysis	FMEA summary, top risks with mitigations	Risk register with RPN scores	Operations, leadership
Edge Case Analysis	Categories explored, behaviors defined	Summary list, detailed specs in appendix	Developers, QA
Security Review	Threat model, security controls mapped	Threat matrix, control coverage	Security team, compliance

The Residual Risk Statement

No design is perfect. Honest documentation includes a clear statement of remaining risks—things that could still go wrong despite the validation work performed:

Residual Risk Statement ExampleWhat risks remain after validation

Input

Order Processing System Design

Output

**Accepted Risks:**
1. Third-party payment provider has 99.9% SLA; we cannot guarantee higher availability for payment-dependent operations
2. Event replay time for rebuilding read models is O(total events); recovery from catastrophic failure may take 2-4 hours
3. Celebrity user scenario (100K+ followers) not fully load-tested; assumed manageable via queue-based fan-out

**Mitigated but Monitorable Risks:**
1. Database replication lag monitored; alert if lag exceeds 500ms (see runbook #DR-004)
2. Memory pressure from large events monitored; circuit breaker triggers if heap exceeds 80%

**Deferred Risks:**
1. Multi-region active-active deployment deferred to Phase 2; current design is single-region with backup
2. Real-time dashboard performance with 10K concurrent users untested; Phase 1 limits to 1K concurrent

Explanation

This statement is honest about limitations. Stakeholders can make informed decisions about acceptable risk levels.

Never Hide Known Risks

Presenting to Different Audiences

A design document serves multiple audiences with different interests, technical depths, and concerns. Effective presentation adapts to each audience while maintaining a single source of truth.

Audience-Tailored Presentations

Adapting Design Presentation by Audience
Audience	Primary Concern	Focus On	Avoid	Time Allocation
Executive Leadership	Business impact, risk, cost	Problem, solution, impact, timeline, risks	Technical details, implementation specifics	15 minutes
Product Management	User value, feature coverage, timeline	Requirements mapping, MVP scope, phasing	Infrastructure details	30 minutes
Architecture Review Board	Technical soundness, patterns, standards	Trade-offs, scalability, consistency model	Business justification (already approved)	60 minutes
Development Team	How to implement, API contracts, data models	Detailed design, interfaces, component responsibilities	Strategic justification	90+ minutes
Operations/SRE	How to run, monitor, troubleshoot	Deployment, monitoring, runbooks, failure modes	Code-level details	45 minutes
Security Team	Threat landscape, controls, compliance	Security controls, data flows, encryption	Performance optimizations	45 minutes

The Presentation Pyramid

Structure presentations from the top down:

Start with the answer: 'We're building X to solve Y'
Support with key points: 3-5 most important aspects
Offer detail on request: 'I can go deeper on any of these'

This lets audiences self-select their depth. Executives stop at step 1. Architects engage at step 2. Implementers drive into step 3.

Presentation Best Practices

•Start with context the audience doesn't have — What's new since they last engaged?
•Anticipate questions and address them proactively — 'You might wonder about X; here's the answer'
•Use visuals appropriate to the audience — Executives need different diagrams than developers
•Leave time for questions — At least 30% of the meeting
•Document decisions and action items — Send summary within 24 hours
•Be willing to say 'I don't know' — Follow up rather than guess

The Architecture Review

Living Documentation

A design document is only valuable if it's maintained. Stale documentation is worse than no documentation—it misleads. Living documentation requires intentional practices to keep it current.

The Documentation Decay Problem

Without active maintenance, design documents become outdated:

Implementation diverges from design (pragmatic adjustments not backported)
New team members don't know the document exists
The document lives in an inaccessible location
Updates require significant effort (complex formats, approvals)

Making Documentation Live

Living Documentation Practices

•Store with code — Design docs in the same repo as implementation, version-controlled together
•Generate diagrams from code — Tools like Structurizr, PlantUML, or Mermaid keep diagrams in sync
•Link to executable specifications — API docs generated from OpenAPI, tests as behavioral documentation
•Include in code review — Changes to architecture require corresponding doc updates
•Regular review cadence — Quarterly review to identify drift
•Clear ownership — Designated maintainer for each document
•Lightweight format — Markdown or similar; low friction to update

The Documentation Stack

Recommended Documentation Artifacts
Document Type	Purpose	Location	Update Trigger
README.md	Quick start, project overview	Repo root	Every significant change
ARCHITECTURE.md	High-level design summary	Repo root or /docs	Architecture changes
ADRs	Individual decisions and rationale	/docs/adr/	New significant decisions
API Documentation	Contract for consumers	Generated from spec	API changes (automated)
Runbooks	Operational procedures	Operations wiki + repo	Incident learnings
Incident Reports	What went wrong and why	Operations wiki	Post-mortem completion

The Two-Week Test

The Complete Design Summary Checklist

Before finalizing a design summary, verify completeness against this comprehensive checklist. A design summary that misses any of these elements is incomplete.

Design Summary Completeness Checklist

•☐ Executive Summary — Problem, solution, impact in one page
•☐ Problem Statement — Clear definition of what's being solved
•☐ Functional Requirements — Complete list with priorities
•☐ Non-Functional Requirements — Quantified quality attributes
•☐ Context Diagram — System and its external interactions
•☐ Container Diagram — Major components and technology choices
•☐ Data Model — Key entities and relationships
•☐ API Contracts — External interfaces specified
•☐ Data Flow Diagrams — Request flows for major use cases
•☐ Trade-off Analysis — Key decisions with alternatives considered
•☐ Capacity Model — Load projections and bottleneck analysis
•☐ Failure Analysis — FMEA summary and mitigations
•☐ Security Considerations — Threat model and controls
•☐ Operational Requirements — Monitoring, alerting, deployment
•☐ Residual Risks — Known limitations and accepted risks
•☐ Implementation Plan — Phases, milestones, dependencies
•☐ Glossary — Terminology definitions for clarity
•☐ References — Links to related documents and resources

The Final Review Questions

Before declaring a design complete, answer these questions:

Final Review Questions

•Can a new team member understand the system from this document? — Test with actual new team members
•Are all requirements clearly satisfied in the design? — Trace each requirement to components
•Are the major trade-offs documented? — Future maintainers will ask 'why'
•Are the risks and limitations honest? — No overselling, no hidden problems
•Is the implementation path clear? — Teams can start work without additional design
•Is the document maintainable? — Can it be updated without heroic effort?
•Has the design been reviewed by critical stakeholders? — Architecture, security, operations

Summary: Design Summary and Module Conclusion

Design Summary Key Takeaways

Key Takeaways

•Structure for multiple audiences — Executive summary to detailed design, each self-contained
•Lead with impact — Problem, solution, value before technical details
•Layered diagrams tell the architecture story — C4 model provides progressive detail
•Document trade-offs explicitly — ADRs preserve decision rationale
•Include validation results — Builds confidence and surfaces risks
•Adapt presentation to audience — Different concerns, different emphasis
•Plan for living documentation — Stored with code, low friction to update
•Verify completeness before finalizing — Use the checklist

Module 7: Design Validation - Complete

You have now completed the Design Validation module. You've learned to:

Verify requirements — Ensuring every requirement traces to implementing components
Analyze bottlenecks — Predicting capacity constraints before they manifest
Test failure scenarios — Validating resilience through systematic analysis
Handle edge cases — Identifying and addressing unusual conditions
Synthesize the design summary — Communicating validated designs effectively

These skills transform competent system designers into principal engineers who produce designs that don't just look good—they actually work in production.

Module Complete: Design Validation

5 / 5