Design a Multi-Region Active-Active Database

Design a multi-region active-active database that deploys full data replicas across 3–5 geographically distributed regions, allowing any region to independently serve both reads and writes with local latency (single-digit ms). The system uses asynchronous WAL-based replication (50–300ms lag) across regions, resolves write conflicts via last-write-wins (LWW), CRDTs, or application-level resolution, optionally supports globally strong consistency (Spanner-style Paxos + TrueTime for critical operations), handles region failover transparently (RPO ≈ 0, RTO < 60s), enforces data sovereignty via geo-fencing, and propagates schema changes consistently across all regions.

Scale Estimates

Metric	Value
Regions	3–5 (active-active)
Data per region	Full copy (or geo-fenced subset)
Write latency (local)	1–5ms
Write latency (global strong)	200–500ms (cross-region Paxos)
Replication lag (async)	50–300ms
Conflict rate	< 0.001% (with proper key design)
Region failover RTO	< 60 seconds
Region failover RPO	≈ 0 (replication lag)
Cross-region bandwidth	10–100 Mbps per region (compressed)
Database shards per region	10–100

Non-Functional Requirements

Local performance: Reads and writes served from the nearest region with single-digit ms latency; no cross-region round trip for default (eventual) consistency mode; the database must 'feel local' everywhere
Eventual consistency: Async cross-region replication with 50–300ms lag; all regions converge to the same state; conflict resolution guarantees convergence (LWW / CRDTs / app-level); suitable for 99%+ of operations
Strong consistency (optional): For critical operations: synchronous Paxos across regions (Spanner model); TrueTime or HLC for globally ordered timestamps; 200–500ms write latency but linearizable guarantees
Availability: Any region can serve traffic independently; single-region failure transparent (failover < 60s); dual-region failure tolerated by remaining regions; no single point of failure
Data sovereignty: Per-row geo-fencing (GDPR, DPDI Act, PIPL); replication filter prevents pinned data from leaving designated region; compliance audit trail; automated compliance scanning
Conflict handling: LWW for simple overwrites; CRDTs for counters/sets; app-level for complex business logic; conflict avoidance by design (region-owned entities) is the primary strategy; conflict rate < 0.001%

Scale Estimates

Metric

Value

Regions

3–5 (active-active)

Data per region

Full copy (or geo-fenced subset)

Write latency (local)

1–5ms

Write latency (global strong)

200–500ms (cross-region Paxos)

Replication lag (async)

50–300ms

Conflict rate

< 0.001% (with proper key design)

Region failover RTO

< 60 seconds

Region failover RPO

≈ 0 (replication lag)

Cross-region bandwidth

10–100 Mbps per region (compressed)

Database shards per region

10–100

Non-Functional Requirements

Local performance: Reads and writes served from the nearest region with single-digit ms latency; no cross-region round trip for default (eventual) consistency mode; the database must 'feel local' everywhere

Eventual consistency: Async cross-region replication with 50–300ms lag; all regions converge to the same state; conflict resolution guarantees convergence (LWW / CRDTs / app-level); suitable for 99%+ of operations

Strong consistency (optional): For critical operations: synchronous Paxos across regions (Spanner model); TrueTime or HLC for globally ordered timestamps; 200–500ms write latency but linearizable guarantees

Availability: Any region can serve traffic independently; single-region failure transparent (failover < 60s); dual-region failure tolerated by remaining regions; no single point of failure

Data sovereignty: Per-row geo-fencing (GDPR, DPDI Act, PIPL); replication filter prevents pinned data from leaving designated region; compliance audit trail; automated compliance scanning

Conflict handling: LWW for simple overwrites; CRDTs for counters/sets; app-level for complex business logic; conflict avoidance by design (region-owned entities) is the primary strategy; conflict rate < 0.001%

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Multi-Region Active-Active Database

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Multi-Region Active-Active Database

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does cross-region replication work in an active-active database?

2How do you handle write conflicts in an active-active system?

3How does Google Spanner achieve global strong consistency?

4How do you design the routing layer for multi-region?

5How do you handle data sovereignty and compliance?

6How do you handle schema changes across regions?

7How would you architect the complete system?

Key Topics

Asked At

Design a Multi-Region Active-Active Database

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does cross-region replication work in an active-active database?

2How do you handle write conflicts in an active-active system?

3How does Google Spanner achieve global strong consistency?

4How do you design the routing layer for multi-region?

5How do you handle data sovereignty and compliance?

6How do you handle schema changes across regions?

7How would you architect the complete system?

Key Topics

Asked At