Loading content...
We've explored the foundational concepts of distributed databases: why we distribute (motivation), how we divide data (fragmentation), how we maintain copies (replication), and how we hide complexity (transparency). Now we examine how these concepts assemble into coherent system designs—the architecture of distributed databases.
Architecture defines how components are organized, how they communicate, what they share, and how they coordinate. Different architectural choices lead to radically different system characteristics in terms of scalability, availability, consistency, and operational complexity.
There is no single "best" architecture. Each makes trade-offs appropriate for different workloads, scale requirements, and operational contexts. Understanding these architectures allows you to select the right system for your needs and understand the behaviors you'll observe.
By the end of this page, you will understand the major distributed database architectures—shared-nothing, shared-disk, federated, and cloud-native—along with their characteristics, trade-offs, and representative systems. You'll be able to evaluate architectural choices for different use cases and understand why different databases make different choices.
Before examining specific architectures, let's understand the dimensions along which architectures differ:
1. Resource Sharing Model
What resources do nodes share?
2. Data Distribution Model
How is data placed across nodes?
3. Coordination Model
How do nodes coordinate operations?
4. Consistency Model
What guarantees does the system provide?
5. Query Execution Model
How are queries distributed?
| Dimension | Option A | Option B | Trade-off |
|---|---|---|---|
| Sharing | Shared-Nothing | Shared-Disk | Scale-out simplicity vs. shared storage flexibility |
| Distribution | Partitioned | Replicated | Write scale vs. read availability |
| Coordination | Centralized | Decentralized | Simplicity vs. fault tolerance |
| Consistency | Strong | Eventual | Correctness guarantees vs. availability/latency |
| Query execution | Push-down | Centralized | Data locality vs. query optimization flexibility |
Real architectures combine choices across these dimensions. For example, Google Spanner is shared-nothing for compute, uses distributed consensus for coordination, provides strong consistency, and pushes query execution to data nodes. Understanding the dimensions helps you decompose and compare any architecture you encounter.
Shared-nothing architecture assigns each node its own private CPU, memory, and storage. Nodes communicate only via network messages—there's no shared memory or shared disk. Each node is responsible for a subset of the data (its partition) and processes queries against that data locally.
Characteristics
How It Works
123456789101112131415161718192021222324252627282930
Shared-Nothing Architecture=========================== ┌─────────────────────────────────────────────────────────────┐│ Client Application │└─────────────────────────┬───────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Query Router / Coordinator ││ (routes queries to appropriate nodes) │└───────┬──────────────────┬──────────────────┬───────────────┘ │ │ │ ▼ ▼ ▼┌───────────────┐ ┌───────────────┐ ┌───────────────┐│ Node 1 │ │ Node 2 │ │ Node 3 │├───────────────┤ ├───────────────┤ ├───────────────┤│ CPU │ │ CPU │ │ CPU ││ Memory │ │ Memory │ │ Memory ││ Storage │ │ Storage │ │ Storage │├───────────────┤ ├───────────────┤ ├───────────────┤│ Partition A │ │ Partition B │ │ Partition C ││ (Replica of C)│ │ (Replica of A)│ │ (Replica of B)│└───────────────┘ └───────────────┘ └───────────────┘ Key characteristics:- Each node has exclusive access to its storage- Nodes communicate only via network- Data partitioned and replicated across nodes- No shared memory or shared diskRepresentative Systems
| System | Domain | Notes |
|---|---|---|
| PostgreSQL Citus | OLTP/HTAP | Extends PostgreSQL with sharding |
| CockroachDB | OLTP | Distributed SQL, Raft consensus |
| Cassandra | Wide-column | Leaderless, eventual consistency |
| MongoDB | Document | Sharded clusters |
| Amazon Redshift | OLAP | Columnar, MPP analytics |
| Google Spanner | OLTP | Global distribution, TrueTime |
Shared-nothing is the dominant architecture for modern distributed databases. Its independence and commodity hardware economics make it the natural choice for cloud-native systems. When you hear "distributed database," assume shared-nothing unless otherwise specified.
Shared-disk (or shared-storage) architecture gives each node private CPU and memory, but all nodes access a common storage layer. The storage is typically a SAN (Storage Area Network), network-attached storage, or cloud block storage.
Characteristics
How It Works
1234567891011121314151617181920212223242526272829303132333435363738
Shared-Disk Architecture======================== ┌─────────────────────────────────────────────────────────────┐│ Client Application │└───────────┬─────────────────────────────────┬───────────────┘ │ │ ▼ ▼┌─────────────────────────┐ ┌─────────────────────────┐│ Node 1 │ │ Node 2 │├─────────────────────────┤ ├─────────────────────────┤│ CPU │ │ CPU ││ Private Memory │ │ Private Memory ││ (Buffer Pool/Cache) │ │ (Buffer Pool/Cache) │└───────────┬─────────────┘ └─────────────┬───────────┘ │ │ │ ┌───────────────────────┐ │ │ │ Cache Coordination │ │ │ │ (Global Lock Mgr) │ │ │ └───────────────────────┘ │ │ │ └──────────────┬────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────┐│ Shared Storage Layer ││ (SAN, NFS, Cloud Block Storage) ││ ││ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ││ │ Data A │ │ Data B │ │ Data C │ │ Data D │ ││ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │└─────────────────────────────────────────────────────────────┘ Key characteristics:- All nodes access the same storage- Each node has private memory cache- Coordination required for cache coherence- Adding a node doesn't move dataCache Coherence Challenge
When multiple nodes cache the same data, modifications must be coordinated:
Cache coherence overhead limits the scalability of shared-disk compared to shared-nothing.
Representative Systems
| System | Domain | Notes |
|---|---|---|
| Oracle RAC | OLTP | Industry standard for shared-disk |
| Amazon Aurora | OLTP | Cloud-native, shared storage layer |
| Azure Hyperscale | OLTP | Page servers with shared storage |
| Exadata | OLTP/OLAP | Oracle with smart storage |
| IBM Db2 pureScale | OLTP | Shared-disk clustering |
Cloud databases like Aurora and Azure Hyperscale are reinventing shared-disk architecture. They decouple compute and storage, using distributed cloud storage (not traditional SAN). This enables elastic compute scaling without data movement—the storage layer handles durability and replication. This is sometimes called "disaggregated storage" architecture.
Federated (or multi-database) architecture integrates multiple autonomous databases into a coherent system. Each component database retains independence but participates in a larger federation that enables unified queries and transactions.
Characteristics
Why Federation?
123456789101112131415161718192021222324252627282930313233343536
Federated Database Architecture================================ ┌────────────────────────────────────────────────────────────────┐│ Client Application ││ (Uses unified global schema) │└───────────────────────────┬────────────────────────────────────┘ │ ▼┌────────────────────────────────────────────────────────────────┐│ Federation Middleware ││ ┌──────────────────────────────────────────────────────────┐ ││ │ Global Schema / Data Dictionary │ ││ └──────────────────────────────────────────────────────────┘ ││ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ ││ │Query Parser │ │Query Decomp │ │Result Integration │ ││ │& Planner │ │& Routing │ │& Transformation │ ││ └──────────────┘ └──────────────┘ └────────────────────┘ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ Distributed Transaction Manager │ ││ └─────────────────────────────────────────────────────────┘ │└───────┬────────────────────┬───────────────────────┬──────────┘ │ │ │ ▼ ▼ ▼┌───────────────┐ ┌───────────────┐ ┌────────────────────┐│ PostgreSQL │ │ Oracle │ │ MongoDB ││ (Local DB 1) │ │ (Local DB 2) │ │ (Local DB 3) ││ │ │ │ │ ││ HR Data │ │ Financial │ │ Product Catalog ││ Schema A │ │ Schema B │ │ Schema C │└───────────────┘ └───────────────┘ └────────────────────┘ Autonomous: Autonomous: Autonomous: Own admin Own admin Own admin Local queries Local queries Local queries Local optimizations Local optimizations Local optimizationsFederation Components
Global Schema
A unified schema that abstracts local database schemas:
Query Decomposition
Global queries are split into local queries:
Result Integration
Local results are combined into global results:
Distributed Transactions
Coordinate transactions spanning databases:
Modern "data virtualization" tools (Denodo, Starburst, Dremio) are essentially sophisticated federation systems. They query across warehouses, lakes, and operational databases without moving data. The key innovation is better query pushdown and caching to improve federated query performance.
Multi-primary (or multi-master) architecture allows writes at multiple nodes, unlike primary-secondary where only one node accepts writes. Each primary independently accepts writes, with changes replicated between primaries.
Why Multi-Primary?
The Conflict Challenge
Multi-primary's fundamental challenge is conflicts. When two primaries concurrently modify the same data:
balance = 100balance = 150Both writes are valid; there's no single "correct" answer. Conflict resolution is required.
Conflict Resolution Strategies
1. Last-Writer-Wins (LWW)
Highest timestamp wins. Simple but:
2. Application-Defined Merge
Application provides merge logic:
Requires careful application design.
3. CRDTs (Conflict-free Replicated Data Types)
Data structures that mathematically guarantee convergence:
4. Conflict Detection and Resolution
Detect conflicts; let humans or application resolve:
| Strategy | Automation | Data Safety | Complexity | Use Case |
|---|---|---|---|---|
| Last-Writer-Wins | Automatic | May lose data | Low | Caching, non-critical data |
| Application Merge | Semi-auto | Custom logic | Medium | Domain-specific semantics |
| CRDTs | Automatic | Mathematically safe | Low (for supported types) | Counters, sets, flags |
| Manual Resolution | Manual | Human-verified | High | Financial, legal documents |
Multi-Primary Topologies
All-to-All Replication
Every primary replicates to every other. Simple but O(n²) connections.
Ring Replication
Each primary replicates to next in ring. Efficient but failure propagation risk.
Hub-and-Spoke
Central hub receives and redistributes. Single point of failure but simple.
Representative Systems
| System | Conflict Strategy | Notes |
|---|---|---|
| MySQL Group Replication | Certification-based (block conflicts) | Synchronous multi-primary |
| PostgreSQL BDR | Custom conflict handlers | Asynchronous, LWW default |
| CouchDB | Revision tree, manual resolution | Document-level conflicts |
| Riak | Vector clocks, siblings | Eventually consistent |
Multi-primary architecture adds significant complexity compared to primary-secondary. Conflicts are subtle, conflict resolution can have business implications, and debugging distributed state is difficult. Use multi-primary only when write availability across regions is genuinely required—not as a default scaling strategy.
Cloud-native database architecture is designed from the ground up for cloud infrastructure, leveraging cloud primitives (object storage, elastic compute, managed networking) rather than adapting traditional designs.
Key Characteristics
1. Compute-Storage Separation
Compute (query processing) and storage (data persistence) are separate, independently scalable tiers:
2. Serverless Operation
Resources automatically scale with demand:
3. Multi-tenant Isolation
Cloud databases serve multiple tenants on shared infrastructure:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
Cloud-Native Database Architecture (e.g., Snowflake, BigQuery)============================================================= ┌────────────────────────────────────────────────────────────────┐│ Client Application │└───────────────────────────┬────────────────────────────────────┘ │ ▼┌────────────────────────────────────────────────────────────────┐│ Cloud Services Layer (Metadata, Coordination) ││ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ ││ │Query Parser/ │ │Metadata │ │Transaction │ ││ │Optimizer │ │Service │ │Management │ ││ └──────────────┘ └──────────────┘ └────────────────────┘ │└───────────────────────────┬────────────────────────────────────┘ │ ┌───────────────────┼───────────────────┐ ▼ ▼ ▼┌───────────────┐ ┌───────────────┐ ┌───────────────┐│ Compute │ │ Compute │ │ Compute ││ Cluster 1 │ │ Cluster 2 │ │ Cluster 3 ││ (Warehouse) │ │ (Warehouse) │ │ (Warehouse) ││ │ │ │ │ ││ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ ││ │ Ephemeral│ │ │ │ Ephemeral│ │ │ │ Ephemeral│ ││ │ Cache │ │ │ │ Cache │ │ │ │ Cache │ ││ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │└───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ │ │ └───────────────────┴───────────────────┘ │ ▼┌────────────────────────────────────────────────────────────────┐│ Cloud Object Storage (S3, GCS, Azure Blob) ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Table A │ │ Table B │ │ Table C │ │ Metadata │ ││ │ Parquet │ │ Parquet │ │ Parquet │ │ (JSON) │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ ││ - Infinite scale ││ - Pay per byte stored ││ - 11 nines durability ││ - Automatic replication │└────────────────────────────────────────────────────────────────┘Cloud-Native Benefits
Representative Systems
| System | Cloud | Architecture |
|---|---|---|
| Snowflake | Multi-cloud | Compute-storage separation, virtual warehouses |
| BigQuery | GCP | Serverless, Dremel engine, Capacitor storage |
| Amazon Redshift Serverless | AWS | Auto-scaling, pay-per-query |
| Azure Synapse | Azure | Unified analytics, on-demand pools |
| Databricks | Multi-cloud | Spark-based, Delta Lake storage |
Cloud-native architectures are converging toward the "lakehouse" pattern—data stored in open formats (Parquet, Delta) on cloud object storage, with multiple engines (SQL, ML, streaming) accessing the same data. This decouples storage from any single compute engine, enabling best-of-breed processing.
Selecting the right architecture depends on workload characteristics, scale requirements, operational constraints, and organizational context.
Decision Framework
| Requirement | Recommended Architecture | Rationale |
|---|---|---|
| Scale writes horizontally | Shared-nothing | Partitioning enables write distribution |
| Elastic read scaling only | Shared-disk / Read replicas | Simpler than partitioning; add compute |
| Integrate existing databases | Federation | Avoid replacing working systems |
| Multi-region write availability | Multi-primary | Accept conflict complexity for availability |
| Variable workloads, cost sensitivity | Cloud-native / Serverless | Scale to zero, pay per use |
| Strong consistency required | Shared-nothing with consensus | Raft/Paxos for linearizable operations |
| Simple operations, single region | Primary-secondary replication | Well-understood, simpler to operate |
Questions to Guide Selection
Most successful systems start simpler than they end. Begin with primary-secondary replication in one region. Add read replicas when read scale is needed. Add sharding when write scale is needed. Add multi-region when geographic distribution is required. Each step adds complexity—take it only when necessary.
Architecture is how distributed database concepts combine into functional systems. Let's consolidate the key concepts:
Module Complete: Distributed Database Concepts
You've now completed the foundational module on distributed database concepts. You understand:
With this foundation, you're prepared to explore specific distributed database techniques in subsequent modules: fragmentation strategies, distributed transactions, CAP theorem implications, and sharding patterns.
Congratulations! You've mastered the foundational concepts of distributed databases. You can now reason about why distribution is necessary, how data is partitioned and replicated, what transparency mechanisms hide from applications, and how different architectures make different trade-offs. This conceptual foundation prepares you for the practical techniques covered in subsequent modules.