Distributed Database Concepts - Learning Module

Loading content...

0/241

Architecture

Structuring Distributed Database Systems

We've explored the foundational concepts of distributed databases: why we distribute (motivation), how we divide data (fragmentation), how we maintain copies (replication), and how we hide complexity (transparency). Now we examine how these concepts assemble into coherent system designs—the architecture of distributed databases.

Architecture defines how components are organized, how they communicate, what they share, and how they coordinate. Different architectural choices lead to radically different system characteristics in terms of scalability, availability, consistency, and operational complexity.

There is no single "best" architecture. Each makes trade-offs appropriate for different workloads, scale requirements, and operational contexts. Understanding these architectures allows you to select the right system for your needs and understand the behaviors you'll observe.

What You Will Learn

By the end of this page, you will understand the major distributed database architectures—shared-nothing, shared-disk, federated, and cloud-native—along with their characteristics, trade-offs, and representative systems. You'll be able to evaluate architectural choices for different use cases and understand why different databases make different choices.

Architectural Dimensions

Before examining specific architectures, let's understand the dimensions along which architectures differ:

1. Resource Sharing Model

What resources do nodes share?

Shared-Nothing: Nodes share nothing; each has private CPU, memory, disk
Shared-Disk: Nodes share storage but have private CPU and memory
Shared-Everything: All resources shared (traditional SMP, not truly distributed)

2. Data Distribution Model

How is data placed across nodes?

Partitioned: Each piece of data on one node (or replicated set of nodes)
Replicated: All data on all nodes
Hybrid: Some data partitioned, some replicated (e.g., reference data replicated)

3. Coordination Model

How do nodes coordinate operations?

Centralized: One coordinator node makes decisions
Decentralized: All nodes participate in coordination (consensus)
Hierarchical: Tree of coordinators for scalability

4. Consistency Model

What guarantees does the system provide?

Strong: All reads see most recent write (linearizable)
Session: Consistency within a session
Eventual: Replicas converge over time

5. Query Execution Model

How are queries distributed?

Push-down: Sub-queries pushed to data nodes, results aggregated
Centralized: All data pulled to central query processor
Hybrid: Compute where efficient, move when necessary

Architectural Dimension Trade-offs
Dimension	Option A	Option B	Trade-off
Sharing	Shared-Nothing	Shared-Disk	Scale-out simplicity vs. shared storage flexibility
Distribution	Partitioned	Replicated	Write scale vs. read availability
Coordination	Centralized	Decentralized	Simplicity vs. fault tolerance
Consistency	Strong	Eventual	Correctness guarantees vs. availability/latency
Query execution	Push-down	Centralized	Data locality vs. query optimization flexibility

Architectures Are Combinations

Real architectures combine choices across these dimensions. For example, Google Spanner is shared-nothing for compute, uses distributed consensus for coordination, provides strong consistency, and pushes query execution to data nodes. Understanding the dimensions helps you decompose and compare any architecture you encounter.

Shared-Nothing Architecture

Shared-nothing architecture assigns each node its own private CPU, memory, and storage. Nodes communicate only via network messages—there's no shared memory or shared disk. Each node is responsible for a subset of the data (its partition) and processes queries against that data locally.

Characteristics

Independence: Nodes operate independently; one node's failure doesn't affect others' data
Scale-out: Adding nodes adds capacity linearly (assuming good partitioning)
Data locality: Queries execute where data resides, minimizing network transfer
No contention: No shared resources to become bottlenecks

How It Works

Data is partitioned across nodes based on partition key
Queries are routed to nodes containing relevant data
Parallel execution on multiple nodes when query spans partitions
Coordinator aggregates results from participating nodes
Replication provides redundancy (each partition copied to multiple nodes)

shared_nothing_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Shared-Nothing Architecture
===========================
 
┌─────────────────────────────────────────────────────────────┐
│                     Client Application                       │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│              Query Router / Coordinator                      │
│         (routes queries to appropriate nodes)                │
└───────┬──────────────────┬──────────────────┬───────────────┘
        │                  │                  │
        ▼                  ▼                  ▼
┌───────────────┐  ┌───────────────┐  ┌───────────────┐
│    Node 1     │  │    Node 2     │  │    Node 3     │
├───────────────┤  ├───────────────┤  ├───────────────┤
│      CPU      │  │      CPU      │  │      CPU      │
│    Memory     │  │    Memory     │  │    Memory     │
│    Storage    │  │    Storage    │  │    Storage    │
├───────────────┤  ├───────────────┤  ├───────────────┤
│ Partition A   │  │ Partition B   │  │ Partition C   │
│ (Replica of C)│  │ (Replica of A)│  │ (Replica of B)│
└───────────────┘  └───────────────┘  └───────────────┘
 
Key characteristics:
- Each node has exclusive access to its storage
- Nodes communicate only via network
- Data partitioned and replicated across nodes
- No shared memory or shared disk

Shared-Nothing Advantages

•Linear scalability for partitionable workloads
•No single point of failure (with replication)
•Cost-effective with commodity hardware
•No shared resource contention
•Geographic distribution natural

Shared-Nothing Challenges

•Cross-partition queries expensive (network shuffles)
•Joins across partitions require data movement
•Rebalancing overhead when adding/removing nodes
•Distributed transactions complex
•Hot partitions can bottleneck scale

Representative Systems

System	Domain	Notes
PostgreSQL Citus	OLTP/HTAP	Extends PostgreSQL with sharding
CockroachDB	OLTP	Distributed SQL, Raft consensus
Cassandra	Wide-column	Leaderless, eventual consistency
MongoDB	Document	Sharded clusters
Amazon Redshift	OLAP	Columnar, MPP analytics
Google Spanner	OLTP	Global distribution, TrueTime

Shared-Nothing as Default

Shared-nothing is the dominant architecture for modern distributed databases. Its independence and commodity hardware economics make it the natural choice for cloud-native systems. When you hear "distributed database," assume shared-nothing unless otherwise specified.

Shared-Disk Architecture

Shared-disk (or shared-storage) architecture gives each node private CPU and memory, but all nodes access a common storage layer. The storage is typically a SAN (Storage Area Network), network-attached storage, or cloud block storage.

Characteristics

Single data copy: All nodes see the same storage; no partition placement decisions
Flexible compute scaling: Add/remove compute nodes without moving data
Simpler data management: No rebalancing, no data movement when scaling
Coordination required: Cache coherence and locking across nodes essential

How It Works

All nodes connect to shared storage (SAN, cloud storage)
Each node caches data in private memory (buffer pool)
Coordination layer manages cache coherence (e.g., block locking)
Queries can execute on any node—data accessible everywhere
Write coordination prevents conflicts
Storage layer provides durability and (optionally) replication

shared_disk_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Shared-Disk Architecture
========================
 
┌─────────────────────────────────────────────────────────────┐
│                     Client Application                       │
└───────────┬─────────────────────────────────┬───────────────┘
            │                                 │
            ▼                                 ▼
┌─────────────────────────┐   ┌─────────────────────────┐
│        Node 1           │   │        Node 2           │
├─────────────────────────┤   ├─────────────────────────┤
│         CPU             │   │         CPU             │
│    Private Memory       │   │    Private Memory       │
│  (Buffer Pool/Cache)    │   │  (Buffer Pool/Cache)    │
└───────────┬─────────────┘   └─────────────┬───────────┘
            │                               │
            │   ┌───────────────────────┐   │
            │   │  Cache Coordination   │   │
            │   │  (Global Lock Mgr)    │   │
            │   └───────────────────────┘   │
            │                               │
            └──────────────┬────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                     Shared Storage Layer                     │
│              (SAN, NFS, Cloud Block Storage)                 │
│                                                              │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐ │
│  │  Data A   │  │  Data B   │  │  Data C   │  │  Data D   │ │
│  └───────────┘  └───────────┘  └───────────┘  └───────────┘ │
└─────────────────────────────────────────────────────────────┘
 
Key characteristics:
- All nodes access the same storage
- Each node has private memory cache
- Coordination required for cache coherence
- Adding a node doesn't move data

Cache Coherence Challenge

When multiple nodes cache the same data, modifications must be coordinated:

Write invalidation: When Node 1 modifies data, Node 2's cache is invalidated
Distributed lock manager: Ensures only one node can modify a page
Global buffer cache: Shared memory across nodes (rare, complex)

Cache coherence overhead limits the scalability of shared-disk compared to shared-nothing.

Representative Systems

System	Domain	Notes
Oracle RAC	OLTP	Industry standard for shared-disk
Amazon Aurora	OLTP	Cloud-native, shared storage layer
Azure Hyperscale	OLTP	Page servers with shared storage
Exadata	OLTP/OLAP	Oracle with smart storage
IBM Db2 pureScale	OLTP	Shared-disk clustering

Cloud-Native Shared Storage

Cloud databases like Aurora and Azure Hyperscale are reinventing shared-disk architecture. They decouple compute and storage, using distributed cloud storage (not traditional SAN). This enables elastic compute scaling without data movement—the storage layer handles durability and replication. This is sometimes called "disaggregated storage" architecture.

Federated Architecture

Federated (or multi-database) architecture integrates multiple autonomous databases into a coherent system. Each component database retains independence but participates in a larger federation that enables unified queries and transactions.

Characteristics

Heterogeneity: Component databases may be different vendors, versions, schemas
Autonomy: Each database is independently administered and controlled
Integration layer: Middleware provides unified query and transaction interface
Schema mapping: Global schema maps to local schemas of component databases

Why Federation?

Legacy integration: Connect existing databases without replacing them
Organizational boundaries: Different departments own different databases
Best-of-breed: Use specialized databases for different workloads
Incremental migration: Gradually move to new systems without big-bang cutover
Regulatory requirements: Keep certain data in specific systems for compliance

federated_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Federated Database Architecture
================================
 
┌────────────────────────────────────────────────────────────────┐
│                      Client Application                         │
│                  (Uses unified global schema)                   │
└───────────────────────────┬────────────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────────────┐
│                    Federation Middleware                        │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              Global Schema / Data Dictionary              │  │
│  └──────────────────────────────────────────────────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐   │
│  │Query Parser  │  │Query Decomp  │  │Result Integration  │   │
│  │& Planner     │  │& Routing     │  │& Transformation    │   │
│  └──────────────┘  └──────────────┘  └────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Distributed Transaction Manager             │   │
│  └─────────────────────────────────────────────────────────┘   │
└───────┬────────────────────┬───────────────────────┬──────────┘
        │                    │                       │
        ▼                    ▼                       ▼
┌───────────────┐    ┌───────────────┐    ┌────────────────────┐
│  PostgreSQL   │    │   Oracle      │    │    MongoDB         │
│  (Local DB 1) │    │  (Local DB 2) │    │   (Local DB 3)     │
│               │    │               │    │                    │
│ HR Data       │    │ Financial     │    │ Product Catalog    │
│ Schema A      │    │ Schema B      │    │ Schema C           │
└───────────────┘    └───────────────┘    └────────────────────┘
 
 Autonomous:          Autonomous:          Autonomous:
 Own admin            Own admin            Own admin
 Local queries        Local queries        Local queries
 Local optimizations  Local optimizations  Local optimizations

Federation Components

Global Schema

A unified schema that abstracts local database schemas:

Global entities map to local tables
Schema conflicts resolved (naming, types, semantics)
May be read-only or fully updatable

Query Decomposition

Global queries are split into local queries:

Identify which local databases have relevant data
Rewrite global query in local SQL dialects
Optimize for local database capabilities

Result Integration

Local results are combined into global results:

Schema translation back to global schema
Joins across databases (expensive!)
Aggregations, sorting, deduplication

Distributed Transactions

Coordinate transactions spanning databases:

Two-phase commit across heterogeneous systems
Limited by least capable participant
Often avoided due to complexity

Federation Challenges

•Performance limitations: Cross-database joins are slow (data movement over network)
•Query optimization: Global optimizer lacks local statistics, can't push all operations
•Consistency complexity: Distributed transactions across heterogeneous systems are fragile
•Schema mapping maintenance: Local schema changes break global mappings
•Capability variations: Queries limited by least capable database

Modern Federation: Data Virtualization

Modern "data virtualization" tools (Denodo, Starburst, Dremio) are essentially sophisticated federation systems. They query across warehouses, lakes, and operational databases without moving data. The key innovation is better query pushdown and caching to improve federated query performance.

Multi-Primary Architecture

Multi-primary (or multi-master) architecture allows writes at multiple nodes, unlike primary-secondary where only one node accepts writes. Each primary independently accepts writes, with changes replicated between primaries.

Why Multi-Primary?

Write availability: If one primary fails, others continue accepting writes
Geographic writes: Users write to nearby primary; no cross-region latency
Load distribution: Write load spread across primaries

The Conflict Challenge

Multi-primary's fundamental challenge is conflicts. When two primaries concurrently modify the same data:

Primary A sets balance = 100
Primary B sets balance = 150
After replication, which value wins?

Both writes are valid; there's no single "correct" answer. Conflict resolution is required.

Conflict Resolution Strategies

1. Last-Writer-Wins (LWW)

Highest timestamp wins. Simple but:

Clock skew can pick "wrong" winner
Concurrent writes are silently discarded
Works well when conflicts are rare

2. Application-Defined Merge

Application provides merge logic:

Shopping cart: union items
Counter: add deltas
Document: three-way merge

Requires careful application design.

3. CRDTs (Conflict-free Replicated Data Types)

Data structures that mathematically guarantee convergence:

Grow-only counters
Observed-remove sets
Merge semantics built into data type

4. Conflict Detection and Resolution

Detect conflicts; let humans or application resolve:

Flag conflicting versions
Present to user for selection
Store conflict metadata for audit

Multi-Primary Conflict Resolution Comparison
Strategy	Automation	Data Safety	Complexity	Use Case
Last-Writer-Wins	Automatic	May lose data	Low	Caching, non-critical data
Application Merge	Semi-auto	Custom logic	Medium	Domain-specific semantics
CRDTs	Automatic	Mathematically safe	Low (for supported types)	Counters, sets, flags
Manual Resolution	Manual	Human-verified	High	Financial, legal documents

Multi-Primary Topologies

All-to-All Replication

Every primary replicates to every other. Simple but O(n²) connections.

Ring Replication

Each primary replicates to next in ring. Efficient but failure propagation risk.

Hub-and-Spoke

Central hub receives and redistributes. Single point of failure but simple.

Representative Systems

System	Conflict Strategy	Notes
MySQL Group Replication	Certification-based (block conflicts)	Synchronous multi-primary
PostgreSQL BDR	Custom conflict handlers	Asynchronous, LWW default
CouchDB	Revision tree, manual resolution	Document-level conflicts
Riak	Vector clocks, siblings	Eventually consistent

Multi-Primary is Complex

Multi-primary architecture adds significant complexity compared to primary-secondary. Conflicts are subtle, conflict resolution can have business implications, and debugging distributed state is difficult. Use multi-primary only when write availability across regions is genuinely required—not as a default scaling strategy.

Cloud-Native Architecture

Cloud-native database architecture is designed from the ground up for cloud infrastructure, leveraging cloud primitives (object storage, elastic compute, managed networking) rather than adapting traditional designs.

Key Characteristics

1. Compute-Storage Separation

Compute (query processing) and storage (data persistence) are separate, independently scalable tiers:

Scale compute for query load without touching storage
Scale storage for data growth without touching compute
Pay for each tier based on actual usage

2. Serverless Operation

Resources automatically scale with demand:

Automatic provisioning during load spikes
Scale to zero during idle periods
No capacity planning required

3. Multi-tenant Isolation

Cloud databases serve multiple tenants on shared infrastructure:

Workload isolation prevents noisy neighbors
Resource quotas ensure fair sharing
Data isolation for security

cloud_native_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Cloud-Native Database Architecture (e.g., Snowflake, BigQuery)
=============================================================
 
┌────────────────────────────────────────────────────────────────┐
│                      Client Application                         │
└───────────────────────────┬────────────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────────────┐
│         Cloud Services Layer (Metadata, Coordination)          │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐   │
│  │Query Parser/ │  │Metadata      │  │Transaction         │   │
│  │Optimizer     │  │Service       │  │Management          │   │
│  └──────────────┘  └──────────────┘  └────────────────────┘   │
└───────────────────────────┬────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│  Compute      │   │  Compute      │   │  Compute      │
│  Cluster 1    │   │  Cluster 2    │   │  Cluster 3    │
│  (Warehouse)  │   │  (Warehouse)  │   │  (Warehouse)  │
│               │   │               │   │               │
│ ┌───────────┐ │   │ ┌───────────┐ │   │ ┌───────────┐ │
│ │  Ephemeral│ │   │ │  Ephemeral│ │   │ │  Ephemeral│ │
│ │   Cache   │ │   │ │   Cache   │ │   │ │   Cache   │ │
│ └───────────┘ │   │ └───────────┘ │   │ └───────────┘ │
└───────┬───────┘   └───────┬───────┘   └───────┬───────┘
        │                   │                   │
        └───────────────────┴───────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────────────┐
│              Cloud Object Storage (S3, GCS, Azure Blob)         │
│                                                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │ Table A  │  │ Table B  │  │ Table C  │  │ Metadata │       │
│  │ Parquet  │  │ Parquet  │  │ Parquet  │  │ (JSON)   │       │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
│                                                                 │
│  - Infinite scale                                               │
│  - Pay per byte stored                                          │
│  - 11 nines durability                                          │
│  - Automatic replication                                        │
└────────────────────────────────────────────────────────────────┘

Cloud-Native Benefits

Elastic scaling: Handle 100× traffic spike without pre-provisioning
Cost efficiency: Pay only for what you use (compute, storage, I/O)
Operational simplicity: Cloud provider manages infrastructure
Global availability: Deploy in any region with clicks
Storage economics: Cloud object storage is cheap ($0.02/GB/month)

Representative Systems

System	Cloud	Architecture
Snowflake	Multi-cloud	Compute-storage separation, virtual warehouses
BigQuery	GCP	Serverless, Dremel engine, Capacitor storage
Amazon Redshift Serverless	AWS	Auto-scaling, pay-per-query
Azure Synapse	Azure	Unified analytics, on-demand pools
Databricks	Multi-cloud	Spark-based, Delta Lake storage

The Lakehouse Trend

Cloud-native architectures are converging toward the "lakehouse" pattern—data stored in open formats (Parquet, Delta) on cloud object storage, with multiple engines (SQL, ML, streaming) accessing the same data. This decouples storage from any single compute engine, enabling best-of-breed processing.

Choosing an Architecture

Selecting the right architecture depends on workload characteristics, scale requirements, operational constraints, and organizational context.

Decision Framework

Architecture Selection Guide
Requirement	Recommended Architecture	Rationale
Scale writes horizontally	Shared-nothing	Partitioning enables write distribution
Elastic read scaling only	Shared-disk / Read replicas	Simpler than partitioning; add compute
Integrate existing databases	Federation	Avoid replacing working systems
Multi-region write availability	Multi-primary	Accept conflict complexity for availability
Variable workloads, cost sensitivity	Cloud-native / Serverless	Scale to zero, pay per use
Strong consistency required	Shared-nothing with consensus	Raft/Paxos for linearizable operations
Simple operations, single region	Primary-secondary replication	Well-understood, simpler to operate

Questions to Guide Selection

What's my scale target? 1TB or 1PB? 100 TPS or 1M TPS?
What consistency do I need? Can my application tolerate stale reads?
What's my availability requirement? Can I tolerate downtime during failover?
What's my geographic distribution? Single region, multi-region, global?
What's my operational capacity? Dedicated DBA team or developers doing everything?
What's my budget profile? Fixed capacity or pay-per-use?
What's my integration landscape? Greenfield or connecting existing systems?

Common Architecture Mistakes

•Over-engineering: Using globally distributed multi-primary for a single-region workload
•Under-engineering: Starting with architecture that can't scale when needed
•Ignoring operations: Choosing complex architecture without operational expertise
•Ignoring costs: Not modeling total cost of ownership including operations
•Lock-in blindness: Choosing proprietary architecture without exit strategy

Start Simple, Evolve

Most successful systems start simpler than they end. Begin with primary-secondary replication in one region. Add read replicas when read scale is needed. Add sharding when write scale is needed. Add multi-region when geographic distribution is required. Each step adds complexity—take it only when necessary.

Summary: Understanding Database Architecture

Architecture is how distributed database concepts combine into functional systems. Let's consolidate the key concepts:

Key Takeaways

•Architectures combine choices across dimensions — Resource sharing, data distribution, coordination, consistency, query execution
•Shared-nothing scales horizontally — Each node independent with private resources; dominant for modern distributed databases
•Shared-disk separates compute and storage — Simpler scaling but coordination overhead; reinvented for cloud
•Federation integrates heterogeneous databases — Unified access to autonomous systems; limited by cross-database operations
•Multi-primary enables write availability — Writes anywhere but requires conflict resolution; complex operationally
•Cloud-native leverages cloud primitives — Compute-storage separation, elasticity, serverless options
•Architecture should match requirements — Scale, consistency, geography, operations, budget all inform the choice

Module Complete: Distributed Database Concepts

You've now completed the foundational module on distributed database concepts. You understand:

Why we distribute databases (motivation)
How we divide data (fragmentation)
How we maintain copies (replication)
How we hide complexity (transparency)
How these combine into systems (architecture)

With this foundation, you're prepared to explore specific distributed database techniques in subsequent modules: fragmentation strategies, distributed transactions, CAP theorem implications, and sharding patterns.

Module Complete

Congratulations! You've mastered the foundational concepts of distributed databases. You can now reason about why distribution is necessary, how data is partitioned and replicated, what transparency mechanisms hide from applications, and how different architectures make different trade-offs. This conceptual foundation prepares you for the practical techniques covered in subsequent modules.

Architecture

Structuring Distributed Database Systems

What You Will Learn

Architectural Dimensions

Before examining specific architectures, let's understand the dimensions along which architectures differ:

1. Resource Sharing Model

What resources do nodes share?

Shared-Nothing: Nodes share nothing; each has private CPU, memory, disk
Shared-Disk: Nodes share storage but have private CPU and memory
Shared-Everything: All resources shared (traditional SMP, not truly distributed)

2. Data Distribution Model

How is data placed across nodes?

Partitioned: Each piece of data on one node (or replicated set of nodes)
Replicated: All data on all nodes
Hybrid: Some data partitioned, some replicated (e.g., reference data replicated)

3. Coordination Model

How do nodes coordinate operations?

Centralized: One coordinator node makes decisions
Decentralized: All nodes participate in coordination (consensus)
Hierarchical: Tree of coordinators for scalability

4. Consistency Model

What guarantees does the system provide?

Strong: All reads see most recent write (linearizable)
Session: Consistency within a session
Eventual: Replicas converge over time

5. Query Execution Model

How are queries distributed?

Push-down: Sub-queries pushed to data nodes, results aggregated
Centralized: All data pulled to central query processor
Hybrid: Compute where efficient, move when necessary

Architectural Dimension Trade-offs
Dimension	Option A	Option B	Trade-off
Sharing	Shared-Nothing	Shared-Disk	Scale-out simplicity vs. shared storage flexibility
Distribution	Partitioned	Replicated	Write scale vs. read availability
Coordination	Centralized	Decentralized	Simplicity vs. fault tolerance
Consistency	Strong	Eventual	Correctness guarantees vs. availability/latency
Query execution	Push-down	Centralized	Data locality vs. query optimization flexibility

Architectures Are Combinations

Shared-Nothing Architecture

Characteristics

Independence: Nodes operate independently; one node's failure doesn't affect others' data
Scale-out: Adding nodes adds capacity linearly (assuming good partitioning)
Data locality: Queries execute where data resides, minimizing network transfer
No contention: No shared resources to become bottlenecks

How It Works

Data is partitioned across nodes based on partition key
Queries are routed to nodes containing relevant data
Parallel execution on multiple nodes when query spans partitions
Coordinator aggregates results from participating nodes
Replication provides redundancy (each partition copied to multiple nodes)

shared_nothing_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Shared-Nothing Architecture
===========================
 
┌─────────────────────────────────────────────────────────────┐
│                     Client Application                       │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│              Query Router / Coordinator                      │
│         (routes queries to appropriate nodes)                │
└───────┬──────────────────┬──────────────────┬───────────────┘
        │                  │                  │
        ▼                  ▼                  ▼
┌───────────────┐  ┌───────────────┐  ┌───────────────┐
│    Node 1     │  │    Node 2     │  │    Node 3     │
├───────────────┤  ├───────────────┤  ├───────────────┤
│      CPU      │  │      CPU      │  │      CPU      │
│    Memory     │  │    Memory     │  │    Memory     │
│    Storage    │  │    Storage    │  │    Storage    │
├───────────────┤  ├───────────────┤  ├───────────────┤
│ Partition A   │  │ Partition B   │  │ Partition C   │
│ (Replica of C)│  │ (Replica of A)│  │ (Replica of B)│
└───────────────┘  └───────────────┘  └───────────────┘
 
Key characteristics:
- Each node has exclusive access to its storage
- Nodes communicate only via network
- Data partitioned and replicated across nodes
- No shared memory or shared disk

Shared-Nothing Advantages

•Linear scalability for partitionable workloads
•No single point of failure (with replication)
•Cost-effective with commodity hardware
•No shared resource contention
•Geographic distribution natural

Shared-Nothing Challenges

•Cross-partition queries expensive (network shuffles)
•Joins across partitions require data movement
•Rebalancing overhead when adding/removing nodes
•Distributed transactions complex
•Hot partitions can bottleneck scale

Representative Systems

System	Domain	Notes
PostgreSQL Citus	OLTP/HTAP	Extends PostgreSQL with sharding
CockroachDB	OLTP	Distributed SQL, Raft consensus
Cassandra	Wide-column	Leaderless, eventual consistency
MongoDB	Document	Sharded clusters
Amazon Redshift	OLAP	Columnar, MPP analytics
Google Spanner	OLTP	Global distribution, TrueTime

Shared-Nothing as Default

Shared-Disk Architecture

Characteristics

Single data copy: All nodes see the same storage; no partition placement decisions
Flexible compute scaling: Add/remove compute nodes without moving data
Simpler data management: No rebalancing, no data movement when scaling
Coordination required: Cache coherence and locking across nodes essential

How It Works

All nodes connect to shared storage (SAN, cloud storage)
Each node caches data in private memory (buffer pool)
Coordination layer manages cache coherence (e.g., block locking)
Queries can execute on any node—data accessible everywhere
Write coordination prevents conflicts
Storage layer provides durability and (optionally) replication

shared_disk_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Shared-Disk Architecture
========================
 
┌─────────────────────────────────────────────────────────────┐
│                     Client Application                       │
└───────────┬─────────────────────────────────┬───────────────┘
            │                                 │
            ▼                                 ▼
┌─────────────────────────┐   ┌─────────────────────────┐
│        Node 1           │   │        Node 2           │
├─────────────────────────┤   ├─────────────────────────┤
│         CPU             │   │         CPU             │
│    Private Memory       │   │    Private Memory       │
│  (Buffer Pool/Cache)    │   │  (Buffer Pool/Cache)    │
└───────────┬─────────────┘   └─────────────┬───────────┘
            │                               │
            │   ┌───────────────────────┐   │
            │   │  Cache Coordination   │   │
            │   │  (Global Lock Mgr)    │   │
            │   └───────────────────────┘   │
            │                               │
            └──────────────┬────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                     Shared Storage Layer                     │
│              (SAN, NFS, Cloud Block Storage)                 │
│                                                              │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐ │
│  │  Data A   │  │  Data B   │  │  Data C   │  │  Data D   │ │
│  └───────────┘  └───────────┘  └───────────┘  └───────────┘ │
└─────────────────────────────────────────────────────────────┘
 
Key characteristics:
- All nodes access the same storage
- Each node has private memory cache
- Coordination required for cache coherence
- Adding a node doesn't move data

Cache Coherence Challenge

When multiple nodes cache the same data, modifications must be coordinated:

Write invalidation: When Node 1 modifies data, Node 2's cache is invalidated
Distributed lock manager: Ensures only one node can modify a page
Global buffer cache: Shared memory across nodes (rare, complex)

Cache coherence overhead limits the scalability of shared-disk compared to shared-nothing.

Representative Systems

System	Domain	Notes
Oracle RAC	OLTP	Industry standard for shared-disk
Amazon Aurora	OLTP	Cloud-native, shared storage layer
Azure Hyperscale	OLTP	Page servers with shared storage
Exadata	OLTP/OLAP	Oracle with smart storage
IBM Db2 pureScale	OLTP	Shared-disk clustering

Cloud-Native Shared Storage

Federated Architecture

Characteristics

Heterogeneity: Component databases may be different vendors, versions, schemas
Autonomy: Each database is independently administered and controlled
Integration layer: Middleware provides unified query and transaction interface
Schema mapping: Global schema maps to local schemas of component databases

Why Federation?

Legacy integration: Connect existing databases without replacing them
Organizational boundaries: Different departments own different databases
Best-of-breed: Use specialized databases for different workloads
Incremental migration: Gradually move to new systems without big-bang cutover
Regulatory requirements: Keep certain data in specific systems for compliance

federated_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Federated Database Architecture
================================
 
┌────────────────────────────────────────────────────────────────┐
│                      Client Application                         │
│                  (Uses unified global schema)                   │
└───────────────────────────┬────────────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────────────┐
│                    Federation Middleware                        │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              Global Schema / Data Dictionary              │  │
│  └──────────────────────────────────────────────────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐   │
│  │Query Parser  │  │Query Decomp  │  │Result Integration  │   │
│  │& Planner     │  │& Routing     │  │& Transformation    │   │
│  └──────────────┘  └──────────────┘  └────────────────────┘   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Distributed Transaction Manager             │   │
│  └─────────────────────────────────────────────────────────┘   │
└───────┬────────────────────┬───────────────────────┬──────────┘
        │                    │                       │
        ▼                    ▼                       ▼
┌───────────────┐    ┌───────────────┐    ┌────────────────────┐
│  PostgreSQL   │    │   Oracle      │    │    MongoDB         │
│  (Local DB 1) │    │  (Local DB 2) │    │   (Local DB 3)     │
│               │    │               │    │                    │
│ HR Data       │    │ Financial     │    │ Product Catalog    │
│ Schema A      │    │ Schema B      │    │ Schema C           │
└───────────────┘    └───────────────┘    └────────────────────┘
 
 Autonomous:          Autonomous:          Autonomous:
 Own admin            Own admin            Own admin
 Local queries        Local queries        Local queries
 Local optimizations  Local optimizations  Local optimizations

Federation Components

Global Schema

A unified schema that abstracts local database schemas:

Global entities map to local tables
Schema conflicts resolved (naming, types, semantics)
May be read-only or fully updatable

Query Decomposition

Global queries are split into local queries:

Identify which local databases have relevant data
Rewrite global query in local SQL dialects
Optimize for local database capabilities

Result Integration

Local results are combined into global results:

Schema translation back to global schema
Joins across databases (expensive!)
Aggregations, sorting, deduplication

Distributed Transactions

Coordinate transactions spanning databases:

Two-phase commit across heterogeneous systems
Limited by least capable participant
Often avoided due to complexity

Federation Challenges

•Performance limitations: Cross-database joins are slow (data movement over network)
•Query optimization: Global optimizer lacks local statistics, can't push all operations
•Consistency complexity: Distributed transactions across heterogeneous systems are fragile
•Schema mapping maintenance: Local schema changes break global mappings
•Capability variations: Queries limited by least capable database

Modern Federation: Data Virtualization

Multi-Primary Architecture

Why Multi-Primary?

Write availability: If one primary fails, others continue accepting writes
Geographic writes: Users write to nearby primary; no cross-region latency
Load distribution: Write load spread across primaries

The Conflict Challenge

Multi-primary's fundamental challenge is conflicts. When two primaries concurrently modify the same data:

Primary A sets balance = 100
Primary B sets balance = 150
After replication, which value wins?

Both writes are valid; there's no single "correct" answer. Conflict resolution is required.

Conflict Resolution Strategies

1. Last-Writer-Wins (LWW)

Highest timestamp wins. Simple but:

Clock skew can pick "wrong" winner
Concurrent writes are silently discarded
Works well when conflicts are rare

2. Application-Defined Merge

Application provides merge logic:

Shopping cart: union items
Counter: add deltas
Document: three-way merge

Requires careful application design.

3. CRDTs (Conflict-free Replicated Data Types)

Data structures that mathematically guarantee convergence:

Grow-only counters
Observed-remove sets
Merge semantics built into data type

4. Conflict Detection and Resolution

Detect conflicts; let humans or application resolve:

Flag conflicting versions
Present to user for selection
Store conflict metadata for audit

Multi-Primary Conflict Resolution Comparison
Strategy	Automation	Data Safety	Complexity	Use Case
Last-Writer-Wins	Automatic	May lose data	Low	Caching, non-critical data
Application Merge	Semi-auto	Custom logic	Medium	Domain-specific semantics
CRDTs	Automatic	Mathematically safe	Low (for supported types)	Counters, sets, flags
Manual Resolution	Manual	Human-verified	High	Financial, legal documents

Multi-Primary Topologies

All-to-All Replication

Every primary replicates to every other. Simple but O(n²) connections.

Ring Replication

Each primary replicates to next in ring. Efficient but failure propagation risk.

Hub-and-Spoke

Central hub receives and redistributes. Single point of failure but simple.

Representative Systems

System	Conflict Strategy	Notes
MySQL Group Replication	Certification-based (block conflicts)	Synchronous multi-primary
PostgreSQL BDR	Custom conflict handlers	Asynchronous, LWW default
CouchDB	Revision tree, manual resolution	Document-level conflicts
Riak	Vector clocks, siblings	Eventually consistent

Multi-Primary is Complex

Cloud-Native Architecture

Key Characteristics

1. Compute-Storage Separation

Compute (query processing) and storage (data persistence) are separate, independently scalable tiers:

Scale compute for query load without touching storage
Scale storage for data growth without touching compute
Pay for each tier based on actual usage

2. Serverless Operation

Resources automatically scale with demand:

Automatic provisioning during load spikes
Scale to zero during idle periods
No capacity planning required

3. Multi-tenant Isolation

Cloud databases serve multiple tenants on shared infrastructure:

Workload isolation prevents noisy neighbors
Resource quotas ensure fair sharing
Data isolation for security

cloud_native_architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Cloud-Native Database Architecture (e.g., Snowflake, BigQuery)
=============================================================
 
┌────────────────────────────────────────────────────────────────┐
│                      Client Application                         │
└───────────────────────────┬────────────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────────────┐
│         Cloud Services Layer (Metadata, Coordination)          │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐   │
│  │Query Parser/ │  │Metadata      │  │Transaction         │   │
│  │Optimizer     │  │Service       │  │Management          │   │
│  └──────────────┘  └──────────────┘  └────────────────────┘   │
└───────────────────────────┬────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│  Compute      │   │  Compute      │   │  Compute      │
│  Cluster 1    │   │  Cluster 2    │   │  Cluster 3    │
│  (Warehouse)  │   │  (Warehouse)  │   │  (Warehouse)  │
│               │   │               │   │               │
│ ┌───────────┐ │   │ ┌───────────┐ │   │ ┌───────────┐ │
│ │  Ephemeral│ │   │ │  Ephemeral│ │   │ │  Ephemeral│ │
│ │   Cache   │ │   │ │   Cache   │ │   │ │   Cache   │ │
│ └───────────┘ │   │ └───────────┘ │   │ └───────────┘ │
└───────┬───────┘   └───────┬───────┘   └───────┬───────┘
        │                   │                   │
        └───────────────────┴───────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────────────┐
│              Cloud Object Storage (S3, GCS, Azure Blob)         │
│                                                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐       │
│  │ Table A  │  │ Table B  │  │ Table C  │  │ Metadata │       │
│  │ Parquet  │  │ Parquet  │  │ Parquet  │  │ (JSON)   │       │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘       │
│                                                                 │
│  - Infinite scale                                               │
│  - Pay per byte stored                                          │
│  - 11 nines durability                                          │
│  - Automatic replication                                        │
└────────────────────────────────────────────────────────────────┘

Cloud-Native Benefits

Elastic scaling: Handle 100× traffic spike without pre-provisioning
Cost efficiency: Pay only for what you use (compute, storage, I/O)
Operational simplicity: Cloud provider manages infrastructure
Global availability: Deploy in any region with clicks
Storage economics: Cloud object storage is cheap ($0.02/GB/month)

Representative Systems

System	Cloud	Architecture
Snowflake	Multi-cloud	Compute-storage separation, virtual warehouses
BigQuery	GCP	Serverless, Dremel engine, Capacitor storage
Amazon Redshift Serverless	AWS	Auto-scaling, pay-per-query
Azure Synapse	Azure	Unified analytics, on-demand pools
Databricks	Multi-cloud	Spark-based, Delta Lake storage

The Lakehouse Trend

Choosing an Architecture

Selecting the right architecture depends on workload characteristics, scale requirements, operational constraints, and organizational context.

Decision Framework

Architecture Selection Guide
Requirement	Recommended Architecture	Rationale
Scale writes horizontally	Shared-nothing	Partitioning enables write distribution
Elastic read scaling only	Shared-disk / Read replicas	Simpler than partitioning; add compute
Integrate existing databases	Federation	Avoid replacing working systems
Multi-region write availability	Multi-primary	Accept conflict complexity for availability
Variable workloads, cost sensitivity	Cloud-native / Serverless	Scale to zero, pay per use
Strong consistency required	Shared-nothing with consensus	Raft/Paxos for linearizable operations
Simple operations, single region	Primary-secondary replication	Well-understood, simpler to operate

Questions to Guide Selection

What's my scale target? 1TB or 1PB? 100 TPS or 1M TPS?
What consistency do I need? Can my application tolerate stale reads?
What's my availability requirement? Can I tolerate downtime during failover?
What's my geographic distribution? Single region, multi-region, global?
What's my operational capacity? Dedicated DBA team or developers doing everything?
What's my budget profile? Fixed capacity or pay-per-use?
What's my integration landscape? Greenfield or connecting existing systems?

Common Architecture Mistakes

•Over-engineering: Using globally distributed multi-primary for a single-region workload
•Under-engineering: Starting with architecture that can't scale when needed
•Ignoring operations: Choosing complex architecture without operational expertise
•Ignoring costs: Not modeling total cost of ownership including operations
•Lock-in blindness: Choosing proprietary architecture without exit strategy

Start Simple, Evolve

Summary: Understanding Database Architecture

Architecture is how distributed database concepts combine into functional systems. Let's consolidate the key concepts:

Key Takeaways

•Architectures combine choices across dimensions — Resource sharing, data distribution, coordination, consistency, query execution
•Shared-nothing scales horizontally — Each node independent with private resources; dominant for modern distributed databases
•Shared-disk separates compute and storage — Simpler scaling but coordination overhead; reinvented for cloud
•Federation integrates heterogeneous databases — Unified access to autonomous systems; limited by cross-database operations
•Multi-primary enables write availability — Writes anywhere but requires conflict resolution; complex operationally
•Cloud-native leverages cloud primitives — Compute-storage separation, elasticity, serverless options
•Architecture should match requirements — Scale, consistency, geography, operations, budget all inform the choice

Module Complete: Distributed Database Concepts

You've now completed the foundational module on distributed database concepts. You understand:

Why we distribute databases (motivation)
How we divide data (fragmentation)
How we maintain copies (replication)
How we hide complexity (transparency)
How these combine into systems (architecture)

Module Complete