Database Management SystemsDistributed Databases

Distributed Database Concepts

LevelAdvanced

Duration75 mins

TopicDistributed Databases

1 / 5

Distribution Motivation

Why Distribute a Database?

For decades, the centralized database model served as the backbone of enterprise computing. A single powerful server housed all data, processed all queries, and maintained all integrity constraints. This model was elegant in its simplicity—one source of truth, straightforward administration, and well-understood semantics.

Yet this model is fundamentally inadequate for the modern world.

Today's applications serve billions of users across every continent. Financial transactions never sleep. Social networks span the globe. IoT sensors generate petabytes of data continuously. A single server—no matter how powerful—cannot meet these demands. The question is no longer whether to distribute, but how.

What You Will Learn

By the end of this page, you will understand the fundamental drivers behind distributed database systems—scalability, availability, performance, geographic requirements, and organizational needs. You'll grasp why centralized databases fail at scale and how distribution addresses each limitation systematically.

The Limits of Centralization

Before examining why we distribute, we must understand why centralization fails. A centralized database architecture places all data and processing on a single node or tightly-coupled cluster. While this offers simplicity, it encounters fundamental barriers as systems scale:

Physical Hardware Limits

No single machine can indefinitely scale vertically. CPU cores, RAM capacity, disk I/O bandwidth, and network throughput all have upper bounds. When your application needs 1 TB of RAM but the largest available server offers 512 GB, vertical scaling has failed. When your workload requires 10 million IOPS but your storage system peaks at 1 million, no amount of money solves the problem.

The Speed of Light

Physics imposes inflexible constraints. Light travels approximately 299,792 km/s in vacuum—about 200,000 km/s in fiber optic cables. A round trip from New York to Tokyo (approximately 21,500 km) takes at minimum 107 milliseconds just for light to travel. Add network equipment latency, and you're looking at 150-200ms minimum for any database operation. For interactive applications, this is unacceptable.

Network Latency: The Inescapable Reality
Route	Distance	Light Speed Minimum	Typical RTT
Same Data Center	< 1 km	~0.003 ms	0.5 - 1 ms
Same Region	100-500 km	0.5 - 2.5 ms	5 - 15 ms
Cross-Continental	2,000-5,000 km	10 - 25 ms	40 - 80 ms
Transoceanic	10,000-20,000 km	50 - 100 ms	150 - 300 ms

Single Points of Failure

A centralized database is inherently a single point of failure. Hardware fails—disks, memory modules, power supplies, network cards, entire servers. Data centers experience outages from power failures, cooling failures, network cuts, and natural disasters. If your single database node fails, your entire application fails.

This creates an uncomfortable risk equation: the probability of any component failing increases with system complexity, yet centralization demands ever-more-complex single nodes. The more critical your system, the more catastrophic its failure.

The Centralization Paradox

As centralized systems grow more critical, they require more complex hardware for reliability. But more complex hardware has more failure modes. Eventually, the cost of achieving five-nines (99.999%) availability on a single node exceeds the cost of distributing across multiple simpler nodes—and distribution provides better availability anyway.

Scalability: Beyond Vertical Limits

Scalability is the primary driver for database distribution. When a single machine cannot handle the workload, you have two options: make the machine bigger (vertical scaling) or add more machines (horizontal scaling).

Vertical Scaling (Scale Up)

Upgrade to more powerful hardware: faster CPUs, more RAM, faster storage, bigger network pipes. This approach is conceptually simple—your application code doesn't change, your deployment model stays the same. But vertical scaling has fundamental problems:

Diminishing returns: Doubling hardware cost rarely doubles performance
Hard ceilings: Physical limits on CPU speed, RAM density, storage bandwidth
Increased cost per unit: High-end hardware commands premium prices
Longer procurement cycles: Custom enterprise hardware takes months to acquire
Single point of failure persists: A bigger machine is still just one machine

Vertical Scaling Limits

•Maximum server RAM: ~24 TB (high-end)
•Maximum CPU cores: ~448 (per socket × sockets)
•Single storage controller bandwidth limits
•Memory bus contention at high core counts
•Exponential cost curve at top specifications
•Procurement lead times of 3-12 months

Horizontal Scaling Benefits

•Add nodes incrementally as demand grows
•Use commodity hardware: lower cost per unit
•Near-linear scaling for partitionable workloads
•Faster provisioning: minutes versus months
•Natural fault tolerance: nodes are redundant
•Mix hardware generations: gradual upgrades

Horizontal Scaling (Scale Out)

Distribute data and processing across multiple machines. Each machine handles a portion of the overall workload. This approach addresses vertical scaling's limitations:

Linear capacity growth: Add nodes to add capacity
Commodity economics: Use many cheaper machines instead of one expensive one
Incremental investment: Scale precisely to demand, not in quantum leaps
Parallel processing: Exploit data parallelism for faster query execution
Natural redundancy: Losing one node doesn't lose all data

However, horizontal scaling introduces complexity:

Data partitioning: Deciding how to divide data across nodes
Distributed coordination: Maintaining consistency across nodes
Network overhead: Data movement between nodes adds latency
Operational complexity: Monitoring, debugging, upgrading distributed systems

These challenges define the distributed database design space. The motivation is clear—horizontal scaling is often the only viable path—but the implementation requires careful engineering.

Amdahl's Law Reminder

Horizontal scaling is limited by the sequential fraction of your workload. If 10% of processing must happen sequentially, maximum speedup from parallelization is 10× regardless of node count. Distributed database design focuses on minimizing this sequential fraction through careful data partitioning and transaction scoping.

Availability: Surviving Failures

Availability measures the proportion of time a system is operational and accessible. For mission-critical applications—banking, healthcare, e-commerce, air traffic control—availability requirements approach 100%, often expressed as "nines":

99% (two nines): 3.65 days downtime/year — unacceptable for business-critical systems
99.9% (three nines): 8.77 hours downtime/year — typical SLA for business applications
99.99% (four nines): 52.6 minutes downtime/year — high-availability standard
99.999% (five nines): 5.26 minutes downtime/year — carrier-grade, life-critical systems
99.9999% (six nines): 31.5 seconds downtime/year — theoretical, rarely achieved

Achieving high availability with a centralized database requires elaborate failure mitigation: redundant power supplies, redundant network paths, redundant disk controllers, standby nodes with synchronous replication. Despite this redundancy, centralization imposes limits.

Why Distribution Improves Availability

Distributed systems achieve availability through redundancy and independence. When data exists on multiple nodes:

Node failures are localized: One node failing doesn't require full system shutdown
Data remains accessible: Other replicas serve requests while failed node recovers
Maintenance windows shrink: Rolling upgrades: update nodes one at a time, never fully offline
Disaster recovery is inherent: Data survives regional catastrophes if distributed geographically

The Mathematics of Redundancy

Consider a single node with 99.9% availability (8.77 hours downtime/year). If you replicate data across N independent nodes, and you only need one node available:

Probability all nodes are down = (1 - 0.999)^N = (0.001)^N
2 nodes: (0.001)² = 0.000001 = 99.9999% availability
3 nodes: (0.001)³ = 0.000000001 = 99.9999999% availability

Of course, nodes aren't truly independent—correlated failures (shared network, shared power, shared software bugs) reduce these theoretical gains. But distribution still provides substantial availability improvement over centralization.

Availability Improvements Through Replication
Configuration	Individual Node Availability	System Availability	Downtime/Year
Single Node	99.9%	99.9%	8.77 hours
2 Replicas (independent)	99.9% each	99.9999%	31.5 seconds
3 Replicas (independent)	99.9% each	99.9999999%	0.03 seconds
3 Replicas (correlated, realistic)	99.9% each	~99.99%	~52 minutes
3 Replicas, 3 Regions	99.9% each	~99.999%	~5 minutes

Availability vs. Durability

Availability measures whether you can access data now. Durability measures whether data is preserved forever. They're related but distinct. A system might be temporarily unavailable (maintenance window) while perfectly durable, or available with durability risk (single replica, uncommitted data). Distribution typically improves both, but through different mechanisms.

Performance: Reducing Latency

Performance, particularly latency, drives distribution for globally-accessed applications. Users expect responsive applications regardless of their location. The speed of light imposes hard lower bounds on latency for distant data centers. Distribution solves this by placing data closer to users.

The Latency Problem

Consider an application serving users in both New York and Tokyo. With a centralized database in New York:

New York users experience ~5ms database latency (same region)
Tokyo users experience ~150ms database latency (transoceanic)

For a typical web page load requiring 5 database round trips:

New York: 5 × 5ms = 25ms database time
Tokyo: 5 × 150ms = 750ms database time

This 30× difference creates dramatically inferior user experience for Tokyo users—directly impacting engagement, conversion, and satisfaction.

Geographic Distribution as Solution

By placing database replicas (or partitions) in multiple regions, you reduce the distance data must travel:

Read replicas in Tokyo: Tokyo users read from local replica, ~5ms latency
Writes to closest primary: If Tokyo is primary, Tokyo writes are fast; if not, write latency remains
Edge caching: Frequently-accessed data cached at edge, sub-millisecond access

This approach trades consistency complexity for latency improvement. Data replicated across regions may be temporarily inconsistent (replication lag), requiring application-level design decisions about what consistency level each operation needs.

Performance Benefits of Distribution

•Reduced read latency: Local replicas serve reads without cross-region hops
•Query parallelism: Partition scans execute on multiple nodes concurrently
•Load distribution: Traffic spreads across nodes, preventing hotspots
•Specialized nodes: Different nodes optimized for different workload types (OLTP vs. OLAP)
•Cache locality: Each node maintains local buffer pool for its data subset
•Reduced contention: Fewer transactions compete for same locks when data is partitioned

The Write Latency Tradeoff

While reads can be served from local replicas, writes often require coordination. Synchronous replication across regions adds latency to every write. Asynchronous replication reduces write latency but creates consistency windows. This fundamental tension—between low latency and strong consistency—drives much of distributed database design.

Organizational and Regulatory Drivers

Beyond technical motivations, organizational structures and regulatory requirements often mandate database distribution.

Organizational Autonomy

Large enterprises consist of semi-autonomous business units, each with distinct data management needs:

Regional operations: Local offices need low-latency access to their data
Departmental ownership: Different departments own different data domains
Merger integration: Acquired companies have existing databases that must co-exist
Security boundaries: Sensitive data must remain within specific organizational units

Distributed databases allow each unit to maintain local database instances while participating in organization-wide data integration when needed.

Data Sovereignty and Regulatory Compliance

Governments increasingly mandate that certain data remain within national borders:

GDPR (EU): Personal data of EU residents has specific processing and transfer requirements
Russia's Data Localization Law: Personal data of Russian citizens must be stored on servers in Russia
China's Cybersecurity Law: Critical data must remain in mainland China
HIPAA (US): Healthcare data has specific handling and storage requirements
PCI-DSS: Payment card data requires specific security controls

A global company cannot simply store all data in one US data center. They need database architecture that places EU customer data in EU, Chinese customer data in China, and so forth—while still enabling unified analytics and operations where permitted.

Data Residency Requirements by Region
Regulation	Jurisdiction	Data Types Covered	Requirement
GDPR	European Union	Personal data of EU residents	Data processing must comply with EU law; international transfers restricted
Data Localization Law	Russia	Personal data of Russian citizens	Primary storage must be on Russian servers
Cybersecurity Law	China	"Important data" and personal information	Security assessment required for cross-border transfers
HIPAA	United States	Protected Health Information (PHI)	Specific security and privacy safeguards required
PDPA	Singapore	Personal data	Comparable protection required for transfers

Technical Architecture Follows Legal Reality

Data sovereignty requirements fundamentally shape distributed database design. You cannot simply optimize for performance and availability—you must also satisfy legal constraints. This often means maintaining separate database deployments per jurisdiction with controlled, audited data flows between them.

Cost Optimization Through Distribution

Counter-intuitively, distributing a database can reduce costs compared to centralized alternatives. While distribution adds operational complexity, it also enables cost optimization strategies that centralization precludes.

Commodity Hardware Economics

Distributed systems can use many inexpensive commodity servers instead of few expensive enterprise-grade machines:

Top-tier enterprise server: $500,000 - $2,000,000+
Commodity server: $5,000 - $20,000
Ratio: 25-400× cost difference per unit

If workload distributes well across 100 commodity servers at $15,000 each ($1.5M total) versus requiring a single $2M enterprise server, distributed architecture wins economically—and provides better availability and scalability.

Cost Advantages of Distribution

•Commodity hardware: Avoid enterprise hardware premiums
•Incremental capacity: Buy exactly what you need, when you need it
•Competitive provisioning: Choose among cloud providers per region
•Right-sizing nodes: Use different node specs for different workloads
•Geographic arbitrage: Deploy compute where electricity/labor is cheaper
•Graceful degradation: Tolerate partial outages without full-system redundancy costs

Workload Isolation and Resource Optimization

Distribution enables workload isolation—separating different types of processing onto different infrastructure:

OLTP workloads: Fast SSD storage, high memory, moderate CPU
OLAP workloads: Large storage, very high CPU, moderate memory
Archive storage: Cheap spinning disk, minimal compute

Centralized systems must provision for peak load across all workload types simultaneously. Distributed systems can right-size each component independently.

Cloud and Elasticity

Cloud computing platforms (AWS, Azure, GCP) charge by resource-hour. Distributed architectures can scale nodes up and down with demand:

Peak hours: Run 20 nodes
Off-peak hours: Run 5 nodes
Daily average: Pay for 10 node-equivalent

Centralized architectures must provision for peak load continuously, paying for idle capacity during off-peak periods.

Total Cost of Ownership

Distribution adds operational complexity that has costs: more nodes to monitor, more complex failure modes, specialized expertise requirements. A fair cost comparison considers total cost of ownership including operations, not just hardware. For many organizations, the scale and availability benefits justify these costs—but it's not universally cheaper.

When NOT to Distribute

Distribution is not universally appropriate. The complexity it introduces is substantial, and not every system benefits from—or can tolerate—that complexity.

Unnecessary Distribution

Many applications simply don't need distributed databases:

Single-region, moderate-scale applications: A single PostgreSQL instance with read replicas handles most CRUD applications serving millions of users
Strong consistency requirements: Distributed consensus protocols add latency; some applications can't tolerate this
Simple operational requirements: Small teams without distributed systems expertise may struggle with distributed operations
Development-stage products: Premature distribution adds complexity before product-market fit

Complexity Costs

Distributed systems introduce failure modes that centralized systems don't have:

Distributed System Complexities

•Network partitions: Nodes unable to communicate, leading to split-brain scenarios
•Partial failures: Some nodes succeed while others fail mid-transaction
•Clock skew: Different nodes have different notions of "now"
•Replication lag: Replicas temporarily out of sync with primary
•Distributed deadlocks: Deadlocks spanning multiple nodes, harder to detect
•Coordination overhead: Consensus protocols add latency to every coordinated operation
•Debugging difficulty: Tracing issues across nodes requires specialized tooling

The "Distributed Systems Are Hard" Reality

Distributed systems require specialized expertise to design, implement, and operate. Failure modes are subtle and often emerge only at scale or under specific timing conditions. Teams without this expertise may achieve worse reliability with distribution than with a well-operated single-node database.

The pragmatic advice: Start centralized. Distribute when you must. Build your application with clean data access patterns that don't preclude distribution, but don't distribute until:

You've exhausted vertical scaling options
You need availability guarantees a single node can't provide
Geographic distribution is required for latency or compliance
Your team has expertise to operate distributed systems

Premature Distribution

Premature distribution is a common mistake. Startups with 10,000 users don't need globally distributed databases. They need to find product-market fit. A single properly-tuned database instance can handle remarkable scale. Instagram famously served 30 million users on a small PostgreSQL deployment. Scale problems are good problems—they mean you have users.

Summary: Understanding Distribution Motivation

We've examined the fundamental drivers that motivate database distribution. Let's consolidate these insights:

Key Takeaways

•Centralization has hard limits — Physical hardware caps, speed of light latency, and single points of failure constrain what single-node databases can achieve
•Scalability drives distribution — Horizontal scaling overcomes vertical scaling limits through commodity hardware and incremental capacity addition
•Availability improves through redundancy — Multiple replicas provide fault tolerance; losing one node doesn't lose all data
•Performance requires proximity — Geographic distribution reduces latency by placing data closer to users
•Regulations mandate distribution — Data sovereignty laws require local storage for certain data types
•Cost optimization is possible — Commodity hardware economics and workload isolation can reduce total costs
•Distribution isn't always appropriate — Complexity costs are real; start centralized, distribute when necessary

What's Next

Understanding why to distribute is the first step. The next question is how. In the following pages, we'll explore the fundamental techniques that distributed databases use:

Fragmentation: How to divide data across nodes
Replication: How to copy data for availability and performance
Transparency: How to hide distribution complexity from applications
Architecture: How to structure the overall distributed system

Each technique addresses specific motivations—fragmentation for scalability, replication for availability, transparency for usability, and architecture for coherent system design.

Page Complete

You now understand the fundamental motivations driving database distribution. These aren't arbitrary technical choices—they're responses to real physical, organizational, and economic constraints. Next, we'll explore how data fragmentation partitions data across distributed nodes.

1 / 5

Loading learning content...

Database Management SystemsDistributed Databases

Distributed Database Concepts

LevelAdvanced

Duration75 mins

TopicDistributed Databases

1 / 5

Distribution Motivation

Why Distribute a Database?

Yet this model is fundamentally inadequate for the modern world.

What You Will Learn

The Limits of Centralization

Physical Hardware Limits

The Speed of Light

Network Latency: The Inescapable Reality
Route	Distance	Light Speed Minimum	Typical RTT
Same Data Center	< 1 km	~0.003 ms	0.5 - 1 ms
Same Region	100-500 km	0.5 - 2.5 ms	5 - 15 ms
Cross-Continental	2,000-5,000 km	10 - 25 ms	40 - 80 ms
Transoceanic	10,000-20,000 km	50 - 100 ms	150 - 300 ms

Single Points of Failure

The Centralization Paradox

Scalability: Beyond Vertical Limits

Vertical Scaling (Scale Up)

Diminishing returns: Doubling hardware cost rarely doubles performance
Hard ceilings: Physical limits on CPU speed, RAM density, storage bandwidth
Increased cost per unit: High-end hardware commands premium prices
Longer procurement cycles: Custom enterprise hardware takes months to acquire
Single point of failure persists: A bigger machine is still just one machine

Vertical Scaling Limits

•Maximum server RAM: ~24 TB (high-end)
•Maximum CPU cores: ~448 (per socket × sockets)
•Single storage controller bandwidth limits
•Memory bus contention at high core counts
•Exponential cost curve at top specifications
•Procurement lead times of 3-12 months

Horizontal Scaling Benefits

•Add nodes incrementally as demand grows
•Use commodity hardware: lower cost per unit
•Near-linear scaling for partitionable workloads
•Faster provisioning: minutes versus months
•Natural fault tolerance: nodes are redundant
•Mix hardware generations: gradual upgrades

Horizontal Scaling (Scale Out)

Distribute data and processing across multiple machines. Each machine handles a portion of the overall workload. This approach addresses vertical scaling's limitations:

Linear capacity growth: Add nodes to add capacity
Commodity economics: Use many cheaper machines instead of one expensive one
Incremental investment: Scale precisely to demand, not in quantum leaps
Parallel processing: Exploit data parallelism for faster query execution
Natural redundancy: Losing one node doesn't lose all data

However, horizontal scaling introduces complexity:

Data partitioning: Deciding how to divide data across nodes
Distributed coordination: Maintaining consistency across nodes
Network overhead: Data movement between nodes adds latency
Operational complexity: Monitoring, debugging, upgrading distributed systems

These challenges define the distributed database design space. The motivation is clear—horizontal scaling is often the only viable path—but the implementation requires careful engineering.

Amdahl's Law Reminder

Availability: Surviving Failures

99% (two nines): 3.65 days downtime/year — unacceptable for business-critical systems
99.9% (three nines): 8.77 hours downtime/year — typical SLA for business applications
99.99% (four nines): 52.6 minutes downtime/year — high-availability standard
99.999% (five nines): 5.26 minutes downtime/year — carrier-grade, life-critical systems
99.9999% (six nines): 31.5 seconds downtime/year — theoretical, rarely achieved

Why Distribution Improves Availability

Distributed systems achieve availability through redundancy and independence. When data exists on multiple nodes:

Node failures are localized: One node failing doesn't require full system shutdown
Data remains accessible: Other replicas serve requests while failed node recovers
Maintenance windows shrink: Rolling upgrades: update nodes one at a time, never fully offline
Disaster recovery is inherent: Data survives regional catastrophes if distributed geographically

The Mathematics of Redundancy

Consider a single node with 99.9% availability (8.77 hours downtime/year). If you replicate data across N independent nodes, and you only need one node available:

Probability all nodes are down = (1 - 0.999)^N = (0.001)^N
2 nodes: (0.001)² = 0.000001 = 99.9999% availability
3 nodes: (0.001)³ = 0.000000001 = 99.9999999% availability

Availability Improvements Through Replication
Configuration	Individual Node Availability	System Availability	Downtime/Year
Single Node	99.9%	99.9%	8.77 hours
2 Replicas (independent)	99.9% each	99.9999%	31.5 seconds
3 Replicas (independent)	99.9% each	99.9999999%	0.03 seconds
3 Replicas (correlated, realistic)	99.9% each	~99.99%	~52 minutes
3 Replicas, 3 Regions	99.9% each	~99.999%	~5 minutes

Availability vs. Durability

Performance: Reducing Latency

The Latency Problem

Consider an application serving users in both New York and Tokyo. With a centralized database in New York:

New York users experience ~5ms database latency (same region)
Tokyo users experience ~150ms database latency (transoceanic)

For a typical web page load requiring 5 database round trips:

New York: 5 × 5ms = 25ms database time
Tokyo: 5 × 150ms = 750ms database time

This 30× difference creates dramatically inferior user experience for Tokyo users—directly impacting engagement, conversion, and satisfaction.

Geographic Distribution as Solution

By placing database replicas (or partitions) in multiple regions, you reduce the distance data must travel:

Read replicas in Tokyo: Tokyo users read from local replica, ~5ms latency
Writes to closest primary: If Tokyo is primary, Tokyo writes are fast; if not, write latency remains
Edge caching: Frequently-accessed data cached at edge, sub-millisecond access

Performance Benefits of Distribution

•Reduced read latency: Local replicas serve reads without cross-region hops
•Query parallelism: Partition scans execute on multiple nodes concurrently
•Load distribution: Traffic spreads across nodes, preventing hotspots
•Specialized nodes: Different nodes optimized for different workload types (OLTP vs. OLAP)
•Cache locality: Each node maintains local buffer pool for its data subset
•Reduced contention: Fewer transactions compete for same locks when data is partitioned

The Write Latency Tradeoff

Organizational and Regulatory Drivers

Beyond technical motivations, organizational structures and regulatory requirements often mandate database distribution.

Organizational Autonomy

Large enterprises consist of semi-autonomous business units, each with distinct data management needs:

Regional operations: Local offices need low-latency access to their data
Departmental ownership: Different departments own different data domains
Merger integration: Acquired companies have existing databases that must co-exist
Security boundaries: Sensitive data must remain within specific organizational units

Distributed databases allow each unit to maintain local database instances while participating in organization-wide data integration when needed.

Data Sovereignty and Regulatory Compliance

Governments increasingly mandate that certain data remain within national borders:

GDPR (EU): Personal data of EU residents has specific processing and transfer requirements
Russia's Data Localization Law: Personal data of Russian citizens must be stored on servers in Russia
China's Cybersecurity Law: Critical data must remain in mainland China
HIPAA (US): Healthcare data has specific handling and storage requirements
PCI-DSS: Payment card data requires specific security controls

Data Residency Requirements by Region
Regulation	Jurisdiction	Data Types Covered	Requirement
GDPR	European Union	Personal data of EU residents	Data processing must comply with EU law; international transfers restricted
Data Localization Law	Russia	Personal data of Russian citizens	Primary storage must be on Russian servers
Cybersecurity Law	China	"Important data" and personal information	Security assessment required for cross-border transfers
HIPAA	United States	Protected Health Information (PHI)	Specific security and privacy safeguards required
PDPA	Singapore	Personal data	Comparable protection required for transfers

Technical Architecture Follows Legal Reality

Cost Optimization Through Distribution

Commodity Hardware Economics

Distributed systems can use many inexpensive commodity servers instead of few expensive enterprise-grade machines:

Top-tier enterprise server: $500,000 - $2,000,000+
Commodity server: $5,000 - $20,000
Ratio: 25-400× cost difference per unit

Cost Advantages of Distribution

•Commodity hardware: Avoid enterprise hardware premiums
•Incremental capacity: Buy exactly what you need, when you need it
•Competitive provisioning: Choose among cloud providers per region
•Right-sizing nodes: Use different node specs for different workloads
•Geographic arbitrage: Deploy compute where electricity/labor is cheaper
•Graceful degradation: Tolerate partial outages without full-system redundancy costs

Workload Isolation and Resource Optimization

Distribution enables workload isolation—separating different types of processing onto different infrastructure:

OLTP workloads: Fast SSD storage, high memory, moderate CPU
OLAP workloads: Large storage, very high CPU, moderate memory
Archive storage: Cheap spinning disk, minimal compute

Centralized systems must provision for peak load across all workload types simultaneously. Distributed systems can right-size each component independently.

Cloud and Elasticity

Cloud computing platforms (AWS, Azure, GCP) charge by resource-hour. Distributed architectures can scale nodes up and down with demand:

Peak hours: Run 20 nodes
Off-peak hours: Run 5 nodes
Daily average: Pay for 10 node-equivalent

Centralized architectures must provision for peak load continuously, paying for idle capacity during off-peak periods.

Total Cost of Ownership

When NOT to Distribute

Distribution is not universally appropriate. The complexity it introduces is substantial, and not every system benefits from—or can tolerate—that complexity.

Unnecessary Distribution

Many applications simply don't need distributed databases:

Single-region, moderate-scale applications: A single PostgreSQL instance with read replicas handles most CRUD applications serving millions of users
Strong consistency requirements: Distributed consensus protocols add latency; some applications can't tolerate this
Simple operational requirements: Small teams without distributed systems expertise may struggle with distributed operations
Development-stage products: Premature distribution adds complexity before product-market fit

Complexity Costs

Distributed systems introduce failure modes that centralized systems don't have:

Distributed System Complexities

•Network partitions: Nodes unable to communicate, leading to split-brain scenarios
•Partial failures: Some nodes succeed while others fail mid-transaction
•Clock skew: Different nodes have different notions of "now"
•Replication lag: Replicas temporarily out of sync with primary
•Distributed deadlocks: Deadlocks spanning multiple nodes, harder to detect
•Coordination overhead: Consensus protocols add latency to every coordinated operation
•Debugging difficulty: Tracing issues across nodes requires specialized tooling

The "Distributed Systems Are Hard" Reality

The pragmatic advice: Start centralized. Distribute when you must. Build your application with clean data access patterns that don't preclude distribution, but don't distribute until:

You've exhausted vertical scaling options
You need availability guarantees a single node can't provide
Geographic distribution is required for latency or compliance
Your team has expertise to operate distributed systems

Premature Distribution

Summary: Understanding Distribution Motivation

We've examined the fundamental drivers that motivate database distribution. Let's consolidate these insights:

Key Takeaways

•Centralization has hard limits — Physical hardware caps, speed of light latency, and single points of failure constrain what single-node databases can achieve
•Scalability drives distribution — Horizontal scaling overcomes vertical scaling limits through commodity hardware and incremental capacity addition
•Availability improves through redundancy — Multiple replicas provide fault tolerance; losing one node doesn't lose all data
•Performance requires proximity — Geographic distribution reduces latency by placing data closer to users
•Regulations mandate distribution — Data sovereignty laws require local storage for certain data types
•Cost optimization is possible — Commodity hardware economics and workload isolation can reduce total costs
•Distribution isn't always appropriate — Complexity costs are real; start centralized, distribute when necessary

What's Next

Understanding why to distribute is the first step. The next question is how. In the following pages, we'll explore the fundamental techniques that distributed databases use:

Fragmentation: How to divide data across nodes
Replication: How to copy data for availability and performance
Transparency: How to hide distribution complexity from applications
Architecture: How to structure the overall distributed system

Each technique addresses specific motivations—fragmentation for scalability, replication for availability, transparency for usability, and architecture for coherent system design.

Page Complete

1 / 5