Database Management SystemsDistributed Databases

Distributed Database Concepts

LevelAdvanced

Duration75 mins

TopicDistributed Databases

4 / 5

Transparency Types

Hiding the Complexity of Distribution

Distributed databases are inherently complex. Data is fragmented across nodes, replicated for availability, and accessed through networks with latency and failures. Yet application developers shouldn't need to manage this complexity—they should interact with the database as if it were a single, cohesive system.

This abstraction is called transparency. A distributed database provides transparency when its distribution is invisible (or mostly invisible) to applications and users. The database handles routing, coordination, replication, and failure recovery internally, exposing a simplified interface that resembles a centralized system.

Transparency is what makes distributed databases usable. Without it, every application would need to know which node holds which data, track replica states, handle network failures, and implement distributed transaction protocols. This would make application development impossibly complex.

What You Will Learn

By the end of this page, you will understand the types of transparency in distributed databases—location, fragmentation, replication, failure, concurrency, and more. You'll grasp how each transparency type simplifies application development, what mechanisms enable it, and where the limits of transparency lie.

The Transparency Concept

What is Transparency?

In distributed systems, transparency refers to hiding certain aspects of distribution from users and applications. A transparent distributed database behaves like a centralized one from the application's perspective, even though data and processing are distributed across multiple nodes.

The Transparency Spectrum

Transparency exists on a spectrum. Complete transparency—where applications are entirely unaware of distribution—is theoretically appealing but often impractical. Real-world systems provide partial transparency, hiding what can be hidden efficiently while exposing what applications must handle.

Why Transparency Matters

Application simplicity: Developers use familiar SQL without distributed systems expertise
Portability: Applications written for centralized databases work on distributed ones
Flexibility: Database topology can change without application changes
Reliability: Failure handling happens inside the database, not in application code

The Standard Transparency Types

The ISO/OSI Reference Model for Open Distributed Processing defines several transparency types. For databases, the most relevant are:

Transparency Type	What It Hides
Location Transparency	Physical location of data
Fragmentation Transparency	How data is partitioned
Replication Transparency	Existence of multiple copies
Failure Transparency	Node and network failures
Concurrency Transparency	Concurrent access by other users
Access Transparency	Differences in data access methods
Migration Transparency	Movement of data between nodes
Performance Transparency	System tuning and optimization

We'll examine each in detail, understanding what mechanisms enable them and what trade-offs they impose.

Transparency is Not Free

Each transparency type requires mechanisms that add overhead, latency, or complexity. Full transparency may be impossible under certain failure conditions (see CAP theorem). System designers must balance transparency benefits against their costs, sometimes intentionally exposing distribution aspects when transparency is too expensive.

Location Transparency

Location transparency means applications access data by logical name without knowing or specifying the physical location. Whether data resides in New York, Tokyo, or a local server room, the access syntax is identical.

Without Location Transparency

Applications would specify node addresses in queries:

-- Hypothetical non-transparent query
SELECT * FROM node3.datacenter_tokyo.customers WHERE id = 12345;

This is brittle—if data moves, queries break. Applications become coupled to physical infrastructure.

With Location Transparency

Applications use logical names; the system resolves locations:

-- Transparent query
SELECT * FROM customers WHERE id = 12345;

The distributed database routes this query to wherever customer 12345 resides.

Implementation Mechanisms

1. Global Catalog / Data Dictionary

A metadata system tracks which data is located where:

Table → Fragments → Nodes mapping
Updated when data moves
Queried to route requests

2. Distributed Naming

Logical object names are mapped to physical locations:

customers → fragments (customers_NA, customers_EU, customers_APAC)
Each fragment → node (customers_NA → node-us-east-1)

3. Query Routing Layer

A routing layer intercepts queries and directs them:

Parse query to identify accessed tables
Consult catalog for data locations
Route query (or sub-queries) to appropriate nodes
Aggregate results and return to application

location_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Application issues a simple query
SELECT * FROM orders WHERE customer_id = 42;
 
-- Behind the scenes (transparent to application):
 
-- 1. Query router receives query
-- 2. Consults catalog: customer_id = 42 → customer is in region 'EU'
-- 3. EU customers' orders are on node 'eu-west-1'
-- 4. Routes query to eu-west-1
-- 5. eu-west-1 executes locally:
     SELECT * FROM orders_eu WHERE customer_id = 42;
-- 6. Results returned to router
-- 7. Router returns results to application
 
-- Application sees only the results, not the routing logic

Connection Strings vs. Location Transparency

Note that specifying a connection endpoint (hostname, port) is different from location transparency. The connection establishes where to connect; location transparency then abstracts which data is where. Even with location transparency, applications still connect to some entry point—but once connected, they needn't know data locations.

Fragmentation Transparency

Fragmentation transparency hides how tables are partitioned. Applications query logical tables; the system handles mapping queries to fragments and reconstructing complete results.

Without Fragmentation Transparency

Applications would explicitly query fragments:

-- Must know fragmentation scheme
SELECT * FROM customers_na WHERE region = 'North America'
UNION ALL
SELECT * FROM customers_eu WHERE region = 'Europe'
UNION ALL
SELECT * FROM customers_apac WHERE region = 'Asia-Pacific';

If fragmentation changes, all queries must be rewritten.

With Fragmentation Transparency

-- Query logical table; system handles fragments
SELECT * FROM customers;

The system translates this to fragment-specific queries and combines results.

Implementation Mechanisms

Query Decomposition

The query processor translates user queries into fragment queries:

Parse user query against logical schema
Consult fragmentation metadata
Generate sub-queries for relevant fragments
Execute sub-queries (potentially in parallel)
Combine results (UNION for horizontal, JOIN for vertical)
Return unified result set

Fragmentation Schema Storage

Metadata defines how tables map to fragments:

Horizontal: predicates defining each fragment
Vertical: attribute sets for each fragment
Hybrid: both types of mappings

fragmentation_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- User query
SELECT name, email FROM customers WHERE country = 'Japan';
 
-- System knows:
-- - customers is horizontally fragmented by region
-- - Japan is in Asia-Pacific region (customers_apac fragment)
 
-- Generated sub-query:
SELECT name, email FROM customers_apac WHERE country = 'Japan';
 
-- For a query spanning regions:
SELECT COUNT(*) FROM customers;
 
-- Generated sub-queries:
SELECT COUNT(*) AS c1 FROM customers_na;
SELECT COUNT(*) AS c2 FROM customers_eu;
SELECT COUNT(*) AS c3 FROM customers_apac;
-- Final aggregation: c1 + c2 + c3
 
-- For vertically fragmented table:
SELECT name, salary FROM employees WHERE id = 1001;
 
-- Generated sub-queries (assuming name in fragment A, salary in fragment B):
SELECT id, name FROM employees_core WHERE id = 1001;  -- Fragment A
SELECT id, salary FROM employees_payroll WHERE id = 1001;  -- Fragment B
-- Then JOIN on id

Fragmentation Transparency Benefits

•Applications use logical schemas
•Fragmentation changes don't break queries
•Developers need not learn fragmentation details
•Query optimization handles fragment selection
•Enables automated refragmentation

Fragmentation Transparency Challenges

•Query decomposition adds overhead
•Cross-fragment queries may be expensive
•Some queries cannot be fully localized
•Aggregates may require multi-phase execution
•Optimization complexity increases

Leaking Abstractions

While fragmentation transparency hides how data is fragmented, performance often exposes it. Queries aligned with fragmentation (filtering on fragment key) are fast; queries that span all fragments (full table scan) are slow. Sophisticated users often learn the fragmentation scheme to write efficient queries—the transparency "leaks."

Replication Transparency

Replication transparency hides the existence of multiple copies. Applications read and write as if there's one copy; the system manages replicas internally.

What Replication Transparency Provides

Read routing: System selects which replica to read from
Write propagation: Writes are replicated to all copies internally
Consistency management: System enforces consistency guarantees
Failure handling: System fails over to healthy replicas

Without Replication Transparency

Applications would explicitly manage replicas:

# Pseudocode for non-transparent replication handling
for replica in [primary, secondary1, secondary2]:
    try:
        replica.write(data)
    except ReplicaUnavailable:
        continue
        
# For reads: choose replica, handle staleness
result = secondary1.read(query)  # Might be stale!

With Replication Transparency

# Application just reads and writes
db.write(data)  # System handles replication
result = db.read(query)  # System chooses replica

Implementation Mechanisms

Read Routing Strategies

Route to primary: Guarantees freshest data but doesn't scale reads
Route to nearest: Minimizes latency but may return stale data
Route to any: Load-balances reads across replicas
Sticky sessions: Route user's reads to consistent replica
Consistency-aware: Route based on requested consistency level

Write Handling

Client sends write to database entry point
Write routed to primary (in primary-secondary model)
Primary handles replication (sync or async)
Acknowledgment returned after replication completes (per configured level)

replication_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Application issues query
SELECT balance FROM accounts WHERE id = 12345;
 
-- System decision (transparent to app):
-- Option A: Route to primary (always fresh)
-- Option B: Route to local replica (lower latency, possibly stale)
-- Option C: Route based on session consistency requirement
 
-- With consistency level hint (semi-transparent):
-- Some databases allow consistency hints while maintaining transparency
 
-- PostgreSQL example with synchronous_commit
SET synchronous_commit = 'remote_apply';  -- Wait for replicas
INSERT INTO critical_data VALUES (...);
 
SET synchronous_commit = 'local';  -- Fast, primary-only durability
INSERT INTO log_data VALUES (...);
 
-- The *existence* of replicas is transparent
-- But *consistency level* may be exposed as a tunable

Partial Replication Transparency

Many systems provide partial replication transparency—hiding replica selection for reads but exposing consistency levels. This allows applications to request "read from primary" for critical reads or "accept stale" for analytics. Complete transparency would remove this control, which isn't always desirable.

Failure Transparency

Failure transparency hides failures of nodes, networks, and processes. Applications experience temporary delays or retries, but failures are handled internally without application involvement.

What Failure Transparency Provides

Automatic failover: System promotes replicas when primaries fail
Retry logic: Transient failures are retried internally
Connection management: Connections are re-established after network issues
Timeout handling: Long-running operations timeout gracefully
Partial failure handling: Some nodes failing doesn't fail entire query

Levels of Failure Transparency

Failure Transparency Levels
Level	What's Hidden	Application Experience
Network glitches	Brief packet loss, retransmission	Slight delay, transparent retry
Node restart	Node crashes and restarts quickly	Brief delay during failover
Node failure	Node down, doesn't recover	Delay during failover + replica promotion
Partition	Network split isolates nodes	Possible reduced availability or consistency
Region outage	Entire data center unavailable	Cross-region failover (if configured)

Implementation Mechanisms

Connection Pooling with Retry

Database drivers maintain connection pools and automatically retry failed operations:

Failed connection → remove from pool, try another
Query timeout → retry on different node
Connection errors → reconnect and retry

Health Monitoring

Periodic health checks detect node failures
Failed nodes removed from routing
Recovered nodes re-added to routing

Automatic Failover

Primary failure detected → leader election
New primary promoted → clients redirected
Happens in seconds to minutes (depending on configuration)

Circuit Breakers

Prevent cascading failures
Stop sending requests to failing nodes
Allow recovery before resuming traffic

Failures That Break Transparency

•Full cluster failure: All nodes unavailable, no fallback possible
•Prolonged partition: Network split lasting longer than timeout thresholds
•Data corruption: Replicated corruption can't be hidden
•Capacity exhaustion: All nodes at full capacity, can't accept new work
•Configuration errors: Misconfigurations that prevent recovery

The CAP Theorem Limitation

During network partitions, complete failure transparency is impossible. The CAP theorem states you must choose between consistency (all nodes agree or deny service) and availability (all nodes respond, possibly with stale data). This fundamental limit means failure transparency has boundaries—some failures MUST be exposed to applications.

Concurrency Transparency

Concurrency transparency hides the fact that multiple users and applications access the database simultaneously. Each transaction appears to execute in isolation, as if it were the only transaction running.

What Concurrency Transparency Provides

This is the "I" in ACID—Isolation. Users don't see:

Partial writes from other transactions
Uncommitted data from concurrent transactions
Effects of transactions that will later abort
Interleaving of their operations with others'

Isolation Levels and Transparency

Concurrency transparency isn't absolute—it varies by isolation level:

Isolation Levels and Concurrency Transparency
Isolation Level	What You Might See	Transparency Level
Serializable	Nothing concurrent; behaves as sequential	Complete
Repeatable Read	Same data on re-read within transaction	High
Read Committed	Only committed data; may change between reads	Moderate
Read Uncommitted	Uncommitted data from other transactions	Low

Distributed Concurrency Challenges

In distributed databases, concurrency control is more complex:

1. Distributed Locking

Locks must be acquired across nodes:

Lock manager on each node
Global lock coordination for cross-node transactions
Higher latency for lock acquisition

2. Distributed Deadlock Detection

Deadlocks can span nodes:

Node A waits for lock held by Node B
Node B waits for lock held by Node A
Neither local node sees a cycle
Global deadlock detector required

3. Global Serializability

Ensuring global serializable execution:

Local serializable isn't enough
Must ensure global ordering across nodes
Mechanisms: global timestamps, consensus protocols

concurrency_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- User A starts transaction
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE id = 100;  -- Returns 1000
 
-- User B concurrently starts transaction
BEGIN TRANSACTION;  
UPDATE accounts SET balance = balance - 500 WHERE id = 100;
COMMIT;  -- Balance now 500
 
-- User A continues (with Repeatable Read isolation)
SELECT balance FROM accounts WHERE id = 100;  -- Still returns 1000!
-- User A sees consistent snapshot from transaction start
 
-- User A with Read Committed isolation
SELECT balance FROM accounts WHERE id = 100;  -- Returns 500 (sees B's commit)
-- Less transparent: A sees interleaving with B
 
-- In distributed setting:
-- Even with same isolation level, additional coordination needed
-- to ensure consistency across nodes

MVCC: The Transparency Enabler

Multi-Version Concurrency Control (MVCC) enables concurrency transparency by maintaining multiple versions of data. Readers see consistent snapshots without blocking writers. This is how PostgreSQL, MySQL InnoDB, and most modern databases provide high concurrency with strong isolation—each transaction sees a consistent view without explicit locking for reads.

Other Transparency Types

Several additional transparency types are relevant to distributed databases:

Access Transparency

Hides differences in how data is stored or accessed. Whether data is on SSD, HDD, or in a cache; whether it's in a relational table, materialized view, or external source—the access syntax is uniform.

-- Same syntax regardless of storage
SELECT * FROM customers;  -- Local table
SELECT * FROM customer_view;  -- Materialized view
SELECT * FROM external_customers;  -- Foreign data wrapper

Migration Transparency

Hides the movement of data between nodes. When data is rebalanced, migrated for maintenance, or moved for optimization, applications continue operating without interruption or reconfiguration.

Performance Transparency

Hides performance tuning and optimization. The database automatically:

Creates and updates statistics
Chooses efficient query plans
Parallelizes queries when beneficial
Caches frequently accessed data

Applications get optimized execution without explicit tuning commands.

Transaction Transparency

Hides the complexity of distributed transactions:

Two-phase commit coordination
Distributed lock management
Global deadlock detection
Recovery after partial failures

Applications issue BEGIN, COMMIT, ROLLBACK as if everything were local.

Schema Transparency

Hides schema evolution:

Adding columns doesn't break existing queries
Schema migrations happen online
Different applications may see different schema versions

Complete Transparency Type Summary
Transparency Type	Hides	Enabling Mechanism
Location	Physical data location	Global catalog, query routing
Fragmentation	How tables are partitioned	Query decomposition, catalog
Replication	Multiple copies exist	Replica routing, write propagation
Failure	Node and network failures	Failover, retry logic, circuit breakers
Concurrency	Other concurrent transactions	MVCC, isolation levels, locking
Access	Storage differences	Unified query interface
Migration	Data movement between nodes	Online migration protocols
Performance	Optimization complexity	Automatic tuning, statistics
Transaction	Distributed commit complexity	2PC, distributed lock managers

Transparency Trade-offs

More transparency isn't always better. Hiding all details can prevent performance tuning, make debugging harder, and prevent applications from making informed trade-offs. Good systems provide transparency by default but allow applications to "peek behind the curtain" when needed—query hints, explicit routing, consistency level selection.

Summary: Understanding Transparency

Transparency is what makes distributed databases practical for application development. Let's consolidate the key concepts:

Key Takeaways

•Transparency hides distribution complexity — Applications interact with the database as if it were centralized
•Location transparency — Logical names, not physical addresses; data can move without application changes
•Fragmentation transparency — Query logical tables; system handles partition mapping and result assembly
•Replication transparency — Read/write single logical copy; system manages replica routing and synchronization
•Failure transparency — System handles failover, retries, and recovery; applications see delays, not failures
•Concurrency transparency — Transactions appear isolated; MVCC and locking hide interleaving
•Transparency has limits — CAP theorem, performance, and debugging needs sometimes require exposing distribution

What's Next

We've covered the concepts that make up distributed databases: motivation, fragmentation, replication, and transparency. The final piece is architecture—how these concepts combine into coherent system designs. The next page explores distributed database architectures: shared-nothing, shared-disk, federated systems, and more.

Page Complete

You now understand how distributed databases hide complexity through various types of transparency. These mechanisms enable application developers to work with distributed systems using familiar SQL semantics. Next, we'll explore the architectural patterns that tie all these concepts together.

4 / 5

Loading learning content...

Database Management SystemsDistributed Databases

Distributed Database Concepts

LevelAdvanced

Duration75 mins

TopicDistributed Databases

4 / 5

Transparency Types

Hiding the Complexity of Distribution

What You Will Learn

The Transparency Concept

What is Transparency?

The Transparency Spectrum

Why Transparency Matters

Application simplicity: Developers use familiar SQL without distributed systems expertise
Portability: Applications written for centralized databases work on distributed ones
Flexibility: Database topology can change without application changes
Reliability: Failure handling happens inside the database, not in application code

The Standard Transparency Types

The ISO/OSI Reference Model for Open Distributed Processing defines several transparency types. For databases, the most relevant are:

Transparency Type	What It Hides
Location Transparency	Physical location of data
Fragmentation Transparency	How data is partitioned
Replication Transparency	Existence of multiple copies
Failure Transparency	Node and network failures
Concurrency Transparency	Concurrent access by other users
Access Transparency	Differences in data access methods
Migration Transparency	Movement of data between nodes
Performance Transparency	System tuning and optimization

We'll examine each in detail, understanding what mechanisms enable them and what trade-offs they impose.

Transparency is Not Free

Location Transparency

Without Location Transparency

Applications would specify node addresses in queries:

-- Hypothetical non-transparent query
SELECT * FROM node3.datacenter_tokyo.customers WHERE id = 12345;

This is brittle—if data moves, queries break. Applications become coupled to physical infrastructure.

With Location Transparency

Applications use logical names; the system resolves locations:

-- Transparent query
SELECT * FROM customers WHERE id = 12345;

The distributed database routes this query to wherever customer 12345 resides.

Implementation Mechanisms

1. Global Catalog / Data Dictionary

A metadata system tracks which data is located where:

Table → Fragments → Nodes mapping
Updated when data moves
Queried to route requests

2. Distributed Naming

Logical object names are mapped to physical locations:

customers → fragments (customers_NA, customers_EU, customers_APAC)
Each fragment → node (customers_NA → node-us-east-1)

3. Query Routing Layer

A routing layer intercepts queries and directs them:

Parse query to identify accessed tables
Consult catalog for data locations
Route query (or sub-queries) to appropriate nodes
Aggregate results and return to application

location_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Application issues a simple query
SELECT * FROM orders WHERE customer_id = 42;
 
-- Behind the scenes (transparent to application):
 
-- 1. Query router receives query
-- 2. Consults catalog: customer_id = 42 → customer is in region 'EU'
-- 3. EU customers' orders are on node 'eu-west-1'
-- 4. Routes query to eu-west-1
-- 5. eu-west-1 executes locally:
     SELECT * FROM orders_eu WHERE customer_id = 42;
-- 6. Results returned to router
-- 7. Router returns results to application
 
-- Application sees only the results, not the routing logic

Connection Strings vs. Location Transparency

Fragmentation Transparency

Fragmentation transparency hides how tables are partitioned. Applications query logical tables; the system handles mapping queries to fragments and reconstructing complete results.

Without Fragmentation Transparency

Applications would explicitly query fragments:

-- Must know fragmentation scheme
SELECT * FROM customers_na WHERE region = 'North America'
UNION ALL
SELECT * FROM customers_eu WHERE region = 'Europe'
UNION ALL
SELECT * FROM customers_apac WHERE region = 'Asia-Pacific';

If fragmentation changes, all queries must be rewritten.

With Fragmentation Transparency

-- Query logical table; system handles fragments
SELECT * FROM customers;

The system translates this to fragment-specific queries and combines results.

Implementation Mechanisms

Query Decomposition

The query processor translates user queries into fragment queries:

Parse user query against logical schema
Consult fragmentation metadata
Generate sub-queries for relevant fragments
Execute sub-queries (potentially in parallel)
Combine results (UNION for horizontal, JOIN for vertical)
Return unified result set

Fragmentation Schema Storage

Metadata defines how tables map to fragments:

Horizontal: predicates defining each fragment
Vertical: attribute sets for each fragment
Hybrid: both types of mappings

fragmentation_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- User query
SELECT name, email FROM customers WHERE country = 'Japan';
 
-- System knows:
-- - customers is horizontally fragmented by region
-- - Japan is in Asia-Pacific region (customers_apac fragment)
 
-- Generated sub-query:
SELECT name, email FROM customers_apac WHERE country = 'Japan';
 
-- For a query spanning regions:
SELECT COUNT(*) FROM customers;
 
-- Generated sub-queries:
SELECT COUNT(*) AS c1 FROM customers_na;
SELECT COUNT(*) AS c2 FROM customers_eu;
SELECT COUNT(*) AS c3 FROM customers_apac;
-- Final aggregation: c1 + c2 + c3
 
-- For vertically fragmented table:
SELECT name, salary FROM employees WHERE id = 1001;
 
-- Generated sub-queries (assuming name in fragment A, salary in fragment B):
SELECT id, name FROM employees_core WHERE id = 1001;  -- Fragment A
SELECT id, salary FROM employees_payroll WHERE id = 1001;  -- Fragment B
-- Then JOIN on id

Fragmentation Transparency Benefits

•Applications use logical schemas
•Fragmentation changes don't break queries
•Developers need not learn fragmentation details
•Query optimization handles fragment selection
•Enables automated refragmentation

Fragmentation Transparency Challenges

•Query decomposition adds overhead
•Cross-fragment queries may be expensive
•Some queries cannot be fully localized
•Aggregates may require multi-phase execution
•Optimization complexity increases

Leaking Abstractions

Replication Transparency

Replication transparency hides the existence of multiple copies. Applications read and write as if there's one copy; the system manages replicas internally.

What Replication Transparency Provides

Read routing: System selects which replica to read from
Write propagation: Writes are replicated to all copies internally
Consistency management: System enforces consistency guarantees
Failure handling: System fails over to healthy replicas

Without Replication Transparency

Applications would explicitly manage replicas:

# Pseudocode for non-transparent replication handling
for replica in [primary, secondary1, secondary2]:
    try:
        replica.write(data)
    except ReplicaUnavailable:
        continue
        
# For reads: choose replica, handle staleness
result = secondary1.read(query)  # Might be stale!

With Replication Transparency

# Application just reads and writes
db.write(data)  # System handles replication
result = db.read(query)  # System chooses replica

Implementation Mechanisms

Read Routing Strategies

Route to primary: Guarantees freshest data but doesn't scale reads
Route to nearest: Minimizes latency but may return stale data
Route to any: Load-balances reads across replicas
Sticky sessions: Route user's reads to consistent replica
Consistency-aware: Route based on requested consistency level

Write Handling

Client sends write to database entry point
Write routed to primary (in primary-secondary model)
Primary handles replication (sync or async)
Acknowledgment returned after replication completes (per configured level)

replication_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Application issues query
SELECT balance FROM accounts WHERE id = 12345;
 
-- System decision (transparent to app):
-- Option A: Route to primary (always fresh)
-- Option B: Route to local replica (lower latency, possibly stale)
-- Option C: Route based on session consistency requirement
 
-- With consistency level hint (semi-transparent):
-- Some databases allow consistency hints while maintaining transparency
 
-- PostgreSQL example with synchronous_commit
SET synchronous_commit = 'remote_apply';  -- Wait for replicas
INSERT INTO critical_data VALUES (...);
 
SET synchronous_commit = 'local';  -- Fast, primary-only durability
INSERT INTO log_data VALUES (...);
 
-- The *existence* of replicas is transparent
-- But *consistency level* may be exposed as a tunable

Partial Replication Transparency

Failure Transparency

Failure transparency hides failures of nodes, networks, and processes. Applications experience temporary delays or retries, but failures are handled internally without application involvement.

What Failure Transparency Provides

Automatic failover: System promotes replicas when primaries fail
Retry logic: Transient failures are retried internally
Connection management: Connections are re-established after network issues
Timeout handling: Long-running operations timeout gracefully
Partial failure handling: Some nodes failing doesn't fail entire query

Levels of Failure Transparency

Failure Transparency Levels
Level	What's Hidden	Application Experience
Network glitches	Brief packet loss, retransmission	Slight delay, transparent retry
Node restart	Node crashes and restarts quickly	Brief delay during failover
Node failure	Node down, doesn't recover	Delay during failover + replica promotion
Partition	Network split isolates nodes	Possible reduced availability or consistency
Region outage	Entire data center unavailable	Cross-region failover (if configured)

Implementation Mechanisms

Connection Pooling with Retry

Database drivers maintain connection pools and automatically retry failed operations:

Failed connection → remove from pool, try another
Query timeout → retry on different node
Connection errors → reconnect and retry

Health Monitoring

Periodic health checks detect node failures
Failed nodes removed from routing
Recovered nodes re-added to routing

Automatic Failover

Primary failure detected → leader election
New primary promoted → clients redirected
Happens in seconds to minutes (depending on configuration)

Circuit Breakers

Prevent cascading failures
Stop sending requests to failing nodes
Allow recovery before resuming traffic

Failures That Break Transparency

•Full cluster failure: All nodes unavailable, no fallback possible
•Prolonged partition: Network split lasting longer than timeout thresholds
•Data corruption: Replicated corruption can't be hidden
•Capacity exhaustion: All nodes at full capacity, can't accept new work
•Configuration errors: Misconfigurations that prevent recovery

The CAP Theorem Limitation

Concurrency Transparency

What Concurrency Transparency Provides

This is the "I" in ACID—Isolation. Users don't see:

Partial writes from other transactions
Uncommitted data from concurrent transactions
Effects of transactions that will later abort
Interleaving of their operations with others'

Isolation Levels and Transparency

Concurrency transparency isn't absolute—it varies by isolation level:

Isolation Levels and Concurrency Transparency
Isolation Level	What You Might See	Transparency Level
Serializable	Nothing concurrent; behaves as sequential	Complete
Repeatable Read	Same data on re-read within transaction	High
Read Committed	Only committed data; may change between reads	Moderate
Read Uncommitted	Uncommitted data from other transactions	Low

Distributed Concurrency Challenges

In distributed databases, concurrency control is more complex:

1. Distributed Locking

Locks must be acquired across nodes:

Lock manager on each node
Global lock coordination for cross-node transactions
Higher latency for lock acquisition

2. Distributed Deadlock Detection

Deadlocks can span nodes:

Node A waits for lock held by Node B
Node B waits for lock held by Node A
Neither local node sees a cycle
Global deadlock detector required

3. Global Serializability

Ensuring global serializable execution:

Local serializable isn't enough
Must ensure global ordering across nodes
Mechanisms: global timestamps, consensus protocols

concurrency_transparency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- User A starts transaction
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE id = 100;  -- Returns 1000
 
-- User B concurrently starts transaction
BEGIN TRANSACTION;  
UPDATE accounts SET balance = balance - 500 WHERE id = 100;
COMMIT;  -- Balance now 500
 
-- User A continues (with Repeatable Read isolation)
SELECT balance FROM accounts WHERE id = 100;  -- Still returns 1000!
-- User A sees consistent snapshot from transaction start
 
-- User A with Read Committed isolation
SELECT balance FROM accounts WHERE id = 100;  -- Returns 500 (sees B's commit)
-- Less transparent: A sees interleaving with B
 
-- In distributed setting:
-- Even with same isolation level, additional coordination needed
-- to ensure consistency across nodes

MVCC: The Transparency Enabler

Other Transparency Types

Several additional transparency types are relevant to distributed databases:

Access Transparency

-- Same syntax regardless of storage
SELECT * FROM customers;  -- Local table
SELECT * FROM customer_view;  -- Materialized view
SELECT * FROM external_customers;  -- Foreign data wrapper

Migration Transparency

Hides the movement of data between nodes. When data is rebalanced, migrated for maintenance, or moved for optimization, applications continue operating without interruption or reconfiguration.

Performance Transparency

Hides performance tuning and optimization. The database automatically:

Creates and updates statistics
Chooses efficient query plans
Parallelizes queries when beneficial
Caches frequently accessed data

Applications get optimized execution without explicit tuning commands.

Transaction Transparency

Hides the complexity of distributed transactions:

Two-phase commit coordination
Distributed lock management
Global deadlock detection
Recovery after partial failures

Applications issue BEGIN, COMMIT, ROLLBACK as if everything were local.

Schema Transparency

Hides schema evolution:

Adding columns doesn't break existing queries
Schema migrations happen online
Different applications may see different schema versions

Complete Transparency Type Summary
Transparency Type	Hides	Enabling Mechanism
Location	Physical data location	Global catalog, query routing
Fragmentation	How tables are partitioned	Query decomposition, catalog
Replication	Multiple copies exist	Replica routing, write propagation
Failure	Node and network failures	Failover, retry logic, circuit breakers
Concurrency	Other concurrent transactions	MVCC, isolation levels, locking
Access	Storage differences	Unified query interface
Migration	Data movement between nodes	Online migration protocols
Performance	Optimization complexity	Automatic tuning, statistics
Transaction	Distributed commit complexity	2PC, distributed lock managers

Transparency Trade-offs

Summary: Understanding Transparency

Transparency is what makes distributed databases practical for application development. Let's consolidate the key concepts:

Key Takeaways

•Transparency hides distribution complexity — Applications interact with the database as if it were centralized
•Location transparency — Logical names, not physical addresses; data can move without application changes
•Fragmentation transparency — Query logical tables; system handles partition mapping and result assembly
•Replication transparency — Read/write single logical copy; system manages replica routing and synchronization
•Failure transparency — System handles failover, retries, and recovery; applications see delays, not failures
•Concurrency transparency — Transactions appear isolated; MVCC and locking hide interleaving
•Transparency has limits — CAP theorem, performance, and debugging needs sometimes require exposing distribution

What's Next

Page Complete

4 / 5