Loading learning content...
When you swipe your credit card at a coffee shop, that transaction must be recorded correctly—once and only once—before it's visible to your banking app. When you post a tweet, it must appear on your profile before your followers can see it in their feeds. When you update your profile picture on a social network, every server around the world eventually needs that update, but there can be no confusion about which picture is the 'real' one.
These scenarios share a fundamental challenge: how do we replicate data across multiple database servers while maintaining consistency?
The answer that has powered the majority of production database systems for decades is surprisingly elegant: designate a single node as the leader (also called the primary or master), and ensure that all writes flow through this one authoritative source.
By the end of this page, you will understand why single-leader replication is the dominant paradigm in database systems, how it guarantees write consistency, the architectural patterns behind leader election and write routing, and when this model is the right (or wrong) choice for your system.
At its core, leader-follower replication (also known as primary-replica, master-slave, or active-passive replication) follows a deceptively simple rule:
All write operations must be processed by exactly one node—the leader.
This single constraint solves one of the most complex problems in distributed systems: write conflicts. When multiple nodes can accept writes independently, you face the nightmare of concurrent, conflicting updates. Did User A's edit or User B's edit happen first? What if they edited different fields of the same record? What if network partitions mean neither node knows about the other's write?
By funneling all writes through a single leader, we sidestep these questions entirely. The leader sees every write in sequence, applies them in order, and produces a single, authoritative history of changes. This history—called the replication log, write-ahead log (WAL), or binlog depending on the database—becomes the source of truth that all other nodes follow.
The 'single' in single-leader isn't a limitation—it's a feature. By accepting a single point of write coordination, we gain something invaluable: a total ordering of all writes. This makes reasoning about consistency vastly simpler than multi-leader or leaderless alternatives.
To truly understand single-leader replication, let's trace the journey of a single write operation from client to durable storage across all replicas. This flow is the heartbeat of every leader-follower database, from PostgreSQL streaming replication to MySQL master-slave setups to MongoDB replica sets.
1234567891011121314151617181920212223242526272829303132333435
┌─────────────────────────────────────────────────────────────────────────────┐│ CLIENT APPLICATION ││ ││ UPDATE users SET balance = balance - 100 WHERE user_id = 'alice'; │└───────────────────────────────────┬─────────────────────────────────────────┘ │ ▼ (1) Client sends write to Leader┌─────────────────────────────────────────────────────────────────────────────┐│ LEADER (PRIMARY) ││ ││ ┌───────────────────┐ ┌────────────────────┐ ┌──────────────────┐ ││ │ (2) Validate & │───▶│ (3) Write to WAL │───▶│ (4) Apply to │ ││ │ Parse Query │ │ (durable log) │ │ Data Files │ ││ └───────────────────┘ └────────────────────┘ └──────────────────┘ ││ │ ││ ▼ ││ ┌────────────────────┐ ││ │ (5) Acknowledge │ (synchronous mode) ││ │ or Stream │ (asynchronous mode) ││ │ to Followers │ ││ └─────────┬──────────┘ │└─────────────────────────────────────┼───────────────────────────────────────┘ │ ┌─────────────────┼─────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ FOLLOWER 1 │ │ FOLLOWER 2 │ │ FOLLOWER 3 │ │ │ │ │ │ │ │ (6) Receive │ │ (6) Receive │ │ (6) Receive │ │ WAL Entry │ │ WAL Entry │ │ WAL Entry │ │ │ │ │ │ │ │ (7) Apply to │ │ (7) Apply to │ │ (7) Apply to │ │ Local DB │ │ Local DB │ │ Local DB │ └───────────────┘ └───────────────┘ └───────────────┘Step-by-Step Breakdown:
Step 1: Client Routes Write to Leader The client application (or a connection proxy) must route the write to the current leader. This requires knowing which node is the leader—typically determined by querying the database cluster, using a service discovery mechanism, or through a load balancer that routes writes to the primary.
Step 2: Query Validation and Parsing The leader validates the write—checking permissions, parsing the SQL, and planning the execution. This is identical to a single-node database.
Step 3: Write-Ahead Log (WAL) Before modifying any data files, the leader writes the change to a durable, append-only log. This is the critical durability guarantee: if the server crashes after WAL write but before data file update, recovery replays the WAL to restore consistency.
Step 4: Apply to Data Files The leader applies the change to the actual data files (tables, indexes). At this point, the change is visible to local queries.
Step 5: Propagate to Followers The leader streams the WAL entry to followers. The timing of this step relative to client acknowledgment determines whether replication is synchronous or asynchronous (covered in detail in Page 3).
Steps 6-7: Follower Application Each follower receives the WAL entry, writes it to its own log, and applies it to its local data files. The follower's state converges with the leader's.
The write-ahead log is the single source of truth in replication. It's an append-only, sequential record of every change. Both durability (surviving crashes) and replication (synchronizing followers) depend on the WAL being correct and complete. Corrupting the WAL means corrupting the entire cluster's consistency.
The elegance of single-leader replication emerges from how it transforms a complex distributed systems problem into a much simpler sequential one. Let's examine the properties that make this model so effective.
| Aspect | Single-Leader | Multi-Leader | Leaderless |
|---|---|---|---|
| Write Conflicts | Impossible by design | Must be resolved | Must be resolved |
| Write Latency | One round-trip to leader | One round-trip to nearest leader | Quorum-dependent |
| Write Throughput | Limited by single node | Higher (multiple writers) | Higher (any node accepts) |
| Consistency | Strong (leader) or eventual (followers) | Eventual with conflicts | Eventual with conflicts |
| Complexity | Low | High (conflict resolution) | High (quorum logic) |
| CAP Tradeoff | CP-leaning | AP-leaning | Tunable |
The Sequential Abstraction:
Perhaps the most powerful aspect of single-leader replication is that it allows developers to reason about the database as if it were a single machine. Despite running on multiple servers across multiple data centers, the write path behaves like a single-threaded program processing one write at a time.
This abstraction is a lie, of course—modern databases use extensive concurrency and batching. But it's a useful lie that dramatically simplifies application development. You can use familiar transaction semantics, ACID properties, and isolation levels without worrying about distributed consensus on every operation.
Don't underestimate the value of a simple consistency model. Multi-leader and leaderless systems offer higher write availability, but at the cost of conflict resolution complexity. Most applications don't need that complexity—single-leader covers 80%+ of use cases more reliably.
A single-leader system requires more than just routing writes to a leader—it requires a robust mechanism for determining which node is the leader and making that information available to clients and other nodes. This is the domain of leader election and service discovery.
The Leader Election Problem:
At any given moment, exactly one node should be the leader. If zero nodes are the leader, the system can't accept writes (unavailable). If two or more nodes believe they're the leader, we have a split-brain scenario—both accept writes, and the system loses consistency.
Leader election must solve:
1234567891011121314151617181920212223
INITIAL STATE LEADER FAILURE NEW LEADER ELECTED────────────────── ────────────────── ────────────────── ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Node A │ │ Node A │ │ Node A │ │ (Leader) │◀── writes │ ✗✗✗ │ ─────▶ │ (Follower) │ │ ★ ★ ★ │ │ FAILED │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Node B │ │ Node B │ promoted! │ Node B │ │ (Follower) │ │ (Follower) │ ─────────▶ │ (Leader) │◀── writes │ │ │ │ │ ★ ★ ★ │ └─────────────┘ └─────────────┘ └─────────────┘ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Node C │ │ Node C │ │ Node C │ │ (Follower) │ │ (Follower) │ │ (Follower) │ │ │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ Heartbeats flow Node B and C detect Consensus reached: Leader → Followers missing heartbeats Node B is new leaderCommon Leader Election Approaches:
1. External Consensus Systems Many databases delegate leader election to purpose-built distributed consensus systems like ZooKeeper, etcd, or Consul. These systems implement consensus algorithms (Raft, Paxos, ZAB) and provide primitives like distributed locks and leader election APIs.
Example: Apache Kafka (older versions) uses ZooKeeper for controller election. Clients query ZooKeeper to find the current controller.
2. Built-in Consensus Modern databases increasingly embed consensus algorithms directly. PostgreSQL's Patroni uses etcd or Consul, but systems like CockroachDB and TiDB implement Raft internally.
Example: CockroachDB uses Raft for consensus at the range (partition) level—each range has its own Raft group and leader.
3. Heartbeat + Lease-Based Election The leader periodically renews a lease (a time-limited lock). If the lease expires, followers can attempt to acquire it. This is simpler but relies on clock synchronization.
Example: MongoDB replica sets use an election protocol based on heartbeats and protocol version numbers.
Split-brain occurs when two nodes both believe they're the leader and accept conflicting writes. This can corrupt data irreparably. Proper leader election requires consensus (not just timeouts) and fencing mechanisms (like STONITH in HA clusters) to guarantee only one leader operates at a time.
Once a leader exists, clients must route writes to it. This sounds simple but becomes complex in production environments with connection pooling, load balancers, and dynamic cluster membership. Let's examine the primary patterns for write routing.
primary.db.internal) always points to the current leader. On failover, update the DNS record. Relies on low TTLs and proper cache invalidation.
Used by: AWS RDS, many cloud-managed databases.| Pattern | Failover Speed | Client Complexity | Operational Overhead | Best For |
|---|---|---|---|---|
| Direct Connection | Slow (app restart) | High | Low | Simple setups, dev environments |
| DNS-Based | Medium (TTL-bound) | Low | Medium | Cloud-managed databases |
| Proxy-Based | Fast (proxy-aware) | None | High | High-traffic production systems |
| Smart Driver | Fast (native) | Low (driver handles) | Low | Modern database clusters |
| Service Mesh | Fast | None | Very High | Kubernetes-native environments |
Proxies add latency (typically 1-5ms) and operational complexity, but they provide transparent failover, connection pooling, and the ability to implement read/write splitting without application changes. For high-traffic systems, the benefits usually outweigh the costs.
Leader-follower replication isn't just theory—it's the operational backbone of virtually every major database system. Let's examine how different databases implement the single-leader model.
synchronous_commit = on/remote_apply/remote_write/local/offlocal database containing all write operations.primary, primaryPreferred, secondary, secondaryPreferred, nearest.w: 1 (leader only), w: majority, w: <number>, w: <tag set>.Despite surface differences (WAL vs. binlog vs. oplog, Patroni vs. Group Replication vs. built-in elections), all these systems implement the same fundamental model: one leader writes, followers replicate, consensus determines leadership. Understanding the pattern lets you adapt to any specific implementation.
Single-leader replication is powerful and widely applicable, but it's not without costs. Understanding its limitations helps you know when to use it—and when to consider alternatives.
When Single-Leader is the Right Choice:
✅ Your write throughput fits within a single node's capacity ✅ You need strong consistency without conflict resolution complexity ✅ Your users are geographically concentrated (or latency is acceptable) ✅ Read scaling is more important than write scaling ✅ You value simplicity and operational familiarity
When to Consider Alternatives:
⚠️ Write throughput exceeds single-node limits (consider sharding first, then multi-leader) ⚠️ You have globally distributed users requiring low-latency writes in multiple regions ⚠️ Your availability requirements can't tolerate any failover downtime ⚠️ Your data model naturally supports conflict resolution (e.g., CRDTs)
Many engineers jump to multi-leader or leaderless systems prematurely. Before abandoning single-leader, ask: Can we shard the data? Can we tolerate the failover window? Is conflict resolution complexity worth the benefit? Usually, single-leader with proper sharding handles far more scale than expected.
We've established the foundational principle of leader-follower replication: funneling all writes through a single node to achieve consistency without conflict resolution. Let's consolidate the key insights:
What's Next:
Now that we understand how writes flow through the leader, the next page examines the other side of the replication equation: how followers replicate from the leader. We'll explore replication protocols, log streaming, log shipping, and how followers stay in sync with the leader's state.
You now understand the core principle of single-leader replication: one node accepts all writes, creating a total ordering that simplifies distributed consistency. Next, we'll see how followers receive and apply these writes to maintain synchronized copies of the data.