Loading content...
A single MySQL server, no matter how powerful, represents a single point of failure. When that server fails—and eventually, it will—your application goes down with it. Hardware fails. Data centers lose power. Disks corrupt. Networks partition. The question isn't if your database will experience a failure, but when.
MySQL's replication and clustering capabilities transform a fragile single-server deployment into a resilient, highly available system. Through replication, data is copied to standby servers that can take over when the primary fails. Through clustering, multiple servers coordinate to provide automatic failover and data distribution.
Understanding MySQL's replication architecture is essential for any system running MySQL in production. Even if you're using managed cloud databases like Amazon RDS or Cloud SQL, these services build upon the same replication primitives we'll discuss here.
By the end of this page, you will understand MySQL's binary log (binlog) architecture, master asynchronous and semi-synchronous replication, learn group replication for multi-primary deployments, explore MySQL InnoDB Cluster for automatic failover, and gain practical knowledge for designing replication topologies in production systems.
All MySQL replication is built on the binary log (binlog)—a sequential log of all data modifications. Understanding the binlog is fundamental to understanding how replication works.
What Gets Logged:
The binlog records all statements or row changes that modify data:
The binlog does NOT record:
Binary Log Format:
MySQL supports three binlog formats, each with trade-offs:
| Format | Records | Pros | Cons |
|---|---|---|---|
| STATEMENT | Original SQL statements | Compact logs; human-readable | Non-deterministic functions (NOW(), RAND()) may produce different results on replica |
| ROW | Actual row changes (before/after images) | Deterministic replication; handles any statement | Larger logs; bulk updates generate many row events |
| MIXED | STATEMENT when safe, ROW when uncertain | Best of both worlds | Less predictable log sizes; complexity in analysis |
ROW-based replication is the modern default (binlog_format=ROW). It guarantees replica data exactly matches the source, regardless of statement complexity.
Binary Log Structure:
┌─────────────────────────────────────────────────────────────┐
│ Binary Log Files │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ mysql-bin.000001 (oldest) │ │
│ │ [Format Description Event] │ │
│ │ [Query Event: CREATE TABLE users...] │ │
│ │ [Transaction: BEGIN → Row Events → COMMIT] │ │
│ │ ... │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ mysql-bin.000002 │ │
│ │ [GTID Event: source_id:42] │ │
│ │ [Transaction: INSERT INTO orders...] │ │
│ │ ... │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ mysql-bin.index (lists all binlog files) │
└─────────────────────────────────────────────────────────────┘
Each transaction in the binlog is identified either by:
Global Transaction IDs (GTIDs):
GTIDs provide a globally unique identifier for each transaction: source_uuid:transaction_id. For example: 3E11FA47-71CA-11E1-9E33-C80AA9429562:42
1234567891011121314151617
-- Enable GTIDs (requires restart in MySQL < 8.0.17)-- In my.cnf:[mysqld]gtid_mode = ONenforce-gtid-consistency = ONbinlog_format = ROW -- Check GTID statusSHOW VARIABLES LIKE 'gtid%'; -- View executed GTIDsSELECT @@global.gtid_executed;-- Returns: "3E11FA47-71CA-11E1-9E33-C80AA9429562:1-1000,-- 8AD5F9F1-71CA-11E1-9E33-C80AA9429562:1-500" -- View binlog eventsSHOW BINLOG EVENTS IN 'mysql-bin.000001' LIMIT 10;Configure binlog_expire_logs_seconds (or expire_logs_days in older versions) to automatically purge old binlogs. Balance between disk usage and the need for point-in-time recovery or replica catch-up. Typical values: 7-14 days for production systems with good monitoring.
MySQL's default replication mode is asynchronous. The source (master) commits transactions without waiting for replicas to receive or apply them. This provides the best performance but offers no durability guarantee on replicas.
How Asynchronous Replication Works:
┌─────────────────────────────────────────────────────────────┐
│ Asynchronous Replication │
│ │
│ SOURCE (Primary) REPLICA (Secondary) │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ │ │ │ │
│ │ Transaction │ │ I/O Thread │ │
│ │ commits to │ │ connects to │ │
│ │ binlog │ │ source │ │
│ │ │ │ │ │ │ │
│ │ ▼ │ │ ▼ │ │
│ │ Client gets │ │ Receives │ │
│ │ SUCCESS │ │ binlog events │ │
│ │ (async!) │ │ │ │ │
│ │ │ │ ▼ │ │
│ │ │ │ Writes to │ │
│ │ │ │ Relay Log │ │
│ │ │ │ │ │ │
│ │ │ │ ▼ │ │
│ │ │ │ SQL Thread │ │
│ │ │ │ applies │ │
│ │ │ │ events │ │
│ └────────────────┘ └────────────────┘ │
│ │
│ Timeline: │
│ T=0 Client commits on source │
│ T=0+ε Client receives SUCCESS │
│ T=50ms Replica receives binlog (network latency) │
│ T=60ms Replica applies transaction (processing time) │
└─────────────────────────────────────────────────────────────┘
Replica Threads:
Each MySQL replica runs two (or more) threads for replication:
Replication Lag:
The time difference between a transaction committing on the source and being applied on the replica is called replication lag. Some lag is inevitable; the question is how much is acceptable.
Causes of replication lag:
Monitoring lag:
123456789101112131415161718192021
-- Classic status commandSHOW REPLICA STATUS\G-- Key fields:-- Replica_IO_Running: Yes/No-- Replica_SQL_Running: Yes/No-- Seconds_Behind_Source: Replication lag in seconds-- Last_Error: Any replication errors -- MySQL 8.0+ performance_schema method (more accurate)SELECT * FROM performance_schema.replication_applier_status_by_coordinator;SELECT * FROM performance_schema.replication_applier_status_by_worker; -- Check GTID-based lag (executed on replica)SELECT @@global.gtid_executed AS replica_executed, @@global.gtid_purged AS replica_purged; -- On source, compare with:SELECT @@global.gtid_executed AS source_executed; -- The difference between these sets indicates lagWith asynchronous replication, if the source fails before a transaction is replicated, that transaction is LOST. This is the fundamental trade-off: asynchronous replication is fast but provides no durability guarantee beyond the source. For zero data loss, you need semi-synchronous replication or group replication.
Semi-synchronous replication provides a middle ground between asynchronous replication (fast but risky) and fully synchronous replication (safe but slow). The source waits for at least one replica to acknowledge receipt of the transaction before returning success to the client.
How Semi-Sync Works:
┌─────────────────────────────────────────────────────────────┐
│ Semi-Synchronous Replication │
│ │
│ SOURCE REPLICA │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ 1. Client │ │ │ │
│ │ sends │ │ │ │
│ │ COMMIT │ │ │ │
│ │ │ │ │ │ │
│ │ ▼ │ │ │ │
│ │ 2. Write to │ │ │ │
│ │ binlog │ │ │ │
│ │ │ │ binlog │ │ │
│ │ ├───────────event────────▶ 3. Receive │ │
│ │ │ │ │ event │ │
│ │ │ │ ACK │ │ │ │
│ │ │◀──────────────────────────────┘ │ │
│ │ │ │ │ │ │
│ │ ▼ │ │ ▼ │ │
│ │ 4. Return │ │ 5. Write to │ │
│ │ SUCCESS │ │ relay log │ │
│ │ to client │ │ │ │ │
│ │ │ │ ▼ │ │
│ │ │ │ 6. Apply │ │
│ │ │ │ (async) │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ Note: Client waits for ACK (step 4), but replica │
│ application (step 6) happens asynchronously. │
└─────────────────────────────────────────────────────────────┘
Key distinction: The replica acknowledges receiving the binlog event (written to relay log), NOT that it has applied the transaction. This is why it's called "semi" synchronous—the event is durable on the replica's disk (relay log) but not yet reflected in replica's data.
| Parameter | Default | Purpose |
|---|---|---|
| rpl_semi_sync_source_enabled | OFF | Enable semi-sync on source |
| rpl_semi_sync_replica_enabled | OFF | Enable semi-sync on replica |
| rpl_semi_sync_source_timeout | 10000 (10s) | Timeout waiting for ACK; falls back to async |
| rpl_semi_sync_source_wait_for_replica_count | 1 | Number of replicas that must ACK |
| rpl_semi_sync_source_wait_point | AFTER_SYNC | When to wait: AFTER_SYNC (before storage engine commit) or AFTER_COMMIT |
1234567891011121314151617
-- On SOURCE:-- Install the plugin (MySQL 8.0+: already loaded, just enable)INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';SET GLOBAL rpl_semi_sync_source_enabled = ON;SET GLOBAL rpl_semi_sync_source_timeout = 5000; -- 5 second timeout -- On REPLICA:INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';SET GLOBAL rpl_semi_sync_replica_enabled = ON; -- Verify semi-sync is activeSHOW STATUS LIKE 'Rpl_semi_sync%';-- Key metrics:-- Rpl_semi_sync_source_status: ON (semi-sync active)-- Rpl_semi_sync_source_no_tx: Count of async fallback transactions-- Rpl_semi_sync_source_yes_tx: Count of semi-sync transactions-- Rpl_semi_sync_source_clients: Number of semi-sync replicasAFTER_SYNC vs AFTER_COMMIT:
The rpl_semi_sync_source_wait_point parameter controls when the source waits for acknowledgment:
AFTER_SYNC (default): Wait after writing to binlog, but before committing in InnoDB. If the source crashes after ACK but before commit, the transaction is on the replica but not on the source ("phantom read" potential).
AFTER_COMMIT: Wait after InnoDB commit. The transaction is visible on the source before the replica ACKs. If the source crashes, clients may have seen data that replicas don't have.
AFTER_SYNC is generally preferred because it guarantees that if a client sees a commit success, the data exists on at least one replica.
If no replica ACKs within the timeout, the source falls back to asynchronous replication. This prevents one slow/dead replica from blocking all writes. Monitor Rpl_semi_sync_source_no_tx—if it's increasing, investigate replica health or increase timeout. Frequent fallbacks defeat the purpose of semi-sync.
MySQL replication can be arranged in various topologies, each with different characteristics for availability, scalability, and operational complexity.
Topology 1: Single Source with Replicas
The simplest and most common topology. One source handles all writes; replicas handle reads and provide failover capability.
Topology 2: Chain Replication
Replicas replicate from other replicas, forming a chain. Reduces load on the source but increases replication lag.
Source → Replica A → Replica B → Replica C
Useful when you need many replicas (dozens) but don't want them all connecting directly to the source. Each hop adds latency.
Topology 3: Multi-Source Replication
A single replica replicates from multiple sources. Useful for aggregating data from shards or combining data from different systems.
123456789101112131415161718192021222324
-- On the replica, configure multiple replication channelsCHANGE REPLICATION SOURCE TO SOURCE_HOST = 'source1.example.com', SOURCE_PORT = 3306, SOURCE_USER = 'repl', SOURCE_PASSWORD = 'secret', SOURCE_AUTO_POSITION = 1FOR CHANNEL 'source1'; CHANGE REPLICATION SOURCE TO SOURCE_HOST = 'source2.example.com', SOURCE_PORT = 3306, SOURCE_USER = 'repl', SOURCE_PASSWORD = 'secret', SOURCE_AUTO_POSITION = 1FOR CHANNEL 'source2'; -- Start replication on all channelsSTART REPLICA FOR CHANNEL 'source1';START REPLICA FOR CHANNEL 'source2'; -- Check status per channelSHOW REPLICA STATUS FOR CHANNEL 'source1'\GSHOW REPLICA STATUS FOR CHANNEL 'source2'\GTopology 4: Ring/Circular Replication (Deprecated Pattern)
Each server replicates from another forming a ring. Historically used for multi-primary writes. Avoid this topology—it's fragile, difficult to recover, and superseded by Group Replication.
For most applications, start with single-source + replicas topology. For true multi-primary writes, use Group Replication or MySQL InnoDB Cluster. For aggregating from multiple shards, use multi-source replication. Avoid chain and ring topologies unless you have very specific requirements.
MySQL Group Replication is a plugin that implements a virtually synchronous replication protocol based on Paxos consensus. It provides automatic failover, consistent reads, and optional multi-primary writes.
How Group Replication Works:
┌─────────────────────────────────────────────────────────────┐
│ Group Replication │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Primary │ │ Primary │ │ Primary │ │
│ │(Elected) │ │ or │ │ or │ │
│ │ │ │Secondary │ │Secondary │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └───────────────┼───────────────┘ │
│ │ │
│ ┌─────────┴──────────┐ │
│ │ Group │ │
│ │ Communication │ │
│ │ System (GCS) │ │
│ │ │ │
│ │ - Membership │ │
│ │ - Total ordering │ │
│ │ - Certification │ │
│ └────────────────────┘ │
│ │
│ All servers agree on transaction order via consensus. │
│ Conflicting transactions are detected and rolled back. │
└─────────────────────────────────────────────────────────────┘
Key Concepts:
| Aspect | Single-Primary | Multi-Primary |
|---|---|---|
| Write handling | All writes go to elected primary | Writes accepted by any member |
| Conflict handling | No write conflicts possible | Certification detects and rejects conflicts |
| Application complexity | Simple (just point to primary) | Complex (must handle rollbacks) |
| Use case | Most HA deployments | Geographically distributed writes |
| Foreign key support | Full support | Supported with restrictions |
1234567891011121314151617181920212223242526272829
-- Configuration (in my.cnf for all members):[mysqld]# Enable Group Replication pluginplugin_load_add = 'group_replication.so' # Required settingsgtid_mode = ONenforce_gtid_consistency = ONlog_bin = binlogbinlog_format = ROWbinlog_checksum = NONE # Group Replication requirementlog_slave_updates = ON # Group Replication settingsgroup_replication_group_name = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"group_replication_local_address = "server1:33061" # GCS portgroup_replication_group_seeds = "server1:33061,server2:33061,server3:33061"group_replication_single_primary_mode = ON -- Bootstrap on first member:SET GLOBAL group_replication_bootstrap_group = ON;START GROUP_REPLICATION;SET GLOBAL group_replication_bootstrap_group = OFF; -- Join other members:START GROUP_REPLICATION; -- Check group statusSELECT * FROM performance_schema.replication_group_members;Group Replication is sensitive to network latency and partitions. All members should be in the same data center or connected via low-latency links (<10ms recommended). For cross-region deployments, consider MySQL InnoDB ClusterSet which orchestrates independent clusters with asynchronous replication between them.
MySQL InnoDB Cluster is a complete high-availability solution that bundles Group Replication with MySQL Router (for connection routing) and MySQL Shell (for administration). It's MySQL's answer to the question: "How do I set up truly automatic failover?"
InnoDB Cluster Components:
┌─────────────────────────────────────────────────────────────┐
│ MySQL InnoDB Cluster Architecture │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Application │ │
│ └─────────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌─────────────────────────▼───────────────────────────┐ │
│ │ MySQL Router │ │
│ │ - Routes writes to R/W port (6446) → Primary │ │
│ │ - Routes reads to R/O port (6447) → Any member │ │
│ │ - Automatic failover: updates routing on primary │ │
│ │ change │ │
│ └─────────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌─────────────────────────▼───────────────────────────┐ │
│ │ InnoDB Cluster (Group Replication) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Primary │ │Secondary │ │Secondary │ │ │
│ │ │(R/W) │ │(R) │ │(R) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ Managed by MySQL Shell (dba.createCluster(), etc.) │
└─────────────────────────────────────────────────────────────┘
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// Connect to MySQL Shell// mysqlsh root@server1:3306 // Check instance is ready for clusterdba.checkInstanceConfiguration('root@server1:3306'); // Configure instance if neededdba.configureInstance('root@server1:3306'); // Create the clustervar cluster = dba.createCluster('myCluster'); // Add instances to the clustercluster.addInstance('root@server2:3306');cluster.addInstance('root@server3:3306'); // Check cluster statuscluster.status();/* Output:{ "clusterName": "myCluster", "status": "OK", "topology": { "server1:3306": { "role": "PRIMARY", "mode": "R/W", "status": "ONLINE" }, "server2:3306": { "role": "SECONDARY", "mode": "R/O", "status": "ONLINE" }, "server3:3306": { "role": "SECONDARY", "mode": "R/O", "status": "ONLINE" } }}*/ // Bootstrap MySQL Router// On router host:// mysqlrouter --bootstrap root@server1:3306 --user=mysqlrouterFor global deployments, MySQL 8.0.27+ offers InnoDB ClusterSet—multiple InnoDB Clusters linked by asynchronous replication. This provides geographic redundancy with region-local write performance while maintaining disaster recovery to remote regions.
Failover—the process of switching from a failed source to a replica—is the critical moment that determines whether your high-availability investment pays off. Let's examine failover strategies from manual to fully automated.
Manual Failover (Simplest, Slowest):
1234567891011121314151617181920212223242526
-- 1. Verify source is actually down (avoid split-brain!) -- 2. On the replica to be promoted, stop replicationSTOP REPLICA; -- 3. Verify replica has applied all received events-- (Exec_Master_Log_Pos should equal Read_Master_Log_Pos)SHOW REPLICA STATUS\G -- 4. If using GTIDs, ensure replica is caught upSELECT @@global.gtid_executed; -- 5. Make replica writeableSET GLOBAL read_only = OFF;SET GLOBAL super_read_only = OFF; -- 6. Promote to source (reset replica configuration)RESET REPLICA ALL; -- 7. Update application connection strings to point to new source -- 8. Optionally: Configure other replicas to replicate from new sourceCHANGE REPLICATION SOURCE TO SOURCE_HOST = 'new-source-host', SOURCE_AUTO_POSITION = 1;START REPLICA;Automated Failover Options:
| Tool | Type | Pros | Cons |
|---|---|---|---|
| MySQL InnoDB Cluster | Built-in (Group Replication) | Native integration; no external tools | Requires Group Replication setup |
| MySQL Router + GR | Routing layer | Automatic route updates on failover | Only handles connection routing |
| Orchestrator | External tool (open source) | Feature-rich; handles complex topologies | Additional infrastructure; learning curve |
| ProxySQL | Proxy layer | Query routing + failover detection | Adds latency; needs monitoring integration |
| Cloud provider (RDS Multi-AZ) | Managed service | Zero configuration; automatic DNS update | Provider lock-in; limited customization |
Key Failover Considerations:
Untested failover is not failover—it's wishful thinking. Regularly (monthly at minimum) perform planned failovers in production to verify: (1) automation works as expected, (2) RTO (recovery time) meets requirements, (3) team knows the manual procedures, (4) applications handle reconnection correctly.
We've explored MySQL's replication and clustering capabilities in depth. Let's consolidate the key takeaways:
What's Next:
Now that we understand MySQL's replication and clustering capabilities, we'll compare MySQL vs PostgreSQL—two titans of the open-source database world. We'll examine their architectural differences, feature sets, and when to choose each.
You now have deep knowledge of MySQL's replication architecture—from binary logs and GTIDs through asynchronous and semi-synchronous replication to Group Replication and InnoDB Cluster. This knowledge is essential for designing resilient MySQL deployments that can survive failures.