Mysql - Learning Module

Loading content...

0/273

Replication and Clustering: High Availability at Scale

Beyond the Single Server

A single MySQL server, no matter how powerful, represents a single point of failure. When that server fails—and eventually, it will—your application goes down with it. Hardware fails. Data centers lose power. Disks corrupt. Networks partition. The question isn't if your database will experience a failure, but when.

MySQL's replication and clustering capabilities transform a fragile single-server deployment into a resilient, highly available system. Through replication, data is copied to standby servers that can take over when the primary fails. Through clustering, multiple servers coordinate to provide automatic failover and data distribution.

Understanding MySQL's replication architecture is essential for any system running MySQL in production. Even if you're using managed cloud databases like Amazon RDS or Cloud SQL, these services build upon the same replication primitives we'll discuss here.

What You Will Learn

By the end of this page, you will understand MySQL's binary log (binlog) architecture, master asynchronous and semi-synchronous replication, learn group replication for multi-primary deployments, explore MySQL InnoDB Cluster for automatic failover, and gain practical knowledge for designing replication topologies in production systems.

The Binary Log: Foundation of Replication

All MySQL replication is built on the binary log (binlog)—a sequential log of all data modifications. Understanding the binlog is fundamental to understanding how replication works.

What Gets Logged:

The binlog records all statements or row changes that modify data:

INSERT, UPDATE, DELETE operations
DDL statements (CREATE TABLE, ALTER TABLE, etc.)
Transactions with their boundaries (BEGIN, COMMIT, ROLLBACK)

The binlog does NOT record:

SELECT statements (pure reads)
SHOW statements (metadata queries)
SET statements (within session scope only)

Binary Log Format:

MySQL supports three binlog formats, each with trade-offs:

Binary Log Formats
Format	Records	Pros	Cons
STATEMENT	Original SQL statements	Compact logs; human-readable	Non-deterministic functions (NOW(), RAND()) may produce different results on replica
ROW	Actual row changes (before/after images)	Deterministic replication; handles any statement	Larger logs; bulk updates generate many row events
MIXED	STATEMENT when safe, ROW when uncertain	Best of both worlds	Less predictable log sizes; complexity in analysis

ROW-based replication is the modern default (binlog_format=ROW). It guarantees replica data exactly matches the source, regardless of statement complexity.

Binary Log Structure:

┌─────────────────────────────────────────────────────────────┐
│                    Binary Log Files                          │
│                                                              │
│   ┌────────────────────────────────────────────────┐        │
│   │ mysql-bin.000001        (oldest)                │        │
│   │   [Format Description Event]                    │        │
│   │   [Query Event: CREATE TABLE users...]          │        │
│   │   [Transaction: BEGIN → Row Events → COMMIT]    │        │
│   │   ...                                           │        │
│   └────────────────────────────────────────────────┘        │
│                                                              │
│   ┌────────────────────────────────────────────────┐        │
│   │ mysql-bin.000002                                │        │
│   │   [GTID Event: source_id:42]                    │        │
│   │   [Transaction: INSERT INTO orders...]          │        │
│   │   ...                                           │        │
│   └────────────────────────────────────────────────┘        │
│                                                              │
│   mysql-bin.index  (lists all binlog files)                 │
└─────────────────────────────────────────────────────────────┘

Each transaction in the binlog is identified either by:

File position (mysql-bin.000001:1234) — Legacy approach
GTID (Global Transaction ID) — Modern approach, preferred

Global Transaction IDs (GTIDs):

GTIDs provide a globally unique identifier for each transaction: source_uuid:transaction_id. For example: 3E11FA47-71CA-11E1-9E33-C80AA9429562:42

GTID Advantages

•Simplified failover — Replicas know exactly which transactions they've applied; no need to calculate file positions.
•Automatic positioning — When connecting to a new source, replicas can auto-position using GTID sets.
•Topology changes — Easier to reconfigure replication chains without manual position calculations.
•Duplicate prevention — GTID guarantees a transaction is never applied twice, even if replayed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Enable GTIDs (requires restart in MySQL < 8.0.17)
-- In my.cnf:
[mysqld]
gtid_mode = ON
enforce-gtid-consistency = ON
binlog_format = ROW
 
-- Check GTID status
SHOW VARIABLES LIKE 'gtid%';
 
-- View executed GTIDs
SELECT @@global.gtid_executed;
-- Returns: "3E11FA47-71CA-11E1-9E33-C80AA9429562:1-1000,
--           8AD5F9F1-71CA-11E1-9E33-C80AA9429562:1-500"
 
-- View binlog events
SHOW BINLOG EVENTS IN 'mysql-bin.000001' LIMIT 10;

Binlog Retention

Configure binlog_expire_logs_seconds (or expire_logs_days in older versions) to automatically purge old binlogs. Balance between disk usage and the need for point-in-time recovery or replica catch-up. Typical values: 7-14 days for production systems with good monitoring.

Asynchronous Replication: The Classic Approach

MySQL's default replication mode is asynchronous. The source (master) commits transactions without waiting for replicas to receive or apply them. This provides the best performance but offers no durability guarantee on replicas.

How Asynchronous Replication Works:

┌─────────────────────────────────────────────────────────────┐
│                 Asynchronous Replication                     │
│                                                              │
│   SOURCE (Primary)         REPLICA (Secondary)              │
│   ┌────────────────┐       ┌────────────────┐               │
│   │                │       │                │               │
│   │  Transaction   │       │   I/O Thread   │               │
│   │  commits to    │       │   connects to  │               │
│   │  binlog        │       │   source       │               │
│   │       │        │       │       │        │               │
│   │       ▼        │       │       ▼        │               │
│   │  Client gets   │       │  Receives      │               │
│   │  SUCCESS       │       │  binlog events │               │
│   │  (async!)      │       │       │        │               │
│   │                │       │       ▼        │               │
│   │                │       │  Writes to     │               │
│   │                │       │  Relay Log     │               │
│   │                │       │       │        │               │
│   │                │       │       ▼        │               │
│   │                │       │  SQL Thread    │               │
│   │                │       │  applies       │               │
│   │                │       │  events        │               │
│   └────────────────┘       └────────────────┘               │
│                                                              │
│   Timeline:                                                  │
│   T=0    Client commits on source                           │
│   T=0+ε  Client receives SUCCESS                            │
│   T=50ms Replica receives binlog (network latency)          │
│   T=60ms Replica applies transaction (processing time)      │
└─────────────────────────────────────────────────────────────┘

Replica Threads:

Each MySQL replica runs two (or more) threads for replication:

Replica Thread Roles

•I/O Thread — Connects to the source's binlog dump thread, receives binlog events, and writes them to the local relay log. This thread is I/O-bound and rarely a bottleneck.
•SQL Thread (Applier) — Reads events from the relay log and executes them. This is often the bottleneck—it's single-threaded by default, applying transactions sequentially.
•Worker Threads (Multi-Threaded Applier) — MySQL 5.6+ supports parallel replication with multiple worker threads. MySQL 8.0's LOGICAL_CLOCK parallelization applies transactions that committed in the same group concurrently.

Replication Lag:

The time difference between a transaction committing on the source and being applied on the replica is called replication lag. Some lag is inevitable; the question is how much is acceptable.

Causes of replication lag:

Network latency between source and replica
Replica hardware slower than source
Single-threaded SQL applier (the classic bottleneck)
Long-running transactions that block the applier
High write volume overwhelming the replica

Monitoring lag:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Classic status command
SHOW REPLICA STATUS\G
-- Key fields:
-- Replica_IO_Running: Yes/No
-- Replica_SQL_Running: Yes/No
-- Seconds_Behind_Source: Replication lag in seconds
-- Last_Error: Any replication errors
 
-- MySQL 8.0+ performance_schema method (more accurate)
SELECT * FROM performance_schema.replication_applier_status_by_coordinator;
SELECT * FROM performance_schema.replication_applier_status_by_worker;
 
-- Check GTID-based lag (executed on replica)
SELECT 
  @@global.gtid_executed AS replica_executed,
  @@global.gtid_purged AS replica_purged;
 
-- On source, compare with:
SELECT @@global.gtid_executed AS source_executed;
 
-- The difference between these sets indicates lag

Data Loss Risk

With asynchronous replication, if the source fails before a transaction is replicated, that transaction is LOST. This is the fundamental trade-off: asynchronous replication is fast but provides no durability guarantee beyond the source. For zero data loss, you need semi-synchronous replication or group replication.

Semi-Synchronous Replication: The Durability Compromise

Semi-synchronous replication provides a middle ground between asynchronous replication (fast but risky) and fully synchronous replication (safe but slow). The source waits for at least one replica to acknowledge receipt of the transaction before returning success to the client.

How Semi-Sync Works:

┌─────────────────────────────────────────────────────────────┐
│              Semi-Synchronous Replication                    │
│                                                              │
│   SOURCE                          REPLICA                   │
│   ┌──────────────┐                ┌──────────────┐          │
│   │ 1. Client    │                │              │          │
│   │    sends     │                │              │          │
│   │    COMMIT    │                │              │          │
│   │      │       │                │              │          │
│   │      ▼       │                │              │          │
│   │ 2. Write to  │                │              │          │
│   │    binlog    │                │              │          │
│   │      │       │   binlog       │              │          │
│   │      ├───────────event────────▶ 3. Receive   │          │
│   │      │       │                │    event     │          │
│   │      │       │   ACK          │      │       │          │
│   │      │◀──────────────────────────────┘       │          │
│   │      │       │                │              │          │
│   │      ▼       │                │      ▼       │          │
│   │ 4. Return    │                │ 5. Write to  │          │
│   │    SUCCESS   │                │    relay log │          │
│   │    to client │                │      │       │          │
│   │              │                │      ▼       │          │
│   │              │                │ 6. Apply     │          │
│   │              │                │    (async)   │          │
│   └──────────────┘                └──────────────┘          │
│                                                              │
│   Note: Client waits for ACK (step 4), but replica          │
│   application (step 6) happens asynchronously.              │
└─────────────────────────────────────────────────────────────┘

Key distinction: The replica acknowledges receiving the binlog event (written to relay log), NOT that it has applied the transaction. This is why it's called "semi" synchronous—the event is durable on the replica's disk (relay log) but not yet reflected in replica's data.

Semi-Sync Configuration Options
Parameter	Default	Purpose
rpl_semi_sync_source_enabled	OFF	Enable semi-sync on source
rpl_semi_sync_replica_enabled	OFF	Enable semi-sync on replica
rpl_semi_sync_source_timeout	10000 (10s)	Timeout waiting for ACK; falls back to async
rpl_semi_sync_source_wait_for_replica_count	1	Number of replicas that must ACK
rpl_semi_sync_source_wait_point	AFTER_SYNC	When to wait: AFTER_SYNC (before storage engine commit) or AFTER_COMMIT

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- On SOURCE:
-- Install the plugin (MySQL 8.0+: already loaded, just enable)
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = ON;
SET GLOBAL rpl_semi_sync_source_timeout = 5000; -- 5 second timeout
 
-- On REPLICA:
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = ON;
 
-- Verify semi-sync is active
SHOW STATUS LIKE 'Rpl_semi_sync%';
-- Key metrics:
-- Rpl_semi_sync_source_status: ON (semi-sync active)
-- Rpl_semi_sync_source_no_tx: Count of async fallback transactions
-- Rpl_semi_sync_source_yes_tx: Count of semi-sync transactions
-- Rpl_semi_sync_source_clients: Number of semi-sync replicas

AFTER_SYNC vs AFTER_COMMIT:

The rpl_semi_sync_source_wait_point parameter controls when the source waits for acknowledgment:

AFTER_SYNC (default): Wait after writing to binlog, but before committing in InnoDB. If the source crashes after ACK but before commit, the transaction is on the replica but not on the source ("phantom read" potential).
AFTER_COMMIT: Wait after InnoDB commit. The transaction is visible on the source before the replica ACKs. If the source crashes, clients may have seen data that replicas don't have.

AFTER_SYNC is generally preferred because it guarantees that if a client sees a commit success, the data exists on at least one replica.

Timeout Fallback

If no replica ACKs within the timeout, the source falls back to asynchronous replication. This prevents one slow/dead replica from blocking all writes. Monitor Rpl_semi_sync_source_no_tx—if it's increasing, investigate replica health or increase timeout. Frequent fallbacks defeat the purpose of semi-sync.

Replication Topologies: Architectural Patterns

MySQL replication can be arranged in various topologies, each with different characteristics for availability, scalability, and operational complexity.

Topology 1: Single Source with Replicas

The simplest and most common topology. One source handles all writes; replicas handle reads and provide failover capability.

Single Source Topology

•Writes: Go to the single source server
•Reads: Can be distributed across source and replicas
•Failover: Promote a replica to source (manual or automated)
•Use case: Most web applications, OLTP workloads
•Limitation: Write scalability limited to single server

Topology 2: Chain Replication

Replicas replicate from other replicas, forming a chain. Reduces load on the source but increases replication lag.

Source → Replica A → Replica B → Replica C

Useful when you need many replicas (dozens) but don't want them all connecting directly to the source. Each hop adds latency.

Topology 3: Multi-Source Replication

A single replica replicates from multiple sources. Useful for aggregating data from shards or combining data from different systems.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- On the replica, configure multiple replication channels
CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = 'source1.example.com',
  SOURCE_PORT = 3306,
  SOURCE_USER = 'repl',
  SOURCE_PASSWORD = 'secret',
  SOURCE_AUTO_POSITION = 1
FOR CHANNEL 'source1';
 
CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = 'source2.example.com',
  SOURCE_PORT = 3306,
  SOURCE_USER = 'repl',
  SOURCE_PASSWORD = 'secret',
  SOURCE_AUTO_POSITION = 1
FOR CHANNEL 'source2';
 
-- Start replication on all channels
START REPLICA FOR CHANNEL 'source1';
START REPLICA FOR CHANNEL 'source2';
 
-- Check status per channel
SHOW REPLICA STATUS FOR CHANNEL 'source1'\G
SHOW REPLICA STATUS FOR CHANNEL 'source2'\G

Topology 4: Ring/Circular Replication (Deprecated Pattern)

Each server replicates from another forming a ring. Historically used for multi-primary writes. Avoid this topology—it's fragile, difficult to recover, and superseded by Group Replication.

Topology Selection Guide

For most applications, start with single-source + replicas topology. For true multi-primary writes, use Group Replication or MySQL InnoDB Cluster. For aggregating from multiple shards, use multi-source replication. Avoid chain and ring topologies unless you have very specific requirements.

MySQL Group Replication: Consensus-Based HA

MySQL Group Replication is a plugin that implements a virtually synchronous replication protocol based on Paxos consensus. It provides automatic failover, consistent reads, and optional multi-primary writes.

How Group Replication Works:

┌─────────────────────────────────────────────────────────────┐
│                    Group Replication                         │
│                                                              │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐             │
│   │  Primary │    │  Primary │    │  Primary │             │
│   │(Elected) │    │   or     │    │   or     │             │
│   │          │    │Secondary │    │Secondary │             │
│   └────┬─────┘    └────┬─────┘    └────┬─────┘             │
│        │               │               │                    │
│        └───────────────┼───────────────┘                    │
│                        │                                    │
│              ┌─────────┴──────────┐                        │
│              │    Group           │                        │
│              │    Communication   │                        │
│              │    System (GCS)    │                        │
│              │                    │                        │
│              │  - Membership      │                        │
│              │  - Total ordering  │                        │
│              │  - Certification   │                        │
│              └────────────────────┘                        │
│                                                              │
│   All servers agree on transaction order via consensus.     │
│   Conflicting transactions are detected and rolled back.    │
└─────────────────────────────────────────────────────────────┘

Key Concepts:

Group Replication Concepts

•Group: A set of servers that replicate from each other. All members know about all other members.
•Consensus Protocol: Based on Paxos. All members agree on the order of transactions.
•Certification: Before committing, transactions are certified against concurrent transactions. Conflicts are detected and one transaction is rolled back.
•Single-Primary Mode: One member accepts writes; others are read-only. Automatic failover if primary fails.
•Multi-Primary Mode: All members accept writes. Requires careful application design to avoid conflicts.
•Fault Tolerance: A group of N members tolerates (N-1)/2 failures. A 3-member group tolerates 1 failure; 5 members tolerate 2.

Single-Primary vs Multi-Primary Mode
Aspect	Single-Primary	Multi-Primary
Write handling	All writes go to elected primary	Writes accepted by any member
Conflict handling	No write conflicts possible	Certification detects and rejects conflicts
Application complexity	Simple (just point to primary)	Complex (must handle rollbacks)
Use case	Most HA deployments	Geographically distributed writes
Foreign key support	Full support	Supported with restrictions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Configuration (in my.cnf for all members):
[mysqld]
# Enable Group Replication plugin
plugin_load_add = 'group_replication.so'
 
# Required settings
gtid_mode = ON
enforce_gtid_consistency = ON
log_bin = binlog
binlog_format = ROW
binlog_checksum = NONE  # Group Replication requirement
log_slave_updates = ON
 
# Group Replication settings
group_replication_group_name = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
group_replication_local_address = "server1:33061"  # GCS port
group_replication_group_seeds = "server1:33061,server2:33061,server3:33061"
group_replication_single_primary_mode = ON
 
-- Bootstrap on first member:
SET GLOBAL group_replication_bootstrap_group = ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group = OFF;
 
-- Join other members:
START GROUP_REPLICATION;
 
-- Check group status
SELECT * FROM performance_schema.replication_group_members;

Network Requirements

Group Replication is sensitive to network latency and partitions. All members should be in the same data center or connected via low-latency links (<10ms recommended). For cross-region deployments, consider MySQL InnoDB ClusterSet which orchestrates independent clusters with asynchronous replication between them.

MySQL InnoDB Cluster: Integrated HA Solution

MySQL InnoDB Cluster is a complete high-availability solution that bundles Group Replication with MySQL Router (for connection routing) and MySQL Shell (for administration). It's MySQL's answer to the question: "How do I set up truly automatic failover?"

InnoDB Cluster Components:

┌─────────────────────────────────────────────────────────────┐
│                 MySQL InnoDB Cluster Architecture            │
│                                                              │
│   ┌─────────────────────────────────────────────────────┐   │
│   │                     Application                      │   │
│   └─────────────────────────┬───────────────────────────┘   │
│                             │                                │
│   ┌─────────────────────────▼───────────────────────────┐   │
│   │               MySQL Router                           │   │
│   │  - Routes writes to R/W port (6446) → Primary        │   │
│   │  - Routes reads to R/O port (6447) → Any member      │   │
│   │  - Automatic failover: updates routing on primary    │   │
│   │    change                                            │   │
│   └─────────────────────────┬───────────────────────────┘   │
│                             │                                │
│   ┌─────────────────────────▼───────────────────────────┐   │
│   │              InnoDB Cluster (Group Replication)      │   │
│   │   ┌──────────┐   ┌──────────┐   ┌──────────┐        │   │
│   │   │ Primary  │   │Secondary │   │Secondary │        │   │
│   │   │(R/W)     │   │(R)       │   │(R)       │        │   │
│   │   └──────────┘   └──────────┘   └──────────┘        │   │
│   └─────────────────────────────────────────────────────┘   │
│                                                              │
│   Managed by MySQL Shell (dba.createCluster(), etc.)        │
└─────────────────────────────────────────────────────────────┘

InnoDB Cluster Benefits

•Automatic failover: When the primary fails, the group elects a new primary and Router updates its routing—applications reconnect automatically.
•Simplified administration: MySQL Shell provides AdminAPI (JavaScript/Python) for cluster management—creating clusters, adding nodes, checking status—with single commands.
•Connection routing: MySQL Router handles read/write splitting and connection failover without application code changes.
•Integrated monitoring: Performance Schema tables provide cluster health information; MySQL Shell can report status.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Connect to MySQL Shell
// mysqlsh root@server1:3306
 
// Check instance is ready for cluster
dba.checkInstanceConfiguration('root@server1:3306');
 
// Configure instance if needed
dba.configureInstance('root@server1:3306');
 
// Create the cluster
var cluster = dba.createCluster('myCluster');
 
// Add instances to the cluster
cluster.addInstance('root@server2:3306');
cluster.addInstance('root@server3:3306');
 
// Check cluster status
cluster.status();
/* Output:
{
    "clusterName": "myCluster",
    "status": "OK",
    "topology": {
        "server1:3306": {
            "role": "PRIMARY",
            "mode": "R/W",
            "status": "ONLINE"
        },
        "server2:3306": {
            "role": "SECONDARY",
            "mode": "R/O",
            "status": "ONLINE"
        },
        "server3:3306": {
            "role": "SECONDARY",
            "mode": "R/O",
            "status": "ONLINE"
        }
    }
}
*/
 
// Bootstrap MySQL Router
// On router host:
// mysqlrouter --bootstrap root@server1:3306 --user=mysqlrouter

InnoDB ClusterSet for Multi-Region

For global deployments, MySQL 8.0.27+ offers InnoDB ClusterSet—multiple InnoDB Clusters linked by asynchronous replication. This provides geographic redundancy with region-local write performance while maintaining disaster recovery to remote regions.

Failover Strategies and Recovery

Failover—the process of switching from a failed source to a replica—is the critical moment that determines whether your high-availability investment pays off. Let's examine failover strategies from manual to fully automated.

Manual Failover (Simplest, Slowest):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- 1. Verify source is actually down (avoid split-brain!)
 
-- 2. On the replica to be promoted, stop replication
STOP REPLICA;
 
-- 3. Verify replica has applied all received events
-- (Exec_Master_Log_Pos should equal Read_Master_Log_Pos)
SHOW REPLICA STATUS\G
 
-- 4. If using GTIDs, ensure replica is caught up
SELECT @@global.gtid_executed;
 
-- 5. Make replica writeable
SET GLOBAL read_only = OFF;
SET GLOBAL super_read_only = OFF;
 
-- 6. Promote to source (reset replica configuration)
RESET REPLICA ALL;
 
-- 7. Update application connection strings to point to new source
 
-- 8. Optionally: Configure other replicas to replicate from new source
CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = 'new-source-host',
  SOURCE_AUTO_POSITION = 1;
START REPLICA;

Automated Failover Options:

Failover Automation Tools
Tool	Type	Pros	Cons
MySQL InnoDB Cluster	Built-in (Group Replication)	Native integration; no external tools	Requires Group Replication setup
MySQL Router + GR	Routing layer	Automatic route updates on failover	Only handles connection routing
Orchestrator	External tool (open source)	Feature-rich; handles complex topologies	Additional infrastructure; learning curve
ProxySQL	Proxy layer	Query routing + failover detection	Adds latency; needs monitoring integration
Cloud provider (RDS Multi-AZ)	Managed service	Zero configuration; automatic DNS update	Provider lock-in; limited customization

Key Failover Considerations:

Failover Gotchas

•Split-brain prevention: NEVER have two nodes accepting writes simultaneously. Use fencing, STONITH ("Shoot The Other Node In The Head"), or consensus protocols.
•Data loss with async replication: The promoted replica may lack the latest transactions. Accept data loss or use semi-sync/Group Replication.
•Replica lag at failover: Choose the replica with the least lag to minimize data loss.
•Connection storms: After failover, all applications reconnect simultaneously. Use connection pooling and circuit breakers.
•DNS TTL: If using DNS for failover, low TTL (30-60 seconds) helps; some clients cache aggressively.
•Errant transactions: Transactions applied only on a replica (e.g., accidental writes) can prevent rejoining as replica.

Test Your Failovers

Untested failover is not failover—it's wishful thinking. Regularly (monthly at minimum) perform planned failovers in production to verify: (1) automation works as expected, (2) RTO (recovery time) meets requirements, (3) team knows the manual procedures, (4) applications handle reconnection correctly.

Summary: MySQL Replication and Clustering

We've explored MySQL's replication and clustering capabilities in depth. Let's consolidate the key takeaways:

Key Takeaways

•The binary log is the foundation of all MySQL replication — Use ROW format and GTIDs for modern deployments.
•Asynchronous replication is fast but risks data loss — Suitable when some data loss is acceptable and lag is tolerable.
•Semi-synchronous replication provides durability guarantees — At least one replica has the data before commit returns.
•Group Replication provides consensus-based HA — Automatic failover, conflict detection, true multi-primary support.
•MySQL InnoDB Cluster integrates GR + Router + Shell — Complete HA solution with simplified administration.
•Choose topology based on requirements — Single-source for simplicity; Group Replication for automatic failover; ClusterSet for multi-region.
•Monitor replication lag and test failovers — Untested HA is not HA.
•Consider network latency for consensus protocols — Group Replication works best in same data center.

What's Next:

Now that we understand MySQL's replication and clustering capabilities, we'll compare MySQL vs PostgreSQL—two titans of the open-source database world. We'll examine their architectural differences, feature sets, and when to choose each.

Page Complete

You now have deep knowledge of MySQL's replication architecture—from binary logs and GTIDs through asynchronous and semi-synchronous replication to Group Replication and InnoDB Cluster. This knowledge is essential for designing resilient MySQL deployments that can survive failures.