Loading learning content...
Every production database maintains a transaction log—a sequential record of every change made to the data. This log exists for crash recovery: if the database fails mid-transaction, it can replay the log to restore consistency.
What makes log-based CDC revolutionary is a simple insight:
The database already knows exactly what changed, in exactly what order, with exactly what values. We just need to read it.
Rather than polling tables, triggering on changes, or relying on applications to emit events, log-based CDC reads the same authoritative record the database uses internally. This approach is simultaneously the most accurate, most efficient, and most decoupled method of capturing changes.
This page takes you deep into the mechanics of log-based CDC. You'll understand how databases structure their transaction logs, how CDC systems read and parse these logs, and the technical considerations that make log-based CDC the definitive approach for production data pipelines.
Before we can understand log-based CDC, we must understand what transaction logs are and why databases maintain them.
The Write-Ahead Logging (WAL) Protocol:
Almost all modern databases use some form of Write-Ahead Logging (WAL). The protocol is simple but powerful:
This guarantees that even if the system crashes mid-transaction, the log contains enough information to either complete or rollback any in-progress work.
What the Log Contains:
A typical transaction log record includes:
| Component | Purpose |
|---|---|
| Log Sequence Number (LSN) | Unique, monotonically increasing identifier for this record |
| Transaction ID | Which transaction made this change |
| Operation Type | INSERT, UPDATE, DELETE, or internal operations |
| Table/Page Reference | Where in the database this change applies |
| Before Image | The data values before the change (for UPDATE/DELETE) |
| After Image | The data values after the change (for INSERT/UPDATE) |
| Timestamp | When the change was logged |
| Commit/Rollback Markers | Transaction boundaries |
The LSN is particularly important for CDC—it provides an ordered position in the log that CDC systems use to track their progress and resume after failures.
Databases may use physical logging (records exactly which bytes changed on which page) or logical logging (records the SQL-like operation). Some use a hybrid. CDC systems prefer logical logs because they're easier to parse and contain semantic information about what changed, not just byte offsets.
Each database implements its transaction log differently. Understanding these differences is crucial for CDC implementation and choosing the right tools:
PostgreSQL: Write-Ahead Log (WAL) with Logical Decoding
PostgreSQL maintains a WAL that contains all changes to data files. For CDC, PostgreSQL offers Logical Decoding—a framework that converts the physical WAL into logical change streams.
Key Concepts:
pg_wal/ directory, 16MB segments by defaultConfiguration for CDC:
-- postgresql.conf
wal_level = logical -- Enable logical decoding
max_replication_slots = 10 -- Slots for CDC connectors
max_wal_senders = 10 -- Connections that can stream WAL
-- Create a replication slot for CDC
SELECT pg_create_logical_replication_slot(
'debezium_slot',
'pgoutput'
);
-- Create a publication (what to capture)
CREATE PUBLICATION cdc_publication FOR TABLE orders, products;
Important Considerations:
ALTER TABLE orders REPLICA IDENTITY FULL;
CDC connectors act as specialized database clients that read the transaction log as if they were replica nodes. Here's the detailed process:
Step-by-Step Breakdown:
1. Connection Establishment
The connector connects to the database using replication protocol:
# PostgreSQL replication connection
host=db.example.com
dbname=inventory
user=cdc_user
replication=database # Special replication mode
The database treats the connector as a replica, streaming changes continuously.
2. Initial Snapshot (Optional but Common)
Before streaming changes, most connectors take an initial snapshot of existing data:
1. Lock tables (or use consistent snapshot)
2. Record current log position (LSN/binlog position)
3. SELECT * FROM each table
4. Emit 'read' events for all existing rows
5. Release locks
6. Switch to streaming from recorded position
This ensures downstream systems receive complete data, not just changes from connector start.
3. Log Streaming
After snapshotting, the connector streams changes in real-time:
WAL Record → Parse → Extract table, operation, values →
Enrich with schema → Serialize to JSON/Avro →
Publish to Kafka topic
4. Offset Management
The connector tracks its position (LSN, binlog file:position, resume token):
{
"postgres": {
"lsn": "2F/ABC123",
"txId": 571,
"ts_usec": 1704067200000000
}
}
This offset is committed to durable storage (typically Kafka itself). On restart, the connector resumes from the last committed offset, ensuring no data loss.
Initial snapshots can take hours or days for large databases. During snapshotting, the connector holds database resources and connections. Many CDC deployments fail at this stage—plan for snapshot time, consider incremental snapshots, or evaluate schema-aware snapshot strategies that can be paused and resumed.
Reliable offset tracking is the foundation of CDC durability. Without it, changes would be lost or duplicated on every restart. Let's examine how this works in detail:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
interface CDCOffset { // Database-specific position position: { // PostgreSQL: LSN (Log Sequence Number) lsn?: string; // e.g., "2F/ABC123" // MySQL: binlog file and position binlogFile?: string; // e.g., "mysql-bin.000042" binlogPosition?: number; // e.g., 12345678 gtid?: string; // e.g., "3E11FA47-71CA-11E1-9E33:1-21" // MongoDB: resume token resumeToken?: string; // Opaque base64 token }; // Metadata for debugging and recovery timestamp: number; // Wall clock when change was captured transactionId: string; // Database transaction ID serverName: string; // Logical name of source database} class CDCConnector { private currentOffset: CDCOffset | null = null; private offsetStore: OffsetStore; // Backed by Kafka, Redis, or file async start(): Promise<void> { // 1. Load last committed offset this.currentOffset = await this.offsetStore.load(); if (this.currentOffset) { console.log(`Resuming from offset: ${JSON.stringify(this.currentOffset)}`); await this.resumeFromOffset(this.currentOffset); } else { console.log('No offset found, starting initial snapshot'); await this.performSnapshot(); } // 2. Begin streaming changes await this.streamChanges(); } private async streamChanges(): Promise<void> { for await (const change of this.database.streamReplicationChanges()) { // 3. Process each change const event = this.convertToEvent(change); // 4. Publish to Kafka (batched for efficiency) await this.producer.send({ topic: `${this.serverName}.${event.source.schema}.${event.source.table}`, messages: [{ key: this.extractKey(event), value: JSON.stringify(event), headers: { 'lsn': change.lsn } }] }); // 5. Update current offset (not yet committed) this.currentOffset = { position: { lsn: change.lsn }, timestamp: Date.now(), transactionId: change.txId, serverName: this.serverName }; } } // Called periodically or after batch confirmation async commitOffset(): Promise<void> { if (this.currentOffset) { await this.offsetStore.commit(this.currentOffset); console.log(`Committed offset: ${this.currentOffset.position.lsn}`); } }}Offset Storage Strategies:
| Storage | Pros | Cons | Best For |
|---|---|---|---|
| Kafka Connect Offsets Topic | Native integration, exactly-once with Kafka | Kafka-specific | Kafka Connect deployments |
| Database Table | Simple, queryable | Additional dependency | Small deployments |
| Redis | Fast, distributed | No transactions | High-throughput pipelines |
| File | Simplest | Not distributed | Development/testing |
The Exactly-Once Challenge:
Achieving exactly-once requires atomic commit of both the message and the offset:
1. Produce message to Kafka
2. Commit offset
--- Crash here? ---
3. Confirm both succeeded
If the connector crashes between steps 2 and 3, is the offset committed? If between 1 and 2, the message might never be produced on restart.
Solution: Transactional Produces
await producer.beginTransaction();
await producer.send(messages);
await producer.sendOffsets(offsetsToCommit);
await producer.commitTransaction(); // Atomic: all or nothing
This ensures the message and offset commit atomically—Kafka treats them as a single transaction.
One of the most complex aspects of log-based CDC is handling schema changes. When a table adds a column, changes the type of a field, or drops a column, the CDC pipeline must adapt without losing data or breaking consumers.
price changes from integer cents to decimal dollars. How do consumers interpret old vs new events?customer_name the same as customerName? Semantic vs syntactic changes.Schema Registry to the Rescue:
Tools like Confluent Schema Registry or AWS Glue Schema Registry provide:
// Message with schema reference
{
"magic_byte": 0,
"schema_id": 42, // Points to schema in registry
"payload": { ... actual event data ... }
}
Compatibility Modes:
| Mode | Definition | Typical Changes | Use Case |
|---|---|---|---|
| BACKWARD | New schema can read old data | Add optional fields, delete fields | Consumers upgraded first |
| FORWARD | Old schema can read new data | Delete optional fields, add fields | Producers upgraded first |
| FULL | Both BACKWARD and FORWARD | Add/delete optional fields only | Independent upgrades |
| NONE | No compatibility checking | Any change | Careful coordination required |
12345678910111213141516171819202122232425262728
// Schema Version 1{ "type": "record", "name": "Order", "fields": [ {"name": "order_id", "type": "long"}, {"name": "status", "type": "string"}, {"name": "amount", "type": "int"} // Cents ]} // Schema Version 2 (BACKWARD compatible){ "type": "record", "name": "Order", "fields": [ {"name": "order_id", "type": "long"}, {"name": "status", "type": "string"}, {"name": "amount", "type": "int"}, {"name": "currency", "type": "string", "default": "USD"}, // New optional field {"name": "notes", "type": ["null", "string"], "default": null} // New optional field ]} // Consumers using Schema V2 can read V1 data:// - order_id, status, amount: present in both// - currency: defaults to "USD" for V1 data// - notes: defaults to null for V1 dataUnderstanding CDC performance is critical for capacity planning and operational stability. Log-based CDC has unique performance characteristics that differ significantly from traditional database workloads.
| Dimension | Typical Value | Factors | Optimization |
|---|---|---|---|
| Latency | 100ms - 2s | Log flush interval, polling frequency, batch size | Reduce flush intervals, smaller batches |
| Throughput | 10K - 100K events/sec | Log read speed, network, serialization | Parallel readers, efficient serialization (Avro) |
| Source Impact | 1-5% CPU increase | Log decoding, replication connections | Dedicated replication slots, off-peak snapshots |
| Memory Usage | 256MB - 2GB per connector | Batch buffer size, schema cache | Tune batch settings, limit concurrent tables |
Latency Breakdown:
End-to-End CDC Latency
├─────────────────────────────────────────────────────────────────┤
│ DB Commit│ Log Flush │ CDC Read │ Parse │ Network │ Broker Commit │
│ ~1ms │ ~10-100ms│ ~10-50ms│ ~5-20ms│ ~5-50ms │ ~10-100ms │
│ │ │ │ │ │ │
└──────────┴───────────┴──────────┴────────┴─────────┴───────────────┘
Total: 50ms - 500ms typical
Key Performance Insights:
Log flush is often the bottleneck: Most databases don't instantly persist WAL. The wal_writer_delay (PostgreSQL) or sync_binlog (MySQL) controls this.
Large transactions cause latency spikes: A transaction modifying millions of rows is captured as one huge batch, causing downstream processing delays.
Schema lookups add overhead: Each table needs its schema. Caching schemas is essential.
Network matters for geo-distributed CDC: Streaming logs across regions adds significant latency.
12345678910111213141516171819202122
# Debezium connector performance tuning # Reduce latency by polling more frequently (tradeoff: more overhead)poll.interval.ms=100 # Batch changes for throughput (tradeoff: increases latency)max.batch.size=2048 # Snapshot performance for large tablessnapshot.max.threads=4snapshot.fetch.size=10240 # Handle large transactions (prevent OOM)max.queue.size=8192max.queue.size.in.bytes=67108864 # 64MB queue limit # Heartbeat to detect low-traffic periodsheartbeat.interval.ms=10000 # Schema cache sizeschema.history.internal.kafka.topic=schema-historyschema.history.internal.kafka.bootstrap.servers=kafka:9092A transaction that modifies 10 million rows creates a 10-million-event batch in CDC. This can exhaust memory, cause timeout failures, and create massive downstream processing delays. Monitor for large transactions; consider breaking them into smaller batches if they originate from your applications.
Running log-based CDC in production requires careful attention to operational concerns. These are the areas that cause the most production incidents:
wal_keep_size (PostgreSQL) or expire_logs_days (MySQL).1234567891011121314151617181920212223242526272829
-- PostgreSQL: Monitor replication slot lagSELECT slot_name, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag_size, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes, CASE WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 1073741824 THEN 'CRITICAL: >1GB lag' WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 104857600 THEN 'WARNING: >100MB lag' ELSE 'OK' END AS statusFROM pg_replication_slotsWHERE slot_type = 'logical'; -- MySQL: Monitor binlog position lagSHOW MASTER STATUS;SHOW REPLICA STATUS\G -- Compare Exec_Master_Log_Pos vs Read_Master_Log_Pos -- PostgreSQL: Check WAL disk usageSELECT pg_size_pretty(sum(size)) as wal_size, count(*) as wal_filesFROM pg_ls_waldir(); -- Alert: WAL exceeds threshold (shell script)-- WAL_SIZE=$(psql -t -c "SELECT sum(size) FROM pg_ls_waldir()")-- if [ $WAL_SIZE -gt 10737418240 ]; then alert "WAL > 10GB"; fiRecovery Runbook Template:
| Scenario | Detection | Recovery Action | RTO |
|---|---|---|---|
| Connector crash | No events for 5 min | Restart connector, resumes from offset | <5 min |
| Offset corruption | Events with wrong sequence | Stop, delete offset, re-snapshot | Hours |
| Slot deleted accidentally | Connector fails to connect | Recreate slot, re-snapshot | Hours |
| Source DB failover | Connection errors | Reconnect to new primary, may need re-snapshot | Minutes-Hours |
| Schema incompatibility | Serialization errors | Pause CDC, update consumers, resume | Hours |
We've deeply explored the technical mechanics of log-based CDC. Let's consolidate the key insights:
What's Next:
Now that you understand how log-based CDC works at a technical level, we'll examine the tools that implement these patterns. The next page covers Debezium and other CDC tools—comparing capabilities, architectures, and when to use each.
You now understand the deep mechanics of log-based CDC—how databases write transaction logs, how CDC systems read them, offset tracking for reliability, schema evolution challenges, and operational considerations. Next, we'll explore the ecosystem of tools that bring these concepts to life.