Loading learning content...
A single PostgreSQL server, no matter how powerful, represents a single point of failure and a scalability ceiling. Replication addresses both concerns by maintaining copies of your data on multiple servers. When implemented correctly, replication provides high availability (automatic failover when primary fails), read scaling (distribute read load across replicas), and disaster recovery (survive datacenter failures).
PostgreSQL offers sophisticated replication options that have evolved significantly over its history. Understanding the differences between physical and logical replication, synchronous and asynchronous modes, and various failover architectures is essential for designing resilient systems.
This page covers PostgreSQL's replication mechanisms in depth: Write-Ahead Log (WAL) streaming for physical replication, logical replication for selective data distribution, synchronous and asynchronous configurations, cascading replicas, and high availability architectures including Patroni and pgpool-II.
PostgreSQL provides two fundamentally different approaches to replication, each suited for different use cases:
Physical Replication (Streaming Replication):
Physical replication transmits the Write-Ahead Log (WAL)—the binary transaction log—to standby servers. Standbys apply WAL records to their data files, resulting in byte-for-byte identical copies of the primary. This is a complete database copy at the storage level.
Logical Replication:
Logical replication transmits changes as logical operations (INSERT, UPDATE, DELETE on specific tables). The standby reconstructs changes from these logical operations. This allows selective table replication, cross-version replication, and even schema differences between publisher and subscriber.
| Aspect | Physical Replication | Logical Replication |
|---|---|---|
| Data Transmitted | Binary WAL records | Logical row changes (INSERT/UPDATE/DELETE) |
| Replication Scope | Entire cluster (all databases) | Selected tables within a database |
| Standby Writability | Read-only (hot standby) | Writable (can have local tables) |
| Version Compatibility | Same major version required | Cross-version replication possible |
| Schema Requirements | Identical schemas | Compatible schemas (flexible) |
| DDL Replication | Automatic (in WAL) | Not replicated (must apply manually) |
| Initial Setup | pg_basebackup (full copy) | Table-by-table sync + streaming |
| Performance Overhead | Minimal (WAL shipping) | Higher (decoding + apply) |
| Use Cases | HA failover, read replicas | Data integration, subset replication |
Physical replication for: High availability with automatic failover, read replicas for the same application, disaster recovery to a remote site. Logical replication for: Replicating specific tables to analytics systems, migrating between PostgreSQL versions, consolidating from multiple sources to one target, enabling writes on the replica for local data.
Streaming replication is PostgreSQL's primary mechanism for high availability. The primary server streams WAL records to standby servers in near-real-time, maintaining synchronized copies of the entire database cluster.
How Streaming Replication Works:
1234567891011121314151617181920212223242526
Primary Server Standby Server┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐│ Backend Process (handling queries) │ │ Startup Process (applies WAL) ││ │ │ │ ▲ ││ ▼ │ │ │ ││ ┌─────────────────┐ │ │ ┌─────────────────┐ ││ │ WAL Buffer │ │ │ │ WAL Receiver │◄──────┐ ││ │ (in memory) │ │ │ │ Process │ │ ││ └────────┬────────┘ │ │ └─────────────────┘ │ ││ │ │ │ │ │ ││ ▼ │ │ ▼ │ ││ ┌─────────────────┐ │ │ ┌─────────────────┐ │ ││ │ WAL Files │ │ │ │ WAL Files │ │ ││ │ (pg_wal/) │ │ │ │ (pg_wal/) │ │ ││ └────────┬────────┘ │ │ └─────────────────┘ │ ││ │ │ │ │ ││ ▼ │ │ Hot Standby Queries │ ││ ┌─────────────────┐ │ │ ┌─────────────────┐ │ ││ │ WAL Sender │────TCP/IP────────────►│ │ Client Conns │ │ ││ │ Process │ │ │ └─────────────────┘ │ ││ └─────────────────┘ │ │ │ ││ │ │ │ │└─────────────────────────────────────┘ └────────────────────────────┘ │ │ Feedback: flush position, apply position ◄───────────────────────────────────────────────────────1234567891011121314151617181920212223242526272829303132333435
# === On Primary Server === # 1. Configure postgresql.confwal_level = replica # Enable replication info in WALmax_wal_senders = 10 # Max number of standbyswal_keep_size = 1GB # Retain WAL for slow standbyshot_standby = on # Allow queries on standby # 2. Create replication userpsql -c "CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';" # 3. Configure pg_hba.conf (allow replication connections)# host replication replicator standby_ip/32 scram-sha-256 # 4. Restart PostgreSQLpg_ctl restart -D /var/lib/postgresql/data # === On Standby Server === # 1. Stop PostgreSQL if runningpg_ctl stop -D /var/lib/postgresql/data # 2. Clear data directory (IMPORTANT: this deletes existing data)rm -rf /var/lib/postgresql/data/* # 3. Take base backup from primarypg_basebackup -h primary_ip -D /var/lib/postgresql/data -U replicator -P -R # The -R flag creates standby.signal and configures recovery settings # 4. Start standbypg_ctl start -D /var/lib/postgresql/data # Standby is now recovering and accepting read-only queriesMonitoring Replication Lag:
Replication lag—the delay between a transaction committing on primary and being visible on standby—is the critical metric for replica health.
123456789101112131415161718192021222324252627282930
-- On Primary: Check connected standbysSELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS replay_lag, sync_state -- 'async', 'sync', 'potential', 'quorum'FROM pg_stat_replication; -- On Primary: Replication lag in time (approximate)SELECT client_addr, CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN '0 seconds' ELSE NOW() - pg_last_xact_replay_timestamp() END AS replication_lagFROM pg_stat_replication; -- On Standby: Check recovery statusSELECT pg_is_in_recovery() AS is_standby, pg_last_wal_receive_lsn() AS last_received, pg_last_wal_replay_lsn() AS last_applied, pg_last_xact_replay_timestamp() AS last_tx_time; -- Check for WAL files waiting to be appliedSELECT COUNT(*) as pending_wal_files FROM pg_ls_waldir();Replication slots (CREATE SLOT) ensure WAL is retained until standby confirms receipt, preventing 'too far behind' errors. However, if a standby goes offline with an active slot, WAL accumulates indefinitely on primary, potentially filling the disk. Monitor slot lag (pg_replication_slots) and consider slot timeouts for unmonitored standbys.
By default, streaming replication is asynchronous—the primary doesn't wait for standbys before confirming commits. This maximizes throughput but means committed transactions might be lost if the primary fails before standbys receive the WAL.
Synchronous replication addresses this by requiring the primary to wait for one or more standbys to confirm WAL receipt before returning success to the client.
Confirmation Levels:
| Level | Wait For | Durability | Performance Impact | Use Case |
|---|---|---|---|---|
| off | Nothing (async) | Data in primary WAL buffer; risk of 3x wal_writer_delay loss | Lowest latency | Logs, metrics, ephemeral data |
| local | Primary disk flush | Data survives primary crash | Low latency | Standard operations |
| remote_write | Standby OS cache | Data on standby memory; rare loss on both crash | Moderate latency | Good performance / safety balance |
| on (remote_flush) | Standby disk flush | Data on standby disk; survives primary loss | Higher latency | Important transactions |
| remote_apply | Standby has applied | Data queryable on standby immediately after commit | Highest latency | Read-after-write consistency needed |
123456789101112131415161718
# postgresql.conf on Primary # Define synchronous standby policy# FIRST: wait for first N standbys in priority order# ANY: wait for any N from the list synchronous_standby_names = 'FIRST 1 (standby1, standby2)'# - Wait for standby1 (or standby2 if standby1 unavailable)# - standby2 becomes sync if standby1 disconnects # Alternative: require any 2 of 3 standbyssynchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)' # Combined with commit level:synchronous_commit = on # Wait for sync standbys to flush # Quorum-based (requires at least 2 standbys):synchronous_standby_names = 'FIRST 2 (standby1, standby2, standby3)'1234567891011121314
-- Override for specific transactions (less critical data)BEGIN;SET LOCAL synchronous_commit = off;INSERT INTO audit_log (event, timestamp) VALUES ('user_clicked_button', NOW());COMMIT; -- Returns immediately, doesn't wait for standby -- Override for critical transactions (even if default is async)BEGIN;SET LOCAL synchronous_commit = remote_apply;UPDATE accounts SET balance = balance - 1000 WHERE id = 123;COMMIT; -- Waits until standby has applied and can serve reads -- Check current settingSHOW synchronous_commit;If all synchronous standbys become unavailable and synchronous_commit = on, the primary will block all commits until at least one standby reconnects. This protects data but can cause complete application outage. Some architectures use synchronous_commit = local as fallback when standbys are unavailable, trading durability for availability in failure scenarios.
Logical replication, introduced in PostgreSQL 10, replicates changes at the row level using a publish/subscribe model. It's more flexible than physical replication but requires more configuration and has different characteristics.
Key Concepts:
123456789101112131415161718192021222324252627282930313233343536373839404142
-- === On Publisher (source database) === -- Ensure wal_level supports logical decoding-- postgresql.conf: wal_level = logical -- Create publication for specific tablesCREATE PUBLICATION sales_pub FOR TABLE orders, customers, products; -- Or publish all tablesCREATE PUBLICATION all_tables_pub FOR ALL TABLES; -- Add/remove tables from publicationALTER PUBLICATION sales_pub ADD TABLE order_items;ALTER PUBLICATION sales_pub DROP TABLE products; -- View publicationsSELECT pubname, puballtables, pubinsert, pubupdate, pubdelete FROM pg_publication; -- === On Subscriber (target database) === -- Tables must exist with compatible schema (create manually or pg_dump -s)CREATE TABLE orders (...);CREATE TABLE customers (...); -- Create subscription (starts initial sync automatically)CREATE SUBSCRIPTION sales_sub CONNECTION 'host=publisher_ip dbname=source user=replicator password=...' PUBLICATION sales_pub; -- View subscription statusSELECT subname, subenabled, subslotname FROM pg_subscription; -- View replication state per tableSELECT * FROM pg_subscription_rel; -- Check replication lagSELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lagFROM pg_replication_slotsWHERE slot_type = 'logical';Logical Replication Use Cases:
Logical replication does NOT replicate: DDL changes (schema changes must be applied manually), sequences (values don't sync automatically), large objects, truncate (prior to PostgreSQL 11). Tables must have a primary key or REPLICA IDENTITY for UPDATE/DELETE operations to work. Consider these limitations in your replication strategy.
PostgreSQL supports advanced replication topologies beyond simple primary-standby configurations:
Cascading Replication:
Standbys can themselves act as sources for other standbys, creating a replication chain. This reduces load on the primary and enables complex topologies for geographic distribution.
1234567891011121314151617181920212223
┌─────────────────┐ │ Primary │ │ (US-East) │ └────────┬────────┘ │ WAL Stream ┌────────────────┴────────────────┐ │ │ ┌────────▼────────┐ ┌────────▼────────┐ │ Standby #1 │ │ Standby #2 │ │ (US-East DC2) │ │ (EU-West) │ └────────┬────────┘ └────────┬────────┘ │ │ │ Cascade │ Cascade │ │ ┌────────▼────────┐ ┌────────▼────────┐ │ Standby #3 │ │ Standby #4 │ │ (US-West) │ │ (Asia-Pac) │ └─────────────────┘ └─────────────────┘ Benefits:- Standby #1/#2 handle cascade load, primary focuses on writes- Regional standbys serve local reads with low latency- Reduces cross-ocean bandwidth from primary12345678
# On Cascading Source Standby (Standby #1)# postgresql.conf additions:hot_standby = onmax_wal_senders = 5 # Allow this standby to send WAL # On Downstream Standby (Standby #3)# pg_basebackup from Standby #1, not Primary:pg_basebackup -h standby1_ip -D /var/lib/postgresql/data -U replicator -P -RDelayed Standbys:
A delayed standby intentionally applies WAL with a time delay. This provides a recovery point for human errors—if someone accidentally drops a table, you have a window to recover from the delayed standby before it applies the destructive command.
1234567891011121314151617181920
-- On Delayed Standby's postgresql.conf-- (or postgresql.auto.conf via ALTER SYSTEM)recovery_min_apply_delay = '1 hour' -- This standby will always be 1 hour behind the primary-- If disaster occurs, you have 1 hour to:-- 1. Stop replication-- 2. Promote the delayed standby -- 3. Extract the data before the destructive operation -- Common delay values:-- 15 minutes: Quick recovery for operational errors-- 1 hour: Buffer for detection and response-- 24 hours: Overnight protection for major changes -- Check delay on standbySELECT pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp(), NOW() - pg_last_xact_replay_timestamp() AS actual_delay;When recovering from a delayed standby: (1) Pause replication at the right point using pg_wal_replay_pause(), (2) Advance to just before the problematic transaction, (3) Promote the standby to recover data. This requires knowing approximately when the bad transaction occurred. Combine with good monitoring and alerting for effective disaster recovery.
Replication alone doesn't provide high availability—you need automated failover to promote a standby when the primary fails. Several tools and architectures enable this:
Key HA Components:
| Solution | Approach | Pros | Cons |
|---|---|---|---|
| Patroni + etcd/Consul | Cluster manager with consensus store | Industry standard, well-documented | Requires external consensus cluster |
| Stolon | Cloud-native, Kubernetes-focused | Great for K8s, automatic healing | Steeper learning curve for non-K8s |
| repmgr | Traditional cluster manager | Simple setup, SSH-based | Less robust failure detection |
| pg_auto_failover | Citus-backed, built-in monitor | Simple architecture, no external deps | Less flexible topology options |
| pgpool-II | Connection pooler with HA | Combined pooling and failover | Can be complex, legacy code concerns |
| Cloud Managed (RDS, Cloud SQL) | Provider-managed HA | Zero operational burden | Limited customization, vendor lock-in |
123456789101112131415161718192021222324252627282930313233343536373839404142434445
# patroni.yml - Patroni configuration for high availability scope: postgres-clusternamespace: /service/name: postgresql-node1 restapi: listen: 0.0.0.0:8008 connect_address: node1:8008 etcd3: hosts: etcd1:2379,etcd2:2379,etcd3:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 # 1MB - don't failover to far-behind replica postgresql: use_pg_rewind: true # Enable fast rejoining after failover parameters: max_connections: 200 shared_buffers: 2GB wal_level: replica hot_standby: on max_wal_senders: 10 synchronous_commit: on synchronous_standby_names: '*' # Require sync replica initdb: - encoding: UTF8 - data-checksums postgresql: listen: 0.0.0.0:5432 connect_address: node1:5432 data_dir: /var/lib/postgresql/data authentication: replication: username: replicator password: secret superuser: username: postgres password: secret1234567891011121314151617181920212223242526272829303132333435
┌──────────────────────────────────────────┐ │ Load Balancer (HAProxy) │ │ Reads port 5001 → all nodes │ │ Writes port 5000 → primary only │ └──────────────────┬───────────────────────┘ │ ┌─────────────────────────────┼─────────────────────────────┐ │ │ │ ▼ ▼ ▼┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ Node 1 │ │ Node 2 │ │ Node 3 ││ (Primary) │ │ (Replica) │ │ (Replica) ││ │ │ │ │ ││ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ ││ │ PostgreSQL │ │◄───WAL──┤ │ PostgreSQL │ │◄───WAL──┤ │ PostgreSQL │ ││ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ ││ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ ││ │ Patroni │◄├─────────┤►│ Patroni │◄├─────────┤►│ Patroni │ ││ └──────┬──────┘ │ │ └──────┬──────┘ │ │ └──────┬──────┘ │└────────┼────────┘ └────────┼────────┘ └────────┼────────┘ │ │ │ └───────────────────────────┼───────────────────────────┘ │ ┌────────────────▼────────────────┐ │ etcd Cluster (Consensus) │ │ Stores leader lock, config │ └─────────────────────────────────┘ Failover Process:1. Primary becomes unreachable2. Patroni on replicas detect via etcd leader lock expiry3. Most up-to-date replica acquires leader lock4. Patroni promotes that replica to primary5. Other replicas reconfigure to follow new primary6. HAProxy health checks detect change, route writes to new primaryWith well-tuned Patroni, failover typically completes in 10-30 seconds. Most of this time is failure detection (waiting for timeouts to confirm primary is truly down). Aggressive timeouts can speed failover but increase risk of unnecessary failovers from transient network issues. Balance carefully based on your availability requirements.
Effective replication requires attention to operational practices beyond initial setup:
1234567891011121314151617181920212223242526272829303132333435
-- Comprehensive replication health checkSELECT client_addr AS standby_ip, state, sync_state, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) AS send_lag, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, flush_lsn)) AS flush_lag, pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn)) AS replay_lag, CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN interval '0 seconds' ELSE now() - pg_last_xact_replay_timestamp() END AS time_lagFROM pg_stat_replication; -- Check replication slot healthSELECT slot_name, slot_type, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal, CASE WHEN active THEN 'Healthy' WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 1073741824 THEN 'CRITICAL: >1GB pending' ELSE 'Inactive' END AS statusFROM pg_replication_slots; -- WAL generation rate (for capacity planning)SELECT pg_size_pretty(sum(size)) AS wal_24h, pg_size_pretty(sum(size) / 24) AS wal_per_hourFROM pg_ls_waldir()WHERE modification > now() - interval '24 hours';The worst replication failure mode is split-brain: two nodes both accepting writes as primary. This usually happens when failover promotes a standby, but the old primary recovers and resumes accepting writes. Prevention requires proper fencing (STONITH—Shoot The Other Node In The Head) and consensus-based leader election. Never manually force a standby to primary without confirming the old primary is truly down.
We've explored PostgreSQL's comprehensive replication capabilities:
What's Next:
Now that we understand PostgreSQL's replication capabilities, the final page explores when to choose PostgreSQL—decision criteria, use case fit, and comparison with alternatives.
You now understand PostgreSQL's replication options in depth—from basic streaming replication to sophisticated HA architectures. These capabilities enable building systems that survive failures, scale reads, and recover from disasters. Next, we'll synthesize everything into guidance on when PostgreSQL is the right choice.