Database Management SystemsBackup Strategies

Point-in-Time Recovery

LevelAdvanced

Duration75 mins

TopicBackup Strategies

1 / 5

PITR Concept: The Foundations of Point-in-Time Recovery

The Time Machine for Your Database

Imagine you're the database administrator for a major financial institution. At 3:47 PM on a busy trading day, a critical bug in a newly deployed application causes a cascade of erroneous transactions that corrupt critical account balances. The bug is discovered at 4:15 PM, but by then, thousands of transactions—some legitimate, some erroneous—have been committed.

You have your nightly backup from midnight, but restoring it would mean losing 16 hours of legitimate transactions worth millions of dollars. What you need is the ability to restore your database to exactly 3:46 PM—one minute before the disaster began—preserving every valid transaction while eliminating the corruption.

This is the problem that Point-in-Time Recovery (PITR) solves.

PITR transforms the binary choice of "restore backup and lose everything after" into a surgical precision instrument that can navigate your database to any moment in its history. It is, in essence, a time machine for your data.

What You Will Learn

By the end of this page, you will understand the foundational concepts of Point-in-Time Recovery, including how it differs from traditional backup/restore, the architectural components that enable temporal navigation, the relationship between transaction logs and PITR capability, and why PITR has become essential for mission-critical database systems.

Understanding Point-in-Time Recovery

Point-in-Time Recovery (PITR) is a database recovery technique that enables restoration to a specific moment in the database's transaction history, rather than being limited to discrete backup snapshots. PITR combines the stability of periodic backups with the granularity of continuous transaction logging to achieve temporal precision that traditional backup strategies cannot match.

The Fundamental Insight

At its core, PITR rests on a profound insight: a database's state at any moment is the cumulative result of all transactions executed up to that moment. This means that if we have:

A known baseline state (a backup)
A complete record of all changes since that baseline (transaction logs)

We can reconstruct the database's state at any point between the backup and the present by replaying transactions up to the desired moment and stopping there.

This insight transforms transaction logs from mere recovery tools into a temporal navigation system—a complete historical record that allows us to travel to any point in the database's past.

The State Reconstruction Principle

If S₀ represents your database state at backup time, and T₁, T₂, ..., Tₙ represent all transactions committed after the backup, then the database state at any time t is: S(t) = S₀ + T₁ + T₂ + ... + Tₖ, where Tₖ is the last transaction committed before time t. PITR exploits this mathematical property of database state evolution.

The Three Pillars of PITR

PITR capability rests on three essential pillars, each of which must be properly implemented for the system to function:

1. Base Backups (Restoration Foundation)

A base backup provides the starting point for PITR. This is a complete, consistent snapshot of the database at a specific moment. Without this anchor point, there's no baseline from which to begin reconstruction.

The base backup must be:

Consistent: Representing a valid database state (not partially committed transactions)
Complete: Containing all data files and necessary metadata
Restorable: Capable of producing a functional database when loaded

2. Transaction Logs (Change History)

Transaction logs (also called Write-Ahead Logs, redo logs, or journal files depending on the database system) provide the complete record of all changes since the base backup. These logs must be:

Complete: No gaps in the sequence from backup to recovery target
Archived: Preserved beyond the normal log rotation cycle
Ordered: Maintaining strict temporal and logical ordering

3. Log Sequence Tracking (Temporal Correlation)

The system must maintain precise correlation between log records and physical time. This typically involves:

Log Sequence Numbers (LSNs) that provide total ordering
Timestamps embedded in log records
Checkpoint markers that correlate LSNs with specific database states

The Three Pillars of PITR and Their Roles
Pillar	What It Provides	Failure Impact	Key Requirements
Base Backup	Starting state for reconstruction	Cannot begin recovery; must use older backup	Consistency, completeness, tested restorability
Transaction Logs	Change history from backup to target	Recovery limited to last continuous log segment	Complete archival, no gaps, corruption protection
LSN/Timestamp Tracking	Temporal correlation; where to stop	Cannot target specific time; recovery imprecise	Accurate clocks, LSN integrity, checkpoint correlation

PITR Versus Traditional Backup/Restore

To fully appreciate PITR's significance, we must understand how it differs from traditional backup and restore approaches. The distinction goes beyond capability—it represents a fundamental shift in how we think about data protection.

Traditional Backup/Restore: The Snapshot Model

Traditional backup approaches create periodic snapshots of database state. Recovery means selecting the most recent backup that precedes the failure and restoring from that snapshot. This approach has several inherent limitations:

Limitations of Traditional Backup/Restore

•Discrete Recovery Points: You can only restore to specific backup times. If backups run daily at midnight, you can only recover to midnight states—never to 3:47 PM.
•Recovery Point Objective (RPO) Gap: The gap between failures and the last backup represents guaranteed data loss. Daily backups mean up to 24 hours of potential data loss.
•All-or-Nothing Recovery: You must accept the entire backup state. There's no way to exclude specific transactions or recover selectively.
•No Temporal Navigation: You cannot examine the database at different historical points to diagnose issues.
•Wasted Valid Work: All legitimate transactions after the last backup are lost, even if only specific transactions caused problems.

PITR: The Continuous Model

PITR transcends these limitations by treating recovery as navigation through a continuous timeline rather than selection among discrete snapshots. The difference is illustrated in the recovery options:

Traditional Backup Recovery

•Problem discovered at 4:15 PM
•Last backup: midnight
•Recovery option: midnight state only
•Data loss: 16 hours, 15 minutes
•Impact: Thousands of valid transactions lost
•RPO: Worst case = backup interval

PITR-Enabled Recovery

•Problem discovered at 4:15 PM
•Problem started at 3:47 PM
•Recovery target: 3:46 PM
•Data loss: approximately 1 minute
•Impact: Only transactions after 3:46 PM affected
•RPO: Worst case = last log flush

The Granularity Advantage

PITR transforms RPO from 'backup interval' (hours or days) to 'log flush interval' (seconds or less). For many critical systems, this difference represents millions of dollars in protected transactions and operational continuity.

The Conceptual Shift

The shift from traditional backup to PITR represents a change in mental model:

Traditional Model: Database state exists as a series of discrete photographs taken at backup times. Recovery means selecting the right photograph.

PITR Model: Database state exists as a continuous film, with every frame preserved. Recovery means seeking to the right frame.

This shift has profound implications:

Forensic Capability: PITR enables examination of the database at multiple historical points, facilitating root cause analysis
Surgical Recovery: Instead of wholesale restoration, administrators can make informed decisions about exactly when to stop recovery
Reduced Recovery Complexity: Paradoxically, PITR simplifies many recovery scenarios by providing more options rather than forcing all-or-nothing choices
Compliance Support: Many regulatory frameworks require the ability to reconstruct database state at specific times—something only PITR can reliably provide

The Architecture of Point-in-Time Recovery

PITR capability doesn't emerge from a single feature—it requires a carefully orchestrated architecture with multiple interacting components. Understanding this architecture is essential for implementing, operating, and troubleshooting PITR systems.

Architectural Components

A complete PITR implementation consists of several critical components, each with specific responsibilities:

Core PITR Components

•Write-Ahead Log (WAL) Subsystem: The engine that creates the transaction log entries. This must be configured for durability (synchronous commits) and completeness (full logging rather than minimal logging).
•Log Archive Process: A mechanism that preserves log segments after they would normally be recycled. This typically involves copying completed log segments to archive storage.
•Base Backup Infrastructure: The ability to create consistent database snapshots while recording the LSN at which the backup was taken.
•Archive Catalog: A metadata store tracking available base backups, archived log segments, and the LSN ranges each covers.
•Recovery Manager: The component that orchestrates the actual recovery, loading the base backup and replaying logs to the target point.
•Time-to-LSN Translation: The mapping between wall-clock time and log sequence numbers, enabling time-based recovery targets.

The Log Sequence Number (LSN)

The Log Sequence Number (LSN) is the linchpin of PITR architecture. An LSN is a unique, monotonically increasing identifier assigned to each log record. LSNs provide:

Total Ordering: Every database change has a definitive position in the log sequence
Recovery Targeting: PITR can specify "recover to LSN X" for precise positioning
Progress Tracking: The recovery process knows exactly where it is in the log stream
Consistency Verification: LSNs in data pages can be compared to logs to verify synchronization

In most implementations, the LSN encodes both:

Log file identifier: Which log file contains this record
Offset: The byte position within that file

For example, in PostgreSQL, an LSN like 1/4A3B7C00 indicates file 1, offset 0x4A3B7C00.

LSN Is a Logical Clock

The LSN functions as a logical clock for the database, providing Lamport-style ordering of events. Every action that modifies the database advances the LSN, creating an unambiguous timeline of all changes.

The Base Backup: Capturing the Starting Point

A PITR-compatible base backup differs from a simple file copy in crucial ways:

1. LSN Recording The backup process must record the LSN at which it started and completed. This creates the correlation:

Start LSN: First log record that might contain changes since backup began
End LSN: Backup is consistent as of this point

2. Consistency Without Blocking Modern databases take base backups without stopping operations. This is achieved through:

Snapshot isolation (the backup sees a consistent point even as changes occur)
Tracking which data pages were modified during backup
Recording checkpoint position for recovery start point

3. Metadata Inclusion The backup must include:

Database configuration parameters affecting recovery
Tablespace mapping information
The backup label with LSN correlation

4. Integrity Verification PITR base backups should include checksums or other verification allowing detection of corrupted backups before a recovery emergency.

backup_manifest.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "backup_type": "full_base_backup",
  "database_version": "15.4",
  "backup_start_time": "2024-01-15T00:00:05.283Z",
  "backup_end_time": "2024-01-15T00:47:22.891Z",
  "start_lsn": "2/1A000028",
  "end_lsn": "2/1A3F7B60",
  "checkpoint_lsn": "2/1A000028",
  "wal_start_segment": "000000010000000200000001A",
  "required_wal_segments": [
    "000000010000000200000001A",
    "000000010000000200000001B",
    "000000010000000200000001C"
  ],
  "tablespace_map": {
    "16384": "/data/pg_default",
    "16385": "/data/pg_global"
  },
  "backup_size_bytes": 157286400,
  "backup_checksum": "sha256:e3b0c44298fc1c14...",
  "compatible_for_pitr": true
}

Log Archival: Building the Temporal Bridge

The gap between base backups and recovery targets is bridged by archived transaction logs. Log archival is the process of preserving log segments beyond their normal lifecycle, creating the continuous record that enables PITR.

The Log Lifecycle Without Archival

In a database without PITR, transaction logs follow a simple lifecycle:

Creation: New log records written as transactions execute
Active Use: Log records referenced for crash recovery and replication
Checkpoint: Logs before checkpoint marked as reusable
Recycling: Old log segments overwritten with new content

This recycling is essential—without it, logs would grow unbounded. But recycling destroys the historical record needed for PITR.

The Log Lifecycle With Archival

Log archival interrupts this lifecycle by copying segments to archive storage before recycling:

Creation: Log records written as before
Active Use: Normal database operations
Archive Trigger: Segment fills up or time threshold reached
Archive Copy: Complete segment copied to durable archive storage
Archive Confirmation: System confirms successful archival
Recycling: Only after confirmed archival is segment recycled

The critical invariant: No log segment is recycled until successfully archived.

The Archive Gap Catastrophe

If archiving fails and the database continues operating, logs will eventually be recycled to prevent disk exhaustion. This creates an 'archive gap'—a period with no log coverage—which permanently limits PITR capability. You can only recover up to the last continuous log segment before the gap.

Archive Storage Considerations

The choice of archive storage profoundly affects PITR reliability and recovery speed:

Local File System

Simplest implementation
Fastest archive and retrieval
Single point of failure (same disk failure affects database and archives)
Inadequate for true disaster recovery

Network Attached Storage (NAS)

Isolates archives from database host failures
Shared infrastructure enables centralized management
Network latency affects archive speed
May still share failure domains (same data center)

Object Storage (S3, Azure Blob, GCS)

Highest durability (11 nines typical)
Geographic redundancy options
Higher latency than local storage
Requires network connectivity for both archive and restore
Compression and tiering options for cost management

Tape/Cold Storage

Lowest cost per gigabyte for long-term retention
Very high retrieval latency (minutes to hours)
Appropriate for compliance archives, not operational recovery
Requires separate hot tier for recent logs

Archive Storage Trade-offs
Storage Type	Durability	Retrieval Speed	Cost	Best For
Local Filesystem	Low (single disk)	Fastest (<1ms)	Medium	Development, basic DR
NAS/SAN	Medium (RAID)	Fast (1-10ms)	Medium-High	Operational recovery
Object Storage	Very High (11 nines)	Moderate (100ms-1s)	Low	Primary archive tier
Cold/Tape	Very High	Slow (minutes-hours)	Very Low	Long-term compliance

Continuous vs. Segment-Based Archival

Databases offer two approaches to WAL archival:

Segment-Based Archival The traditional approach where complete log segments are archived as units:

Pro: Simpler implementation, atomic units
Pro: Natural fit for object storage (complete files)
Con: RPO limited to segment size (typically 16-256MB worth of transactions)

Continuous (Streaming) Archival Log records streamed continuously to archive storage:

Pro: Near-zero RPO (seconds of data at risk)
Pro: No dependency on segment rotation
Con: More complex implementation
Con: Requires reliable streaming infrastructure

Modern enterprise deployments often combine both: streaming archival for minimal RPO, with segment-based archival as a backup mechanism.

Time-to-LSN Correlation: Navigating the Timeline

The ability to recover to a specific time (rather than a specific LSN) is what makes PITR practically useful. Administrators think in terms of "restore to 3:46 PM before the incident" rather than "restore to LSN 5/2A7C3400." This requires accurate correlation between wall-clock time and log positions.

How Time-LSN Mapping Works

Every transaction commit records a commit timestamp alongside its LSN. This creates a mapping:

LSN         | Commit Timestamp        | Transaction
------------|-------------------------|------------
5/2A000100  | 2024-01-15 15:45:22.103 | T1
5/2A000228  | 2024-01-15 15:45:22.847 | T2
5/2A000390  | 2024-01-15 15:45:23.201 | T3
5/2A000510  | 2024-01-15 15:45:24.592 | T4
...

When you specify a recovery target like "2024-01-15 15:45:23," the recovery process:

Begins replaying from the base backup
Applies each log record, checking commit timestamps
Stops when it encounters the first transaction committed after the target time
That transaction and all following are not applied

The result is the database state as it existed at the target time—all transactions committed before that moment are present, and none committed after.

Commit Time vs. Start Time

PITR uses transaction commit timestamps, not start timestamps. A transaction that started at 15:44 but committed at 15:46 will be excluded if the recovery target is 15:45. This matches the database's durability guarantees—a transaction isn't durable until committed.

The Clock Synchronization Challenge

Time-based recovery introduces a hidden dependency: clock accuracy. The timestamps in log records reflect the database server's clock at commit time. If that clock is wrong, the mapping between "real" time and LSN is distorted.

Consider this scenario:

Database server clock runs 5 minutes slow
Incident occurs at 15:45 "real" time
Server records the incident at 15:40 server time
Administrator requests recovery to 15:44 to be "one minute before the incident"
Recovery actually goes to 15:44 server time = 15:49 "real" time
Corrupt transactions are included in recovery!

Mitigation strategies:

NTP Synchronization: All database servers synchronized to reliable time sources
Clock Monitoring: Alerting on clock skew exceeding acceptable thresholds
LSN-Based Recovery: For critical situations, use LSN targets determined by log inspection
Pre-Recovery Investigation: Examine log timestamps near the incident time before committing to a target

Transaction Boundary Considerations

PITR recovery targets the boundary between transactions, not arbitrary points within transactions. This is a fundamental constraint, not a limitation:

Why Transaction Boundaries Matter:

Transactions are atomic—partial transactions would violate ACID properties
Durability guarantees apply only to committed transactions
Rollback state doesn't exist in the log (uncommitted work isn't recoverable)

Practical Implications:

Recovery will "snap" to the nearest transaction boundary before the target
Long-running transactions create large "gaps" in recovery precision
Systems with many short transactions have finer-grained recovery options

Example:

Target Time: 15:45:30.000

Transaction Commits:
T1 committed at 15:45:29.847    <- Last transaction BEFORE target
T2 committed at 15:45:30.291    <- First transaction AFTER target

Recovery Result: Database state includes T1, does not include T2
Actual recovery time: 15:45:29.847 (snapped to T1 commit)

Recovery Target Modes

Modern PITR implementations offer multiple ways to specify recovery targets, each suited to different scenarios:

Recovery Target Time

The most intuitive mode—specify a wall-clock timestamp:

-- PostgreSQL example
recovery_target_time = '2024-01-15 15:45:30 America/New_York'

Best for: Incidents with known approximate times, compliance requirements Precision: Transaction boundary nearest to specified time Requirements: Accurate server clocks, known timezone

Recovery Target LSN

Specify an exact position in the log sequence:

recovery_target_lsn = '5/2A7C3400'

Best for: Precise recovery after log inspection, repeatable recovery procedures Precision: Exact transaction boundary at specified LSN Requirements: Prior log analysis to determine target LSN

Recovery Target Transaction ID

Specify a transaction ID (XID) as the stopping point:

recovery_target_xid = '12847294'

Best for: Recovering to include/exclude specific known transactions Precision: Exact transaction boundary Requirements: Knowledge of specific transaction identifiers

Recovery Target Named Point

Recover to a named restore point created by the application:

-- Creating a restore point in PostgreSQL
SELECT pg_create_restore_point('before_migration_v5.2');

-- Recovering to that point
recovery_target_name = 'before_migration_v5.2'

Best for: Planned procedures with intentional checkpoints, release deployments Precision: Exact point where restore point was created Requirements: Proactive restore point creation before significant operations

Recovery Target Modes Comparison
Mode	Target Specification	Precision	Use Case
Time	Timestamp with timezone	Transaction boundary	Known incident times
LSN	Log sequence number	Exact boundary	Post-analysis recovery
XID	Transaction identifier	Exact transaction	Specific transaction
Named Point	Restore point name	Exact point	Planned procedures

Recovery Target Inclusive Options

Most databases offer an 'inclusive' option for recovery targets. With inclusive=true, recovery includes the transaction at the target; with inclusive=false (default), it stops just before. This is crucial when you want to include a specific transaction's effects.

PITR in Context: Related Technologies

PITR exists within a broader ecosystem of database protection technologies. Understanding the relationships and distinctions helps administrators select appropriate tools for different scenarios.

PITR vs. Flashback/Temporal Tables

Flashback (Oracle) and Temporal Tables (SQL:2011, SQL Server, PostgreSQL) provide row-level time travel within the running database:

Capability: Query historical states without recovery
Granularity: Row-level, table-level
Overhead: Requires undo retention (storage cost)
Use Case: Application-level auditing, data correction

PITR:

Capability: Restore entire database to historical state
Granularity: Whole database
Overhead: Log archival storage
Use Case: Disaster recovery, major incident recovery

Key Distinction: Flashback is for querying history; PITR is for restoring to history.

PITR vs. Logical Replication

Logical Replication streams logical changes (row changes) to replicas:

Purpose: Data distribution, migration, heterogeneous replication
Recovery Capability: Limited—replicas track master, no arbitrary time targeting
Data Preserved: Logical changes only

PITR:

Purpose: Point-in-time restoration capability
Recovery Capability: Arbitrary time targeting within archive retention
Data Preserved: Physical changes (complete database)

Key Distinction: Replication distributes current state; PITR preserves historical states.

PITR vs. Continuous Data Protection (CDP)

CDP (often a storage-layer feature) provides continuous capture of all I/O:

Granularity: Block-level, continuous
Database Awareness: None (treats database as opaque files)
Consistency: May capture inconsistent states
Use Case: Near-zero RPO for entire systems

PITR:

Granularity: Transaction-level
Database Awareness: Full (understands transaction boundaries)
Consistency: Always produces consistent database states
Use Case: Consistent database recovery

Key Distinction: CDP captures physical blocks; PITR ensures logical consistency.

When to Use Each Technology

•PITR: Major incident recovery, disaster recovery, compliance point-in-time snapshots
•Flashback/Temporal: Application data auditing, user error correction, historical queries
•Logical Replication: Data distribution, zero-downtime migrations, reporting replicas
•CDP: Near-zero RPO for entire systems, especially non-database workloads

Summary: The PITR Foundation

We've established the foundational understanding of Point-in-Time Recovery. Let's consolidate the essential concepts:

Key Takeaways

•PITR enables temporal navigation — Unlike discrete backup snapshots, PITR allows recovery to any moment within the archive retention window.
•Three pillars support PITR — Base backups provide the starting state, archived logs provide the change history, and LSN tracking provides temporal correlation.
•Log archival is the bridge — The continuous preservation of transaction logs connects backups to recovery targets, enabling fine-grained recovery.
•Time-to-LSN mapping enables practical use — Converting wall-clock times to log positions makes PITR accessible to operators thinking in terms of 'when' not 'where in the log.'
•Multiple target modes exist — Time, LSN, XID, and named restore points offer different precision and use case tradeoffs.
•PITR is transaction-aware — Recovery always produces consistent, complete transaction states, never partial transactions.
•PITR complements other technologies — It serves a distinct role from flashback queries, replication, and block-level protection.

What's next:

Now that we understand the conceptual foundations of PITR, we'll examine Log Archiving in depth. The next page explores the mechanics of preserving transaction logs, archive storage strategies, monitoring archival health, and the critical operational practices that ensure PITR capability is maintained through the routine operation of the database system.

Page Complete

You now understand the foundational concepts of Point-in-Time Recovery. You've learned what PITR is, how it differs from traditional backup/restore, the architectural components that enable it, and the role of log sequence numbers in temporal navigation. Next, we'll explore the critical process of log archiving that makes PITR possible.

1 / 5

Loading learning content...

Database Management SystemsBackup Strategies

Point-in-Time Recovery

LevelAdvanced

Duration75 mins

TopicBackup Strategies

1 / 5

PITR Concept: The Foundations of Point-in-Time Recovery

The Time Machine for Your Database

This is the problem that Point-in-Time Recovery (PITR) solves.

What You Will Learn

Understanding Point-in-Time Recovery

The Fundamental Insight

At its core, PITR rests on a profound insight: a database's state at any moment is the cumulative result of all transactions executed up to that moment. This means that if we have:

A known baseline state (a backup)
A complete record of all changes since that baseline (transaction logs)

We can reconstruct the database's state at any point between the backup and the present by replaying transactions up to the desired moment and stopping there.

This insight transforms transaction logs from mere recovery tools into a temporal navigation system—a complete historical record that allows us to travel to any point in the database's past.

The State Reconstruction Principle

The Three Pillars of PITR

PITR capability rests on three essential pillars, each of which must be properly implemented for the system to function:

1. Base Backups (Restoration Foundation)

The base backup must be:

Consistent: Representing a valid database state (not partially committed transactions)
Complete: Containing all data files and necessary metadata
Restorable: Capable of producing a functional database when loaded

2. Transaction Logs (Change History)

Transaction logs (also called Write-Ahead Logs, redo logs, or journal files depending on the database system) provide the complete record of all changes since the base backup. These logs must be:

Complete: No gaps in the sequence from backup to recovery target
Archived: Preserved beyond the normal log rotation cycle
Ordered: Maintaining strict temporal and logical ordering

3. Log Sequence Tracking (Temporal Correlation)

The system must maintain precise correlation between log records and physical time. This typically involves:

Log Sequence Numbers (LSNs) that provide total ordering
Timestamps embedded in log records
Checkpoint markers that correlate LSNs with specific database states

The Three Pillars of PITR and Their Roles
Pillar	What It Provides	Failure Impact	Key Requirements
Base Backup	Starting state for reconstruction	Cannot begin recovery; must use older backup	Consistency, completeness, tested restorability
Transaction Logs	Change history from backup to target	Recovery limited to last continuous log segment	Complete archival, no gaps, corruption protection
LSN/Timestamp Tracking	Temporal correlation; where to stop	Cannot target specific time; recovery imprecise	Accurate clocks, LSN integrity, checkpoint correlation

PITR Versus Traditional Backup/Restore

Traditional Backup/Restore: The Snapshot Model

Limitations of Traditional Backup/Restore

•Discrete Recovery Points: You can only restore to specific backup times. If backups run daily at midnight, you can only recover to midnight states—never to 3:47 PM.
•Recovery Point Objective (RPO) Gap: The gap between failures and the last backup represents guaranteed data loss. Daily backups mean up to 24 hours of potential data loss.
•All-or-Nothing Recovery: You must accept the entire backup state. There's no way to exclude specific transactions or recover selectively.
•No Temporal Navigation: You cannot examine the database at different historical points to diagnose issues.
•Wasted Valid Work: All legitimate transactions after the last backup are lost, even if only specific transactions caused problems.

PITR: The Continuous Model

Traditional Backup Recovery

•Problem discovered at 4:15 PM
•Last backup: midnight
•Recovery option: midnight state only
•Data loss: 16 hours, 15 minutes
•Impact: Thousands of valid transactions lost
•RPO: Worst case = backup interval

PITR-Enabled Recovery

•Problem discovered at 4:15 PM
•Problem started at 3:47 PM
•Recovery target: 3:46 PM
•Data loss: approximately 1 minute
•Impact: Only transactions after 3:46 PM affected
•RPO: Worst case = last log flush

The Granularity Advantage

The Conceptual Shift

The shift from traditional backup to PITR represents a change in mental model:

Traditional Model: Database state exists as a series of discrete photographs taken at backup times. Recovery means selecting the right photograph.

PITR Model: Database state exists as a continuous film, with every frame preserved. Recovery means seeking to the right frame.

This shift has profound implications:

Forensic Capability: PITR enables examination of the database at multiple historical points, facilitating root cause analysis
Surgical Recovery: Instead of wholesale restoration, administrators can make informed decisions about exactly when to stop recovery
Reduced Recovery Complexity: Paradoxically, PITR simplifies many recovery scenarios by providing more options rather than forcing all-or-nothing choices
Compliance Support: Many regulatory frameworks require the ability to reconstruct database state at specific times—something only PITR can reliably provide

The Architecture of Point-in-Time Recovery

Architectural Components

A complete PITR implementation consists of several critical components, each with specific responsibilities:

Core PITR Components

•Write-Ahead Log (WAL) Subsystem: The engine that creates the transaction log entries. This must be configured for durability (synchronous commits) and completeness (full logging rather than minimal logging).
•Log Archive Process: A mechanism that preserves log segments after they would normally be recycled. This typically involves copying completed log segments to archive storage.
•Base Backup Infrastructure: The ability to create consistent database snapshots while recording the LSN at which the backup was taken.
•Archive Catalog: A metadata store tracking available base backups, archived log segments, and the LSN ranges each covers.
•Recovery Manager: The component that orchestrates the actual recovery, loading the base backup and replaying logs to the target point.
•Time-to-LSN Translation: The mapping between wall-clock time and log sequence numbers, enabling time-based recovery targets.

The Log Sequence Number (LSN)

The Log Sequence Number (LSN) is the linchpin of PITR architecture. An LSN is a unique, monotonically increasing identifier assigned to each log record. LSNs provide:

Total Ordering: Every database change has a definitive position in the log sequence
Recovery Targeting: PITR can specify "recover to LSN X" for precise positioning
Progress Tracking: The recovery process knows exactly where it is in the log stream
Consistency Verification: LSNs in data pages can be compared to logs to verify synchronization

In most implementations, the LSN encodes both:

Log file identifier: Which log file contains this record
Offset: The byte position within that file

For example, in PostgreSQL, an LSN like 1/4A3B7C00 indicates file 1, offset 0x4A3B7C00.

LSN Is a Logical Clock

The Base Backup: Capturing the Starting Point

A PITR-compatible base backup differs from a simple file copy in crucial ways:

1. LSN Recording The backup process must record the LSN at which it started and completed. This creates the correlation:

Start LSN: First log record that might contain changes since backup began
End LSN: Backup is consistent as of this point

2. Consistency Without Blocking Modern databases take base backups without stopping operations. This is achieved through:

Snapshot isolation (the backup sees a consistent point even as changes occur)
Tracking which data pages were modified during backup
Recording checkpoint position for recovery start point

3. Metadata Inclusion The backup must include:

Database configuration parameters affecting recovery
Tablespace mapping information
The backup label with LSN correlation

4. Integrity Verification PITR base backups should include checksums or other verification allowing detection of corrupted backups before a recovery emergency.

backup_manifest.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "backup_type": "full_base_backup",
  "database_version": "15.4",
  "backup_start_time": "2024-01-15T00:00:05.283Z",
  "backup_end_time": "2024-01-15T00:47:22.891Z",
  "start_lsn": "2/1A000028",
  "end_lsn": "2/1A3F7B60",
  "checkpoint_lsn": "2/1A000028",
  "wal_start_segment": "000000010000000200000001A",
  "required_wal_segments": [
    "000000010000000200000001A",
    "000000010000000200000001B",
    "000000010000000200000001C"
  ],
  "tablespace_map": {
    "16384": "/data/pg_default",
    "16385": "/data/pg_global"
  },
  "backup_size_bytes": 157286400,
  "backup_checksum": "sha256:e3b0c44298fc1c14...",
  "compatible_for_pitr": true
}

Log Archival: Building the Temporal Bridge

The Log Lifecycle Without Archival

In a database without PITR, transaction logs follow a simple lifecycle:

Creation: New log records written as transactions execute
Active Use: Log records referenced for crash recovery and replication
Checkpoint: Logs before checkpoint marked as reusable
Recycling: Old log segments overwritten with new content

This recycling is essential—without it, logs would grow unbounded. But recycling destroys the historical record needed for PITR.

The Log Lifecycle With Archival

Log archival interrupts this lifecycle by copying segments to archive storage before recycling:

Creation: Log records written as before
Active Use: Normal database operations
Archive Trigger: Segment fills up or time threshold reached
Archive Copy: Complete segment copied to durable archive storage
Archive Confirmation: System confirms successful archival
Recycling: Only after confirmed archival is segment recycled

The critical invariant: No log segment is recycled until successfully archived.

The Archive Gap Catastrophe

Archive Storage Considerations

The choice of archive storage profoundly affects PITR reliability and recovery speed:

Local File System

Simplest implementation
Fastest archive and retrieval
Single point of failure (same disk failure affects database and archives)
Inadequate for true disaster recovery

Network Attached Storage (NAS)

Isolates archives from database host failures
Shared infrastructure enables centralized management
Network latency affects archive speed
May still share failure domains (same data center)

Object Storage (S3, Azure Blob, GCS)

Highest durability (11 nines typical)
Geographic redundancy options
Higher latency than local storage
Requires network connectivity for both archive and restore
Compression and tiering options for cost management

Tape/Cold Storage

Lowest cost per gigabyte for long-term retention
Very high retrieval latency (minutes to hours)
Appropriate for compliance archives, not operational recovery
Requires separate hot tier for recent logs

Archive Storage Trade-offs
Storage Type	Durability	Retrieval Speed	Cost	Best For
Local Filesystem	Low (single disk)	Fastest (<1ms)	Medium	Development, basic DR
NAS/SAN	Medium (RAID)	Fast (1-10ms)	Medium-High	Operational recovery
Object Storage	Very High (11 nines)	Moderate (100ms-1s)	Low	Primary archive tier
Cold/Tape	Very High	Slow (minutes-hours)	Very Low	Long-term compliance

Continuous vs. Segment-Based Archival

Databases offer two approaches to WAL archival:

Segment-Based Archival The traditional approach where complete log segments are archived as units:

Pro: Simpler implementation, atomic units
Pro: Natural fit for object storage (complete files)
Con: RPO limited to segment size (typically 16-256MB worth of transactions)

Continuous (Streaming) Archival Log records streamed continuously to archive storage:

Pro: Near-zero RPO (seconds of data at risk)
Pro: No dependency on segment rotation
Con: More complex implementation
Con: Requires reliable streaming infrastructure

Modern enterprise deployments often combine both: streaming archival for minimal RPO, with segment-based archival as a backup mechanism.

Time-to-LSN Correlation: Navigating the Timeline

How Time-LSN Mapping Works

Every transaction commit records a commit timestamp alongside its LSN. This creates a mapping:

LSN         | Commit Timestamp        | Transaction
------------|-------------------------|------------
5/2A000100  | 2024-01-15 15:45:22.103 | T1
5/2A000228  | 2024-01-15 15:45:22.847 | T2
5/2A000390  | 2024-01-15 15:45:23.201 | T3
5/2A000510  | 2024-01-15 15:45:24.592 | T4
...

When you specify a recovery target like "2024-01-15 15:45:23," the recovery process:

Begins replaying from the base backup
Applies each log record, checking commit timestamps
Stops when it encounters the first transaction committed after the target time
That transaction and all following are not applied

The result is the database state as it existed at the target time—all transactions committed before that moment are present, and none committed after.

Commit Time vs. Start Time

The Clock Synchronization Challenge

Consider this scenario:

Database server clock runs 5 minutes slow
Incident occurs at 15:45 "real" time
Server records the incident at 15:40 server time
Administrator requests recovery to 15:44 to be "one minute before the incident"
Recovery actually goes to 15:44 server time = 15:49 "real" time
Corrupt transactions are included in recovery!

Mitigation strategies:

NTP Synchronization: All database servers synchronized to reliable time sources
Clock Monitoring: Alerting on clock skew exceeding acceptable thresholds
LSN-Based Recovery: For critical situations, use LSN targets determined by log inspection
Pre-Recovery Investigation: Examine log timestamps near the incident time before committing to a target

Transaction Boundary Considerations

PITR recovery targets the boundary between transactions, not arbitrary points within transactions. This is a fundamental constraint, not a limitation:

Why Transaction Boundaries Matter:

Transactions are atomic—partial transactions would violate ACID properties
Durability guarantees apply only to committed transactions
Rollback state doesn't exist in the log (uncommitted work isn't recoverable)

Practical Implications:

Recovery will "snap" to the nearest transaction boundary before the target
Long-running transactions create large "gaps" in recovery precision
Systems with many short transactions have finer-grained recovery options

Example:

Target Time: 15:45:30.000

Transaction Commits:
T1 committed at 15:45:29.847    <- Last transaction BEFORE target
T2 committed at 15:45:30.291    <- First transaction AFTER target

Recovery Result: Database state includes T1, does not include T2
Actual recovery time: 15:45:29.847 (snapped to T1 commit)

Recovery Target Modes

Modern PITR implementations offer multiple ways to specify recovery targets, each suited to different scenarios:

Recovery Target Time

The most intuitive mode—specify a wall-clock timestamp:

-- PostgreSQL example
recovery_target_time = '2024-01-15 15:45:30 America/New_York'

Best for: Incidents with known approximate times, compliance requirements Precision: Transaction boundary nearest to specified time Requirements: Accurate server clocks, known timezone

Recovery Target LSN

Specify an exact position in the log sequence:

recovery_target_lsn = '5/2A7C3400'

Recovery Target Transaction ID

Specify a transaction ID (XID) as the stopping point:

recovery_target_xid = '12847294'

Best for: Recovering to include/exclude specific known transactions Precision: Exact transaction boundary Requirements: Knowledge of specific transaction identifiers

Recovery Target Named Point

Recover to a named restore point created by the application:

-- Creating a restore point in PostgreSQL
SELECT pg_create_restore_point('before_migration_v5.2');

-- Recovering to that point
recovery_target_name = 'before_migration_v5.2'

Recovery Target Modes Comparison
Mode	Target Specification	Precision	Use Case
Time	Timestamp with timezone	Transaction boundary	Known incident times
LSN	Log sequence number	Exact boundary	Post-analysis recovery
XID	Transaction identifier	Exact transaction	Specific transaction
Named Point	Restore point name	Exact point	Planned procedures

Recovery Target Inclusive Options

PITR in Context: Related Technologies

PITR exists within a broader ecosystem of database protection technologies. Understanding the relationships and distinctions helps administrators select appropriate tools for different scenarios.

PITR vs. Flashback/Temporal Tables

Flashback (Oracle) and Temporal Tables (SQL:2011, SQL Server, PostgreSQL) provide row-level time travel within the running database:

Capability: Query historical states without recovery
Granularity: Row-level, table-level
Overhead: Requires undo retention (storage cost)
Use Case: Application-level auditing, data correction

PITR:

Capability: Restore entire database to historical state
Granularity: Whole database
Overhead: Log archival storage
Use Case: Disaster recovery, major incident recovery

Key Distinction: Flashback is for querying history; PITR is for restoring to history.

PITR vs. Logical Replication

Logical Replication streams logical changes (row changes) to replicas:

Purpose: Data distribution, migration, heterogeneous replication
Recovery Capability: Limited—replicas track master, no arbitrary time targeting
Data Preserved: Logical changes only

PITR:

Purpose: Point-in-time restoration capability
Recovery Capability: Arbitrary time targeting within archive retention
Data Preserved: Physical changes (complete database)

Key Distinction: Replication distributes current state; PITR preserves historical states.

PITR vs. Continuous Data Protection (CDP)

CDP (often a storage-layer feature) provides continuous capture of all I/O:

Granularity: Block-level, continuous
Database Awareness: None (treats database as opaque files)
Consistency: May capture inconsistent states
Use Case: Near-zero RPO for entire systems

PITR:

Granularity: Transaction-level
Database Awareness: Full (understands transaction boundaries)
Consistency: Always produces consistent database states
Use Case: Consistent database recovery

Key Distinction: CDP captures physical blocks; PITR ensures logical consistency.

When to Use Each Technology

•PITR: Major incident recovery, disaster recovery, compliance point-in-time snapshots
•Flashback/Temporal: Application data auditing, user error correction, historical queries
•Logical Replication: Data distribution, zero-downtime migrations, reporting replicas
•CDP: Near-zero RPO for entire systems, especially non-database workloads

Summary: The PITR Foundation

We've established the foundational understanding of Point-in-Time Recovery. Let's consolidate the essential concepts:

Key Takeaways

•PITR enables temporal navigation — Unlike discrete backup snapshots, PITR allows recovery to any moment within the archive retention window.
•Three pillars support PITR — Base backups provide the starting state, archived logs provide the change history, and LSN tracking provides temporal correlation.
•Log archival is the bridge — The continuous preservation of transaction logs connects backups to recovery targets, enabling fine-grained recovery.
•Time-to-LSN mapping enables practical use — Converting wall-clock times to log positions makes PITR accessible to operators thinking in terms of 'when' not 'where in the log.'
•Multiple target modes exist — Time, LSN, XID, and named restore points offer different precision and use case tradeoffs.
•PITR is transaction-aware — Recovery always produces consistent, complete transaction states, never partial transactions.
•PITR complements other technologies — It serves a distinct role from flashback queries, replication, and block-level protection.

What's next:

Page Complete

1 / 5