Loading learning content...
The batch job that corrupted your database ran from 14:31:47.234 to 14:31:48.891. In that 1.657 seconds, it executed 47 transactions that left your data in an inconsistent state. But the job itself was triggered by a legitimate operation at 14:31:47.119—which you need to preserve.
Your recovery target must be between 14:31:47.119 (include) and 14:31:47.234 (exclude).
That's a window of 115 milliseconds.
This scenario illustrates why target time specification is both an art and a science. It requires understanding not just the technical mechanisms of recovery targeting, but also the investigation techniques to identify the precise moment that separates "good" from "bad" in your database's history.
This page explores the full depth of recovery target specification: the different modes available, the critical considerations around time zones and transaction boundaries, and the investigative techniques that help you identify the optimal recovery point.
By the end of this page, you will understand how to specify recovery targets with precision, navigate timezone complexities, work with transaction boundaries, use alternative target modes (LSN, XID, named restore points), and investigate incidents to determine the optimal recovery moment.
A recovery target defines the point in the database's transaction history where replay should stop. Understanding the fundamental mechanics ensures accurate target specification.
When you specify a recovery target, the database recovery process:
The critical insight: targets are evaluated against transaction commit records, not individual operations.
This means:
Most databases offer control over whether the target transaction is included:
Exclusive (Default in PostgreSQL)
recovery_target_inclusive = false
Inclusive
recovery_target_inclusive = true
Example:
Target: LSN 5/2A000400
WAL Stream:
LSN 5/29FFF800 - Transaction T1 COMMIT
LSN 5/2A000400 - Transaction T2 COMMIT <- Target LSN
LSN 5/2A000800 - Transaction T3 COMMIT
With inclusive=false: Recovered state includes T1, excludes T2, T3
With inclusive=true: Recovered state includes T1, T2, excludes T3
When recovering from a known bad transaction, always use exclusive mode (the default). This ensures the problematic transaction isn't applied. Only use inclusive mode when you specifically need to include a particular transaction's effects.
What should happen when the target is reached?
pause (Recommended for Production)
recovery_target_action = 'pause'
promote
recovery_target_action = 'promote'
shutdown
recovery_target_action = 'shutdown'
Timezone handling in recovery targets is a frequent source of errors. Understanding how timestamps are interpreted and stored is essential for accurate recovery.
Internally, PostgreSQL (and most databases) store commit timestamps in UTC. When a transaction commits, the timestamp recorded is the current UTC time, regardless of the session's timezone setting.
When you specify a recovery target time, you must account for timezone:
Explicit Timezone (Recommended)
recovery_target_time = '2024-01-15 14:30:00 America/New_York'
recovery_target_time = '2024-01-15 14:30:00-05:00'
recovery_target_time = '2024-01-15 19:30:00 UTC'
No Timezone (Dangerous)
recovery_target_time = '2024-01-15 14:30:00'
A recovery target of '2024-01-15 14:30:00' without a timezone will be interpreted using the recovered database's timezone setting—which may have been different before the incident or between environments. Always specify an explicit timezone.
DST transitions create ambiguous or non-existent times:
Non-Existent Times (Spring Forward)
In US Eastern, March 10, 2024 jumps from 1:59 AM to 3:00 AM. The time "2:30 AM" doesn't exist.
-- This time doesn't exist!
recovery_target_time = '2024-03-10 02:30:00 America/New_York'
Behavior varies by database. Most will interpret as 3:30 AM or reject the timestamp.
Ambiguous Times (Fall Back)
In US Eastern, November 3, 2024 repeats 1:00 AM to 1:59 AM twice. The time "1:30 AM" occurs twice.
-- Which 1:30 AM? First or second occurrence?
recovery_target_time = '2024-11-03 01:30:00 America/New_York'
Best Practice: During DST transitions, use UTC or explicit offset:
-- Clear and unambiguous
recovery_target_time = '2024-11-03 05:30:00 UTC'
recovery_target_time = '2024-11-03 01:30:00-04:00' -- Before fall-back
recovery_target_time = '2024-11-03 01:30:00-05:00' -- After fall-back
| Method | Example | Risk Level | Recommendation |
|---|---|---|---|
| Explicit zone name | '2024-01-15 14:30:00 America/New_York' | Low | Preferred for operational use |
| Explicit offset | '2024-01-15 14:30:00-05:00' | Low | Preferred when offset is known |
| UTC | '2024-01-15 19:30:00 UTC' | Low | Preferred for automation |
| Implicit (no zone) | '2024-01-15 14:30:00' | High | Never use in production |
| With 'Z' suffix | '2024-01-15 19:30:00Z' | Low | ISO format, clear UTC indication |
Before executing recovery, verify how your target time will be interpreted:
-- Check database timezone setting (from backup)
-- This is how timestamps without zones will be interpreted
SHOW timezone;
-- Convert your intended target to UTC to verify
SELECT '2024-01-15 14:30:00 America/New_York'::timestamptz;
-- Expected: 2024-01-15 19:30:00+00
-- If you have a UTC time, convert to local for verification
SELECT '2024-01-15 19:30:00 UTC'::timestamptz
AT TIME ZONE 'America/New_York';
-- Expected: 2024-01-15 14:30:00
Recovery targets operate at transaction boundaries—you can never recover to the middle of a transaction. Understanding this constraint is essential for setting realistic expectations.
Transactions are atomic: they either happen completely or not at all. This ACID property extends to PITR:
Example Timeline:
14:30:00.000 - Transaction T1 BEGINS
14:30:00.500 - T1: INSERT row A
14:30:01.000 - T1: UPDATE row B
14:30:01.500 - T1: DELETE row C
14:30:02.000 - Transaction T1 COMMITS <- This is T1's time
Recovery to 14:30:01.000:
- T1 is NOT included (didn't commit until 14:30:02)
- Row A: not inserted
- Row B: not updated
- Row C: not deleted
Recovery to 14:30:02.500:
- T1 IS included (committed at 14:30:02)
- Row A: inserted
- Row B: updated
- Row C: deleted
Long-running transactions create large 'gaps' in recovery precision. If a transaction runs for 10 minutes, you cannot recover to any point within that 10-minute window—you get either the pre-transaction or post-transaction state.
Commit timestamps have limited resolution, typically microseconds. In a high-throughput system, multiple transactions may share the same millisecond.
Practical Implications:
Commit timestamps at 14:30:00.123456:
- T1 COMMIT at 14:30:00.123456 (LSN: 100)
- T2 COMMIT at 14:30:00.123456 (LSN: 101) <- Target transaction
- T3 COMMIT at 14:30:00.123456 (LSN: 102)
Recovery to '14:30:00.123456' with inclusive=true:
- Which transaction gets included?
- Behavior may vary by database
When Precision Matters: Use LSN-based targeting:
-- Know exactly which transactions get included
recovery_target_lsn = '5/2A000800'
Before specifying a recovery target, investigate the transaction landscape around your intended time:
-- PostgreSQL: Find commits near a timestamp (requires pg_stat_statements/logging)
-- From your application logs or audit tables:
SELECT
xact_id,
commit_time,
LEAD(commit_time) OVER (ORDER BY commit_time) AS next_commit,
LEAD(commit_time) OVER (ORDER BY commit_time) - commit_time AS gap_to_next
FROM transaction_audit_log
WHERE commit_time BETWEEN '2024-01-15 14:29:00' AND '2024-01-15 14:32:00'
ORDER BY commit_time;
This reveals:
When you specify a recovery target time, recovery doesn't stop exactly at that time. It stops at the nearest transaction boundary before that time (with inclusive=false):
Target: 14:30:00.500
Transaction Commits:
T1 at 14:29:59.800 <- Last commit BEFORE target
T2 at 14:30:00.600 <- First commit AFTER target
Recovery stops at 14:29:59.800 (T1 commit time)
Actual recovered state: 14:29:59.800, NOT 14:30:00.500
Documenting Actual Recovery Point:
After recovery pauses, always check the actual recovered point:
SELECT pg_last_xact_replay_timestamp();
-- Returns the actual recovery point, which may differ from target
While time-based recovery is intuitive, other target modes offer advantages in specific scenarios.
Specifying a Log Sequence Number provides the highest precision:
recovery_target_lsn = '5/2A3B7C00'
Advantages:
Challenges:
Finding the Target LSN:
-- From WAL inspection tools (pg_waldump)
-- Example output:
rmgr: Transaction len (rec/tot): 34/ 34,
tx: 123456, lsn: 5/2A3B7C00,
prev 5/2A3B7B80, desc: COMMIT 2024-01-15 14:30:00.123456 UTC
-- The lsn value is your target
# PostgreSQL: Dump WAL to find LSNs
pg_waldump /archive/wal/000000010000000500000002A \
--start=5/2A000000 --end=5/2B000000 \
| grep -E "(COMMIT|ABORT)" \
| head -20
Target a specific transaction by its internal identifier:
recovery_target_xid = '12847294'
Advantages:
Challenges:
When to Use:
Proactively create restore points for planned operations:
-- Before a risky operation, create a restore point
SELECT pg_create_restore_point('before_schema_migration_v5.2');
-- Then perform the risky operation
ALTER TABLE customers ADD COLUMN loyalty_tier VARCHAR(20);
-- ... more migration steps ...
-- If something goes wrong, recover to the named point
recovery_target_name = 'before_schema_migration_v5.2'
Advantages:
Best Practices for Restore Points:
SELECT * FROM pg_catalog.pg_replication_slots;
SELECT pg_current_wal_lsn(); -- Note LSN for reference
| Target Mode | Precision | Usability | Best For |
|---|---|---|---|
| Time | Snap to nearest transaction | Very intuitive | Most incident responses |
| LSN | Exact transaction boundary | Requires investigation | Forensic recovery, precise control |
| XID | Exact transaction | Requires special logging | Specific transaction targeting |
| Named Point | Exact named position | Very clear | Planned operations |
Determining the optimal recovery target requires investigation. The goal is to identify the precise moment that separates "good" state from "bad" state.
Step 1: Establish the Symptom Timeline
Gather all available information about when symptoms appeared:
Step 2: Work Backward to Root Cause
Symptoms often appear after the causal event:
Causal Transaction: 14:30:00
→ Database corruption occurs
→ Application reads corrupted data: 14:30:15
→ User reports problem: 14:32:00
→ Support ticket created: 14:45:00
Step 3: Identify the Causal Transaction
Use all available sources:
-- Check application audit tables
SELECT *
FROM audit_log
WHERE table_name = 'affected_table'
AND change_time BETWEEN '2024-01-15 14:25:00' AND '2024-01-15 14:35:00'
ORDER BY change_time;
-- Check for bulk operations
SELECT *
FROM pg_stat_statements
WHERE query LIKE '%affected_table%'
AND rows > 1000; -- Bulk operations
For detailed investigation, analyze the WAL directly:
Using pg_waldump (PostgreSQL):
# Find transactions affecting a specific table
pg_waldump /archive/wal/000000010000000500000002A \
--start=5/2A000000 \
--rmgr=Heap \
| grep -E "(INSERT|UPDATE|DELETE)" \
| grep "rel 16384/16385/12345" # OID of affected table
Correlating Log Entries with LSN:
# Find the exact LSN of suspicious activity
pg_waldump /archive/wal/* \
--start=5/29000000 --end=5/2B000000 \
| grep -B2 -A2 "COMMIT" \
| less
# Example output interpretation:
# rmgr: Heap len (rec/tot): 54/54, tx: 123456, lsn: 5/2A3B7B00
# desc: INSERT off 5
# rmgr: Transaction len: 34, tx: 123456, lsn: 5/2A3B7C00
# desc: COMMIT 2024-01-15 14:30:00.123
#
# Transaction 123456 commits at LSN 5/2A3B7C00
# To exclude this transaction: recovery_target_lsn = '5/2A3B7B00'
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161
#!/usr/bin/env python3"""PITR Target Investigation ToolHelps identify optimal recovery targets by analyzing transaction history""" import subprocessimport refrom datetime import datetime, timedeltafrom dataclasses import dataclassfrom typing import Optional, Listimport argparse @dataclassclass TransactionCommit: """Represents a transaction commit from WAL analysis""" lsn: str xid: int timestamp: datetime table_oids: List[str] operation_count: int @dataclassclass RecoveryWindow: """Represents a valid recovery window""" start_lsn: str end_lsn: str start_time: datetime end_time: datetime transactions_in_window: int @property def duration(self) -> timedelta: return self.end_time - self.start_time class PITRInvestigator: """Investigates transaction history to identify PITR targets""" def __init__(self, archive_path: str, start_time: str, end_time: str): self.archive_path = archive_path self.start_time = datetime.fromisoformat(start_time) self.end_time = datetime.fromisoformat(end_time) self.commits: List[TransactionCommit] = [] def analyze_wal_segment(self, segment_path: str) -> List[TransactionCommit]: """Parse a WAL segment for transaction commits""" result = subprocess.run( ['pg_waldump', segment_path, '--rmgr=Transaction'], capture_output=True, text=True ) commits = [] commit_pattern = re.compile( r'lsn: ([dA-F/]+).*tx:s+(d+).*COMMITs+(d{4}-d{2}-d{2}s+[d:.]+)' ) for line in result.stdout.split('\n'): match = commit_pattern.search(line) if match: commit = TransactionCommit( lsn=match.group(1), xid=int(match.group(2)), timestamp=datetime.fromisoformat(match.group(3)), table_oids=[], operation_count=0 ) commits.append(commit) return commits def find_recovery_windows( self, bad_transaction_start: datetime, min_window_seconds: float = 1.0 ) -> List[RecoveryWindow]: """ Find valid recovery windows before a known bad transaction. Returns windows where recovery could safely stop. """ windows = [] sorted_commits = sorted(self.commits, key=lambda c: c.timestamp) for i, commit in enumerate(sorted_commits): if commit.timestamp >= bad_transaction_start: break # Find the next commit if i + 1 < len(sorted_commits): next_commit = sorted_commits[i + 1] gap = (next_commit.timestamp - commit.timestamp).total_seconds() if gap >= min_window_seconds: window = RecoveryWindow( start_lsn=commit.lsn, end_lsn=next_commit.lsn, start_time=commit.timestamp, end_time=next_commit.timestamp, transactions_in_window=0 ) windows.append(window) return windows def recommend_target( self, bad_transaction_time: datetime, safety_margin_seconds: float = 5.0 ) -> Optional[str]: """ Recommend a recovery target based on analysis. Returns recommended LSN target with safety margin. """ target_time = bad_transaction_time - timedelta(seconds=safety_margin_seconds) # Find last commit before target time sorted_commits = sorted(self.commits, key=lambda c: c.timestamp) candidate = None for commit in sorted_commits: if commit.timestamp <= target_time: candidate = commit else: break if candidate: return candidate.lsn return None def main(): parser = argparse.ArgumentParser( description='PITR Target Investigation Tool' ) parser.add_argument('--archive-path', required=True, help='Path to WAL archive') parser.add_argument('--incident-time', required=True, help='Approximate incident time (ISO format)') parser.add_argument('--search-window-minutes', type=int, default=30, help='Minutes before/after incident to analyze') args = parser.parse_args() # Initialize investigator incident = datetime.fromisoformat(args.incident_time) start = incident - timedelta(minutes=args.search_window_minutes) end = incident + timedelta(minutes=5) investigator = PITRInvestigator( args.archive_path, start.isoformat(), end.isoformat() ) # Analyze and recommend # ... (analysis logic) print(f"\nRecommended recovery target:") print(f" Time-based: {target_time.isoformat()}") print(f" LSN-based: {recommended_lsn}") if __name__ == "__main__": main()Even with thorough investigation, uncertainty remains. Safety margins protect against incomplete information.
Several factors can cause the actual problem to occur slightly earlier than estimated:
| Scenario | Minimum Margin | Recommended Margin |
|---|---|---|
| High-confidence timing from logs | 10 seconds | 30 seconds |
| Estimated from user reports | 2 minutes | 5 minutes |
| Unknown timing, narrow corruption | 5 minutes | 10 minutes |
| Major incident, unclear scope | 15 minutes | 30 minutes |
When uncertain about the target, use an iterative approach:
Round 1: Recover with large safety margin
Round 2 (if needed): Narrow the target
Round 3: Final recovery point selection
# Iterative recovery example
# Round 1: Safe target (5 min before estimated incident)
recovery_target_time = '2024-01-15 14:25:00'
# Result: Clean, but 5 minutes of data lost
# Round 2: Closer target (2 min before)
recovery_target_time = '2024-01-15 14:28:00'
# Result: Still clean, recovered 3 more minutes
# Round 3: Final target (30 sec before)
recovery_target_time = '2024-01-15 14:29:30'
# Result: Clean, optimal recovery
Learning from common mistakes helps avoid them during high-pressure recovery situations.
Scenario: Administrator in New York specifies 2:30 PM thinking it's local time, but server is configured for UTC.
What happens: Recovery goes to 9:30 PM local time instead—hours past the incident.
Prevention:
-- Always verify interpretation
SELECT '2024-01-15 14:30:00'::timestamp AT TIME ZONE current_setting('TimeZone');
-- Always specify timezone
recovery_target_time = '2024-01-15 14:30:00 America/New_York'
Scenario: Administrator wants to recover up to and including a specific transaction, but uses default exclusive mode.
What happens: The target transaction is NOT included.
Prevention:
-- Be explicit about inclusive behavior
recovery_target_inclusive = true -- If you WANT the target transaction
recovery_target_inclusive = false -- If you want to EXCLUDE target (default)
Scenario: Recovery target is earlier than the base backup's checkpoint.
What happens: Recovery cannot proceed—there's no starting point.
Prevention:
# Always check backup's LSN/time before specifying target
cat /backup/backup_label
# START WAL LOCATION: 5/1A000028 (file 000000010000000500000001A)
# START TIME: 2024-01-15 00:00:00 UTC
# Target must be AFTER 2024-01-15 00:00:00 UTC
| Error | Symptom | Prevention |
|---|---|---|
| Timezone mismatch | Recovery at wrong time | Always specify explicit timezone |
| Inclusive confusion | Target transaction included/excluded unexpectedly | Be explicit about inclusive setting |
| Target before backup | Recovery fails to start | Verify target is after backup timestamp |
| Target in future | Recovery completes all WAL (no stop point) | Verify target is in the past |
| Missing WAL gap | Recovery stops at gap, not target | Verify archive continuity to target |
| Wrong target mode | Unexpected recovery point | Choose appropriate mode for scenario |
No matter how confident you are about the target, always use recovery_target_action = 'pause' and verify the recovered state before promoting. This is your last opportunity to detect target specification errors.
Precise target specification is where PITR theory meets practice. Let's consolidate the key concepts:
What's next:
Having mastered target specification, we'll examine PITR Limitations. The final page explores the inherent constraints of Point-in-Time Recovery, including RPO boundaries, storage requirements, performance impacts, and scenarios where PITR alone is insufficient—along with complementary strategies that address these limitations.
You now understand the nuances of recovery target specification. You've learned about timezone handling, transaction boundaries, alternative target modes, investigation techniques, and safety margins. Next, we'll explore the inherent limitations of PITR and how to address them.