Database Management SystemsMaintenance Tasks

Database Maintenance Tasks

LevelAdvanced

Duration90 mins

TopicMaintenance Tasks

3 / 5

Log Management

The Critical Role of Database Logs

Database logs are the unsung heroes of data integrity. Every insert, update, and delete passes through the transaction log before being written to data files. Without properly managed logs, databases cannot guarantee durability, cannot recover from crashes, and cannot be replicated to standby systems.

But logs are also consumptive beasts. Left unmanaged, they grow without bound, consuming storage until the database grinds to a halt. A transaction log that fills the disk is one of the most common causes of production database outages—and often the most preventable.

Log management is essential for both reliability and operational stability.

What You Will Learn

By the end of this page, you will understand transaction log architecture, master log size management and growth strategies, implement log backup and archival procedures, configure log shipping and rotation, and develop production log management policies.

Transaction Log Fundamentals

Every modern database maintains a transaction log (also called write-ahead log, redo log, or WAL) that records all modifications before they're applied to data files. This write-ahead logging (WAL) protocol is the foundation of database durability and recovery.

Why write-ahead logging exists:

Durability: If the system crashes after commit, the log contains enough information to replay the transaction
Atomicity: If crash occurs mid-transaction, the log enables rollback of incomplete work
Performance: Sequential log writes are faster than random data file updates
Replication: Logs can be shipped to standby servers for real-time replication

Transaction Log Terminology Across Database Systems
Concept	SQL Server	PostgreSQL	MySQL/InnoDB	Oracle
Transaction Log	Transaction Log (.ldf)	Write-Ahead Log (WAL)	Redo Log + Binary Log	Redo Log
Log Files	Single virtual log file with VLFs	Segment files (16MB each)	ib_logfile0, ib_logfile1	Multiple redo log groups
Log Backup	Transaction log backup	WAL archiving	Binlog + Redo archive	Archived redo logs
Log Truncation	Checkpoint + backup	Checkpoint + WAL archive	Checkpoint	Log switch/archive
Log Reuse	VLF status becomes reusable	Old segments removed	Circular reuse after checkpoint	Circular reuse or archive

The log lifecycle:

Write: Transaction modifies data; log record is written first
Commit: Transaction completes; log record is hardened to disk
Checkpoint: Database writes dirty pages to data files; marks log position
Truncation/Reuse: Log space before checkpoint can be reused (after backup)
Archive: For point-in-time recovery, logs are backed up before truncation

Log vs. Data File Writes

Log writes are sequential (append-only) and synchronous at commit. Data file writes are random and can be deferred. This is why logging seems 'extra work' but actually improves performance—sequential I/O is much faster than random I/O, especially on spinning disks.

Log File Size Management

Proper log file sizing balances several concerns: preventing uncontrolled growth, avoiding frequent auto-growth events, ensuring sufficient space for peak transaction volumes, and minimizing wasted disk space.

Log growth problems:

Too small: Frequent auto-growth events cause performance spikes and fragmentation
Too large: Wasted disk space that could be used for data or other systems
Uncontrolled: Logs fill disk, causing database to stop accepting writes
Improper auto-growth: Small increments (1MB) cause thousands of VLFs

log_sizing.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- SQL Server: Log file size management
 
-- Check current log file status
SELECT 
    DB_NAME(database_id) AS database_name,
    name AS log_file_name,
    type_desc,
    size * 8 / 1024 AS size_mb,
    max_size,
    growth,
    is_percent_growth
FROM sys.master_files
WHERE type_desc = 'LOG'
  AND database_id = DB_ID();
 
-- Check log space usage
DBCC SQLPERF(LOGSPACE);
 
-- View Virtual Log Files (VLFs)
DBCC LOGINFO;
-- Too many VLFs (>500) indicates fragmented log growth
 
-- Check why log cannot be truncated
SELECT 
    name,
    log_reuse_wait_desc
FROM sys.databases
WHERE name = DB_NAME();
-- Common values:
-- NOTHING - Log can be truncated
-- LOG_BACKUP - Waiting for log backup
-- ACTIVE_TRANSACTION - Long-running transaction
-- DATABASE_MIRRORING - Mirror not caught up
-- REPLICATION - Pending replication
 
-- Resize log file properly
-- Step 1: Backup the log (if full recovery)
BACKUP LOG [OrdersDB] TO DISK = 'D:\Backups\OrdersDB_Log.trn';
 
-- Step 2: Shrink the log (avoid in production if possible)
DBCC SHRINKFILE (OrdersDB_Log, 1024);  -- Shrink to 1GB
 
-- Step 3: Grow to proper size with good increment
ALTER DATABASE [OrdersDB]
MODIFY FILE (NAME = OrdersDB_Log, SIZE = 8GB);
 
-- Set appropriate auto-growth (avoid small increments)
ALTER DATABASE [OrdersDB]
MODIFY FILE (NAME = OrdersDB_Log, FILEGROWTH = 512MB);
 
-- NEVER use percent growth for large databases
-- 10% of 100GB = 10GB growth events (too large)

The VLF Problem (SQL Server)

When SQL Server grows the log file, it creates Virtual Log Files (VLFs). Many small growths create many small VLFs, slowing log operations. Target < 500 VLFs. If you have thousands, shrink and pre-grow the log in large increments (512MB-1GB) to consolidate.

Transaction Log Backup Strategies

Transaction log backups serve two purposes: enabling point-in-time recovery and allowing log truncation. Without regular log backups, the transaction log grows indefinitely.

Log backup frequency considerations:

Recovery Point Objective (RPO): How much data can you afford to lose? Log backup frequency determines maximum data loss.
Log generation rate: High-transaction systems generate more log data, requiring more frequent backups
Storage costs: Frequent backups consume more storage
Restore complexity: More log backups mean longer restore sequences

Log Backup Frequency Guidelines
Workload Type	Recommended Frequency	RPO	Typical Log Size
Low transaction (reporting)	Every 1-4 hours	1-4 hours data loss	Small (MB/hour)
Medium transaction (typical OLTP)	Every 15-30 minutes	15-30 min data loss	Moderate (10s MB/hour)
High transaction (e-commerce)	Every 5-15 minutes	5-15 min data loss	Large (100s MB/hour)
Critical (financial)	Every 1-5 minutes	1-5 min data loss	Very large (GB/hour)
Near-zero data loss	Continuous (log shipping)	Seconds	Shipped immediately

log_backup.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- SQL Server: Transaction log backup strategies
 
-- Check recovery model (must be FULL for log backups)
SELECT 
    name,
    recovery_model_desc
FROM sys.databases
WHERE name = 'OrdersDB';
 
-- Set full recovery model if needed
ALTER DATABASE [OrdersDB] SET RECOVERY FULL;
 
-- Basic log backup
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\OrdersDB_Log_20240115_0800.trn'
WITH COMPRESSION, CHECKSUM;
 
-- Log backup with automatic naming
DECLARE @BackupFile NVARCHAR(500) = 
    'D:\Backups\OrdersDB_Log_' + 
    FORMAT(GETDATE(), 'yyyyMMdd_HHmmss') + '.trn';
 
BACKUP LOG [OrdersDB]
TO DISK = @BackupFile
WITH COMPRESSION, CHECKSUM, STATS = 10;
 
-- Tail-log backup (before disaster recovery)
-- Captures log generated since last backup
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\OrdersDB_TailLog.trn'
WITH NO_TRUNCATE, CHECKSUM;  -- NO_TRUNCATE for damaged database
 
-- Copy-only log backup (doesn't break backup chain)
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\OrdersDB_Log_CopyOnly.trn'
WITH COPY_ONLY, COMPRESSION;
 
-- Log backup to multiple files (striping)
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\Log1.trn',
   DISK = 'E:\Backups\Log2.trn'
WITH COMPRESSION, CHECKSUM;
 
-- Scheduled log backup job (SQL Agent)
-- Create job that runs every 15 minutes
-- Calls backup command with timestamp-based filename
 
-- Verify integrity of log backup
RESTORE VERIFYONLY
FROM DISK = 'D:\Backups\OrdersDB_Log_20240115_0800.trn';
 
-- Check backup history
SELECT 
    bs.database_name,
    bs.backup_start_date,
    bs.backup_finish_date,
    bs.type AS backup_type,
    DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) AS duration_sec,
    bs.backup_size / 1024 / 1024 AS size_mb,
    bmf.physical_device_name
FROM msdb.dbo.backupset bs
JOIN msdb.dbo.backupmediaset bms ON bs.media_set_id = bms.media_set_id
JOIN msdb.dbo.backupmediafamily bmf ON bms.media_set_id = bmf.media_set_id
WHERE bs.database_name = 'OrdersDB'
  AND bs.type = 'L'  -- L = Log backup
ORDER BY bs.backup_start_date DESC;

Backup Chain Integrity

Log backups form a chain from the last full backup to the present. Breaking the chain (e.g., switching from FULL to SIMPLE recovery, or missing a log backup file) prevents point-in-time recovery. Always verify backup chains are complete before deleting old backups.

Log Shipping and Continuous Replication

Log shipping is a disaster recovery technique that automatically sends transaction log backups from a primary server to one or more secondary servers. It provides warm standby capability with configurable lag time.

Log shipping vs. synchronous replication:

Log shipping: Asynchronous, file-based, higher latency (minutes to hours)
Synchronous replication: Real-time, network-based, minimal latency (seconds)
Log shipping advantages: Simpler setup, works across network boundaries, lower bandwidth
Synchronous advantages: Lower RPO, automatic failover possible

log_shipping.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- SQL Server: Log shipping setup
 
-- On PRIMARY server:
-- 1. Enable log backup job (creates .trn files)
-- 2. Configure backup share accessible to secondary
 
-- Create backup directory share
-- \\PRIMARY\LogShipping$
 
-- Backup job (run every 15 minutes)
BACKUP LOG [OrdersDB]
TO DISK = '\\PRIMARY\LogShipping$\OrdersDB_LS.trn'
WITH INIT, COMPRESSION;
 
-- On SECONDARY server:
-- 1. Restore full backup with NORECOVERY
RESTORE DATABASE [OrdersDB]
FROM DISK = '\\PRIMARY\LogShipping$\OrdersDB_Full.bak'
WITH NORECOVERY,
MOVE 'OrdersDB' TO 'D:\Data\OrdersDB_LS.mdf',
MOVE 'OrdersDB_log' TO 'D:\Logs\OrdersDB_LS.ldf';
 
-- 2. Create copy job (copies files from primary share)
-- SQL Agent job that runs every 15 minutes
 
-- 3. Create restore job (restores copied log files)
-- SQL Agent job example:
RESTORE LOG [OrdersDB]
FROM DISK = 'D:\LSCopy\OrdersDB_LS.trn'
WITH NORECOVERY;
 
-- Monitor log shipping status
SELECT 
    primary_server,
    primary_database,
    secondary_server,
    secondary_database,
    last_copied_file,
    last_copied_date,
    last_restored_file,
    last_restored_date,
    DATEDIFF(MINUTE, last_restored_date, GETDATE()) AS lag_minutes
FROM msdb.dbo.log_shipping_monitor_secondary;
 
-- Failover (when primary fails):
-- 1. Apply remaining logs with RECOVERY
RESTORE LOG [OrdersDB]
FROM DISK = 'D:\LSCopy\OrdersDB_LS.trn'
WITH RECOVERY;  -- Database now online as primary
 
-- Using GUI: Database Properties > Transaction Log Shipping
-- (Easier setup for most scenarios)

Lag Monitoring is Critical

Always monitor replication lag. A standby that falls behind may have too much to recover during failover, extending downtime. Set alerts for lag exceeding acceptable thresholds (e.g., > 1 hour for log shipping, > 1 minute for streaming replication).

Log Rotation and Cleanup Strategies

Log files—whether transaction logs, error logs, slow query logs, or audit logs—must be managed proactively. Without rotation and cleanup, they consume all available storage.

Types of Database Logs and Management Strategies
Log Type	Purpose	Rotation Strategy	Retention
Transaction/Redo Log	Durability and recovery	Circular reuse after checkpoint/backup	N/A (reused in place)
WAL Archive/Binlog	Point-in-time recovery	Time-based cleanup after PITR window	Last full backup + N days
Error Log	Troubleshooting	Size or date-based rotation	30-90 days typically
Slow Query Log	Performance tuning	Size-based rotation	7-30 days
Audit Log	Compliance, security	Time-based, compressed archive	Years (compliance requirements)
Connection Log	Security, debugging	Size or date-based rotation	7-30 days

log_rotation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- SQL Server: Log file rotation and cleanup
 
-- Cycle error log (rotates, keeping 6 previous)
EXEC sp_cycle_errorlog;
-- Configure max error logs in SQL Server Configuration Manager
 
-- Increase number of error log files kept
EXEC xp_instance_regwrite 
    N'HKEY_LOCAL_MACHINE', 
    N'SOFTWARE\Microsoft\MSSQLServer\MSSQLServer', 
    N'NumErrorLogs', REG_DWORD, 12;
 
-- Clean up old transaction log backups
-- (Use SQL Agent job or maintenance plan)
DECLARE @RetentionDays INT = 7;
DECLARE @CleanupDate DATETIME = DATEADD(DAY, -@RetentionDays, GETDATE());
 
-- Using xp_delete_file (recommended)
EXEC xp_delete_file 
    0,                           -- Type: 0 = backup files
    'D:\Backups\',             -- Folder path
    'trn',                       -- Extension
    @CleanupDate,                -- Delete older than this date
    1;                           -- Include subfolders
 
-- Clean up backup history in msdb
DECLARE @OldestDate DATETIME = DATEADD(MONTH, -6, GETDATE());
EXEC msdb.dbo.sp_delete_backuphistory @OldestDate;
 
-- Clean up job history
EXEC msdb.dbo.sp_purge_jobhistory 
    @oldest_date = @OldestDate;
 
-- Clean up mail log
EXEC msdb.dbo.sysmail_delete_mailitems_sp 
    @sent_before = @OldestDate;
 
-- Maintenance plan approach:
-- Create maintenance plan with:
-- 1. Backup Log task (every 15 min)
-- 2. Cleanup task (delete files older than 7 days)

Troubleshooting Common Log Issues

Log-related issues are among the most common causes of database outages. Understanding how to diagnose and resolve these issues quickly is essential for production database administration.

Common Log Problems and Solutions

•Log full / Cannot write — The most critical issue. Database stops accepting writes. Immediate log backup (SQL Server) or checkpoint + WAL cleanup (PostgreSQL) required. Consider emergency log file growth.
•Log growing uncontrollably — Usually caused by: long-running transactions, missing log backups, replication lag, or database mirroring issues. Identify and resolve the blocker.
•Cannot truncate/reuse log — Check for: active transactions, outstanding log backups, replication publishers waiting, or mirroring synchronization. Resolve the blocking condition.
•Too many VLFs (SQL Server) — Caused by many small auto-growth events. Shrink log, then grow in large increments. Target < 500 VLFs.
•WAL archiving failing (PostgreSQL) — Check archive_command for errors. Verify destination storage is accessible and has space. Check pg_stat_archiver for error details.
•Replication lag increasing — Network bandwidth, slow standby, or heavy write load on primary. May need to increase resources or tune replication settings.

log_troubleshooting.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- SQL Server: Log troubleshooting queries
 
-- 1. Why can't the log be truncated?
SELECT 
    name,
    log_reuse_wait_desc,
    recovery_model_desc
FROM sys.databases
WHERE database_id = DB_ID();
 
-- Interpret log_reuse_wait_desc:
-- CHECKPOINT: Normal, waiting for checkpoint
-- LOG_BACKUP: Need to take log backup (FULL recovery)
-- ACTIVE_TRANSACTION: Long-running transaction blocking
-- DATABASE_MIRRORING: Mirror is behind
-- REPLICATION: Transaction replication needs the log
-- AVAILABILITY_REPLICA: AG replica behind
 
-- 2. Find long-running transactions
SELECT 
    s.session_id,
    s.login_name,
    s.host_name,
    t.transaction_id,
    t.transaction_begin_time,
    DATEDIFF(MINUTE, t.transaction_begin_time, GETDATE()) AS duration_min,
    t.transaction_state
FROM sys.dm_tran_active_transactions t
JOIN sys.dm_tran_session_transactions st 
    ON t.transaction_id = st.transaction_id
JOIN sys.dm_exec_sessions s 
    ON st.session_id = s.session_id
ORDER BY t.transaction_begin_time ASC;
 
-- 3. Check log space usage per transaction
SELECT 
    DB_NAME(database_id) AS database_name,
    CAST(total_log_size_in_bytes / 1024.0 / 1024 AS DECIMAL(10,2)) AS total_log_mb,
    CAST(used_log_space_in_bytes / 1024.0 / 1024 AS DECIMAL(10,2)) AS used_log_mb,
    CAST(used_log_space_in_percent AS DECIMAL(5,2)) AS used_pct
FROM sys.dm_db_log_space_usage;
 
-- 4. Emergency: Log full, need immediate space
-- Option A: Take log backup immediately
BACKUP LOG [OrdersDB] 
TO DISK = 'D:\Emergency\OrdersDB_Log.trn'
WITH COMPRESSION;
 
-- Option B: If no backup needed (will break backup chain!)
-- Switch to SIMPLE recovery temporarily
ALTER DATABASE [OrdersDB] SET RECOVERY SIMPLE;
CHECKPOINT;
ALTER DATABASE [OrdersDB] SET RECOVERY FULL;
-- MUST take new full backup to restart log chain!
 
-- 5. VLF count and potential consolidation
DBCC LOGINFO;
-- Count rows to get VLF count
-- If > 500, consider shrink and regrow

Log Management Best Practices

Effective log management requires proactive policies, monitoring, and automation. The following best practices help prevent log-related incidents in production.

Production Log Management Best Practices

•Pre-size log files — Don't rely on auto-growth for transaction logs. Pre-size based on peak workload analysis. Avoid many small auto-growth events.
•Separate log files from data — Place transaction logs on dedicated, fast storage (ideally different physical disks than data files for write-ahead performance).
•Implement log backup jobs — Automate log backups at appropriate intervals based on RPO requirements. Monitor for job failures.
•Set up log space alerts — Alert before logs reach critical thresholds (e.g., 80% full). React before there's an outage.
•Automate log cleanup — Implement automated cleanup of old log backups based on retention policies. Don't let backup storage fill up.
•Monitor log generation rate — Track how much log is generated per hour/day. Use this for capacity planning and backup scheduling.
•Document recovery procedures — Log files are useless if you can't restore from them. Document and test recovery procedures regularly.
•Test log restores periodically — Verify log backup chains are complete and restorable. Discover problems during tests, not emergencies.

The 80% Rule

Set alerts at 80% log space usage. This gives you time to react before reaching 100%. When the alert fires, take an immediate log backup and investigate why usage is higher than normal.

Summary: Mastering Log Management

Log management is often overlooked until something goes wrong. But proper log management is essential for database durability, recoverability, and operational stability. Neglecting logs leads to outages; mastering logs prevents them.

Key Takeaways

•Transaction logs enable durability and recovery — Write-ahead logging is fundamental to database reliability. Logs must be protected and managed.
•Size logs appropriately — Pre-size based on workload, configure reasonable auto-growth, and avoid the VLF proliferation problem.
•Backup logs regularly — Log backup frequency determines your recovery point objective. More critical systems need more frequent backups.
•Understand log retention — Logs can only be truncated/reused after backup and checkpoint. Know what prevents truncation in your system.
•Implement log shipping for DR — Log shipping provides warm standby capability. Monitor lag and test failover procedures.
•Automate rotation and cleanup — Manual log management doesn't scale. Automate cleanup based on retention policies.
•Monitor proactively — Set alerts for log space, archiving failures, and replication lag. React before there's an outage.

What's next:

Logs keep databases running. But the software itself needs maintenance too. In the next page, we'll explore Patching—how to plan, test, and apply database patches with minimal risk and downtime.

Page Complete

You now understand transaction log architecture, sizing strategies, backup and archival procedures, log shipping, rotation policies, and troubleshooting techniques. Apply these principles to keep your databases reliable and recoverable.

3 / 5

Loading learning content...

Database Management SystemsMaintenance Tasks

Database Maintenance Tasks

LevelAdvanced

Duration90 mins

TopicMaintenance Tasks

3 / 5

Log Management

The Critical Role of Database Logs

Log management is essential for both reliability and operational stability.

What You Will Learn

Transaction Log Fundamentals

Why write-ahead logging exists:

Durability: If the system crashes after commit, the log contains enough information to replay the transaction
Atomicity: If crash occurs mid-transaction, the log enables rollback of incomplete work
Performance: Sequential log writes are faster than random data file updates
Replication: Logs can be shipped to standby servers for real-time replication

Transaction Log Terminology Across Database Systems
Concept	SQL Server	PostgreSQL	MySQL/InnoDB	Oracle
Transaction Log	Transaction Log (.ldf)	Write-Ahead Log (WAL)	Redo Log + Binary Log	Redo Log
Log Files	Single virtual log file with VLFs	Segment files (16MB each)	ib_logfile0, ib_logfile1	Multiple redo log groups
Log Backup	Transaction log backup	WAL archiving	Binlog + Redo archive	Archived redo logs
Log Truncation	Checkpoint + backup	Checkpoint + WAL archive	Checkpoint	Log switch/archive
Log Reuse	VLF status becomes reusable	Old segments removed	Circular reuse after checkpoint	Circular reuse or archive

The log lifecycle:

Write: Transaction modifies data; log record is written first
Commit: Transaction completes; log record is hardened to disk
Checkpoint: Database writes dirty pages to data files; marks log position
Truncation/Reuse: Log space before checkpoint can be reused (after backup)
Archive: For point-in-time recovery, logs are backed up before truncation

Log vs. Data File Writes

Log File Size Management

Log growth problems:

Too small: Frequent auto-growth events cause performance spikes and fragmentation
Too large: Wasted disk space that could be used for data or other systems
Uncontrolled: Logs fill disk, causing database to stop accepting writes
Improper auto-growth: Small increments (1MB) cause thousands of VLFs

log_sizing.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- SQL Server: Log file size management
 
-- Check current log file status
SELECT 
    DB_NAME(database_id) AS database_name,
    name AS log_file_name,
    type_desc,
    size * 8 / 1024 AS size_mb,
    max_size,
    growth,
    is_percent_growth
FROM sys.master_files
WHERE type_desc = 'LOG'
  AND database_id = DB_ID();
 
-- Check log space usage
DBCC SQLPERF(LOGSPACE);
 
-- View Virtual Log Files (VLFs)
DBCC LOGINFO;
-- Too many VLFs (>500) indicates fragmented log growth
 
-- Check why log cannot be truncated
SELECT 
    name,
    log_reuse_wait_desc
FROM sys.databases
WHERE name = DB_NAME();
-- Common values:
-- NOTHING - Log can be truncated
-- LOG_BACKUP - Waiting for log backup
-- ACTIVE_TRANSACTION - Long-running transaction
-- DATABASE_MIRRORING - Mirror not caught up
-- REPLICATION - Pending replication
 
-- Resize log file properly
-- Step 1: Backup the log (if full recovery)
BACKUP LOG [OrdersDB] TO DISK = 'D:\Backups\OrdersDB_Log.trn';
 
-- Step 2: Shrink the log (avoid in production if possible)
DBCC SHRINKFILE (OrdersDB_Log, 1024);  -- Shrink to 1GB
 
-- Step 3: Grow to proper size with good increment
ALTER DATABASE [OrdersDB]
MODIFY FILE (NAME = OrdersDB_Log, SIZE = 8GB);
 
-- Set appropriate auto-growth (avoid small increments)
ALTER DATABASE [OrdersDB]
MODIFY FILE (NAME = OrdersDB_Log, FILEGROWTH = 512MB);
 
-- NEVER use percent growth for large databases
-- 10% of 100GB = 10GB growth events (too large)

The VLF Problem (SQL Server)

Transaction Log Backup Strategies

Transaction log backups serve two purposes: enabling point-in-time recovery and allowing log truncation. Without regular log backups, the transaction log grows indefinitely.

Log backup frequency considerations:

Recovery Point Objective (RPO): How much data can you afford to lose? Log backup frequency determines maximum data loss.
Log generation rate: High-transaction systems generate more log data, requiring more frequent backups
Storage costs: Frequent backups consume more storage
Restore complexity: More log backups mean longer restore sequences

Log Backup Frequency Guidelines
Workload Type	Recommended Frequency	RPO	Typical Log Size
Low transaction (reporting)	Every 1-4 hours	1-4 hours data loss	Small (MB/hour)
Medium transaction (typical OLTP)	Every 15-30 minutes	15-30 min data loss	Moderate (10s MB/hour)
High transaction (e-commerce)	Every 5-15 minutes	5-15 min data loss	Large (100s MB/hour)
Critical (financial)	Every 1-5 minutes	1-5 min data loss	Very large (GB/hour)
Near-zero data loss	Continuous (log shipping)	Seconds	Shipped immediately

log_backup.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- SQL Server: Transaction log backup strategies
 
-- Check recovery model (must be FULL for log backups)
SELECT 
    name,
    recovery_model_desc
FROM sys.databases
WHERE name = 'OrdersDB';
 
-- Set full recovery model if needed
ALTER DATABASE [OrdersDB] SET RECOVERY FULL;
 
-- Basic log backup
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\OrdersDB_Log_20240115_0800.trn'
WITH COMPRESSION, CHECKSUM;
 
-- Log backup with automatic naming
DECLARE @BackupFile NVARCHAR(500) = 
    'D:\Backups\OrdersDB_Log_' + 
    FORMAT(GETDATE(), 'yyyyMMdd_HHmmss') + '.trn';
 
BACKUP LOG [OrdersDB]
TO DISK = @BackupFile
WITH COMPRESSION, CHECKSUM, STATS = 10;
 
-- Tail-log backup (before disaster recovery)
-- Captures log generated since last backup
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\OrdersDB_TailLog.trn'
WITH NO_TRUNCATE, CHECKSUM;  -- NO_TRUNCATE for damaged database
 
-- Copy-only log backup (doesn't break backup chain)
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\OrdersDB_Log_CopyOnly.trn'
WITH COPY_ONLY, COMPRESSION;
 
-- Log backup to multiple files (striping)
BACKUP LOG [OrdersDB]
TO DISK = 'D:\Backups\Log1.trn',
   DISK = 'E:\Backups\Log2.trn'
WITH COMPRESSION, CHECKSUM;
 
-- Scheduled log backup job (SQL Agent)
-- Create job that runs every 15 minutes
-- Calls backup command with timestamp-based filename
 
-- Verify integrity of log backup
RESTORE VERIFYONLY
FROM DISK = 'D:\Backups\OrdersDB_Log_20240115_0800.trn';
 
-- Check backup history
SELECT 
    bs.database_name,
    bs.backup_start_date,
    bs.backup_finish_date,
    bs.type AS backup_type,
    DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) AS duration_sec,
    bs.backup_size / 1024 / 1024 AS size_mb,
    bmf.physical_device_name
FROM msdb.dbo.backupset bs
JOIN msdb.dbo.backupmediaset bms ON bs.media_set_id = bms.media_set_id
JOIN msdb.dbo.backupmediafamily bmf ON bms.media_set_id = bmf.media_set_id
WHERE bs.database_name = 'OrdersDB'
  AND bs.type = 'L'  -- L = Log backup
ORDER BY bs.backup_start_date DESC;

Backup Chain Integrity

Log Shipping and Continuous Replication

Log shipping vs. synchronous replication:

Log shipping: Asynchronous, file-based, higher latency (minutes to hours)
Synchronous replication: Real-time, network-based, minimal latency (seconds)
Log shipping advantages: Simpler setup, works across network boundaries, lower bandwidth
Synchronous advantages: Lower RPO, automatic failover possible

log_shipping.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- SQL Server: Log shipping setup
 
-- On PRIMARY server:
-- 1. Enable log backup job (creates .trn files)
-- 2. Configure backup share accessible to secondary
 
-- Create backup directory share
-- \\PRIMARY\LogShipping$
 
-- Backup job (run every 15 minutes)
BACKUP LOG [OrdersDB]
TO DISK = '\\PRIMARY\LogShipping$\OrdersDB_LS.trn'
WITH INIT, COMPRESSION;
 
-- On SECONDARY server:
-- 1. Restore full backup with NORECOVERY
RESTORE DATABASE [OrdersDB]
FROM DISK = '\\PRIMARY\LogShipping$\OrdersDB_Full.bak'
WITH NORECOVERY,
MOVE 'OrdersDB' TO 'D:\Data\OrdersDB_LS.mdf',
MOVE 'OrdersDB_log' TO 'D:\Logs\OrdersDB_LS.ldf';
 
-- 2. Create copy job (copies files from primary share)
-- SQL Agent job that runs every 15 minutes
 
-- 3. Create restore job (restores copied log files)
-- SQL Agent job example:
RESTORE LOG [OrdersDB]
FROM DISK = 'D:\LSCopy\OrdersDB_LS.trn'
WITH NORECOVERY;
 
-- Monitor log shipping status
SELECT 
    primary_server,
    primary_database,
    secondary_server,
    secondary_database,
    last_copied_file,
    last_copied_date,
    last_restored_file,
    last_restored_date,
    DATEDIFF(MINUTE, last_restored_date, GETDATE()) AS lag_minutes
FROM msdb.dbo.log_shipping_monitor_secondary;
 
-- Failover (when primary fails):
-- 1. Apply remaining logs with RECOVERY
RESTORE LOG [OrdersDB]
FROM DISK = 'D:\LSCopy\OrdersDB_LS.trn'
WITH RECOVERY;  -- Database now online as primary
 
-- Using GUI: Database Properties > Transaction Log Shipping
-- (Easier setup for most scenarios)

Lag Monitoring is Critical

Log Rotation and Cleanup Strategies

Log files—whether transaction logs, error logs, slow query logs, or audit logs—must be managed proactively. Without rotation and cleanup, they consume all available storage.

Types of Database Logs and Management Strategies
Log Type	Purpose	Rotation Strategy	Retention
Transaction/Redo Log	Durability and recovery	Circular reuse after checkpoint/backup	N/A (reused in place)
WAL Archive/Binlog	Point-in-time recovery	Time-based cleanup after PITR window	Last full backup + N days
Error Log	Troubleshooting	Size or date-based rotation	30-90 days typically
Slow Query Log	Performance tuning	Size-based rotation	7-30 days
Audit Log	Compliance, security	Time-based, compressed archive	Years (compliance requirements)
Connection Log	Security, debugging	Size or date-based rotation	7-30 days

log_rotation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- SQL Server: Log file rotation and cleanup
 
-- Cycle error log (rotates, keeping 6 previous)
EXEC sp_cycle_errorlog;
-- Configure max error logs in SQL Server Configuration Manager
 
-- Increase number of error log files kept
EXEC xp_instance_regwrite 
    N'HKEY_LOCAL_MACHINE', 
    N'SOFTWARE\Microsoft\MSSQLServer\MSSQLServer', 
    N'NumErrorLogs', REG_DWORD, 12;
 
-- Clean up old transaction log backups
-- (Use SQL Agent job or maintenance plan)
DECLARE @RetentionDays INT = 7;
DECLARE @CleanupDate DATETIME = DATEADD(DAY, -@RetentionDays, GETDATE());
 
-- Using xp_delete_file (recommended)
EXEC xp_delete_file 
    0,                           -- Type: 0 = backup files
    'D:\Backups\',             -- Folder path
    'trn',                       -- Extension
    @CleanupDate,                -- Delete older than this date
    1;                           -- Include subfolders
 
-- Clean up backup history in msdb
DECLARE @OldestDate DATETIME = DATEADD(MONTH, -6, GETDATE());
EXEC msdb.dbo.sp_delete_backuphistory @OldestDate;
 
-- Clean up job history
EXEC msdb.dbo.sp_purge_jobhistory 
    @oldest_date = @OldestDate;
 
-- Clean up mail log
EXEC msdb.dbo.sysmail_delete_mailitems_sp 
    @sent_before = @OldestDate;
 
-- Maintenance plan approach:
-- Create maintenance plan with:
-- 1. Backup Log task (every 15 min)
-- 2. Cleanup task (delete files older than 7 days)

Troubleshooting Common Log Issues

Log-related issues are among the most common causes of database outages. Understanding how to diagnose and resolve these issues quickly is essential for production database administration.

Common Log Problems and Solutions

•Log full / Cannot write — The most critical issue. Database stops accepting writes. Immediate log backup (SQL Server) or checkpoint + WAL cleanup (PostgreSQL) required. Consider emergency log file growth.
•Log growing uncontrollably — Usually caused by: long-running transactions, missing log backups, replication lag, or database mirroring issues. Identify and resolve the blocker.
•Cannot truncate/reuse log — Check for: active transactions, outstanding log backups, replication publishers waiting, or mirroring synchronization. Resolve the blocking condition.
•Too many VLFs (SQL Server) — Caused by many small auto-growth events. Shrink log, then grow in large increments. Target < 500 VLFs.
•WAL archiving failing (PostgreSQL) — Check archive_command for errors. Verify destination storage is accessible and has space. Check pg_stat_archiver for error details.
•Replication lag increasing — Network bandwidth, slow standby, or heavy write load on primary. May need to increase resources or tune replication settings.

log_troubleshooting.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- SQL Server: Log troubleshooting queries
 
-- 1. Why can't the log be truncated?
SELECT 
    name,
    log_reuse_wait_desc,
    recovery_model_desc
FROM sys.databases
WHERE database_id = DB_ID();
 
-- Interpret log_reuse_wait_desc:
-- CHECKPOINT: Normal, waiting for checkpoint
-- LOG_BACKUP: Need to take log backup (FULL recovery)
-- ACTIVE_TRANSACTION: Long-running transaction blocking
-- DATABASE_MIRRORING: Mirror is behind
-- REPLICATION: Transaction replication needs the log
-- AVAILABILITY_REPLICA: AG replica behind
 
-- 2. Find long-running transactions
SELECT 
    s.session_id,
    s.login_name,
    s.host_name,
    t.transaction_id,
    t.transaction_begin_time,
    DATEDIFF(MINUTE, t.transaction_begin_time, GETDATE()) AS duration_min,
    t.transaction_state
FROM sys.dm_tran_active_transactions t
JOIN sys.dm_tran_session_transactions st 
    ON t.transaction_id = st.transaction_id
JOIN sys.dm_exec_sessions s 
    ON st.session_id = s.session_id
ORDER BY t.transaction_begin_time ASC;
 
-- 3. Check log space usage per transaction
SELECT 
    DB_NAME(database_id) AS database_name,
    CAST(total_log_size_in_bytes / 1024.0 / 1024 AS DECIMAL(10,2)) AS total_log_mb,
    CAST(used_log_space_in_bytes / 1024.0 / 1024 AS DECIMAL(10,2)) AS used_log_mb,
    CAST(used_log_space_in_percent AS DECIMAL(5,2)) AS used_pct
FROM sys.dm_db_log_space_usage;
 
-- 4. Emergency: Log full, need immediate space
-- Option A: Take log backup immediately
BACKUP LOG [OrdersDB] 
TO DISK = 'D:\Emergency\OrdersDB_Log.trn'
WITH COMPRESSION;
 
-- Option B: If no backup needed (will break backup chain!)
-- Switch to SIMPLE recovery temporarily
ALTER DATABASE [OrdersDB] SET RECOVERY SIMPLE;
CHECKPOINT;
ALTER DATABASE [OrdersDB] SET RECOVERY FULL;
-- MUST take new full backup to restart log chain!
 
-- 5. VLF count and potential consolidation
DBCC LOGINFO;
-- Count rows to get VLF count
-- If > 500, consider shrink and regrow

Log Management Best Practices

Effective log management requires proactive policies, monitoring, and automation. The following best practices help prevent log-related incidents in production.

Production Log Management Best Practices

•Pre-size log files — Don't rely on auto-growth for transaction logs. Pre-size based on peak workload analysis. Avoid many small auto-growth events.
•Separate log files from data — Place transaction logs on dedicated, fast storage (ideally different physical disks than data files for write-ahead performance).
•Implement log backup jobs — Automate log backups at appropriate intervals based on RPO requirements. Monitor for job failures.
•Set up log space alerts — Alert before logs reach critical thresholds (e.g., 80% full). React before there's an outage.
•Automate log cleanup — Implement automated cleanup of old log backups based on retention policies. Don't let backup storage fill up.
•Monitor log generation rate — Track how much log is generated per hour/day. Use this for capacity planning and backup scheduling.
•Document recovery procedures — Log files are useless if you can't restore from them. Document and test recovery procedures regularly.
•Test log restores periodically — Verify log backup chains are complete and restorable. Discover problems during tests, not emergencies.

The 80% Rule

Set alerts at 80% log space usage. This gives you time to react before reaching 100%. When the alert fires, take an immediate log backup and investigate why usage is higher than normal.

Summary: Mastering Log Management

Key Takeaways

•Transaction logs enable durability and recovery — Write-ahead logging is fundamental to database reliability. Logs must be protected and managed.
•Size logs appropriately — Pre-size based on workload, configure reasonable auto-growth, and avoid the VLF proliferation problem.
•Backup logs regularly — Log backup frequency determines your recovery point objective. More critical systems need more frequent backups.
•Understand log retention — Logs can only be truncated/reused after backup and checkpoint. Know what prevents truncation in your system.
•Implement log shipping for DR — Log shipping provides warm standby capability. Monitor lag and test failover procedures.
•Automate rotation and cleanup — Manual log management doesn't scale. Automate cleanup based on retention policies.
•Monitor proactively — Set alerts for log space, archiving failures, and replication lag. React before there's an outage.

What's next:

Page Complete

3 / 5