Dbms Concepts - Learning Module

Loading content...

0/241

Database System Components

Inside the Engine Room

When you execute a SQL query, what actually happens? A deceptively simple SELECT statement triggers a complex orchestration of components—query parsers, optimizers, execution engines, buffer managers, storage systems—all working in precise coordination to deliver results in milliseconds.

Understanding these internal components transforms your relationship with databases. You stop treating the DBMS as a black box and start understanding why certain operations are fast or slow, why specific configurations matter, and how to design systems that leverage DBMS capabilities effectively.

In this page, we'll dissect a DBMS into its constituent parts, examining each component's role and responsibilities.

What You Will Learn

By the end of this page, you'll understand the major components of a DBMS architecture: the query processor, storage manager, transaction manager, buffer manager, and their interactions. You'll see how these components collaborate to process queries, manage data, and ensure reliability.

High-Level DBMS Architecture

A modern DBMS is composed of several interconnected subsystems, each responsible for a specific aspect of database management. While implementations vary across vendors, the fundamental architecture is remarkably consistent across relational systems.

The high-level architecture can be conceptualized as layers, with each layer providing services to the layers above it while depending on the layers below.

Converting Mermaid diagram...

Major DBMS Subsystems

•Query Processing Subsystem — Parses, validates, optimizes, and executes queries. Transforms declarative SQL into efficient execution plans.
•Transaction Management Subsystem — Ensures ACID properties for database operations. Manages concurrency, locking, and coordination.
•Storage Management Subsystem — Manages how data is physically organized, stored, and retrieved from disk. Handles file structures and indexing.
•Buffer Management Subsystem — Manages the buffer pool (in-memory cache of disk pages). Optimizes memory usage and minimizes I/O.
•Recovery Subsystem — Ensures durability and enables recovery from failures. Manages logs and implements recovery algorithms.

Layered Design

The layered architecture promotes separation of concerns. The query processor doesn't need to know how data is physically stored. The storage manager doesn't need to understand SQL syntax. This modularity enables independent evolution and optimization of each component.

Query Processor

The Query Processor is arguably the most sophisticated component of a DBMS. Its job is to transform high-level, declarative queries (like SQL) into efficient, low-level operations that can be executed against the stored data.

This transformation involves multiple stages, each adding value to the query processing pipeline:

The Query Parser

The parser is the first component to touch an incoming query. Its responsibilities include:

1. Lexical Analysis (Tokenization) The query string is broken into tokens: keywords (SELECT, FROM, WHERE), identifiers (table/column names), operators (+, =, <), literals ('John', 42), and punctuation.

2. Syntactic Analysis (Parsing) Tokens are organized into a parse tree according to the SQL grammar. The parser verifies that the query follows valid SQL syntax.

3. Semantic Analysis The parse tree is validated against the database schema:

Do referenced tables exist?
Do columns exist in those tables?
Are data types compatible for operations?
Are there ambiguous column references?

4. Parse Tree Generation A validated parse tree (or Abstract Syntax Tree) is produced, representing the logical structure of the query.

Query Parsing Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Original Query
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.salary > 50000
  AND d.location = 'NYC';
 
-- Parser Output (Conceptual Parse Tree):
/*
SELECT_STATEMENT
├── SELECT_LIST
│   ├── COLUMN_REF: e.name
│   └── COLUMN_REF: d.department_name
├── FROM_CLAUSE
│   ├── TABLE_REF: employees (alias: e)
│   └── JOIN
│       ├── TABLE_REF: departments (alias: d)
│       └── JOIN_CONDITION: e.dept_id = d.id
└── WHERE_CLAUSE
    └── AND
        ├── COMPARISON: e.salary > 50000
        └── COMPARISON: d.location = 'NYC'
*/

Why Learn Query Internals?

Understanding query processing helps you write better queries, interpret EXPLAIN output, create effective indexes, and diagnose performance problems. When a query is slow, knowing these components tells you where to look: Is the plan suboptimal? Is there excessive I/O? Are locks caused delays?

Storage Manager

The Storage Manager is responsible for how data is physically organized on disk and how it's efficiently retrieved. This component bridges the gap between the logical view of data (tables, rows, columns) and the physical reality of disk storage (blocks, files, sectors).

Key Challenge: The Disk Bottleneck

Disk I/O is orders of magnitude slower than memory operations:

Memory access: ~100 nanoseconds
SSD random read: ~100 microseconds (1000x slower)
HDD random read: ~10 milliseconds (100,000x slower)

The storage manager's primary goal is to minimize disk I/O through intelligent data organization and access patterns.

Storage Manager Responsibilities

•File Organization — Manage how tables are stored in files. Strategies include heap files (unordered), sorted files, hashed files, and clustered files.
•Page Management — Organize data into fixed-size pages (typically 4KB-16KB). Pages are the unit of disk I/O—reading one row means reading its entire page.
•Record Layout — Determine how individual records (rows) are physically stored within pages. Handle fixed-length and variable-length fields, null values, and headers.
•Index Management — Create and maintain index structures (B+ trees, hash indexes) that accelerate data retrieval without full table scans.
•Space Management — Track free space within pages and files. Allocate space for new records; reclaim space from deleted records.
•Catalog Management — Maintain the system catalog (metadata) describing all database objects: tables, columns, indexes, constraints, users.

Converting Mermaid diagram...

Index Structures:

Indexes are the secret to fast data retrieval. Without indexes, finding a specific row in a million-row table requires scanning all million rows. With a proper index, it takes a handful of page reads.

B+ Tree Index (Most Common):

Balanced tree structure with all values in leaf nodes
O(log n) search, insert, delete operations
Leaf nodes are linked for efficient range scans
Interior nodes contain only keys and child pointers
Typical B+ tree with 100 keys/node can index 1 billion rows in 5 levels

Hash Index:

O(1) average case for equality lookups
No support for range queries
Must be rebuilt if table size changes significantly
Excellent for equality-only access patterns

Index Trade-offs

Indexes accelerate reads but slow down writes. Every INSERT, UPDATE, or DELETE must update all relevant indexes. Over-indexing leads to write amplification. Under-indexing leads to slow queries. Finding the right balance requires understanding both workload patterns and storage manager mechanics.

Buffer Manager

The Buffer Manager implements the critical caching layer between query execution and disk storage. It manages the buffer pool—a region of main memory used to cache frequently accessed disk pages.

Given the enormous speed difference between memory and disk, the buffer manager's effectiveness dramatically impacts database performance. A well-tuned buffer pool can reduce disk I/O by 90% or more for typical workloads.

Buffer Manager Core Responsibilities

•Page Fetching — When the executor needs a page not in memory, fetch it from disk into an available buffer frame.
•Buffer Pool Management — Track which pages are in memory, which are dirty (modified), and which are pinned (in use).
•Page Replacement — When the buffer pool is full and a new page is needed, choose which page to evict. This requires a replacement policy like LRU, Clock, or LRU-K.
•Write-Back — Dirty pages must eventually be written back to disk. This can happen at replacement time, checkpoint time, or through background flushing.
•Concurrency Control — Multiple queries may access the same page simultaneously. The buffer manager provides latches (lightweight locks) to prevent corruption.
•Prefetching — Anticipate future page needs and fetch them proactively. Range scans benefit from prefetching sequential pages ahead of actual access.

Page Replacement Policies Compared
Policy	How It Works	Strengths	Weaknesses
LRU (Least Recently Used)	Evict the page unused for the longest time	Simple; intuitive; captures temporal locality	Vulnerable to sequential flooding (one scan evicts all useful pages)
Clock (Second Chance)	Circular buffer with reference bits; give each page a 'second chance'	O(1) amortized; avoids overhead of true LRU	Approximation of LRU; may evict suboptimally
LRU-K	Track last K accesses; evict based on K-th most recent use	Resistant to sequential flooding; captures frequency	Higher overhead; complexity in tracking
2Q (Two Queue)	Separate queues for new and hot pages	Good scan resistance; low overhead	Tuning parameters affect performance
ARC (Adaptive Replacement)	Dynamically balance recency and frequency	Self-tuning; excellent hit rates	Patent issues; more complex implementation

The Buffer Pool in Action:

Query: SELECT * FROM orders WHERE customer_id = 1001;

1. Query executor requests page containing customer 1001's orders
2. Buffer manager checks: Is this page in the buffer pool?
   - If YES (cache hit): Return pointer to page in memory
   - If NO (cache miss): 
     a. Find a free frame OR choose victim page to evict
     b. If victim is dirty, write it to disk first
     c. Read requested page from disk into frame
     d. Return pointer to page
3. Executor pins the page (marks it in use)
4. Executor reads desired tuples from the page
5. Executor unpins the page when done

Buffer Pool Sizing:

Buffer pool size is one of the most impactful DBMS configuration parameters. Too small a buffer pool means excessive disk I/O. Too large wastes memory and may trigger OS swapping. The sweet spot depends on working set size—the set of pages actively used during normal operations.

A common guideline: allocate 70-80% of available memory to the buffer pool, but this varies based on other system components and workload characteristics.

Monitoring Buffer Pool Performance

Most DBMS expose buffer pool statistics: hit ratio, dirty page count, pages read/written. A hit ratio below 90% often indicates an undersized buffer pool or a workload that doesn't fit in memory. Monitoring these metrics is essential for performance tuning.

Transaction Manager

The Transaction Manager is responsible for ensuring that database operations execute correctly despite concurrent access and system failures. It implements the ACID properties that make databases reliable.

Without transaction management, concurrent operations could leave the database in inconsistent states, and system crashes could result in permanent data corruption. The transaction manager prevents these disasters.

ACID Properties Explained

•Atomicity — Transactions are all-or-nothing. Either all operations in a transaction succeed, or none do. Partial execution is never visible. Implemented through undo logging and rollback mechanisms.
•Consistency — Transactions move the database from one valid state to another. All integrity constraints are satisfied before and after. The transaction manager validates constraints before commit.
•Isolation — Concurrent transactions don't interfere with each other. Each transaction sees a consistent snapshot of the database as if it were running alone. Implemented through locking or MVCC.
•Durability — Once a transaction commits, its effects persist even through system crashes. Implemented through write-ahead logging (WAL) and recovery procedures.

Concurrency Control

Multiple transactions accessing the same data simultaneously can lead to problems:

Lost Updates — Two transactions read the same value, both modify it, and one overwrites the other's change.
Dirty Reads — Transaction reads uncommitted changes from another transaction.
Non-Repeatable Reads — Transaction reads the same row twice and gets different values.
Phantom Reads — Transaction reruns a query and gets additional rows.

Locking Approaches:

Shared locks for reads (multiple allowed)
Exclusive locks for writes (single holder)
Two-Phase Locking (2PL) protocol
Deadlock detection and resolution

MVCC (Multi-Version Concurrency Control)

Modern databases often use MVCC, which maintains multiple versions of data:

Writers don't block readers
Readers see a consistent snapshot
Each transaction sees the database as of its start time
Old versions are garbage collected

Isolation Levels:

Read Uncommitted (weakest)
Read Committed
Repeatable Read
Serializable (strongest)

Stronger isolation = more correct but slower. Most applications use Read Committed or Repeatable Read as a practical compromise.

Transaction Isolation Demonstration
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Session 1: Begin transaction, read balance
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE id = 100;
-- Returns: 1000
 
-- Session 2: (Concurrent) Deduct from same account
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 200 WHERE id = 100;
COMMIT;
-- Balance is now 800 on disk
 
-- Session 1: Read again (within same transaction)
SELECT balance FROM accounts WHERE id = 100;
-- Still returns: 1000 (snapshot isolation)
-- Session 1 sees the database as of its start time
 
-- Session 1 tries to update
UPDATE accounts SET balance = balance - 100 WHERE id = 100;
-- What happens? Depends on DBMS:
-- PostgreSQL: Sees conflict, aborts transaction
-- Others: May proceed with stale read
 
COMMIT; -- May fail if conflict detected

Choosing Isolation Levels

Lower isolation levels improve concurrency but risk anomalies. Higher levels prevent anomalies but may cause transactions to abort. Understanding your application's consistency requirements is crucial for choosing the right isolation level.

Recovery Manager

The Recovery Manager ensures database durability and enables recovery from failures. In a world where power outages, crashes, and hardware failures are inevitable, the recovery manager is what makes databases reliable.

Its core responsibility: After any failure, restore the database to a consistent state that reflects exactly those transactions that committed before the crash.

Write-Ahead Logging (WAL):

The foundation of recovery is Write-Ahead Logging, following a simple but crucial rule:

Before any change is written to the database, a log record describing the change must be written to stable storage.

This seemingly simple rule enables powerful recovery guarantees:

Redo Information — The log contains enough information to redo any committed transaction whose changes might not have reached disk.
Undo Information — The log contains enough information to undo any uncommitted transaction whose changes might have reached disk.

Log Record Types:

UPDATE — Records old and new values of modified data
BEGIN — Marks transaction start
COMMIT — Marks successful transaction completion
ABORT — Marks transaction rollback
CHECKPOINT — Marks point where all commit transactions are on disk

Converting Mermaid diagram...

ARIES Recovery Algorithm

•Analysis Phase — Scan the log from the last checkpoint forward. Identify which transactions were active at crash time. Identify which pages might need redo or undo.
•Redo Phase — Repeat history. Apply all logged updates to bring the database to its exact state at crash time. This includes changes from both committed AND uncommitted transactions.
•Undo Phase — Roll back uncommitted transactions by scanning the log backward and undoing their changes. This restores the database to a consistent state reflecting only committed transactions.

Checkpointing:

Without checkpoints, recovery would require processing the entire log from the beginning of time. Checkpoints limit recovery time by periodically:

Flushing all dirty pages to disk (or recording what's dirty)
Writing a checkpoint record to the log
Recording active transactions at checkpoint time

Recovery only needs to process log records after the last checkpoint, dramatically reducing recovery time.

Recovery Guarantees

With proper WAL implementation, a DBMS can guarantee: (1) No committed transaction is ever lost, (2) No uncommitted transaction ever becomes visible, (3) Recovery completes in bounded time proportional to log since last checkpoint.

How Components Work Together

Understanding individual components is essential, but the real magic happens in their orchestration. Let's trace a complete query through all components to see how they collaborate.

Complete Query Lifecycle
SQL
1
2
3
4
-- User submits this query:
UPDATE accounts 
SET balance = balance - 100 
WHERE account_id = 12345;

Query Execution Lifecycle

•Parser receives SQL text, tokenizes it, validates syntax, and checks that accounts table and account_id, balance columns exist in the catalog.
•Optimizer considers execution strategies. With an index on account_id, chooses index scan rather than full table scan. Generates execution plan.
•Transaction Manager assigns a transaction ID (if not already in a transaction). Notes transaction as active in its tables.
•Executor begins executing the plan. Requests the index page for account_id = 12345.
•Buffer Manager checks if the index page is in the buffer pool. If not, reads it from disk. Returns a pointer to the page in memory.
•Executor traverses the index to find the data page containing the target row. Requests that page.
•Buffer Manager fetches the data page (if needed). Executor pins the page.
•Lock Manager acquires an exclusive lock on the row being updated. May wait if another transaction holds the lock.
•Recovery Manager writes a log record containing: transaction ID, table, old value (100+balance), new value (balance). Log record goes to log buffer.
•Executor modifies the row in the buffer pool page. Marks the page as dirty.
•Executor indicates update complete. If this is a single-statement autocommit transaction...
•Recovery Manager flushes log buffer to disk (durability!).
•Transaction Manager writes COMMIT record to log. Releases all locks. Marks transaction as committed.
•Response returns to client: 'UPDATE 1' (1 row affected).
•Background: Buffer Manager eventually flushes dirty page to disk. Checkpoint includes this page if it hasn't been flushed yet.

The Dance of Components

Notice how each component has a clear responsibility and trusts others to do their jobs. The executor doesn't worry about disk I/O—that's the buffer manager's job. The buffer manager doesn't worry about recovery—the log ensures durability. This separation of concerns is what makes DBMS both reliable and maintainable.

Summary: Database System Components

We've explored the internal machinery of a Database Management System. Let's consolidate the key insights:

Key Takeaways

•Query Processor transforms declarative SQL into efficient execution plans through parsing, optimization, and execution stages.
•Storage Manager handles physical data organization, from file and page structure to index creation and maintenance.
•Buffer Manager implements the crucial caching layer, minimizing disk I/O through intelligent page management and replacement policies.
•Transaction Manager ensures ACID properties through concurrency control (locking/MVCC) and coordinated commit processing.
•Recovery Manager guarantees durability and enables crash recovery through Write-Ahead Logging and the ARIES recovery protocol.
•Components collaborate seamlessly, each trusting others to fulfill their responsibilities, enabling the complex behavior we see as simple query execution.
•Understanding internals empowers better query writing, index design, configuration tuning, and performance troubleshooting.

What's Next:

With a solid understanding of DBMS components, we'll explore data abstraction levels—how the DBMS presents different views of data to different stakeholders. This concept of multiple perspectives on the same underlying data is fundamental to DBMS design and usage.

Page Complete

You now understand the major components that comprise a DBMS and how they work together to process queries, manage data, and ensure reliability. This architectural knowledge forms the foundation for understanding more advanced DBMS topics.

Database System Components

Inside the Engine Room

In this page, we'll dissect a DBMS into its constituent parts, examining each component's role and responsibilities.

What You Will Learn

High-Level DBMS Architecture

The high-level architecture can be conceptualized as layers, with each layer providing services to the layers above it while depending on the layers below.

Converting Mermaid diagram...

Major DBMS Subsystems

•Query Processing Subsystem — Parses, validates, optimizes, and executes queries. Transforms declarative SQL into efficient execution plans.
•Transaction Management Subsystem — Ensures ACID properties for database operations. Manages concurrency, locking, and coordination.
•Storage Management Subsystem — Manages how data is physically organized, stored, and retrieved from disk. Handles file structures and indexing.
•Buffer Management Subsystem — Manages the buffer pool (in-memory cache of disk pages). Optimizes memory usage and minimizes I/O.
•Recovery Subsystem — Ensures durability and enables recovery from failures. Manages logs and implements recovery algorithms.

Layered Design

Query Processor

This transformation involves multiple stages, each adding value to the query processing pipeline:

The Query Parser

The parser is the first component to touch an incoming query. Its responsibilities include:

2. Syntactic Analysis (Parsing) Tokens are organized into a parse tree according to the SQL grammar. The parser verifies that the query follows valid SQL syntax.

3. Semantic Analysis The parse tree is validated against the database schema:

Do referenced tables exist?
Do columns exist in those tables?
Are data types compatible for operations?
Are there ambiguous column references?

4. Parse Tree Generation A validated parse tree (or Abstract Syntax Tree) is produced, representing the logical structure of the query.

Query Parsing Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Original Query
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.salary > 50000
  AND d.location = 'NYC';
 
-- Parser Output (Conceptual Parse Tree):
/*
SELECT_STATEMENT
├── SELECT_LIST
│   ├── COLUMN_REF: e.name
│   └── COLUMN_REF: d.department_name
├── FROM_CLAUSE
│   ├── TABLE_REF: employees (alias: e)
│   └── JOIN
│       ├── TABLE_REF: departments (alias: d)
│       └── JOIN_CONDITION: e.dept_id = d.id
└── WHERE_CLAUSE
    └── AND
        ├── COMPARISON: e.salary > 50000
        └── COMPARISON: d.location = 'NYC'
*/

Why Learn Query Internals?

Storage Manager

Key Challenge: The Disk Bottleneck

Disk I/O is orders of magnitude slower than memory operations:

Memory access: ~100 nanoseconds
SSD random read: ~100 microseconds (1000x slower)
HDD random read: ~10 milliseconds (100,000x slower)

The storage manager's primary goal is to minimize disk I/O through intelligent data organization and access patterns.

Storage Manager Responsibilities

•File Organization — Manage how tables are stored in files. Strategies include heap files (unordered), sorted files, hashed files, and clustered files.
•Page Management — Organize data into fixed-size pages (typically 4KB-16KB). Pages are the unit of disk I/O—reading one row means reading its entire page.
•Record Layout — Determine how individual records (rows) are physically stored within pages. Handle fixed-length and variable-length fields, null values, and headers.
•Index Management — Create and maintain index structures (B+ trees, hash indexes) that accelerate data retrieval without full table scans.
•Space Management — Track free space within pages and files. Allocate space for new records; reclaim space from deleted records.
•Catalog Management — Maintain the system catalog (metadata) describing all database objects: tables, columns, indexes, constraints, users.

Converting Mermaid diagram...

Index Structures:

B+ Tree Index (Most Common):

Balanced tree structure with all values in leaf nodes
O(log n) search, insert, delete operations
Leaf nodes are linked for efficient range scans
Interior nodes contain only keys and child pointers
Typical B+ tree with 100 keys/node can index 1 billion rows in 5 levels

Hash Index:

O(1) average case for equality lookups
No support for range queries
Must be rebuilt if table size changes significantly
Excellent for equality-only access patterns

Index Trade-offs

Buffer Manager

Buffer Manager Core Responsibilities

•Page Fetching — When the executor needs a page not in memory, fetch it from disk into an available buffer frame.
•Buffer Pool Management — Track which pages are in memory, which are dirty (modified), and which are pinned (in use).
•Page Replacement — When the buffer pool is full and a new page is needed, choose which page to evict. This requires a replacement policy like LRU, Clock, or LRU-K.
•Write-Back — Dirty pages must eventually be written back to disk. This can happen at replacement time, checkpoint time, or through background flushing.
•Concurrency Control — Multiple queries may access the same page simultaneously. The buffer manager provides latches (lightweight locks) to prevent corruption.
•Prefetching — Anticipate future page needs and fetch them proactively. Range scans benefit from prefetching sequential pages ahead of actual access.

Page Replacement Policies Compared
Policy	How It Works	Strengths	Weaknesses
LRU (Least Recently Used)	Evict the page unused for the longest time	Simple; intuitive; captures temporal locality	Vulnerable to sequential flooding (one scan evicts all useful pages)
Clock (Second Chance)	Circular buffer with reference bits; give each page a 'second chance'	O(1) amortized; avoids overhead of true LRU	Approximation of LRU; may evict suboptimally
LRU-K	Track last K accesses; evict based on K-th most recent use	Resistant to sequential flooding; captures frequency	Higher overhead; complexity in tracking
2Q (Two Queue)	Separate queues for new and hot pages	Good scan resistance; low overhead	Tuning parameters affect performance
ARC (Adaptive Replacement)	Dynamically balance recency and frequency	Self-tuning; excellent hit rates	Patent issues; more complex implementation

The Buffer Pool in Action:

Query: SELECT * FROM orders WHERE customer_id = 1001;

1. Query executor requests page containing customer 1001's orders
2. Buffer manager checks: Is this page in the buffer pool?
   - If YES (cache hit): Return pointer to page in memory
   - If NO (cache miss): 
     a. Find a free frame OR choose victim page to evict
     b. If victim is dirty, write it to disk first
     c. Read requested page from disk into frame
     d. Return pointer to page
3. Executor pins the page (marks it in use)
4. Executor reads desired tuples from the page
5. Executor unpins the page when done

Buffer Pool Sizing:

A common guideline: allocate 70-80% of available memory to the buffer pool, but this varies based on other system components and workload characteristics.

Monitoring Buffer Pool Performance

Transaction Manager

ACID Properties Explained

•Atomicity — Transactions are all-or-nothing. Either all operations in a transaction succeed, or none do. Partial execution is never visible. Implemented through undo logging and rollback mechanisms.
•Consistency — Transactions move the database from one valid state to another. All integrity constraints are satisfied before and after. The transaction manager validates constraints before commit.
•Isolation — Concurrent transactions don't interfere with each other. Each transaction sees a consistent snapshot of the database as if it were running alone. Implemented through locking or MVCC.
•Durability — Once a transaction commits, its effects persist even through system crashes. Implemented through write-ahead logging (WAL) and recovery procedures.

Concurrency Control

Multiple transactions accessing the same data simultaneously can lead to problems:

Lost Updates — Two transactions read the same value, both modify it, and one overwrites the other's change.
Dirty Reads — Transaction reads uncommitted changes from another transaction.
Non-Repeatable Reads — Transaction reads the same row twice and gets different values.
Phantom Reads — Transaction reruns a query and gets additional rows.

Locking Approaches:

Shared locks for reads (multiple allowed)
Exclusive locks for writes (single holder)
Two-Phase Locking (2PL) protocol
Deadlock detection and resolution

MVCC (Multi-Version Concurrency Control)

Modern databases often use MVCC, which maintains multiple versions of data:

Writers don't block readers
Readers see a consistent snapshot
Each transaction sees the database as of its start time
Old versions are garbage collected

Isolation Levels:

Read Uncommitted (weakest)
Read Committed
Repeatable Read
Serializable (strongest)

Stronger isolation = more correct but slower. Most applications use Read Committed or Repeatable Read as a practical compromise.

Transaction Isolation Demonstration
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Session 1: Begin transaction, read balance
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE id = 100;
-- Returns: 1000
 
-- Session 2: (Concurrent) Deduct from same account
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 200 WHERE id = 100;
COMMIT;
-- Balance is now 800 on disk
 
-- Session 1: Read again (within same transaction)
SELECT balance FROM accounts WHERE id = 100;
-- Still returns: 1000 (snapshot isolation)
-- Session 1 sees the database as of its start time
 
-- Session 1 tries to update
UPDATE accounts SET balance = balance - 100 WHERE id = 100;
-- What happens? Depends on DBMS:
-- PostgreSQL: Sees conflict, aborts transaction
-- Others: May proceed with stale read
 
COMMIT; -- May fail if conflict detected

Choosing Isolation Levels

Recovery Manager

Its core responsibility: After any failure, restore the database to a consistent state that reflects exactly those transactions that committed before the crash.

Write-Ahead Logging (WAL):

The foundation of recovery is Write-Ahead Logging, following a simple but crucial rule:

Before any change is written to the database, a log record describing the change must be written to stable storage.

This seemingly simple rule enables powerful recovery guarantees:

Redo Information — The log contains enough information to redo any committed transaction whose changes might not have reached disk.
Undo Information — The log contains enough information to undo any uncommitted transaction whose changes might have reached disk.

Log Record Types:

UPDATE — Records old and new values of modified data
BEGIN — Marks transaction start
COMMIT — Marks successful transaction completion
ABORT — Marks transaction rollback
CHECKPOINT — Marks point where all commit transactions are on disk

Converting Mermaid diagram...

ARIES Recovery Algorithm

•Analysis Phase — Scan the log from the last checkpoint forward. Identify which transactions were active at crash time. Identify which pages might need redo or undo.
•Redo Phase — Repeat history. Apply all logged updates to bring the database to its exact state at crash time. This includes changes from both committed AND uncommitted transactions.
•Undo Phase — Roll back uncommitted transactions by scanning the log backward and undoing their changes. This restores the database to a consistent state reflecting only committed transactions.

Checkpointing:

Without checkpoints, recovery would require processing the entire log from the beginning of time. Checkpoints limit recovery time by periodically:

Flushing all dirty pages to disk (or recording what's dirty)
Writing a checkpoint record to the log
Recording active transactions at checkpoint time

Recovery only needs to process log records after the last checkpoint, dramatically reducing recovery time.

Recovery Guarantees

How Components Work Together

Understanding individual components is essential, but the real magic happens in their orchestration. Let's trace a complete query through all components to see how they collaborate.

Complete Query Lifecycle
SQL
1
2
3
4
-- User submits this query:
UPDATE accounts 
SET balance = balance - 100 
WHERE account_id = 12345;

Query Execution Lifecycle

•Parser receives SQL text, tokenizes it, validates syntax, and checks that accounts table and account_id, balance columns exist in the catalog.
•Optimizer considers execution strategies. With an index on account_id, chooses index scan rather than full table scan. Generates execution plan.
•Transaction Manager assigns a transaction ID (if not already in a transaction). Notes transaction as active in its tables.
•Executor begins executing the plan. Requests the index page for account_id = 12345.
•Buffer Manager checks if the index page is in the buffer pool. If not, reads it from disk. Returns a pointer to the page in memory.
•Executor traverses the index to find the data page containing the target row. Requests that page.
•Buffer Manager fetches the data page (if needed). Executor pins the page.
•Lock Manager acquires an exclusive lock on the row being updated. May wait if another transaction holds the lock.
•Recovery Manager writes a log record containing: transaction ID, table, old value (100+balance), new value (balance). Log record goes to log buffer.
•Executor modifies the row in the buffer pool page. Marks the page as dirty.
•Executor indicates update complete. If this is a single-statement autocommit transaction...
•Recovery Manager flushes log buffer to disk (durability!).
•Transaction Manager writes COMMIT record to log. Releases all locks. Marks transaction as committed.
•Response returns to client: 'UPDATE 1' (1 row affected).
•Background: Buffer Manager eventually flushes dirty page to disk. Checkpoint includes this page if it hasn't been flushed yet.

The Dance of Components

Summary: Database System Components

We've explored the internal machinery of a Database Management System. Let's consolidate the key insights:

Key Takeaways

•Query Processor transforms declarative SQL into efficient execution plans through parsing, optimization, and execution stages.
•Storage Manager handles physical data organization, from file and page structure to index creation and maintenance.
•Buffer Manager implements the crucial caching layer, minimizing disk I/O through intelligent page management and replacement policies.
•Transaction Manager ensures ACID properties through concurrency control (locking/MVCC) and coordinated commit processing.
•Recovery Manager guarantees durability and enables crash recovery through Write-Ahead Logging and the ARIES recovery protocol.
•Components collaborate seamlessly, each trusting others to fulfill their responsibilities, enabling the complex behavior we see as simple query execution.
•Understanding internals empowers better query writing, index design, configuration tuning, and performance troubleshooting.

What's Next:

Page Complete