Database Management SystemDatabase System Architecture

Three-Level Architecture

LevelBeginner

Duration60 mins

TopicDatabase System Architecture

5 / 5

Level Interactions

The Architecture in Motion

We've examined each level of the three-level architecture in isolation—external views, conceptual structure, and internal storage. But a database is not static diagrams and definitions; it's a living system processing thousands of queries, updates, and transactions every second.

How do the three levels work together when you run a query? When you insert a row? When a DBA adds an index? When a schema evolves?

This page explores the dynamic interactions between levels—the transformation pipeline that translates user requests into physical operations and the complex choreography that makes database systems work seamlessly.

What You Will Learn

By the end of this page, you will understand how queries flow through all three levels, how the query processor uses mappings to translate requests, how data modifications propagate, how schema changes affect the architecture, and why understanding these interactions is essential for performance optimization.

Query Processing: From SQL to Disk

When a user submits a SQL query, it undergoes a series of transformations that traverse all three architecture levels. Let's trace a query's journey:

Converting Mermaid diagram...

Detailed Phase Breakdown

Phase 1: Parsing and Validation (External Level)

The query processor first interacts with the external level:

Parses SQL syntax into an abstract syntax tree (AST)
Looks up Employee_View in the external schema catalog
Verifies the user has SELECT permission on this view
Validates that department is a valid column name

Phase 2: View Expansion (External → Conceptual)

The query is transformed from external to conceptual terms:

Retrieves the view definition (the mapping)
Substitutes the view with its underlying query
Merges user predicates with view predicates

Phase 3: Optimization (Conceptual + Internal)

The optimizer uses both conceptual and internal knowledge:

Rewrites the query for efficiency (push down predicates, eliminate redundancy)
Consults statistics from conceptual metadata (row counts, value distributions)
Uses internal schema knowledge (indexes, partitions) to choose physical operations

Phase 4: Execution (Internal Level)

The physical plan executes against storage:

Access methods read data from files
Buffer manager handles page caching
Results are assembled and returned through the levels

Crossing Boundaries

Notice how a single query touches all three levels: it starts in external (user's view), crosses to conceptual (logical operations), dips into internal (physical execution), and returns results back up through the levels. The mappings handle each boundary crossing seamlessly.

View Expansion: The External-Conceptual Transformation

View expansion is the process of replacing a view reference with its definition (the underlying query). This is how the external-level query becomes a conceptual-level query.

How View Expansion Works

view_expansion_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- STEP 1: View definition (stored in catalog)
CREATE VIEW HighValueCustomers AS
SELECT 
    c.customer_id,
    c.name,
    c.email,
    SUM(o.amount) AS total_purchases
FROM Customer c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY c.customer_id, c.name, c.email
HAVING SUM(o.amount) > 10000;
 
-- STEP 2: User query (external level)
SELECT name, email, total_purchases
FROM HighValueCustomers
WHERE total_purchases > 50000
ORDER BY total_purchases DESC
LIMIT 10;
 
-- STEP 3: After view expansion (conceptual level)
-- The system replaces "HighValueCustomers" with its definition:
SELECT name, email, total_purchases
FROM (
    SELECT 
        c.customer_id,
        c.name,
        c.email,
        SUM(o.amount) AS total_purchases
    FROM Customer c
    JOIN Orders o ON c.customer_id = o.customer_id
    WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
    GROUP BY c.customer_id, c.name, c.email
    HAVING SUM(o.amount) > 10000
) AS HighValueCustomers
WHERE total_purchases > 50000
ORDER BY total_purchases DESC
LIMIT 10;
 
-- STEP 4: After logical optimization (still conceptual level)
-- Optimizer merges predicates and simplifies:
SELECT 
    c.name,
    c.email,
    SUM(o.amount) AS total_purchases
FROM Customer c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY c.customer_id, c.name, c.email
HAVING SUM(o.amount) > 50000  -- Merged: 10000 AND 50000 → 50000
ORDER BY total_purchases DESC
LIMIT 10;

Nested View Expansion

Views can reference other views, creating chains of expansion:

CREATE VIEW BaseView AS SELECT ... FROM Table1 ...;
CREATE VIEW MiddleView AS SELECT ... FROM BaseView ...;
CREATE VIEW TopView AS SELECT ... FROM MiddleView ...;

When a user queries TopView, the system expands all three levels recursively until only base tables remain. Deep nesting can impact optimization time.

View Expansion Limitations

Some optimizations become harder after view expansion. For example, a view with DISTINCT followed by a user query with COUNT(*) might not optimize as well as the same logic written directly. Database developers continuously improve optimizer ability to 'see through' views.

Query Optimization Across Levels

Query optimization is where conceptual and internal levels collaborate most intensively. The optimizer's job is to find the most efficient physical execution plan for a logical query.

The Optimization Stages

Logical optimization works at the conceptual level, transforming queries into equivalent but more efficient forms:

Predicate pushdown: Move WHERE clauses as close to base tables as possible
Projection pushdown: Select only needed columns early
Join reordering: Choose optimal join order based on table sizes
Subquery unnesting: Convert correlated subqueries to joins
Common subexpression elimination: Compute repeated expressions once

logical_optimization.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Original query
SELECT e.name, d.department_name
FROM Employee e
JOIN Department d ON e.dept_id = d.dept_id
WHERE d.department_name = 'Engineering'
  AND e.salary > 100000;
 
-- After predicate pushdown (logically equivalent)
SELECT e.name, d.department_name
FROM (SELECT * FROM Employee WHERE salary > 100000) e
JOIN (SELECT * FROM Department WHERE department_name = 'Engineering') d
ON e.dept_id = d.dept_id;
 
-- Why this helps: Filtered tables are smaller, join is faster
-- The optimizer does this automatically
 
-- Another example: Join reordering
-- Suppose: Orders has 10M rows, Products has 1K rows, Categories has 50 rows
SELECT o.*, p.name, c.category_name
FROM Orders o
JOIN Products p ON o.product_id = p.product_id
JOIN Categories c ON p.category_id = c.category_id
WHERE c.category_name = 'Electronics';
 
-- Optimal join order (smallest first):
-- 1. Filter Categories to 1 row (Electronics)
-- 2. Join with Products (maybe 100 Electronics products)
-- 3. Join with Orders (only orders for those 100 products)
-- This can be 1000x faster than starting with Orders

Statistics Are Critical

The optimizer's quality depends entirely on accurate statistics. Stale statistics lead to bad plans. Run ANALYZE regularly (or enable autovacuum in PostgreSQL, auto-statistics in other databases) to keep statistics current.

Data Modification: INSERT, UPDATE, DELETE

Data modifications follow a similar multi-level path as queries, but with additional complexity around constraint checking, index maintenance, and transaction logging.

The Modification Pipeline

modification_flow.txt

Text

INSERT INTO Orders (customer_id, product_id, quantity, amount)
VALUES (1001, 5042, 3, 299.97);
 
┌─────────────────────────────────────────────────────────────────┐
│ STEP 1: PARSE & VALIDATE (External Level)                       │
│ • Parse SQL syntax                                               │
│ • Resolve "Orders" table                                         │
│ • Check INSERT permission                                        │
│ • Validate column names match                                    │
└────────────────────────┬────────────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 2: CONSTRAINT CHECK (Conceptual Level)                      │
│ • Check NOT NULL constraints (all required columns present?)     │
│ • Validate data types (customer_id is INTEGER?)                  │
│ • Check domain constraints (quantity > 0?)                       │
│ • Verify foreign keys:                                           │
│   - Does customer_id=1001 exist in Customers table?              │
│   - Does product_id=5042 exist in Products table?                │
│ • Check unique constraints (if order_id auto-generated)          │
│ • Execute BEFORE triggers                                        │
└────────────────────────┬────────────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 3: PHYSICAL INSERT (Internal Level)                         │
│ • Find target page with free space                               │
│ • Format row in internal representation                          │
│ • Write to buffer pool (mark page dirty)                         │
│ • Write to Write-Ahead Log (WAL) for durability                  │
│ • Update all indexes:                                            │
│   - idx_orders_customer: add (1001, new_row_location)           │
│   - idx_orders_product: add (5042, new_row_location)             │
│   - idx_orders_date: add (current_date, new_row_location)        │
│ • Update table statistics (if incremental)                       │
└────────────────────────┬────────────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 4: POST-PROCESSING                                          │
│ • Execute AFTER triggers                                         │
│ • If view insert (via INSTEAD OF trigger): run trigger logic     │
│ • Update materialized views if affected                          │
│ • Return success/generated values                                │
└─────────────────────────────────────────────────────────────────┘

Updates and Deletes

UPDATE involves:

Finding rows to modify (like a SELECT)
Checking NEW values against constraints
Updating row data in pages
Removing old index entries, adding new ones (for indexed columns)
Logging changes for recovery

DELETE involves:

Finding rows to delete (like a SELECT)
Checking cascading foreign key actions
Marking rows as deleted (often soft delete first, later reclaimed)
Removing index entries
Logging changes

The Hidden Cost of Indexes

Every index must be updated for every INSERT/UPDATE/DELETE. A table with 10 indexes means 11 writes (1 table + 10 indexes) for each insert. This is why write-heavy workloads require careful index management—every index improves reads but slows writes.

Schema Modifications: DDL Operations

Schema modifications (DDL: CREATE, ALTER, DROP) impact the architecture differently than data modifications (DML). DDL changes the structure itself, potentially affecting all three levels.

DDL Impact Across Levels
Operation	External Level	Conceptual Level	Internal Level
CREATE TABLE	—	New table in catalog	Allocate storage files
ALTER TABLE ADD COLUMN	Views may need update	Schema updated	Existing rows extended/rewritten
CREATE INDEX	—	Index metadata	Build index structure on disk
CREATE VIEW	New view available	View definition stored	—
DROP TABLE	Dependent views invalidated	Table removed	Files deallocated
RENAME TABLE	View definitions may break	Catalog updated	—

ddl_impact.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Adding a column: What happens at each level
 
-- Operation:
ALTER TABLE Customer ADD COLUMN loyalty_tier VARCHAR(20) DEFAULT 'bronze';
 
-- CONCEPTUAL LEVEL IMPACT:
-- • pg_attribute gets new row for the column
-- • pg_class.relnatts incremented
-- • Default value stored in pg_attrdef
 
-- INTERNAL LEVEL IMPACT (depends on implementation):
-- Option A: Table rewrite (older approach)
--   Every row physically rewritten with new column
--   Very expensive for large tables
--
-- Option B: Fast add (modern approach, e.g., PostgreSQL 11+)
--   New column with non-volatile default: metadata-only change
--   Existing rows implicitly have default value
--   New rows written with column included
--   Much faster, no table rewrite
 
-- EXTERNAL LEVEL IMPACT:
-- Views selecting * will now include loyalty_tier
-- Views selecting specific columns unaffected
-- Views depending on column order may need review
 
-- Check what views depend on a table:
SELECT 
    v.relname AS view_name,
    d.refobjid::regclass AS depends_on
FROM pg_depend d
JOIN pg_class v ON d.classid = 'pg_class'::regclass AND d.objid = v.oid
WHERE d.refobjid = 'customer'::regclass
  AND v.relkind = 'v';

Online DDL

Modern databases increasingly support 'online DDL'—schema changes that don't block ongoing operations. PostgreSQL's concurrent index creation, MySQL's online ALTER TABLE, and Oracle's online redefinition all aim to minimize the disruption of DDL across levels.

Transaction Management Across Levels

Transactions—the ACID guarantees—must be coordinated across all three levels. Let's trace how a transaction flows through the architecture:

transaction_flow.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- A transfer transaction
BEGIN TRANSACTION;
  UPDATE Accounts SET balance = balance - 1000 WHERE account_id = 'A';
  UPDATE Accounts SET balance = balance + 1000 WHERE account_id = 'B';
COMMIT;
 
/* 
LEVEL-BY-LEVEL FLOW:
 
═══════════════════════════════════════════════════════════════════
EXTERNAL LEVEL:
• Transaction start recorded in session state
• Application sees isolated view of data
• Uncommitted changes visible only to this session
═══════════════════════════════════════════════════════════════════
 
═══════════════════════════════════════════════════════════════════
CONCEPTUAL LEVEL:
• Logical locking: Rows for 'A' and 'B' logically locked
• Constraint checking: balance >= 0 checked after each UPDATE
• Referential integrity: No FK issues here
• Trigger execution: Any AFTER UPDATE triggers queued
═══════════════════════════════════════════════════════════════════
 
═══════════════════════════════════════════════════════════════════
INTERNAL LEVEL:
FOR UPDATE #1 (Account A):
• Acquire row-level lock on page containing A's row
• Write old balance value to undo log
• Write "UPDATE account A: balance -= 1000" to WAL (redo log)
• Modify buffer page (dirty page, not yet on disk)
 
FOR UPDATE #2 (Account B):
• Acquire row-level lock on page containing B's row  
• Write old balance to undo log
• Write "UPDATE account B: balance += 1000" to WAL
• Modify buffer page
 
FOR COMMIT:
• Write COMMIT record to WAL
• Force WAL to disk (durability guarantee)
• Release locks
• Dirty pages written to disk later (checkpoint)
═══════════════════════════════════════════════════════════════════
*/

ACID Properties and the Levels

Property	Primary Level	Mechanism
Atomicity	Internal	Undo log allows rollback
Consistency	Conceptual	Constraint checking before commit
Isolation	All levels	Locking (internal), MVCC (conceptual+internal)
Durability	Internal	WAL (write-ahead log) to disk

MVCC and the External Level

Multi-Version Concurrency Control (MVCC) allows readers and writers to proceed without blocking each other. Each transaction sees a consistent snapshot of the database—an 'external view' frozen at a point in time. This isolation creates a virtual external schema per transaction.

Performance Implications of Level Interactions

Understanding level interactions reveals where performance bottlenecks occur and how to address them.

Common Performance Bottlenecks by Level

•External Level: Complex Views — Deeply nested views with aggregations force repeated expensive computations. Consider materialized views for heavy aggregations.
•External→Conceptual: View Expansion Overhead — Some queries don't optimize well after view expansion. Monitor EXPLAIN plans for unexpected complexity.
•Conceptual Level: Poor Statistics — Stale or missing statistics lead to bad optimizer decisions. Run ANALYZE regularly.
•Conceptual Level: Constraint Checking — Heavy CHECK constraints and triggers slow insertions. Consider deferring to batch validation.
•Conceptual→Internal: Missing Indexes — Queries optimized logically but unable to use efficient access paths. Add indexes for common predicates.
•Internal Level: Buffer Cache Misses — Frequently accessed data not fitting in buffer pool. Increase shared_buffers or add more RAM.
•Internal Level: Lock Contention — Concurrent transactions waiting for same rows. Consider redesigning hot spots.
•Internal Level: WAL I/O — Commit latency limited by disk write speed. Use faster disks or batched commits.

diagnose_bottlenecks.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- PostgreSQL: Identify level where time is spent
 
-- Is view expansion causing problems?
EXPLAIN (ANALYZE, VERBOSE, FORMAT TEXT)
SELECT * FROM complex_nested_view WHERE id = 1;
-- Look at planning time vs execution time
 
-- Are statistics stale? (Conceptual level issue)
SELECT 
    schemaname, 
    relname, 
    last_analyze, 
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL '7 days'
   OR n_dead_tup > n_live_tup * 0.1;
 
-- Missing indexes? (Internal level issue)
SELECT 
    schemaname, 
    relname, 
    seq_scan, 
    seq_tup_read,
    idx_scan, 
    idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan  -- More seq scans than index scans
  AND seq_tup_read > 10000 -- Reading significant data
ORDER BY seq_tup_read DESC;
 
-- Buffer cache efficiency? (Internal level)
SELECT 
    heap_blks_read,
    heap_blks_hit,
    round(heap_blks_hit::numeric / 
          nullif(heap_blks_hit + heap_blks_read, 0) * 100, 2) AS hit_ratio
FROM pg_statio_user_tables
WHERE relname = 'your_table';
-- Goal: > 99% hit ratio for OLTP
 
-- Lock contention? (Internal level)
SELECT 
    locktype, 
    relation::regclass, 
    mode, 
    granted,
    count(*)
FROM pg_locks
WHERE NOT granted
GROUP BY locktype, relation, mode, granted;

Systematic Diagnosis

When diagnosing performance issues, work through the levels: First, is the query correct at the external level? Then, does the optimizer make good choices at the conceptual level? Finally, is physical execution efficient at the internal level? Each level requires different diagnostic tools.

Summary: Level Interactions

We've explored how the three architectural levels work together dynamically. Here are the essential takeaways:

Key Takeaways

•Queries traverse all three levels — From external view through conceptual processing to internal storage access and back.
•View expansion bridges external→conceptual — Views are replaced with their definitions, then optimized as unified queries.
•Optimization uses both conceptual and internal knowledge — Logical rewrites use conceptual semantics; physical planning uses internal structures.
•Data modifications ripple through all levels — Constraint checking (conceptual), storage updates (internal), index maintenance (internal).
•Transactions coordinate across levels — ACID properties require coordination from conceptual consistency to internal logging.
•Performance issues can occur at any level — Diagnosis requires understanding which level is the bottleneck.

Module Complete:

You've now mastered the Three-Level Architecture—the foundational framework for understanding database systems. You understand the external level (user views), conceptual level (logical structure), internal level (physical storage), the formal ANSI-SPARC framework, and how these levels interact dynamically during database operations.

Module Complete

Congratulations! You've completed the Three-Level Architecture module. You can now explain the purpose and components of each level, how they interact during queries and updates, why data independence matters, and how to diagnose performance issues across levels. This understanding forms the foundation for all advanced database topics.

5 / 5

Loading learning content...

Database Management SystemDatabase System Architecture

Three-Level Architecture

LevelBeginner

Duration60 mins

TopicDatabase System Architecture

5 / 5

Level Interactions

The Architecture in Motion

How do the three levels work together when you run a query? When you insert a row? When a DBA adds an index? When a schema evolves?

What You Will Learn

Query Processing: From SQL to Disk

When a user submits a SQL query, it undergoes a series of transformations that traverse all three architecture levels. Let's trace a query's journey:

Converting Mermaid diagram...

Detailed Phase Breakdown

Phase 1: Parsing and Validation (External Level)

The query processor first interacts with the external level:

Parses SQL syntax into an abstract syntax tree (AST)
Looks up Employee_View in the external schema catalog
Verifies the user has SELECT permission on this view
Validates that department is a valid column name

Phase 2: View Expansion (External → Conceptual)

The query is transformed from external to conceptual terms:

Retrieves the view definition (the mapping)
Substitutes the view with its underlying query
Merges user predicates with view predicates

Phase 3: Optimization (Conceptual + Internal)

The optimizer uses both conceptual and internal knowledge:

Rewrites the query for efficiency (push down predicates, eliminate redundancy)
Consults statistics from conceptual metadata (row counts, value distributions)
Uses internal schema knowledge (indexes, partitions) to choose physical operations

Phase 4: Execution (Internal Level)

The physical plan executes against storage:

Access methods read data from files
Buffer manager handles page caching
Results are assembled and returned through the levels

Crossing Boundaries

View Expansion: The External-Conceptual Transformation

View expansion is the process of replacing a view reference with its definition (the underlying query). This is how the external-level query becomes a conceptual-level query.

How View Expansion Works

view_expansion_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- STEP 1: View definition (stored in catalog)
CREATE VIEW HighValueCustomers AS
SELECT 
    c.customer_id,
    c.name,
    c.email,
    SUM(o.amount) AS total_purchases
FROM Customer c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY c.customer_id, c.name, c.email
HAVING SUM(o.amount) > 10000;
 
-- STEP 2: User query (external level)
SELECT name, email, total_purchases
FROM HighValueCustomers
WHERE total_purchases > 50000
ORDER BY total_purchases DESC
LIMIT 10;
 
-- STEP 3: After view expansion (conceptual level)
-- The system replaces "HighValueCustomers" with its definition:
SELECT name, email, total_purchases
FROM (
    SELECT 
        c.customer_id,
        c.name,
        c.email,
        SUM(o.amount) AS total_purchases
    FROM Customer c
    JOIN Orders o ON c.customer_id = o.customer_id
    WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
    GROUP BY c.customer_id, c.name, c.email
    HAVING SUM(o.amount) > 10000
) AS HighValueCustomers
WHERE total_purchases > 50000
ORDER BY total_purchases DESC
LIMIT 10;
 
-- STEP 4: After logical optimization (still conceptual level)
-- Optimizer merges predicates and simplifies:
SELECT 
    c.name,
    c.email,
    SUM(o.amount) AS total_purchases
FROM Customer c
JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY c.customer_id, c.name, c.email
HAVING SUM(o.amount) > 50000  -- Merged: 10000 AND 50000 → 50000
ORDER BY total_purchases DESC
LIMIT 10;

Nested View Expansion

Views can reference other views, creating chains of expansion:

CREATE VIEW BaseView AS SELECT ... FROM Table1 ...;
CREATE VIEW MiddleView AS SELECT ... FROM BaseView ...;
CREATE VIEW TopView AS SELECT ... FROM MiddleView ...;

When a user queries TopView, the system expands all three levels recursively until only base tables remain. Deep nesting can impact optimization time.

View Expansion Limitations

Query Optimization Across Levels

Query optimization is where conceptual and internal levels collaborate most intensively. The optimizer's job is to find the most efficient physical execution plan for a logical query.

The Optimization Stages

Logical optimization works at the conceptual level, transforming queries into equivalent but more efficient forms:

Predicate pushdown: Move WHERE clauses as close to base tables as possible
Projection pushdown: Select only needed columns early
Join reordering: Choose optimal join order based on table sizes
Subquery unnesting: Convert correlated subqueries to joins
Common subexpression elimination: Compute repeated expressions once

logical_optimization.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Original query
SELECT e.name, d.department_name
FROM Employee e
JOIN Department d ON e.dept_id = d.dept_id
WHERE d.department_name = 'Engineering'
  AND e.salary > 100000;
 
-- After predicate pushdown (logically equivalent)
SELECT e.name, d.department_name
FROM (SELECT * FROM Employee WHERE salary > 100000) e
JOIN (SELECT * FROM Department WHERE department_name = 'Engineering') d
ON e.dept_id = d.dept_id;
 
-- Why this helps: Filtered tables are smaller, join is faster
-- The optimizer does this automatically
 
-- Another example: Join reordering
-- Suppose: Orders has 10M rows, Products has 1K rows, Categories has 50 rows
SELECT o.*, p.name, c.category_name
FROM Orders o
JOIN Products p ON o.product_id = p.product_id
JOIN Categories c ON p.category_id = c.category_id
WHERE c.category_name = 'Electronics';
 
-- Optimal join order (smallest first):
-- 1. Filter Categories to 1 row (Electronics)
-- 2. Join with Products (maybe 100 Electronics products)
-- 3. Join with Orders (only orders for those 100 products)
-- This can be 1000x faster than starting with Orders

Statistics Are Critical

Data Modification: INSERT, UPDATE, DELETE

Data modifications follow a similar multi-level path as queries, but with additional complexity around constraint checking, index maintenance, and transaction logging.

The Modification Pipeline

modification_flow.txt

Text

INSERT INTO Orders (customer_id, product_id, quantity, amount)
VALUES (1001, 5042, 3, 299.97);
 
┌─────────────────────────────────────────────────────────────────┐
│ STEP 1: PARSE & VALIDATE (External Level)                       │
│ • Parse SQL syntax                                               │
│ • Resolve "Orders" table                                         │
│ • Check INSERT permission                                        │
│ • Validate column names match                                    │
└────────────────────────┬────────────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 2: CONSTRAINT CHECK (Conceptual Level)                      │
│ • Check NOT NULL constraints (all required columns present?)     │
│ • Validate data types (customer_id is INTEGER?)                  │
│ • Check domain constraints (quantity > 0?)                       │
│ • Verify foreign keys:                                           │
│   - Does customer_id=1001 exist in Customers table?              │
│   - Does product_id=5042 exist in Products table?                │
│ • Check unique constraints (if order_id auto-generated)          │
│ • Execute BEFORE triggers                                        │
└────────────────────────┬────────────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 3: PHYSICAL INSERT (Internal Level)                         │
│ • Find target page with free space                               │
│ • Format row in internal representation                          │
│ • Write to buffer pool (mark page dirty)                         │
│ • Write to Write-Ahead Log (WAL) for durability                  │
│ • Update all indexes:                                            │
│   - idx_orders_customer: add (1001, new_row_location)           │
│   - idx_orders_product: add (5042, new_row_location)             │
│   - idx_orders_date: add (current_date, new_row_location)        │
│ • Update table statistics (if incremental)                       │
└────────────────────────┬────────────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 4: POST-PROCESSING                                          │
│ • Execute AFTER triggers                                         │
│ • If view insert (via INSTEAD OF trigger): run trigger logic     │
│ • Update materialized views if affected                          │
│ • Return success/generated values                                │
└─────────────────────────────────────────────────────────────────┘

Updates and Deletes

UPDATE involves:

Finding rows to modify (like a SELECT)
Checking NEW values against constraints
Updating row data in pages
Removing old index entries, adding new ones (for indexed columns)
Logging changes for recovery

DELETE involves:

Finding rows to delete (like a SELECT)
Checking cascading foreign key actions
Marking rows as deleted (often soft delete first, later reclaimed)
Removing index entries
Logging changes

The Hidden Cost of Indexes

Schema Modifications: DDL Operations

Schema modifications (DDL: CREATE, ALTER, DROP) impact the architecture differently than data modifications (DML). DDL changes the structure itself, potentially affecting all three levels.

DDL Impact Across Levels
Operation	External Level	Conceptual Level	Internal Level
CREATE TABLE	—	New table in catalog	Allocate storage files
ALTER TABLE ADD COLUMN	Views may need update	Schema updated	Existing rows extended/rewritten
CREATE INDEX	—	Index metadata	Build index structure on disk
CREATE VIEW	New view available	View definition stored	—
DROP TABLE	Dependent views invalidated	Table removed	Files deallocated
RENAME TABLE	View definitions may break	Catalog updated	—

ddl_impact.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Adding a column: What happens at each level
 
-- Operation:
ALTER TABLE Customer ADD COLUMN loyalty_tier VARCHAR(20) DEFAULT 'bronze';
 
-- CONCEPTUAL LEVEL IMPACT:
-- • pg_attribute gets new row for the column
-- • pg_class.relnatts incremented
-- • Default value stored in pg_attrdef
 
-- INTERNAL LEVEL IMPACT (depends on implementation):
-- Option A: Table rewrite (older approach)
--   Every row physically rewritten with new column
--   Very expensive for large tables
--
-- Option B: Fast add (modern approach, e.g., PostgreSQL 11+)
--   New column with non-volatile default: metadata-only change
--   Existing rows implicitly have default value
--   New rows written with column included
--   Much faster, no table rewrite
 
-- EXTERNAL LEVEL IMPACT:
-- Views selecting * will now include loyalty_tier
-- Views selecting specific columns unaffected
-- Views depending on column order may need review
 
-- Check what views depend on a table:
SELECT 
    v.relname AS view_name,
    d.refobjid::regclass AS depends_on
FROM pg_depend d
JOIN pg_class v ON d.classid = 'pg_class'::regclass AND d.objid = v.oid
WHERE d.refobjid = 'customer'::regclass
  AND v.relkind = 'v';

Online DDL

Transaction Management Across Levels

Transactions—the ACID guarantees—must be coordinated across all three levels. Let's trace how a transaction flows through the architecture:

transaction_flow.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- A transfer transaction
BEGIN TRANSACTION;
  UPDATE Accounts SET balance = balance - 1000 WHERE account_id = 'A';
  UPDATE Accounts SET balance = balance + 1000 WHERE account_id = 'B';
COMMIT;
 
/* 
LEVEL-BY-LEVEL FLOW:
 
═══════════════════════════════════════════════════════════════════
EXTERNAL LEVEL:
• Transaction start recorded in session state
• Application sees isolated view of data
• Uncommitted changes visible only to this session
═══════════════════════════════════════════════════════════════════
 
═══════════════════════════════════════════════════════════════════
CONCEPTUAL LEVEL:
• Logical locking: Rows for 'A' and 'B' logically locked
• Constraint checking: balance >= 0 checked after each UPDATE
• Referential integrity: No FK issues here
• Trigger execution: Any AFTER UPDATE triggers queued
═══════════════════════════════════════════════════════════════════
 
═══════════════════════════════════════════════════════════════════
INTERNAL LEVEL:
FOR UPDATE #1 (Account A):
• Acquire row-level lock on page containing A's row
• Write old balance value to undo log
• Write "UPDATE account A: balance -= 1000" to WAL (redo log)
• Modify buffer page (dirty page, not yet on disk)
 
FOR UPDATE #2 (Account B):
• Acquire row-level lock on page containing B's row  
• Write old balance to undo log
• Write "UPDATE account B: balance += 1000" to WAL
• Modify buffer page
 
FOR COMMIT:
• Write COMMIT record to WAL
• Force WAL to disk (durability guarantee)
• Release locks
• Dirty pages written to disk later (checkpoint)
═══════════════════════════════════════════════════════════════════
*/

ACID Properties and the Levels

Property	Primary Level	Mechanism
Atomicity	Internal	Undo log allows rollback
Consistency	Conceptual	Constraint checking before commit
Isolation	All levels	Locking (internal), MVCC (conceptual+internal)
Durability	Internal	WAL (write-ahead log) to disk

MVCC and the External Level

Performance Implications of Level Interactions

Understanding level interactions reveals where performance bottlenecks occur and how to address them.

Common Performance Bottlenecks by Level

•External Level: Complex Views — Deeply nested views with aggregations force repeated expensive computations. Consider materialized views for heavy aggregations.
•External→Conceptual: View Expansion Overhead — Some queries don't optimize well after view expansion. Monitor EXPLAIN plans for unexpected complexity.
•Conceptual Level: Poor Statistics — Stale or missing statistics lead to bad optimizer decisions. Run ANALYZE regularly.
•Conceptual Level: Constraint Checking — Heavy CHECK constraints and triggers slow insertions. Consider deferring to batch validation.
•Conceptual→Internal: Missing Indexes — Queries optimized logically but unable to use efficient access paths. Add indexes for common predicates.
•Internal Level: Buffer Cache Misses — Frequently accessed data not fitting in buffer pool. Increase shared_buffers or add more RAM.
•Internal Level: Lock Contention — Concurrent transactions waiting for same rows. Consider redesigning hot spots.
•Internal Level: WAL I/O — Commit latency limited by disk write speed. Use faster disks or batched commits.

diagnose_bottlenecks.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- PostgreSQL: Identify level where time is spent
 
-- Is view expansion causing problems?
EXPLAIN (ANALYZE, VERBOSE, FORMAT TEXT)
SELECT * FROM complex_nested_view WHERE id = 1;
-- Look at planning time vs execution time
 
-- Are statistics stale? (Conceptual level issue)
SELECT 
    schemaname, 
    relname, 
    last_analyze, 
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables
WHERE last_analyze < NOW() - INTERVAL '7 days'
   OR n_dead_tup > n_live_tup * 0.1;
 
-- Missing indexes? (Internal level issue)
SELECT 
    schemaname, 
    relname, 
    seq_scan, 
    seq_tup_read,
    idx_scan, 
    idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan  -- More seq scans than index scans
  AND seq_tup_read > 10000 -- Reading significant data
ORDER BY seq_tup_read DESC;
 
-- Buffer cache efficiency? (Internal level)
SELECT 
    heap_blks_read,
    heap_blks_hit,
    round(heap_blks_hit::numeric / 
          nullif(heap_blks_hit + heap_blks_read, 0) * 100, 2) AS hit_ratio
FROM pg_statio_user_tables
WHERE relname = 'your_table';
-- Goal: > 99% hit ratio for OLTP
 
-- Lock contention? (Internal level)
SELECT 
    locktype, 
    relation::regclass, 
    mode, 
    granted,
    count(*)
FROM pg_locks
WHERE NOT granted
GROUP BY locktype, relation, mode, granted;

Systematic Diagnosis

Summary: Level Interactions

We've explored how the three architectural levels work together dynamically. Here are the essential takeaways:

Key Takeaways

•Queries traverse all three levels — From external view through conceptual processing to internal storage access and back.
•View expansion bridges external→conceptual — Views are replaced with their definitions, then optimized as unified queries.
•Optimization uses both conceptual and internal knowledge — Logical rewrites use conceptual semantics; physical planning uses internal structures.
•Data modifications ripple through all levels — Constraint checking (conceptual), storage updates (internal), index maintenance (internal).
•Transactions coordinate across levels — ACID properties require coordination from conceptual consistency to internal logging.
•Performance issues can occur at any level — Diagnosis requires understanding which level is the bottleneck.

Module Complete:

Module Complete

5 / 5