Three Level Architecture - Learning Module

Loading content...

0/252

Internal Level (Physical)

Where Data Lives: The Physical Reality

We've explored the external level (what users see) and the conceptual level (the logical structure). But data doesn't live in abstractions—it lives on physical storage devices: spinning hard drives, solid-state drives, network-attached storage systems, and cloud storage volumes.

The internal level is where abstraction meets reality. It defines how the logical structures of the conceptual schema are actually stored, organized, and accessed on physical media. This is the domain of storage engines, file systems, buffer managers, and indexing algorithms.

Why does this matter? Because the difference between a query taking 10 milliseconds and 10 seconds often comes down to physical storage decisions. The conceptual schema might define a table with 10 million rows—but how those rows are stored, indexed, and accessed determines whether the database is usable or unbearably slow.

What You Will Learn

By the end of this page, you will understand the internal level's responsibility for physical data storage, including file organization methods, indexing structures, storage allocation, and buffer management. You'll see how physical design decisions impact performance and why the three-level architecture keeps these details separate from logical design.

Understanding the Internal Level

The internal level (also called the physical level or storage level) describes how data is actually stored on storage devices. It is the lowest level of the three-level architecture, hidden from users and even from application programmers.

Definition and Scope

The internal level contains the internal schema (or storage schema)—a complete specification of how the conceptual schema is mapped to physical storage. This includes:

File structures — How data files are organized on disk
Record formats — How individual rows are physically laid out
Index structures — How to quickly locate specific data
Compression — How data is compacted for storage efficiency
Encryption — How data is protected at rest
Allocation — How storage space is managed

Formal Definition:

The internal schema is a specification of data storage structures, access paths, and file organizations used to store the database on physical storage devices.

Key Characteristics of the Internal Level
Characteristic	Description	Impact
Physical Storage Focus	Deals with bytes on disk, not logical concepts	Performance is primary concern, not semantics
Invisible to Users	Completely hidden from external and most conceptual operations	Users never see file structures or index details
Performance Critical	Physical decisions determine query speed	Wrong choices can cause 1000x performance difference
Hardware Dependent	Must consider device characteristics (HDD vs SSD)	Optimal strategies differ by storage type
Tunable	Can be modified without changing conceptual schema	Enables performance optimization without application changes

The Warehouse Analogy

Think of the internal level as warehouse management. The conceptual level defines what products exist (tables). The internal level decides: Which warehouse building stores which products? How are shelves organized? Where are the frequently accessed items placed? How do workers (queries) find what they need quickly? Good warehouse organization makes operations efficient; bad organization means workers spend hours searching for items.

Storage Media and the Memory Hierarchy

To understand internal-level design, we must understand the storage devices the database uses and their characteristics. Modern systems use a memory hierarchy with dramatically different performance at each level.

Memory Hierarchy in Database Systems
Level	Technology	Typical Size	Access Time	$ per GB	Volatility
CPU Registers	SRAM in CPU	~1 KB	< 1 ns	N/A	Volatile
L1 Cache	SRAM on chip	32-64 KB	~1 ns	~$10,000	Volatile
L2 Cache	SRAM on chip	256 KB-1 MB	~4 ns	~$1,000	Volatile
L3 Cache	SRAM on/near chip	4-50 MB	~15 ns	~$100	Volatile
Main Memory (RAM)	DRAM	16 GB - 6 TB	~100 ns	~$5	Volatile
SSD (NVMe)	NAND Flash	256 GB - 30 TB	~100 μs	~$0.10	Non-volatile
SSD (SATA)	NAND Flash	256 GB - 8 TB	~500 μs	~$0.08	Non-volatile
HDD (Spinning)	Magnetic disk	1 TB - 20 TB	~10 ms	~$0.02	Non-volatile
Tape/Archive	Magnetic tape	Petabytes	seconds-minutes	~$0.004	Non-volatile

The Critical Implication

The access time difference between RAM (~100 nanoseconds) and HDD (~10 milliseconds) is a factor of 100,000. This means:

An operation taking 1 second in RAM would take 27+ hours if done the same way on disk
Database systems must be designed to minimize disk I/O above all else
Caching, buffering, and intelligent data placement are essential

Disk I/O is the bottleneck. The entire architecture of database storage is designed to minimize the number of disk reads and writes required for any operation.

Sequential vs Random Access

On HDDs, sequential reads (reading consecutive bytes) are 100-200x faster than random reads (jumping to different locations). On SSDs, the gap is smaller but still significant (3-10x). This is why the internal level carefully organizes data to maximize sequential access and minimize random seeks.

File Organization Methods

The internal level must decide how records (rows) are physically arranged in files. This file organization profoundly affects query performance.

Heap organization stores records in no particular order—new records are placed wherever there's space.

Characteristics

Insert: O(1) — Append to end of file
Search (by any attribute): O(n) — Must scan entire file
Delete: O(n) search + O(1) mark as deleted
Update: O(n) search + O(1) in place (if size unchanged)

When to Use

Bulk loading large datasets
Tables accessed primarily by full table scans
Staging tables for ETL processes
Log tables where new records are appended

heap_organization.txt

Text

Heap File Structure:
┌─────────────────────────────────────────────────────────┐
│ Page 1                                                  │
│ ┌───────────┬───────────┬───────────┬───────────┐      │
│ │ Record 1  │ Record 2  │ Record 3  │ Record 4  │      │
│ │ ID=7      │ ID=3      │ ID=12     │ ID=1      │      │
│ └───────────┴───────────┴───────────┴───────────┘      │
├─────────────────────────────────────────────────────────┤
│ Page 2                                                  │
│ ┌───────────┬───────────┬───────────┬───────────┐      │
│ │ Record 5  │ Record 6  │ Record 7  │ (Empty)   │      │
│ │ ID=9      │ ID=2      │ ID=5      │           │      │
│ └───────────┴───────────┴───────────┴───────────┘      │
└─────────────────────────────────────────────────────────┘
 
Notice: Records are NOT sorted by ID.
To find ID=5, we must scan from Record 1 through Record 7.
 
Query: SELECT * FROM table WHERE id = 5
Plan:  Sequential Scan, filter (id = 5)
Pages Read: Potentially ALL pages in the table

Clustered vs Unclustered

A table can only be physically sorted by ONE attribute (the clustering attribute). All other attributes cannot benefit from the physical order. This is why indexes are essential—they provide alternative "sorted views" of data without physically reorganizing it.

Indexing Structures

Indexes are the most important performance tool in the internal level. An index is a separate data structure that enables fast lookup without scanning entire tables.

The Phone Book Analogy

Think of a phone book. The actual records (people) aren't easily searchable because there are millions of them. But the index (alphabetical listing by last name) lets you quickly find any person. Indexes do the same for databases.

Types of Indexes

Common Index Types
Index Type	Structure	Best For	Not Good For
B-tree/B+ tree	Balanced tree with sorted keys	Range queries, equality, ORDER BY	Pattern matching, full-text
Hash Index	Hash table	Equality lookups only	Range queries, sorting
Bitmap Index	Bit vectors per value	Low-cardinality columns, AND/OR	High-cardinality, frequent updates
GiST	Generalized search tree	Geometric, full-text, complex types	Simple equality lookups
GIN	Generalized inverted index	Full-text search, arrays, JSONB	Simple scalar values
BRIN	Block range index	Very large tables with natural order	Random distribution, point queries

B+ Tree: The Workhorse

The B+ tree is the dominant index structure in relational databases. It provides:

O(log n) search — Even with billions of rows
Efficient range queries — Leaf nodes are linked sequentially
Self-balancing — Stays efficient as data grows
High fanout — Each node contains many keys, reducing tree height

btree_structure.txt

Text

B+ Tree Index Structure (order=3):
 
                        ┌─────────────┐
                        │   [30,60]   │              (Root Node)
                        └──────┬──────┘
               ┌───────────────┼───────────────┐
               ▼               ▼               ▼
        ┌───────────┐   ┌───────────┐   ┌───────────┐
        │ [10,20]   │   │ [40,50]   │   │ [70,80]   │  (Internal Nodes)
        └─────┬─────┘   └─────┬─────┘   └─────┬─────┘
         ┌────┼────┐     ┌────┼────┐     ┌────┼────┐
         ▼    ▼    ▼     ▼    ▼    ▼     ▼    ▼    ▼
       ┌───┐┌───┐┌───┐ ┌───┐┌───┐┌───┐ ┌───┐┌───┐┌───┐
       │5,8││15 ││25,│ │35,││45 ││55,│ │65,││75 ││85,│ (Leaf Nodes)
       │   ││18 ││28 │ │38 ││48 ││58 │ │68 ││78 ││90 │
       └─┬─┘└─┬─┘└─┬─┘ └─┬─┘└─┬─┘└─┬─┘ └─┬─┘└─┬─┘└─┬─┘
         │    │    │     │    │    │     │    │    │
         ▼    ▼    ▼     ▼    ▼    ▼     ▼    ▼    ▼
      [Row][Row][Row] [Row][Row][Row] [Row][Row][Row]  (Data Pages/Row Pointers)
 
Key Properties:
1. All data pointers are in leaf nodes only
2. Leaf nodes are linked for sequential access (→→→)
3. Internal nodes only guide the search
4. Tree is always balanced (all leaves at same level)
 
Search for key=45:
  Root: 45 ≥ 30 and 45 < 60, go middle
  Internal: 45 ≥ 40 and 45 < 50, go middle  
  Leaf: Found! Return row pointer
 
Disk reads: 3 (one per level) - regardless of table size!

index_creation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- B-tree index (default)
CREATE INDEX idx_customer_email ON Customer(email);
 
-- Composite index (multiple columns)
CREATE INDEX idx_order_customer_date ON Order(customer_id, order_date);
-- Efficient for: WHERE customer_id = ? AND order_date = ?
-- Also efficient for: WHERE customer_id = ? (leftmost prefix)
-- NOT efficient for: WHERE order_date = ? (not leftmost)
 
-- Unique index (also enforces constraint)
CREATE UNIQUE INDEX idx_product_sku ON Product(sku);
 
-- Partial index (only index rows matching condition)
CREATE INDEX idx_active_orders ON Order(order_date)
WHERE status = 'active';
-- Smaller index, faster for common queries
 
-- Covering index (includes all columns needed by query)
CREATE INDEX idx_order_covering ON Order(customer_id)
INCLUDE (order_date, total_amount, status);
-- Query can be answered from index alone, no table access needed
 
-- Hash index (PostgreSQL)
CREATE INDEX idx_customer_id_hash ON Customer USING HASH(customer_id);
-- Slightly faster equality lookups, no range support
 
-- GIN index for full-text search
CREATE INDEX idx_product_search ON Product 
USING GIN(to_tsvector('english', product_name || ' ' || description));

Index Overhead

Indexes are not free. Each index consumes disk space and must be updated on every INSERT, UPDATE, or DELETE. A table with 10 indexes means every write operation triggers 11 writes (table + 10 indexes). Choose indexes wisely based on actual query patterns.

Storage Allocation and Page Structure

The internal level manages physical space through pages (also called blocks)—fixed-size units of storage that are read and written atomically.

Page Fundamentals

Page size typically 4KB, 8KB, or 16KB (PostgreSQL: 8KB, MySQL InnoDB: 16KB)
Pages are the unit of I/O—even reading 1 byte reads an entire page
Database buffer pool caches pages in memory
Records must fit within pages (large objects stored separately)

page_structure.txt

Text

Typical Database Page Structure (8KB example):
 
┌────────────────────────────────────────────────────────────────┐
│ Page Header (24-100 bytes)                                     │
│ ┌────────────┬────────────┬────────────┬────────────────────┐ │
│ │ Page ID    │ LSN        │ Checksum   │ Free Space Pointer │ │
│ │ [4 bytes]  │ [8 bytes]  │ [4 bytes]  │ [2 bytes]          │ │
│ └────────────┴────────────┴────────────┴────────────────────┘ │
├────────────────────────────────────────────────────────────────┤
│ Item Pointers (Line Pointer Array) - grows downward            │
│ ┌────────┬────────┬────────┬────────┬────────┬───────────┐    │
│ │ Ptr 1  │ Ptr 2  │ Ptr 3  │ Ptr 4  │ Ptr 5  │ ...       │    │
│ │ →Row1  │ →Row2  │ →Row3  │ →Row4  │ →Row5  │           │    │
│ └────────┴────────┴────────┴────────┴────────┴───────────┘    │
│                              ↓                                  │
│ ╔════════════════════════════════════════════════════════════╗ │
│ ║              F R E E   S P A C E                           ║ │
│ ║                                                            ║ │
│ ╚════════════════════════════════════════════════════════════╝ │
│                              ↑                                  │
├────────────────────────────────────────────────────────────────┤
│ Tuple Data (Row Data) - grows upward from bottom               │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Row 5: customer_id=105, name='Eve Garcia', email=...      │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 4: customer_id=104, name='David Lee', email=...       │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 3: customer_id=103, name='Carol White', email=...     │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 2: customer_id=102, name='Bob Jones', email=...       │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 1: customer_id=101, name='Alice Smith', email=...     │ │
│ └───────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
 
Key Design Elements:
1. Item pointers allow rows to move within page (for compaction)
2. Free space between pointers and data allows efficient insertion
3. Header contains page metadata for recovery and integrity
4. Fixed page size enables efficient I/O and buffer management

Tablespaces and File Organization

Above individual pages, databases organize storage into:

Files — Operating system files containing multiple pages
Segments — Logical groupings of pages for a single object (table, index)
Tablespaces — Collections of database files that can span multiple disks
Datafiles — Physical files on the filesystem

tablespace_management.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Create tablespace on specific storage
-- PostgreSQL example
CREATE TABLESPACE fast_ssd LOCATION '/mnt/nvme/pg_data';
CREATE TABLESPACE archive_hdd LOCATION '/mnt/hdd/pg_archive';
 
-- Create table on specific tablespace
CREATE TABLE hot_data (
    id SERIAL PRIMARY KEY,
    data JSONB,
    created_at TIMESTAMP DEFAULT NOW()
) TABLESPACE fast_ssd;
 
CREATE TABLE cold_archive (
    id BIGINT PRIMARY KEY,
    data JSONB,
    archived_at TIMESTAMP
) TABLESPACE archive_hdd;
 
-- Move existing table to different tablespace
ALTER TABLE order_history SET TABLESPACE archive_hdd;
 
-- Create index on different tablespace than table
CREATE INDEX idx_hot_data_created 
ON hot_data(created_at) 
TABLESPACE fast_ssd;
 
-- Oracle: Create tablespace with specific parameters
-- CREATE TABLESPACE sales_data
--   DATAFILE '/u01/oradata/sales01.dbf' SIZE 10G
--   EXTENT MANAGEMENT LOCAL
--   SEGMENT SPACE MANAGEMENT AUTO;

Strategic Storage Placement

Use fast storage (NVMe SSD) for frequently accessed tables and indexes, transaction logs, and temp space. Use slower storage (HDD, object storage) for historical data, backups, and rarely-accessed archives. This tiered storage approach optimizes cost and performance simultaneously.

Buffer Management and Caching

Since disk I/O is slow, databases maintain a buffer pool (also called buffer cache or shared buffers)—an area of RAM that caches frequently accessed pages.

Buffer Pool Mechanics

Page Request: Query needs data from a specific page
Buffer Lookup: Check if page is already in buffer pool
Hit: If found, return directly from memory (fast!)
Miss: If not found, read from disk into buffer pool
Eviction: If buffer pool is full, evict least useful page
Write-Back: Modified ("dirty") pages are written to disk later

Buffer Pool Configuration
Database	Parameter	Recommended Setting	Notes
PostgreSQL	shared_buffers	25% of RAM	Start here, tune based on workload
PostgreSQL	effective_cache_size	50-75% of RAM	Hint for query planner
MySQL InnoDB	innodb_buffer_pool_size	70-80% of RAM	Most critical InnoDB setting
Oracle	SGA_TARGET	40-80% of RAM	Includes buffer cache + other caches
SQL Server	max server memory	Leave 2-4GB for OS	SQL Server manages the rest

Page Replacement Policies

When the buffer pool is full and a new page is needed, which existing page should be evicted?

LRU (Least Recently Used) — Evict the page not accessed for longest time
Clock/Second Chance — Approximation of LRU with lower overhead
LRU-K — Track last K references, not just last reference
2Q — Separate queues for new pages and proven hot pages
ARC (Adaptive Replacement Cache) — Balances recency and frequency

Most databases use sophisticated variants that prevent sequential scans from flushing the entire useful cache (the "table scan problem").

buffer_monitoring.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- PostgreSQL: Check buffer cache hit ratio
SELECT 
    sum(blks_hit) / nullif(sum(blks_hit + blks_read), 0) * 100 AS cache_hit_ratio
FROM pg_stat_database;
-- Goal: > 99% cache hit ratio for OLTP workloads
 
-- PostgreSQL: See what's in the buffer cache (requires pg_buffercache extension)
CREATE EXTENSION pg_buffercache;
 
SELECT 
    c.relname AS table_name,
    count(*) AS buffers,
    round(100.0 * count(*) / (SELECT count(*) FROM pg_buffercache WHERE relfilenode IS NOT NULL), 2) AS percent_of_cache
FROM pg_class c
JOIN pg_buffercache b ON b.relfilenode = pg_relation_filenode(c.oid)
GROUP BY c.relname
ORDER BY buffers DESC
LIMIT 20;
 
-- MySQL: InnoDB buffer pool statistics
SHOW STATUS LIKE 'Innodb_buffer_pool%';
-- Key metrics: 
-- Innodb_buffer_pool_read_requests (logical reads)
-- Innodb_buffer_pool_reads (physical reads from disk)
-- Hit ratio = 1 - (reads / read_requests)
 
-- MySQL: Buffer pool contents summary
SELECT 
    TABLE_NAME,
    ENGINE,
    TABLE_ROWS,
    ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
    ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY DATA_LENGTH DESC;

The Double Buffering Problem

Both the database and the operating system have caches. PostgreSQL with shared_buffers + OS page cache works well. But some databases (Oracle, SQL Server) prefer direct I/O to avoid double-caching overhead. Understand your database's preference to avoid configuration conflicts.

Physical Design Decisions

DBAs make numerous physical design decisions that don't affect logical structure but dramatically impact performance. These are pure internal-level concerns.

Key Physical Design Decisions

•Index Selection — Which columns to index, what index types, covering vs non-covering indexes. Too few indexes = slow reads. Too many = slow writes.
•Partitioning — Split large tables into smaller physical pieces by range (date), list (category), or hash. Enables parallel processing and selective pruning.
•Clustering/Ordering — Physical row ordering. Which column determines physical placement? Usually the primary key or a frequently queried column.
•Compression — Reduce storage size at cost of CPU cycles. Particularly effective for cold data and repeated values.
•Fill Factor — How full to make pages initially. Lower fill factor leaves room for updates without page splits, at cost of more pages.
•Tablespace Assignment — Place hot data on fast storage, cold data on slower storage. Put indexes and data on separate devices for parallel I/O.

physical_design_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Partitioning by date range (PostgreSQL)
CREATE TABLE order_history (
    order_id        BIGINT,
    order_date      DATE NOT NULL,
    customer_id     INTEGER,
    total_amount    DECIMAL(12,2)
) PARTITION BY RANGE (order_date);
 
CREATE TABLE order_history_2023 PARTITION OF order_history
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
    
CREATE TABLE order_history_2024 PARTITION OF order_history
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
 
-- Query only scans relevant partition
SELECT * FROM order_history WHERE order_date = '2024-06-15';
-- Only order_history_2024 is accessed!
 
-- Compression (PostgreSQL TOAST + external compression)
-- Large values automatically compressed and stored out-of-line
 
-- MySQL InnoDB compression
ALTER TABLE archive_data ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
 
-- Fill factor (PostgreSQL)
CREATE INDEX idx_product_name ON product(name) WITH (fillfactor = 70);
-- Leaves 30% free space per page for updates
 
-- Clustered table (PostgreSQL)
CLUSTER order_history USING idx_order_date;
-- Physically reorders rows by order_date - one-time operation
 
-- Oracle IOT (Index-Organized Table) - data stored in index structure
-- CREATE TABLE hot_data (
--     id NUMBER PRIMARY KEY,
--     data VARCHAR2(100)
-- ) ORGANIZATION INDEX;

Measure, Don't Guess

Physical design decisions should be based on actual workload patterns, not intuition. Use EXPLAIN ANALYZE to see query plans and I/O costs. Monitor buffer hit ratios, wait events, and I/O latencies. The best physical design for your system depends on YOUR data and YOUR queries.

Summary: The Internal Level

We've explored the internal level—where database abstractions meet physical reality. Here are the key takeaways:

Key Takeaways

•The internal level manages physical storage — It translates logical structures into files, pages, and bytes on storage devices.
•Disk I/O is the primary bottleneck — The 100,000x speed difference between RAM and disk drives all storage design decisions.
•File organization methods trade-offs — Heap (fast insert), sorted (fast range), hash (fast equality)—each suited to different access patterns.
•Indexes enable fast access — B+ trees provide O(log n) lookup; proper indexing is the #1 performance optimization.
•Buffer pools cache hot pages — High cache hit ratios (>99%) are essential for good performance.
•Physical design is separate from logical — You can change indexes, partitioning, and storage without changing the conceptual schema.

What's Next:

We've now covered all three levels: external (user views), conceptual (logical structure), and internal (physical storage). Next, we'll examine the ANSI-SPARC architecture—the formal framework that defined this three-level model and explains how the levels interact through mappings.

Page Complete

You now understand the internal level—how databases physically store and access data. You can explain file organization methods, indexing structures, buffer management, and physical design decisions. This knowledge enables you to make informed choices about database performance and storage configuration.

Internal Level (Physical)

Where Data Lives: The Physical Reality

What You Will Learn

Understanding the Internal Level

Definition and Scope

The internal level contains the internal schema (or storage schema)—a complete specification of how the conceptual schema is mapped to physical storage. This includes:

File structures — How data files are organized on disk
Record formats — How individual rows are physically laid out
Index structures — How to quickly locate specific data
Compression — How data is compacted for storage efficiency
Encryption — How data is protected at rest
Allocation — How storage space is managed

Formal Definition:

The internal schema is a specification of data storage structures, access paths, and file organizations used to store the database on physical storage devices.

Key Characteristics of the Internal Level
Characteristic	Description	Impact
Physical Storage Focus	Deals with bytes on disk, not logical concepts	Performance is primary concern, not semantics
Invisible to Users	Completely hidden from external and most conceptual operations	Users never see file structures or index details
Performance Critical	Physical decisions determine query speed	Wrong choices can cause 1000x performance difference
Hardware Dependent	Must consider device characteristics (HDD vs SSD)	Optimal strategies differ by storage type
Tunable	Can be modified without changing conceptual schema	Enables performance optimization without application changes

The Warehouse Analogy

Storage Media and the Memory Hierarchy

Memory Hierarchy in Database Systems
Level	Technology	Typical Size	Access Time	$ per GB	Volatility
CPU Registers	SRAM in CPU	~1 KB	< 1 ns	N/A	Volatile
L1 Cache	SRAM on chip	32-64 KB	~1 ns	~$10,000	Volatile
L2 Cache	SRAM on chip	256 KB-1 MB	~4 ns	~$1,000	Volatile
L3 Cache	SRAM on/near chip	4-50 MB	~15 ns	~$100	Volatile
Main Memory (RAM)	DRAM	16 GB - 6 TB	~100 ns	~$5	Volatile
SSD (NVMe)	NAND Flash	256 GB - 30 TB	~100 μs	~$0.10	Non-volatile
SSD (SATA)	NAND Flash	256 GB - 8 TB	~500 μs	~$0.08	Non-volatile
HDD (Spinning)	Magnetic disk	1 TB - 20 TB	~10 ms	~$0.02	Non-volatile
Tape/Archive	Magnetic tape	Petabytes	seconds-minutes	~$0.004	Non-volatile

The Critical Implication

The access time difference between RAM (~100 nanoseconds) and HDD (~10 milliseconds) is a factor of 100,000. This means:

An operation taking 1 second in RAM would take 27+ hours if done the same way on disk
Database systems must be designed to minimize disk I/O above all else
Caching, buffering, and intelligent data placement are essential

Disk I/O is the bottleneck. The entire architecture of database storage is designed to minimize the number of disk reads and writes required for any operation.

Sequential vs Random Access

File Organization Methods

The internal level must decide how records (rows) are physically arranged in files. This file organization profoundly affects query performance.

Heap organization stores records in no particular order—new records are placed wherever there's space.

Characteristics

Insert: O(1) — Append to end of file
Search (by any attribute): O(n) — Must scan entire file
Delete: O(n) search + O(1) mark as deleted
Update: O(n) search + O(1) in place (if size unchanged)

When to Use

Bulk loading large datasets
Tables accessed primarily by full table scans
Staging tables for ETL processes
Log tables where new records are appended

heap_organization.txt

Text

Heap File Structure:
┌─────────────────────────────────────────────────────────┐
│ Page 1                                                  │
│ ┌───────────┬───────────┬───────────┬───────────┐      │
│ │ Record 1  │ Record 2  │ Record 3  │ Record 4  │      │
│ │ ID=7      │ ID=3      │ ID=12     │ ID=1      │      │
│ └───────────┴───────────┴───────────┴───────────┘      │
├─────────────────────────────────────────────────────────┤
│ Page 2                                                  │
│ ┌───────────┬───────────┬───────────┬───────────┐      │
│ │ Record 5  │ Record 6  │ Record 7  │ (Empty)   │      │
│ │ ID=9      │ ID=2      │ ID=5      │           │      │
│ └───────────┴───────────┴───────────┴───────────┘      │
└─────────────────────────────────────────────────────────┘
 
Notice: Records are NOT sorted by ID.
To find ID=5, we must scan from Record 1 through Record 7.
 
Query: SELECT * FROM table WHERE id = 5
Plan:  Sequential Scan, filter (id = 5)
Pages Read: Potentially ALL pages in the table

Clustered vs Unclustered

Indexing Structures

Indexes are the most important performance tool in the internal level. An index is a separate data structure that enables fast lookup without scanning entire tables.

The Phone Book Analogy

Types of Indexes

Common Index Types
Index Type	Structure	Best For	Not Good For
B-tree/B+ tree	Balanced tree with sorted keys	Range queries, equality, ORDER BY	Pattern matching, full-text
Hash Index	Hash table	Equality lookups only	Range queries, sorting
Bitmap Index	Bit vectors per value	Low-cardinality columns, AND/OR	High-cardinality, frequent updates
GiST	Generalized search tree	Geometric, full-text, complex types	Simple equality lookups
GIN	Generalized inverted index	Full-text search, arrays, JSONB	Simple scalar values
BRIN	Block range index	Very large tables with natural order	Random distribution, point queries

B+ Tree: The Workhorse

The B+ tree is the dominant index structure in relational databases. It provides:

O(log n) search — Even with billions of rows
Efficient range queries — Leaf nodes are linked sequentially
Self-balancing — Stays efficient as data grows
High fanout — Each node contains many keys, reducing tree height

btree_structure.txt

Text

B+ Tree Index Structure (order=3):
 
                        ┌─────────────┐
                        │   [30,60]   │              (Root Node)
                        └──────┬──────┘
               ┌───────────────┼───────────────┐
               ▼               ▼               ▼
        ┌───────────┐   ┌───────────┐   ┌───────────┐
        │ [10,20]   │   │ [40,50]   │   │ [70,80]   │  (Internal Nodes)
        └─────┬─────┘   └─────┬─────┘   └─────┬─────┘
         ┌────┼────┐     ┌────┼────┐     ┌────┼────┐
         ▼    ▼    ▼     ▼    ▼    ▼     ▼    ▼    ▼
       ┌───┐┌───┐┌───┐ ┌───┐┌───┐┌───┐ ┌───┐┌───┐┌───┐
       │5,8││15 ││25,│ │35,││45 ││55,│ │65,││75 ││85,│ (Leaf Nodes)
       │   ││18 ││28 │ │38 ││48 ││58 │ │68 ││78 ││90 │
       └─┬─┘└─┬─┘└─┬─┘ └─┬─┘└─┬─┘└─┬─┘ └─┬─┘└─┬─┘└─┬─┘
         │    │    │     │    │    │     │    │    │
         ▼    ▼    ▼     ▼    ▼    ▼     ▼    ▼    ▼
      [Row][Row][Row] [Row][Row][Row] [Row][Row][Row]  (Data Pages/Row Pointers)
 
Key Properties:
1. All data pointers are in leaf nodes only
2. Leaf nodes are linked for sequential access (→→→)
3. Internal nodes only guide the search
4. Tree is always balanced (all leaves at same level)
 
Search for key=45:
  Root: 45 ≥ 30 and 45 < 60, go middle
  Internal: 45 ≥ 40 and 45 < 50, go middle  
  Leaf: Found! Return row pointer
 
Disk reads: 3 (one per level) - regardless of table size!

index_creation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- B-tree index (default)
CREATE INDEX idx_customer_email ON Customer(email);
 
-- Composite index (multiple columns)
CREATE INDEX idx_order_customer_date ON Order(customer_id, order_date);
-- Efficient for: WHERE customer_id = ? AND order_date = ?
-- Also efficient for: WHERE customer_id = ? (leftmost prefix)
-- NOT efficient for: WHERE order_date = ? (not leftmost)
 
-- Unique index (also enforces constraint)
CREATE UNIQUE INDEX idx_product_sku ON Product(sku);
 
-- Partial index (only index rows matching condition)
CREATE INDEX idx_active_orders ON Order(order_date)
WHERE status = 'active';
-- Smaller index, faster for common queries
 
-- Covering index (includes all columns needed by query)
CREATE INDEX idx_order_covering ON Order(customer_id)
INCLUDE (order_date, total_amount, status);
-- Query can be answered from index alone, no table access needed
 
-- Hash index (PostgreSQL)
CREATE INDEX idx_customer_id_hash ON Customer USING HASH(customer_id);
-- Slightly faster equality lookups, no range support
 
-- GIN index for full-text search
CREATE INDEX idx_product_search ON Product 
USING GIN(to_tsvector('english', product_name || ' ' || description));

Index Overhead

Storage Allocation and Page Structure

The internal level manages physical space through pages (also called blocks)—fixed-size units of storage that are read and written atomically.

Page Fundamentals

Page size typically 4KB, 8KB, or 16KB (PostgreSQL: 8KB, MySQL InnoDB: 16KB)
Pages are the unit of I/O—even reading 1 byte reads an entire page
Database buffer pool caches pages in memory
Records must fit within pages (large objects stored separately)

page_structure.txt

Text

Typical Database Page Structure (8KB example):
 
┌────────────────────────────────────────────────────────────────┐
│ Page Header (24-100 bytes)                                     │
│ ┌────────────┬────────────┬────────────┬────────────────────┐ │
│ │ Page ID    │ LSN        │ Checksum   │ Free Space Pointer │ │
│ │ [4 bytes]  │ [8 bytes]  │ [4 bytes]  │ [2 bytes]          │ │
│ └────────────┴────────────┴────────────┴────────────────────┘ │
├────────────────────────────────────────────────────────────────┤
│ Item Pointers (Line Pointer Array) - grows downward            │
│ ┌────────┬────────┬────────┬────────┬────────┬───────────┐    │
│ │ Ptr 1  │ Ptr 2  │ Ptr 3  │ Ptr 4  │ Ptr 5  │ ...       │    │
│ │ →Row1  │ →Row2  │ →Row3  │ →Row4  │ →Row5  │           │    │
│ └────────┴────────┴────────┴────────┴────────┴───────────┘    │
│                              ↓                                  │
│ ╔════════════════════════════════════════════════════════════╗ │
│ ║              F R E E   S P A C E                           ║ │
│ ║                                                            ║ │
│ ╚════════════════════════════════════════════════════════════╝ │
│                              ↑                                  │
├────────────────────────────────────────────────────────────────┤
│ Tuple Data (Row Data) - grows upward from bottom               │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Row 5: customer_id=105, name='Eve Garcia', email=...      │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 4: customer_id=104, name='David Lee', email=...       │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 3: customer_id=103, name='Carol White', email=...     │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 2: customer_id=102, name='Bob Jones', email=...       │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Row 1: customer_id=101, name='Alice Smith', email=...     │ │
│ └───────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
 
Key Design Elements:
1. Item pointers allow rows to move within page (for compaction)
2. Free space between pointers and data allows efficient insertion
3. Header contains page metadata for recovery and integrity
4. Fixed page size enables efficient I/O and buffer management

Tablespaces and File Organization

Above individual pages, databases organize storage into:

Files — Operating system files containing multiple pages
Segments — Logical groupings of pages for a single object (table, index)
Tablespaces — Collections of database files that can span multiple disks
Datafiles — Physical files on the filesystem

tablespace_management.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Create tablespace on specific storage
-- PostgreSQL example
CREATE TABLESPACE fast_ssd LOCATION '/mnt/nvme/pg_data';
CREATE TABLESPACE archive_hdd LOCATION '/mnt/hdd/pg_archive';
 
-- Create table on specific tablespace
CREATE TABLE hot_data (
    id SERIAL PRIMARY KEY,
    data JSONB,
    created_at TIMESTAMP DEFAULT NOW()
) TABLESPACE fast_ssd;
 
CREATE TABLE cold_archive (
    id BIGINT PRIMARY KEY,
    data JSONB,
    archived_at TIMESTAMP
) TABLESPACE archive_hdd;
 
-- Move existing table to different tablespace
ALTER TABLE order_history SET TABLESPACE archive_hdd;
 
-- Create index on different tablespace than table
CREATE INDEX idx_hot_data_created 
ON hot_data(created_at) 
TABLESPACE fast_ssd;
 
-- Oracle: Create tablespace with specific parameters
-- CREATE TABLESPACE sales_data
--   DATAFILE '/u01/oradata/sales01.dbf' SIZE 10G
--   EXTENT MANAGEMENT LOCAL
--   SEGMENT SPACE MANAGEMENT AUTO;

Strategic Storage Placement

Buffer Management and Caching

Since disk I/O is slow, databases maintain a buffer pool (also called buffer cache or shared buffers)—an area of RAM that caches frequently accessed pages.

Buffer Pool Mechanics

Page Request: Query needs data from a specific page
Buffer Lookup: Check if page is already in buffer pool
Hit: If found, return directly from memory (fast!)
Miss: If not found, read from disk into buffer pool
Eviction: If buffer pool is full, evict least useful page
Write-Back: Modified ("dirty") pages are written to disk later

Buffer Pool Configuration
Database	Parameter	Recommended Setting	Notes
PostgreSQL	shared_buffers	25% of RAM	Start here, tune based on workload
PostgreSQL	effective_cache_size	50-75% of RAM	Hint for query planner
MySQL InnoDB	innodb_buffer_pool_size	70-80% of RAM	Most critical InnoDB setting
Oracle	SGA_TARGET	40-80% of RAM	Includes buffer cache + other caches
SQL Server	max server memory	Leave 2-4GB for OS	SQL Server manages the rest

Page Replacement Policies

When the buffer pool is full and a new page is needed, which existing page should be evicted?

LRU (Least Recently Used) — Evict the page not accessed for longest time
Clock/Second Chance — Approximation of LRU with lower overhead
LRU-K — Track last K references, not just last reference
2Q — Separate queues for new pages and proven hot pages
ARC (Adaptive Replacement Cache) — Balances recency and frequency

Most databases use sophisticated variants that prevent sequential scans from flushing the entire useful cache (the "table scan problem").

buffer_monitoring.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- PostgreSQL: Check buffer cache hit ratio
SELECT 
    sum(blks_hit) / nullif(sum(blks_hit + blks_read), 0) * 100 AS cache_hit_ratio
FROM pg_stat_database;
-- Goal: > 99% cache hit ratio for OLTP workloads
 
-- PostgreSQL: See what's in the buffer cache (requires pg_buffercache extension)
CREATE EXTENSION pg_buffercache;
 
SELECT 
    c.relname AS table_name,
    count(*) AS buffers,
    round(100.0 * count(*) / (SELECT count(*) FROM pg_buffercache WHERE relfilenode IS NOT NULL), 2) AS percent_of_cache
FROM pg_class c
JOIN pg_buffercache b ON b.relfilenode = pg_relation_filenode(c.oid)
GROUP BY c.relname
ORDER BY buffers DESC
LIMIT 20;
 
-- MySQL: InnoDB buffer pool statistics
SHOW STATUS LIKE 'Innodb_buffer_pool%';
-- Key metrics: 
-- Innodb_buffer_pool_read_requests (logical reads)
-- Innodb_buffer_pool_reads (physical reads from disk)
-- Hit ratio = 1 - (reads / read_requests)
 
-- MySQL: Buffer pool contents summary
SELECT 
    TABLE_NAME,
    ENGINE,
    TABLE_ROWS,
    ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
    ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY DATA_LENGTH DESC;

The Double Buffering Problem

Physical Design Decisions

DBAs make numerous physical design decisions that don't affect logical structure but dramatically impact performance. These are pure internal-level concerns.

Key Physical Design Decisions

•Index Selection — Which columns to index, what index types, covering vs non-covering indexes. Too few indexes = slow reads. Too many = slow writes.
•Partitioning — Split large tables into smaller physical pieces by range (date), list (category), or hash. Enables parallel processing and selective pruning.
•Clustering/Ordering — Physical row ordering. Which column determines physical placement? Usually the primary key or a frequently queried column.
•Compression — Reduce storage size at cost of CPU cycles. Particularly effective for cold data and repeated values.
•Fill Factor — How full to make pages initially. Lower fill factor leaves room for updates without page splits, at cost of more pages.
•Tablespace Assignment — Place hot data on fast storage, cold data on slower storage. Put indexes and data on separate devices for parallel I/O.

physical_design_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Partitioning by date range (PostgreSQL)
CREATE TABLE order_history (
    order_id        BIGINT,
    order_date      DATE NOT NULL,
    customer_id     INTEGER,
    total_amount    DECIMAL(12,2)
) PARTITION BY RANGE (order_date);
 
CREATE TABLE order_history_2023 PARTITION OF order_history
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
    
CREATE TABLE order_history_2024 PARTITION OF order_history
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
 
-- Query only scans relevant partition
SELECT * FROM order_history WHERE order_date = '2024-06-15';
-- Only order_history_2024 is accessed!
 
-- Compression (PostgreSQL TOAST + external compression)
-- Large values automatically compressed and stored out-of-line
 
-- MySQL InnoDB compression
ALTER TABLE archive_data ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
 
-- Fill factor (PostgreSQL)
CREATE INDEX idx_product_name ON product(name) WITH (fillfactor = 70);
-- Leaves 30% free space per page for updates
 
-- Clustered table (PostgreSQL)
CLUSTER order_history USING idx_order_date;
-- Physically reorders rows by order_date - one-time operation
 
-- Oracle IOT (Index-Organized Table) - data stored in index structure
-- CREATE TABLE hot_data (
--     id NUMBER PRIMARY KEY,
--     data VARCHAR2(100)
-- ) ORGANIZATION INDEX;

Measure, Don't Guess

Summary: The Internal Level

We've explored the internal level—where database abstractions meet physical reality. Here are the key takeaways:

Key Takeaways

•The internal level manages physical storage — It translates logical structures into files, pages, and bytes on storage devices.
•Disk I/O is the primary bottleneck — The 100,000x speed difference between RAM and disk drives all storage design decisions.
•File organization methods trade-offs — Heap (fast insert), sorted (fast range), hash (fast equality)—each suited to different access patterns.
•Indexes enable fast access — B+ trees provide O(log n) lookup; proper indexing is the #1 performance optimization.
•Buffer pools cache hot pages — High cache hit ratios (>99%) are essential for good performance.
•Physical design is separate from logical — You can change indexes, partitioning, and storage without changing the conceptual schema.

What's Next:

Page Complete