Access Methods - Learning Module

Loading content...

0/241

Indexed Access

From Byte Offsets to Logical Keys

Direct access lets you seek to byte 1,000,000—but how do you know that's where the record you want is stored? How does a database find account #12345 among millions of records? How does an email client locate message ID 'abc123def' without scanning the entire mailbox?

The answer is indexed access: an approach that maintains auxiliary data structures (indexes) mapping logical identifiers (keys) to physical locations (file offsets or block numbers). The index transforms human-meaningful lookups into efficient random access operations.

The fundamental insight:

An index is a sorted map that trades space and write overhead for dramatically faster reads:

Without index: Find account #12345 = read all N records until found = O(N)
With index: Lookup #12345 in index → offset 847,200 → seek → read = O(log N) or O(1)

This transformation—from linear search to logarithmic or constant-time lookup—is arguably the most important optimization technique in all of computer science. Every database, search engine, file system, and key-value store depends fundamentally on indexed access.

What You Will Learn

By the end of this page, you will understand indexed file access at a deep level—the conceptual model, common index structures (hash, B-tree, LSM), how indexes interact with file organization, the space/time trade-offs involved, and practical implementation patterns used in production systems.

The Conceptual Model of Indexed Access

Indexed access introduces a layer of indirection between logical data identity and physical storage location:

    Application                Index                    Data File
    -----------              ---------                  ----------
    "Find user 'alice'"  →  index['alice'] = 47520  →  seek(47520) → read

Components of indexed access:

Data File — Contains the actual records, stored sequentially, by insertion order, or in some other primary organization.
Index — A separate data structure mapping search keys to record locations. May be stored in:
- Same file (embedded index)
- Separate file (external index)
- Memory (in-memory index, reconstructed on startup)
Search Key — The attribute(s) used for lookup. Can be:
- Primary key (unique identifier like user ID)
- Secondary key (non-unique like city or category)
- Composite key (multiple attributes combined)

Index entry structure:

Each index entry typically contains:

Search key value — The key being indexed
Pointer — File offset, block number, or record ID pointing to data

Types of indexes:

Index Classification
Classification	Types	Characteristics
By uniqueness	Primary (unique), Secondary (non-unique)	Primary: exactly one record per key; Secondary: multiple records possible
By organization	Clustered, Non-clustered	Clustered: data file sorted by index key; Non-clustered: data in different order
By structure	Hash, Tree (B-tree), Bitmap, etc.	Each structure optimizes for different query patterns
By density	Dense, Sparse	Dense: entry for every record; Sparse: entry for every block

Dense vs. Sparse indexes:

Dense index: Contains an entry for every record in the data file. Allows direct lookup but requires more space.
Sparse index: Contains entries only for some records (e.g., one per disk block). Requires data file to be sorted by index key. Smaller but requires block search after index lookup.

Clustered vs. Non-clustered:

Clustered index: The data file is physically sorted according to the index key. Only one clustered index possible per table. Range queries are extremely efficient.
Non-clustered index: The index order differs from data file order. Multiple non-clustered indexes possible. Each lookup may require random access to scattered locations.

The Space-Time Trade-off

Indexes occupy storage space and slow down writes (every insert/update must maintain the index). In exchange, they dramatically accelerate reads. The decision to create an index is always a trade-off: Is the read speedup worth the space overhead and write slowdown for this particular access pattern?

Hash Indexes: O(1) Lookup

The simplest form of indexing uses a hash table: compute a hash function on the search key, and use the result to determine where the record pointer is stored.

How hash indexes work:

Hash function: h(key) → bucket number (integer in range 0 to B-1, where B = number of buckets)
Bucket: Each bucket stores entries for keys that hash to that bucket number. Buckets may be:
- Fixed-size arrays (with overflow chains)
- Linked lists
- Separate files or pages
Lookup: Compute h(key), go to that bucket, find the entry, follow pointer to data
Insert: Compute h(key), add entry to appropriate bucket (handle overflow if needed)

Strengths of hash indexes:

O(1) average lookup — Given a good hash function, lookup is constant time
Simple implementation — Hash functions are computationally cheap
Excellent for equality queries — 'Find record where id = 12345'

Weaknesses of hash indexes:

No range query support — Hash destroys ordering; 'id > 12345' requires full scan
No prefix matching — 'name LIKE "John%"' cannot use hash index
Bucket overflow — Poor hash distribution or growing data leads to long chains
Rehashing cost — Doubling buckets requires rebuilding entire index

hash_index.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
/**
 * Simple on-disk hash index implementation
 * Demonstrates the core concepts of hash-based file indexing
 */
 
#define BUCKET_COUNT 1024
#define ENTRIES_PER_BUCKET 16
 
typedef struct {
    char key[32];           // Search key
    off_t offset;           // File offset of data record
} HashEntry;
 
typedef struct {
    HashEntry entries[ENTRIES_PER_BUCKET];
    int count;
    off_t overflow_ptr;     // Pointer to overflow bucket (0 if none)
} HashBucket;
 
typedef struct {
    int index_fd;           // File descriptor for index file
    int data_fd;            // File descriptor for data file
    int bucket_count;
} HashIndex;
 
// Simple djb2 hash function
unsigned long hash_key(const char *key) {
    unsigned long hash = 5381;
    int c;
    while ((c = *key++)) {
        hash = ((hash << 5) + hash) + c;
    }
    return hash;
}
 
// Lookup: Find record offset by key
off_t hash_index_lookup(HashIndex *idx, const char *key) {
    // 1. Compute hash and determine bucket number
    unsigned long bucket_num = hash_key(key) % idx->bucket_count;
    
    // 2. Read the bucket from index file
    HashBucket bucket;
    off_t bucket_offset = bucket_num * sizeof(HashBucket);
    pread(idx->index_fd, &bucket, sizeof(bucket), bucket_offset);
    
    // 3. Search within the bucket
    for (int i = 0; i < bucket.count; i++) {
        if (strcmp(bucket.entries[i].key, key) == 0) {
            return bucket.entries[i].offset;  // Found!
        }
    }
    
    // 4. Check overflow bucket if exists
    if (bucket.overflow_ptr != 0) {
        // Recursively search overflow chain
        // (simplified: in production, iterate instead)
        return -1;  // Not found (simplified)
    }
    
    return -1;  // Key not found
}
 
// Insert: Add a key-offset pair to the index  
int hash_index_insert(HashIndex *idx, const char *key, off_t offset) {
    unsigned long bucket_num = hash_key(key) % idx->bucket_count;
    
    HashBucket bucket;
    off_t bucket_offset = bucket_num * sizeof(HashBucket);
    pread(idx->index_fd, &bucket, sizeof(bucket), bucket_offset);
    
    // Check if bucket has room
    if (bucket.count < ENTRIES_PER_BUCKET) {
        // Add entry to bucket
        strncpy(bucket.entries[bucket.count].key, key, 31);
        bucket.entries[bucket.count].offset = offset;
        bucket.count++;
        
        // Write bucket back
        pwrite(idx->index_fd, &bucket, sizeof(bucket), bucket_offset);
        return 0;
    }
    
    // Bucket full: would need overflow handling
    // (allocate overflow bucket, link from parent)
    return -1;  // Overflow not implemented in simplified example
}
 
/*
 * Performance characteristics:
 * 
 * - Lookup: O(1) average, O(n) worst case (all in one bucket)
 * - Insert: O(1) average, O(1) amortized with rehashing
 * - Range query: O(n) - must scan all buckets
 * - Space: O(n) entries + O(B) buckets overhead
 * 
 * Real implementations use:
 * - Extendible hashing (grows without full rehash)
 * - Linear hashing (incremental growth)
 * - Cuckoo hashing (guaranteed O(1) lookup)
 */

When to Use Hash Indexes

Use hash indexes when: (1) You only need equality lookups (key = value), (2) Keys are uniformly distributed, (3) The number of keys is relatively stable. Avoid for: range queries, prefix matching, ordered iteration, or highly dynamic key sets.

B-Tree Indexes: The Gold Standard

The B-tree (and its variant, the B+tree) is the dominant index structure in databases, file systems, and storage engines worldwide. It provides logarithmic lookup time while supporting range queries and maintaining sorted order.

Why B-trees dominate:

Logarithmic operations — Find, insert, delete all O(log n)
Range query support — Efficiently retrieve all keys in a range
Sorted iteration — Traverse keys in order
Disk-optimized — Large fan-out minimizes tree height; nodes sized to disk pages
Self-balancing — Always maintains balanced structure

B-tree structure:

A B-tree of order m has these properties:

Each node has at most m children and m-1 keys
Root has at least 2 children (unless it's a leaf)
Internal nodes have at least ⌈m/2⌉ children
All leaves are at the same depth
Keys within a node are sorted

B+tree enhancement:

Databases typically use B+trees where:

Only leaves contain data pointers — Internal nodes hold only keys for routing
Leaves are linked — Efficient range scans traverse the leaf chain
Higher fan-out — Internal nodes can hold more keys

Example structure with m=4:

btree_structure.txt

text

B+Tree of order 4 (max 3 keys per node):
 
                        [30 | 60 |   ]  ← Root (internal)
                       /     |     \
                      /      |      \
         [10|20|  ]    [40|50|  ]    [70|80|90]  ← Internal nodes
          /  |  \       /  |  \       /  |  |  \
         ↓   ↓   ↓     ↓   ↓   ↓     ↓   ↓   ↓   ↓
        [L1][L2][L3]  [L4][L5][L6]  [L7][L8][L9][L10]  ← Leaf nodes
 
Leaf nodes contain:
- Actual key values
- Pointers to data records
- Link to next leaf (for range scans)
 
Example leaf node L4:
+-------------------------+
| 31 → offset 4500        |
| 35 → offset 2200        |
| 38 → offset 9100        |
| next_leaf → L5          |
+-------------------------+
 
To find key 35:
1. Start at root [30|60]
2. 35 > 30 and 35 < 60, go to middle child [40|50]
3. 35 < 40, go to leftmost child (L4)
4. Linear search in L4, find 35 → offset 2200
5. Seek to offset 2200 in data file
 
Total I/Os: 3 (root page + internal page + leaf page)
For 1 billion records, tree height ≈ 4 → only 4 I/Os per lookup!

B-tree performance analysis:

For n records with nodes holding m keys:

Tree height: h = ⌈log_m(n)⌉
Lookup: O(h) = O(log_m n) page reads
Insert: O(log n) — may cause node splits
Delete: O(log n) — may cause node merges
Range query for k results: O(log n + k/m) — traverse to start + scan leaves

Practical numbers:

Records	Keys/Node	Height	Lookups to Find Any Key
1,000	100	2	2 page reads
1,000,000	100	4	4 page reads
1,000,000,000	100	5	5 page reads

With 100+ keys per node (typical for 4KB pages with small keys), a B-tree covering a billion records is only 5 levels deep. The root and upper levels are usually cached in RAM, so practical lookups often require just 1-2 disk reads.

Why B-Trees for Disks

Binary search trees have O(log₂ n) height; B-trees have O(log_m n) where m might be 100+. This dramatically reduces tree height and thus disk reads. A BST with 1 billion nodes has height ≈30, requiring 30 disk reads per lookup. A B-tree needs only 5. Since disk I/O is ~10,000x slower than RAM access, this difference is existential.

LSM Trees: Write-Optimized Indexing

The Log-Structured Merge-tree (LSM tree) is an alternative index structure optimized for write-heavy workloads. It's the foundation of modern storage engines like LevelDB, RocksDB, Cassandra, and many others.

The problem with B-trees for writes:

B-tree inserts cause random I/O:

Read leaf page → (random read)
Modify in memory
Write leaf page back → (random write)
Possibly split nodes → (more random I/O)

On HDDs, random writes are catastrophically slow. Even on SSDs, random small writes cause write amplification.

The LSM approach:

Instead of updating in place, LSM trees:

Buffer writes in memory (memtable) — Fast, sequential in RAM
Periodically flush to disk — As immutable sorted files (SSTables)
Merge files in background — Combine smaller files into larger ones (compaction)
Read from multiple components — Check memtable, then each SSTable level

LSM structure:

lsm_structure.txt

text

LSM Tree Structure:
 
┌─────────────────────────────────────────────────┐
│                   MEMORY                         │
│  ┌─────────────────────────────────────────┐    │
│  │   Memtable (red-black tree or skiplist) │    │
│  │   Holds recent writes, sorted by key    │    │
│  │   Write: O(log n)  Read: O(log n)       │    │
│  └─────────────────────────────────────────┘    │
│           ↓ (when full, flush to L0)            │
├─────────────────────────────────────────────────┤
│                    DISK                          │
│                                                  │
│  Level 0: [SST1] [SST2] [SST3]                  │
│           (recently flushed, may overlap)        │
│                    ↓ (compact to L1)             │
│                                                  │
│  Level 1: [  SST-A  ] [  SST-B  ] [  SST-C  ]   │
│           (non-overlapping, 10x size of L0)      │
│                    ↓ (compact to L2)             │
│                                                  │
│  Level 2: [   SST-X   ] [   SST-Y   ] [   SST-Z   ]│
│           (10x size of L1)                       │
│                    ↓                             │
│  Level N: [...largest files...]                  │
└─────────────────────────────────────────────────┘
 
SSTable (Sorted String Table) format:
┌─────────────────────────────────────────────────┐
│  [Data Block 0] [Data Block 1] ... [Data Block N]│
│  [Index Block: key → block offset]               │
│  [Bloom Filter: fast negative lookups]           │
│  [Footer: metadata and offsets]                  │
└─────────────────────────────────────────────────┘
 
Read path for key K:
1. Check memtable (RAM) — O(log n)
2. Check bloom filters for each L0 SSTable
3. Binary search in L0 SSTables that might contain K
4. Check bloom filter for L1, binary search, etc.
5. Continue down levels until found or confirmed absent
 
Write path for key K:
1. Write to memtable — O(log n)
2. Done! (Flush and compaction are background)
 
This is why LSM is fast for writes!

LSM vs. B-tree trade-offs:

B-Tree vs. LSM Tree Comparison
Aspect	B-Tree	LSM Tree
Write pattern	Random I/O	Sequential I/O
Write throughput	Lower (seeks)	Higher (batched)
Write amplification	1-2x	10-30x
Read amplification	1 (single path)	Multiple levels
Point reads	Faster	Slower (check levels)
Range scans	Excellent	Good (merge iterators)
Space amplification	Lower (~1.5x)	Higher (~2x+)
Predictable latency	Yes	No (compaction spikes)

Choosing Between B-Tree and LSM

B-trees excel for: read-heavy workloads, predictable latency requirements, frequently updated data. LSM trees excel for: write-heavy workloads, append-mostly data (logs, time series), SSDs where write amplification matters less. Many modern databases offer both options (e.g., MySQL InnoDB vs. MyRocks).

Secondary Indexes and Multi-Key Lookup

So far we've focused on the primary index—usually organized around a unique identifier. But real applications need to search by multiple attributes:

SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM products WHERE category = 'electronics' AND price < 500;
SELECT * FROM orders WHERE customer_id = 42 ORDER BY date DESC;

Secondary indexes provide this capability: additional indexes on non-primary columns that point back to the primary record.

Structure of secondary indexes:

A secondary index maps: secondary_key_value → primary_key (or record pointer)

The indirection through primary key is common because:

Data file may be reorganized (compaction, clustering)
Avoids updating all secondary indexes when data moves
Enables consistent reads through primary index

Types of secondary index entries:

Storing record pointers directly:
- email='alice@example.com' → {offset: 47520, page: 12}
- Faster reads (single index lookup)
- Must update on data movement
Storing primary keys:
- email='alice@example.com' → {user_id: 1001}
- Extra lookup through primary index
- Resilient to data movement

Handling non-unique keys:

Unlike primary indexes, secondary indexes often have duplicate keys:

category = 'electronics' → [product_101, product_205, product_892, ...]

Storage options:

Posting list: Store list of matching IDs per key
Duplicate entries: Store each (key, ID) pair separately
Combined: Key → block of IDs (like inverted indexes)

secondary_index_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/**
 * Secondary index example: Index users by email
 * Primary index: user_id → user record
 * Secondary index: email → user_id
 */
 
typedef struct {
    int user_id;
    char email[128];
    char name[64];
    // ... other fields
} User;
 
// Primary index: user_id → file offset (B-tree)
// secondary_email_idx: email → user_id (B-tree)
 
// Find user by email (using secondary index)
User* find_user_by_email(DB *db, const char *email) {
    // Step 1: Look up email in secondary index
    // Returns the user_id for this email
    int user_id = btree_lookup(db->email_index, email);
    if (user_id < 0) {
        return NULL;  // Email not found
    }
    
    // Step 2: Look up user_id in primary index
    // Returns the file offset for this user
    off_t offset = btree_lookup(db->primary_index, user_id);
    
    // Step 3: Read the actual record
    User *user = malloc(sizeof(User));
    pread(db->data_fd, user, sizeof(User), offset);
    
    return user;
}
 
// Insert new user (must update both indexes)
int insert_user(DB *db, User *user) {
    // 1. Write record to data file
    off_t offset = append_record(db->data_fd, user, sizeof(User));
    
    // 2. Add to primary index: user_id → offset
    btree_insert(db->primary_index, user->user_id, offset);
    
    // 3. Add to secondary index: email → user_id
    btree_insert(db->email_index, user->email, user->user_id);
    
    return 0;
}
 
// Find all users in a city (non-unique secondary index)
// city_index stores: city → list of user_ids
UserList* find_users_by_city(DB *db, const char *city) {
    // Get posting list: all user_ids in this city
    int *user_ids;
    int count = btree_get_posting_list(db->city_index, city, &user_ids);
    
    UserList *result = create_user_list(count);
    
    // Fetch each user via primary index
    for (int i = 0; i < count; i++) {
        off_t offset = btree_lookup(db->primary_index, user_ids[i]);
        User user;
        pread(db->data_fd, &user, sizeof(User), offset);
        add_to_list(result, &user);
    }
    
    free(user_ids);
    return result;
}

The Write Amplification of Indexes

Every secondary index multiplies write cost. Inserting one user record requires: 1 data write + 1 primary index update + N secondary index updates. With 5 secondary indexes, a single insert becomes 7 writes. This is why database tuning often involves carefully selecting which columns to index.

Implementing a Practical File Index

Let's implement a complete, practical index system for a file of records. We'll use a simplified B-tree-like structure that demonstrates core principles without full B-tree complexity.

file_index_implementation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
 
/**
 * Practical file index implementation
 * Uses a sorted index file with binary search
 * Suitable for read-heavy workloads with batch updates
 */
 
#define INDEX_ENTRY_SIZE 40
#define KEY_SIZE 32
 
typedef struct {
    char key[KEY_SIZE];
    off_t offset;
} IndexEntry;
 
typedef struct {
    int index_fd;           // Index file descriptor
    int data_fd;            // Data file descriptor  
    IndexEntry *cache;      // In-memory index cache
    int entry_count;        // Number of entries
    int cache_dirty;        // Needs write-back
} FileIndex;
 
// Load index into memory
FileIndex* index_open(const char *index_path, const char *data_path) {
    FileIndex *idx = calloc(1, sizeof(FileIndex));
    
    idx->index_fd = open(index_path, O_RDWR | O_CREAT, 0644);
    idx->data_fd = open(data_path, O_RDWR | O_CREAT, 0644);
    
    // Determine index size
    off_t size = lseek(idx->index_fd, 0, SEEK_END);
    idx->entry_count = size / sizeof(IndexEntry);
    
    // Load entire index into RAM
    if (idx->entry_count > 0) {
        idx->cache = malloc(idx->entry_count * sizeof(IndexEntry));
        pread(idx->index_fd, idx->cache, size, 0);
    }
    
    return idx;
}
 
// Binary search for key (O(log n))
off_t index_lookup(FileIndex *idx, const char *key) {
    if (idx->entry_count == 0) return -1;
    
    int left = 0, right = idx->entry_count - 1;
    
    while (left <= right) {
        int mid = left + (right - left) / 2;
        int cmp = strcmp(idx->cache[mid].key, key);
        
        if (cmp == 0) {
            return idx->cache[mid].offset;  // Found
        } else if (cmp < 0) {
            left = mid + 1;
        } else {
            right = mid - 1;
        }
    }
    
    return -1;  // Not found
}
 
// Range query: find all keys in [start_key, end_key]
int index_range_query(FileIndex *idx, 
                       const char *start_key, 
                       const char *end_key,
                       off_t *results, 
                       int max_results) {
    // Find starting position with binary search
    int left = 0, right = idx->entry_count - 1;
    int start_pos = idx->entry_count;  // Default: past end
    
    while (left <= right) {
        int mid = left + (right - left) / 2;
        int cmp = strcmp(idx->cache[mid].key, start_key);
        
        if (cmp >= 0) {
            start_pos = mid;
            right = mid - 1;
        } else {
            left = mid + 1;
        }
    }
    
    // Scan forward collecting matching entries
    int count = 0;
    for (int i = start_pos; 
         i < idx->entry_count && count < max_results; 
         i++) {
        if (strcmp(idx->cache[i].key, end_key) > 0) {
            break;  // Past end of range
        }
        results[count++] = idx->cache[i].offset;
    }
    
    return count;
}
 
// Insert new entry (maintains sorted order)
int index_insert(FileIndex *idx, const char *key, off_t offset) {
    // Find insertion point
    int pos = 0;
    while (pos < idx->entry_count && 
           strcmp(idx->cache[pos].key, key) < 0) {
        pos++;
    }
    
    // Expand cache
    idx->cache = realloc(idx->cache, 
                         (idx->entry_count + 1) * sizeof(IndexEntry));
    
    // Shift entries to make room
    memmove(&idx->cache[pos + 1], 
            &idx->cache[pos],
            (idx->entry_count - pos) * sizeof(IndexEntry));
    
    // Insert new entry
    strncpy(idx->cache[pos].key, key, KEY_SIZE - 1);
    idx->cache[pos].offset = offset;
    idx->entry_count++;
    idx->cache_dirty = 1;
    
    return 0;
}
 
// Flush index to disk
void index_sync(FileIndex *idx) {
    if (!idx->cache_dirty) return;
    
    // Rewrite entire index file (simple but not optimal)
    ftruncate(idx->index_fd, 0);
    pwrite(idx->index_fd, idx->cache, 
           idx->entry_count * sizeof(IndexEntry), 0);
    fsync(idx->index_fd);
    idx->cache_dirty = 0;
}
 
// Close and cleanup
void index_close(FileIndex *idx) {
    index_sync(idx);
    close(idx->index_fd);
    close(idx->data_fd);
    free(idx->cache);
    free(idx);
}

Production Considerations

This simplified implementation loads the entire index into RAM and rewrites on every sync. Production systems use: (1) Page-based B-trees written incrementally, (2) Write-ahead logging for crash recovery, (3) Memory-mapped files for efficient large indexes, (4) Bloom filters to avoid unnecessary disk reads for absent keys.

Index Design Trade-offs

Designing indexes involves balancing competing concerns. Understanding these trade-offs is essential for building efficient storage systems.

1. Space Overhead vs. Read Performance

Indexes consume disk space and memory:

A typical B-tree index is 1-3% of data size
Dense indexes are larger but faster
Sparse indexes save space but require block scanning
Each secondary index adds ~1-5% overhead

2. Write Performance vs. Read Performance

More indexes = faster reads, slower writes
B-trees balance both reasonably
LSM trees favor writes at read cost
No indexes = fast writes, slow reads (full scans)

3. Consistency vs. Performance

Synchronous index updates: consistent but slower
Asynchronous updates: faster but briefly inconsistent
Batch updates: efficient but delayed visibility

4. Memory vs. Disk

In-memory indexes: fastest reads, limited size
Disk-based indexes: unlimited size, I/O latency
Hybrid (cached upper levels): practical balance

Index Design Decision Matrix
Scenario	Recommended Structure	Reason
Point lookups only	Hash index	O(1) lookup, simple
Range queries needed	B-tree / B+tree	Sorted, efficient range scans
Write-heavy workload	LSM tree	Batched sequential writes
Read-heavy, rarely updated	Sorted array + binary search	Simple, minimal overhead
Full-text search	Inverted index	Maps terms to documents
Geospatial data	R-tree / Quadtree	2D/3D range queries
Time-series data	Time-partitioned + B-tree	Partition pruning + indexing

The Indexing Paradox

You can always make reads faster by adding more indexes—until you can't. At some point, the write overhead of maintaining indexes exceeds the read benefit. Finding the right balance requires understanding your actual query patterns and measuring performance empirically.

Summary: Mastering Indexed Access

We've explored indexed file access—the crucial technique that transforms key-based lookups from linear scans into efficient targeted retrieval. Let's consolidate the key concepts:

Key Takeaways

•Indexes map keys to locations — An auxiliary data structure that enables efficient lookup without scanning.
•Hash indexes provide O(1) lookups — Perfect for equality queries but cannot support ranges.
•B-trees are the gold standard — O(log n) operations, range query support, disk-optimized through high fan-out.
•LSM trees optimize for writes — Batch sequential I/O instead of random updates, with read amplification trade-off.
•Secondary indexes enable multi-column search — At the cost of additional write overhead per index.
•Clustered indexes physically sort data — Excellent for range queries on the clustering key.
•Index design is about trade-offs — Space, write speed, read speed, and consistency must be balanced for your workload.

What's next:

We've covered three distinct access methods: sequential, direct, and indexed. But how do these compare in practice? When should you choose one over another? The next page provides a systematic Access Method Comparison, examining performance characteristics, use cases, and decision criteria for selecting the right approach.

Page Complete

You now possess deep knowledge of indexed file access—from fundamental concepts to B-trees, LSM trees, and secondary indexes. This understanding is essential for database internals, file system design, and building any application that must efficiently locate data by key.

Indexed Access

From Byte Offsets to Logical Keys

The fundamental insight:

An index is a sorted map that trades space and write overhead for dramatically faster reads:

Without index: Find account #12345 = read all N records until found = O(N)
With index: Lookup #12345 in index → offset 847,200 → seek → read = O(log N) or O(1)

What You Will Learn

The Conceptual Model of Indexed Access

Indexed access introduces a layer of indirection between logical data identity and physical storage location:

    Application                Index                    Data File
    -----------              ---------                  ----------
    "Find user 'alice'"  →  index['alice'] = 47520  →  seek(47520) → read

Components of indexed access:

Data File — Contains the actual records, stored sequentially, by insertion order, or in some other primary organization.
Index — A separate data structure mapping search keys to record locations. May be stored in:
- Same file (embedded index)
- Separate file (external index)
- Memory (in-memory index, reconstructed on startup)
Search Key — The attribute(s) used for lookup. Can be:
- Primary key (unique identifier like user ID)
- Secondary key (non-unique like city or category)
- Composite key (multiple attributes combined)

Index entry structure:

Each index entry typically contains:

Search key value — The key being indexed
Pointer — File offset, block number, or record ID pointing to data

Types of indexes:

Index Classification
Classification	Types	Characteristics
By uniqueness	Primary (unique), Secondary (non-unique)	Primary: exactly one record per key; Secondary: multiple records possible
By organization	Clustered, Non-clustered	Clustered: data file sorted by index key; Non-clustered: data in different order
By structure	Hash, Tree (B-tree), Bitmap, etc.	Each structure optimizes for different query patterns
By density	Dense, Sparse	Dense: entry for every record; Sparse: entry for every block

Dense vs. Sparse indexes:

Dense index: Contains an entry for every record in the data file. Allows direct lookup but requires more space.
Sparse index: Contains entries only for some records (e.g., one per disk block). Requires data file to be sorted by index key. Smaller but requires block search after index lookup.

Clustered vs. Non-clustered:

Clustered index: The data file is physically sorted according to the index key. Only one clustered index possible per table. Range queries are extremely efficient.
Non-clustered index: The index order differs from data file order. Multiple non-clustered indexes possible. Each lookup may require random access to scattered locations.

The Space-Time Trade-off

Hash Indexes: O(1) Lookup

The simplest form of indexing uses a hash table: compute a hash function on the search key, and use the result to determine where the record pointer is stored.

How hash indexes work:

Hash function: h(key) → bucket number (integer in range 0 to B-1, where B = number of buckets)
Bucket: Each bucket stores entries for keys that hash to that bucket number. Buckets may be:
- Fixed-size arrays (with overflow chains)
- Linked lists
- Separate files or pages
Lookup: Compute h(key), go to that bucket, find the entry, follow pointer to data
Insert: Compute h(key), add entry to appropriate bucket (handle overflow if needed)

Strengths of hash indexes:

O(1) average lookup — Given a good hash function, lookup is constant time
Simple implementation — Hash functions are computationally cheap
Excellent for equality queries — 'Find record where id = 12345'

Weaknesses of hash indexes:

No range query support — Hash destroys ordering; 'id > 12345' requires full scan
No prefix matching — 'name LIKE "John%"' cannot use hash index
Bucket overflow — Poor hash distribution or growing data leads to long chains
Rehashing cost — Doubling buckets requires rebuilding entire index

hash_index.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
/**
 * Simple on-disk hash index implementation
 * Demonstrates the core concepts of hash-based file indexing
 */
 
#define BUCKET_COUNT 1024
#define ENTRIES_PER_BUCKET 16
 
typedef struct {
    char key[32];           // Search key
    off_t offset;           // File offset of data record
} HashEntry;
 
typedef struct {
    HashEntry entries[ENTRIES_PER_BUCKET];
    int count;
    off_t overflow_ptr;     // Pointer to overflow bucket (0 if none)
} HashBucket;
 
typedef struct {
    int index_fd;           // File descriptor for index file
    int data_fd;            // File descriptor for data file
    int bucket_count;
} HashIndex;
 
// Simple djb2 hash function
unsigned long hash_key(const char *key) {
    unsigned long hash = 5381;
    int c;
    while ((c = *key++)) {
        hash = ((hash << 5) + hash) + c;
    }
    return hash;
}
 
// Lookup: Find record offset by key
off_t hash_index_lookup(HashIndex *idx, const char *key) {
    // 1. Compute hash and determine bucket number
    unsigned long bucket_num = hash_key(key) % idx->bucket_count;
    
    // 2. Read the bucket from index file
    HashBucket bucket;
    off_t bucket_offset = bucket_num * sizeof(HashBucket);
    pread(idx->index_fd, &bucket, sizeof(bucket), bucket_offset);
    
    // 3. Search within the bucket
    for (int i = 0; i < bucket.count; i++) {
        if (strcmp(bucket.entries[i].key, key) == 0) {
            return bucket.entries[i].offset;  // Found!
        }
    }
    
    // 4. Check overflow bucket if exists
    if (bucket.overflow_ptr != 0) {
        // Recursively search overflow chain
        // (simplified: in production, iterate instead)
        return -1;  // Not found (simplified)
    }
    
    return -1;  // Key not found
}
 
// Insert: Add a key-offset pair to the index  
int hash_index_insert(HashIndex *idx, const char *key, off_t offset) {
    unsigned long bucket_num = hash_key(key) % idx->bucket_count;
    
    HashBucket bucket;
    off_t bucket_offset = bucket_num * sizeof(HashBucket);
    pread(idx->index_fd, &bucket, sizeof(bucket), bucket_offset);
    
    // Check if bucket has room
    if (bucket.count < ENTRIES_PER_BUCKET) {
        // Add entry to bucket
        strncpy(bucket.entries[bucket.count].key, key, 31);
        bucket.entries[bucket.count].offset = offset;
        bucket.count++;
        
        // Write bucket back
        pwrite(idx->index_fd, &bucket, sizeof(bucket), bucket_offset);
        return 0;
    }
    
    // Bucket full: would need overflow handling
    // (allocate overflow bucket, link from parent)
    return -1;  // Overflow not implemented in simplified example
}
 
/*
 * Performance characteristics:
 * 
 * - Lookup: O(1) average, O(n) worst case (all in one bucket)
 * - Insert: O(1) average, O(1) amortized with rehashing
 * - Range query: O(n) - must scan all buckets
 * - Space: O(n) entries + O(B) buckets overhead
 * 
 * Real implementations use:
 * - Extendible hashing (grows without full rehash)
 * - Linear hashing (incremental growth)
 * - Cuckoo hashing (guaranteed O(1) lookup)
 */

When to Use Hash Indexes

B-Tree Indexes: The Gold Standard

Why B-trees dominate:

Logarithmic operations — Find, insert, delete all O(log n)
Range query support — Efficiently retrieve all keys in a range
Sorted iteration — Traverse keys in order
Disk-optimized — Large fan-out minimizes tree height; nodes sized to disk pages
Self-balancing — Always maintains balanced structure

B-tree structure:

A B-tree of order m has these properties:

Each node has at most m children and m-1 keys
Root has at least 2 children (unless it's a leaf)
Internal nodes have at least ⌈m/2⌉ children
All leaves are at the same depth
Keys within a node are sorted

B+tree enhancement:

Databases typically use B+trees where:

Only leaves contain data pointers — Internal nodes hold only keys for routing
Leaves are linked — Efficient range scans traverse the leaf chain
Higher fan-out — Internal nodes can hold more keys

Example structure with m=4:

btree_structure.txt

text

B+Tree of order 4 (max 3 keys per node):
 
                        [30 | 60 |   ]  ← Root (internal)
                       /     |     \
                      /      |      \
         [10|20|  ]    [40|50|  ]    [70|80|90]  ← Internal nodes
          /  |  \       /  |  \       /  |  |  \
         ↓   ↓   ↓     ↓   ↓   ↓     ↓   ↓   ↓   ↓
        [L1][L2][L3]  [L4][L5][L6]  [L7][L8][L9][L10]  ← Leaf nodes
 
Leaf nodes contain:
- Actual key values
- Pointers to data records
- Link to next leaf (for range scans)
 
Example leaf node L4:
+-------------------------+
| 31 → offset 4500        |
| 35 → offset 2200        |
| 38 → offset 9100        |
| next_leaf → L5          |
+-------------------------+
 
To find key 35:
1. Start at root [30|60]
2. 35 > 30 and 35 < 60, go to middle child [40|50]
3. 35 < 40, go to leftmost child (L4)
4. Linear search in L4, find 35 → offset 2200
5. Seek to offset 2200 in data file
 
Total I/Os: 3 (root page + internal page + leaf page)
For 1 billion records, tree height ≈ 4 → only 4 I/Os per lookup!

B-tree performance analysis:

For n records with nodes holding m keys:

Tree height: h = ⌈log_m(n)⌉
Lookup: O(h) = O(log_m n) page reads
Insert: O(log n) — may cause node splits
Delete: O(log n) — may cause node merges
Range query for k results: O(log n + k/m) — traverse to start + scan leaves

Practical numbers:

Records	Keys/Node	Height	Lookups to Find Any Key
1,000	100	2	2 page reads
1,000,000	100	4	4 page reads
1,000,000,000	100	5	5 page reads

Why B-Trees for Disks

LSM Trees: Write-Optimized Indexing

The problem with B-trees for writes:

B-tree inserts cause random I/O:

Read leaf page → (random read)
Modify in memory
Write leaf page back → (random write)
Possibly split nodes → (more random I/O)

On HDDs, random writes are catastrophically slow. Even on SSDs, random small writes cause write amplification.

The LSM approach:

Instead of updating in place, LSM trees:

Buffer writes in memory (memtable) — Fast, sequential in RAM
Periodically flush to disk — As immutable sorted files (SSTables)
Merge files in background — Combine smaller files into larger ones (compaction)
Read from multiple components — Check memtable, then each SSTable level

LSM structure:

lsm_structure.txt

text

LSM Tree Structure:
 
┌─────────────────────────────────────────────────┐
│                   MEMORY                         │
│  ┌─────────────────────────────────────────┐    │
│  │   Memtable (red-black tree or skiplist) │    │
│  │   Holds recent writes, sorted by key    │    │
│  │   Write: O(log n)  Read: O(log n)       │    │
│  └─────────────────────────────────────────┘    │
│           ↓ (when full, flush to L0)            │
├─────────────────────────────────────────────────┤
│                    DISK                          │
│                                                  │
│  Level 0: [SST1] [SST2] [SST3]                  │
│           (recently flushed, may overlap)        │
│                    ↓ (compact to L1)             │
│                                                  │
│  Level 1: [  SST-A  ] [  SST-B  ] [  SST-C  ]   │
│           (non-overlapping, 10x size of L0)      │
│                    ↓ (compact to L2)             │
│                                                  │
│  Level 2: [   SST-X   ] [   SST-Y   ] [   SST-Z   ]│
│           (10x size of L1)                       │
│                    ↓                             │
│  Level N: [...largest files...]                  │
└─────────────────────────────────────────────────┘
 
SSTable (Sorted String Table) format:
┌─────────────────────────────────────────────────┐
│  [Data Block 0] [Data Block 1] ... [Data Block N]│
│  [Index Block: key → block offset]               │
│  [Bloom Filter: fast negative lookups]           │
│  [Footer: metadata and offsets]                  │
└─────────────────────────────────────────────────┘
 
Read path for key K:
1. Check memtable (RAM) — O(log n)
2. Check bloom filters for each L0 SSTable
3. Binary search in L0 SSTables that might contain K
4. Check bloom filter for L1, binary search, etc.
5. Continue down levels until found or confirmed absent
 
Write path for key K:
1. Write to memtable — O(log n)
2. Done! (Flush and compaction are background)
 
This is why LSM is fast for writes!

LSM vs. B-tree trade-offs:

B-Tree vs. LSM Tree Comparison
Aspect	B-Tree	LSM Tree
Write pattern	Random I/O	Sequential I/O
Write throughput	Lower (seeks)	Higher (batched)
Write amplification	1-2x	10-30x
Read amplification	1 (single path)	Multiple levels
Point reads	Faster	Slower (check levels)
Range scans	Excellent	Good (merge iterators)
Space amplification	Lower (~1.5x)	Higher (~2x+)
Predictable latency	Yes	No (compaction spikes)

Choosing Between B-Tree and LSM

Secondary Indexes and Multi-Key Lookup

So far we've focused on the primary index—usually organized around a unique identifier. But real applications need to search by multiple attributes:

SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM products WHERE category = 'electronics' AND price < 500;
SELECT * FROM orders WHERE customer_id = 42 ORDER BY date DESC;

Secondary indexes provide this capability: additional indexes on non-primary columns that point back to the primary record.

Structure of secondary indexes:

A secondary index maps: secondary_key_value → primary_key (or record pointer)

The indirection through primary key is common because:

Data file may be reorganized (compaction, clustering)
Avoids updating all secondary indexes when data moves
Enables consistent reads through primary index

Types of secondary index entries:

Storing record pointers directly:
- email='alice@example.com' → {offset: 47520, page: 12}
- Faster reads (single index lookup)
- Must update on data movement
Storing primary keys:
- email='alice@example.com' → {user_id: 1001}
- Extra lookup through primary index
- Resilient to data movement

Handling non-unique keys:

Unlike primary indexes, secondary indexes often have duplicate keys:

category = 'electronics' → [product_101, product_205, product_892, ...]

Storage options:

Posting list: Store list of matching IDs per key
Duplicate entries: Store each (key, ID) pair separately
Combined: Key → block of IDs (like inverted indexes)

secondary_index_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/**
 * Secondary index example: Index users by email
 * Primary index: user_id → user record
 * Secondary index: email → user_id
 */
 
typedef struct {
    int user_id;
    char email[128];
    char name[64];
    // ... other fields
} User;
 
// Primary index: user_id → file offset (B-tree)
// secondary_email_idx: email → user_id (B-tree)
 
// Find user by email (using secondary index)
User* find_user_by_email(DB *db, const char *email) {
    // Step 1: Look up email in secondary index
    // Returns the user_id for this email
    int user_id = btree_lookup(db->email_index, email);
    if (user_id < 0) {
        return NULL;  // Email not found
    }
    
    // Step 2: Look up user_id in primary index
    // Returns the file offset for this user
    off_t offset = btree_lookup(db->primary_index, user_id);
    
    // Step 3: Read the actual record
    User *user = malloc(sizeof(User));
    pread(db->data_fd, user, sizeof(User), offset);
    
    return user;
}
 
// Insert new user (must update both indexes)
int insert_user(DB *db, User *user) {
    // 1. Write record to data file
    off_t offset = append_record(db->data_fd, user, sizeof(User));
    
    // 2. Add to primary index: user_id → offset
    btree_insert(db->primary_index, user->user_id, offset);
    
    // 3. Add to secondary index: email → user_id
    btree_insert(db->email_index, user->email, user->user_id);
    
    return 0;
}
 
// Find all users in a city (non-unique secondary index)
// city_index stores: city → list of user_ids
UserList* find_users_by_city(DB *db, const char *city) {
    // Get posting list: all user_ids in this city
    int *user_ids;
    int count = btree_get_posting_list(db->city_index, city, &user_ids);
    
    UserList *result = create_user_list(count);
    
    // Fetch each user via primary index
    for (int i = 0; i < count; i++) {
        off_t offset = btree_lookup(db->primary_index, user_ids[i]);
        User user;
        pread(db->data_fd, &user, sizeof(User), offset);
        add_to_list(result, &user);
    }
    
    free(user_ids);
    return result;
}

The Write Amplification of Indexes

Implementing a Practical File Index

Let's implement a complete, practical index system for a file of records. We'll use a simplified B-tree-like structure that demonstrates core principles without full B-tree complexity.

file_index_implementation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
 
/**
 * Practical file index implementation
 * Uses a sorted index file with binary search
 * Suitable for read-heavy workloads with batch updates
 */
 
#define INDEX_ENTRY_SIZE 40
#define KEY_SIZE 32
 
typedef struct {
    char key[KEY_SIZE];
    off_t offset;
} IndexEntry;
 
typedef struct {
    int index_fd;           // Index file descriptor
    int data_fd;            // Data file descriptor  
    IndexEntry *cache;      // In-memory index cache
    int entry_count;        // Number of entries
    int cache_dirty;        // Needs write-back
} FileIndex;
 
// Load index into memory
FileIndex* index_open(const char *index_path, const char *data_path) {
    FileIndex *idx = calloc(1, sizeof(FileIndex));
    
    idx->index_fd = open(index_path, O_RDWR | O_CREAT, 0644);
    idx->data_fd = open(data_path, O_RDWR | O_CREAT, 0644);
    
    // Determine index size
    off_t size = lseek(idx->index_fd, 0, SEEK_END);
    idx->entry_count = size / sizeof(IndexEntry);
    
    // Load entire index into RAM
    if (idx->entry_count > 0) {
        idx->cache = malloc(idx->entry_count * sizeof(IndexEntry));
        pread(idx->index_fd, idx->cache, size, 0);
    }
    
    return idx;
}
 
// Binary search for key (O(log n))
off_t index_lookup(FileIndex *idx, const char *key) {
    if (idx->entry_count == 0) return -1;
    
    int left = 0, right = idx->entry_count - 1;
    
    while (left <= right) {
        int mid = left + (right - left) / 2;
        int cmp = strcmp(idx->cache[mid].key, key);
        
        if (cmp == 0) {
            return idx->cache[mid].offset;  // Found
        } else if (cmp < 0) {
            left = mid + 1;
        } else {
            right = mid - 1;
        }
    }
    
    return -1;  // Not found
}
 
// Range query: find all keys in [start_key, end_key]
int index_range_query(FileIndex *idx, 
                       const char *start_key, 
                       const char *end_key,
                       off_t *results, 
                       int max_results) {
    // Find starting position with binary search
    int left = 0, right = idx->entry_count - 1;
    int start_pos = idx->entry_count;  // Default: past end
    
    while (left <= right) {
        int mid = left + (right - left) / 2;
        int cmp = strcmp(idx->cache[mid].key, start_key);
        
        if (cmp >= 0) {
            start_pos = mid;
            right = mid - 1;
        } else {
            left = mid + 1;
        }
    }
    
    // Scan forward collecting matching entries
    int count = 0;
    for (int i = start_pos; 
         i < idx->entry_count && count < max_results; 
         i++) {
        if (strcmp(idx->cache[i].key, end_key) > 0) {
            break;  // Past end of range
        }
        results[count++] = idx->cache[i].offset;
    }
    
    return count;
}
 
// Insert new entry (maintains sorted order)
int index_insert(FileIndex *idx, const char *key, off_t offset) {
    // Find insertion point
    int pos = 0;
    while (pos < idx->entry_count && 
           strcmp(idx->cache[pos].key, key) < 0) {
        pos++;
    }
    
    // Expand cache
    idx->cache = realloc(idx->cache, 
                         (idx->entry_count + 1) * sizeof(IndexEntry));
    
    // Shift entries to make room
    memmove(&idx->cache[pos + 1], 
            &idx->cache[pos],
            (idx->entry_count - pos) * sizeof(IndexEntry));
    
    // Insert new entry
    strncpy(idx->cache[pos].key, key, KEY_SIZE - 1);
    idx->cache[pos].offset = offset;
    idx->entry_count++;
    idx->cache_dirty = 1;
    
    return 0;
}
 
// Flush index to disk
void index_sync(FileIndex *idx) {
    if (!idx->cache_dirty) return;
    
    // Rewrite entire index file (simple but not optimal)
    ftruncate(idx->index_fd, 0);
    pwrite(idx->index_fd, idx->cache, 
           idx->entry_count * sizeof(IndexEntry), 0);
    fsync(idx->index_fd);
    idx->cache_dirty = 0;
}
 
// Close and cleanup
void index_close(FileIndex *idx) {
    index_sync(idx);
    close(idx->index_fd);
    close(idx->data_fd);
    free(idx->cache);
    free(idx);
}

Production Considerations

Index Design Trade-offs

Designing indexes involves balancing competing concerns. Understanding these trade-offs is essential for building efficient storage systems.

1. Space Overhead vs. Read Performance

Indexes consume disk space and memory:

A typical B-tree index is 1-3% of data size
Dense indexes are larger but faster
Sparse indexes save space but require block scanning
Each secondary index adds ~1-5% overhead

2. Write Performance vs. Read Performance

More indexes = faster reads, slower writes
B-trees balance both reasonably
LSM trees favor writes at read cost
No indexes = fast writes, slow reads (full scans)

3. Consistency vs. Performance

Synchronous index updates: consistent but slower
Asynchronous updates: faster but briefly inconsistent
Batch updates: efficient but delayed visibility

4. Memory vs. Disk

In-memory indexes: fastest reads, limited size
Disk-based indexes: unlimited size, I/O latency
Hybrid (cached upper levels): practical balance

Index Design Decision Matrix
Scenario	Recommended Structure	Reason
Point lookups only	Hash index	O(1) lookup, simple
Range queries needed	B-tree / B+tree	Sorted, efficient range scans
Write-heavy workload	LSM tree	Batched sequential writes
Read-heavy, rarely updated	Sorted array + binary search	Simple, minimal overhead
Full-text search	Inverted index	Maps terms to documents
Geospatial data	R-tree / Quadtree	2D/3D range queries
Time-series data	Time-partitioned + B-tree	Partition pruning + indexing

The Indexing Paradox

Summary: Mastering Indexed Access

We've explored indexed file access—the crucial technique that transforms key-based lookups from linear scans into efficient targeted retrieval. Let's consolidate the key concepts:

Key Takeaways

•Indexes map keys to locations — An auxiliary data structure that enables efficient lookup without scanning.
•Hash indexes provide O(1) lookups — Perfect for equality queries but cannot support ranges.
•B-trees are the gold standard — O(log n) operations, range query support, disk-optimized through high fan-out.
•LSM trees optimize for writes — Batch sequential I/O instead of random updates, with read amplification trade-off.
•Secondary indexes enable multi-column search — At the cost of additional write overhead per index.
•Clustered indexes physically sort data — Excellent for range queries on the clustering key.
•Index design is about trade-offs — Space, write speed, read speed, and consistency must be balanced for your workload.

What's next:

Page Complete