In Memory Databases - Learning Module

Loading content...

0/241

In-Memory Concept

The RAM Revolution in Database Systems

For decades, database systems were designed around a fundamental constraint: disk is slow, but memory is scarce. Every component of traditional DBMS architecture—buffer pools, I/O schedulers, page management, write-ahead logging—exists primarily to minimize the latency penalty of reading from and writing to persistent storage.

But what happens when we remove this constraint entirely?

In-memory databases (IMDBs) represent a paradigm shift in database architecture. By storing the entire dataset in main memory (RAM), these systems bypass the I/O bottleneck that has defined database design for half a century. The result isn't just incremental improvement—it's a fundamental transformation in what databases can achieve.

What You Will Learn

By the end of this page, you will understand the core concepts of in-memory databases: why RAM-based storage enables dramatic performance gains, how memory hierarchies affect database design, the architectural differences between disk-based and in-memory systems, and the trade-offs that define when in-memory databases are the right choice.

The Bottleneck That Shaped Database History

To appreciate why in-memory databases represent such a radical departure, we must first understand the constraint that shaped traditional database design: the I/O wall.

The Disk Access Problem

In a traditional disk-based database, data resides on persistent storage—historically spinning hard disk drives (HDDs), and increasingly solid-state drives (SSDs). When the database needs to read or write data, it must transfer that data between disk and memory. This transfer is extraordinarily slow relative to CPU operations.

Storage Hierarchy Latency Comparison
Storage Type	Typical Latency	Relative Speed	Capacity (Typical Server)
CPU L1 Cache	~1 nanosecond	1x (baseline)	64-128 KB
CPU L2 Cache	~4 nanoseconds	4x slower	256 KB - 1 MB
CPU L3 Cache	~12 nanoseconds	12x slower	8-64 MB
Main Memory (RAM)	~100 nanoseconds	100x slower	64-2048 GB
NVMe SSD	~10-100 microseconds	10,000-100,000x slower	1-32 TB
SATA SSD	~100 microseconds	100,000x slower	1-16 TB
HDD (7200 RPM)	~5-10 milliseconds	5,000,000-10,000,000x slower	1-18 TB

The Scale of the Problem

If a CPU cycle were one second, reading from RAM would take about 100 seconds. Reading from an SSD would take 1-3 days. Reading from a spinning hard drive would take 3-6 months. This astronomical difference in access times is why disk I/O has historically been the dominant bottleneck in database performance.

How Traditional Databases Cope

Because disk access is so expensive, traditional DBMS architectures invest enormous complexity in minimizing I/O:

Buffer Pool Management: Keep frequently accessed pages in memory, using algorithms like LRU to decide what to evict
Page Organization: Group related data on the same disk pages to minimize random access
Write-Ahead Logging (WAL): Batch and sequence writes to reduce random I/O
Prefetching: Predict which pages will be needed and load them before they're requested
B+-Tree Indexes: Minimize tree depth to reduce the number of disk reads per lookup

All of this complexity exists for one reason: to hide the latency of disk from the application. But what if we eliminated the underlying problem instead?

The In-Memory Database Concept

An in-memory database (IMDB), also called a main memory database system (MMDB), is a database management system that primarily relies on main memory for data storage, rather than disk-based storage mechanisms.

Core Principle

The fundamental idea is deceptively simple: if all data resides in RAM, we never wait for disk I/O during query execution. Every data access becomes a memory reference—nanoseconds instead of milliseconds.

Disk-Based DBMS

•Data resides on disk; memory is a cache
•Query execution waits for I/O completion
•Buffer pool manages page caching
•Complex page layout optimization
•Write-ahead logging for every transaction
•I/O dominates execution time
•Indexes designed to minimize disk reads

In-Memory DBMS

•Data resides in RAM; disk is for persistence
•Query execution at memory speed
•No buffer pool needed—all data accessible
•Data structures optimized for CPU caches
•Logging strategies optimized for throughput
•CPU becomes the primary bottleneck
•Indexes optimized for cache efficiency

Beyond Simple Caching

It's crucial to understand that an in-memory database is not simply a disk-based database with a very large cache. When all data is guaranteed to be in memory, the entire architecture can be redesigned:

No Page Management: Disk-based databases organize data into fixed-size pages (typically 4KB-16KB) optimized for disk block reads. In-memory databases can use arbitrary data structures optimized for pointer traversal and cache locality.
Direct Memory Pointers: Instead of storing disk addresses that must be translated through buffer pool lookups, in-memory databases can use direct memory pointers for relationships between data objects.
Lock-Free Data Structures: Without I/O waits, transactions complete in microseconds. This makes fine-grained locking and lock-free concurrency control practical.
Different Indexing Strategies: B+-trees are designed to minimize tree height (disk accesses). In-memory databases can use T-trees, adaptive radix trees, or cache-conscious structures that exploit CPU cache hierarchies.

The Memory Pricing Revolution

In 2000, RAM cost approximately $1,000 per GB. By 2024, that cost has dropped below $5 per GB—a 200x reduction. Meanwhile, dataset sizes for many critical applications (OLTP systems, session stores, real-time analytics) often fit comfortably in the memory capacity of a modern server (256GB-2TB). This economic shift has made in-memory databases practical for workloads that were previously unimaginable.

Memory Architecture Foundations

Understanding in-memory databases requires appreciating the memory hierarchy of modern computer systems. While "in-memory" sounds uniform, the reality is a complex hierarchy with its own performance cliffs.

The CPU Cache Hierarchy

Modern CPUs don't access main memory directly. Instead, data flows through multiple levels of cache, each faster and smaller than the next:

memory_hierarchy.txt
┌─────────────────────────────────────────────────────────────────┐
│                          CPU CORE                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Registers (< 1 KB) - Access: ~0.25 ns                  │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  L1 Cache (64-128 KB) - Access: ~1 ns                   │    │
│  │  • Split: L1d (Data) + L1i (Instructions)               │    │
│  │  • Per-core, very fast, very small                      │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  L2 Cache (256 KB - 1 MB) - Access: ~4 ns               │    │
│  │  • Per-core, larger than L1                             │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────────┐
│  L3 Cache (8-64 MB) - Access: ~12 ns                            │
│  • Shared across all cores                                      │
│  • Last level cache (LLC) before main memory                    │
└─────────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────────┐
│  Main Memory (DRAM) - Access: ~100 ns                           │
│  • 64 GB - 2 TB+ capacity                                       │
│  • Accessed via memory controller                               │
│  • The "in-memory" in in-memory databases                       │
└─────────────────────────────────────────────────────────────────┘

Cache-Conscious Database Design

In-memory databases must be designed with cache behavior in mind. A naive implementation that stores all data in RAM but ignores cache effects will leave most of the potential performance gains on the table.

Key Cache Optimization Principles:

Cache Line Awareness: CPUs fetch data in 64-byte cache lines. Data structures should be organized so that accessing one element brings related elements into cache.
Sequential Access Patterns: Sequential memory access is significantly faster than random access, even in RAM. Prefetchers can predict and preload sequential data.
Data Structure Layout: Traditional pointer-heavy structures (linked lists, tree nodes with child pointers) cause cache misses. Flat, array-based structures are dramatically faster.
NUMA Considerations: On multi-socket systems, memory access latency varies depending on which socket owns the memory. In-memory databases must be NUMA-aware.

The New Bottleneck: Memory Bandwidth

With disk I/O eliminated, the next bottleneck becomes memory bandwidth—the rate at which data can be transferred between RAM and CPU. A modern server might have 100+ GB/s of memory bandwidth, but that can still become saturated with analytical workloads that scan large datasets. In-memory databases must be designed to minimize data movement and maximize computation per byte transferred.

Architectural Paradigms

In-memory databases aren't monolithic—they span multiple architectural approaches, each optimized for different workloads. Understanding these paradigms is essential for selecting and designing IMDB systems.

1. Row-Oriented In-Memory Databases

These systems store data row-by-row, similar to traditional OLTP databases, but entirely in memory. They excel at transactional workloads where queries access complete records.

Examples: VoltDB, MemSQL (now SingleStore), Oracle TimesTen

Characteristics:

Optimized for point queries and short transactions
Record-at-a-time processing
Tuple layout keeps all columns of a row contiguous
Excellent for OLTP workloads

row_store_layout.txt
Row Store Memory Layout:
┌──────────────────────────────────────────────────────────────┐
│ Row 1: [ID₁ | Name₁ | Age₁ | Salary₁ | Dept₁ | ...]          │
├──────────────────────────────────────────────────────────────┤
│ Row 2: [ID₂ | Name₂ | Age₂ | Salary₂ | Dept₂ | ...]          │
├──────────────────────────────────────────────────────────────┤
│ Row 3: [ID₃ | Name₃ | Age₃ | Salary₃ | Dept₃ | ...]          │
├──────────────────────────────────────────────────────────────┤
│ ...                                                          │
└──────────────────────────────────────────────────────────────┘
 
Advantages:
• Reading complete row requires single memory access
• Excellent for INSERT/UPDATE of complete records
• Natural fit for transactional workloads
 
Disadvantages:
• Analytical queries (e.g., SUM(Salary)) read unneeded columns
• Poor cache utilization for columnar aggregations

2. Column-Oriented In-Memory Databases

Column stores organize data by column rather than by row. Each column is stored as a contiguous array in memory. This layout provides dramatic performance improvements for analytical queries.

Examples: SAP HANA (columnar tables), ClickHouse, Apache Druid

Characteristics:

Optimized for aggregations and analytics
Column-at-a-time processing
Excellent compression ratios
SIMD-friendly data layout

column_store_layout.txt
Column Store Memory Layout:
┌──────────────────────────────────────────────────┐
│ ID Column:      [ID₁ | ID₂ | ID₃ | ID₄ | ...]    │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Name Column:    [Name₁ | Name₂ | Name₃ | ...]    │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Age Column:     [Age₁ | Age₂ | Age₃ | ...]       │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Salary Column:  [Sal₁ | Sal₂ | Sal₃ | ...]       │
└──────────────────────────────────────────────────┘
 
Advantages:
• SUM(Salary) reads only the Salary column - perfect cache utilization
• Excellent compression (similar values cluster together)
• SIMD operations process 4-8+ values simultaneously
• Vectorized query execution
 
Disadvantages:
• Reconstruct full rows requires reading all columns
• Point updates may require multiple memory locations

3. Hybrid Row-Column Stores

Modern in-memory databases increasingly support both layouts, choosing the optimal representation for each workload.

Examples: SAP HANA, Oracle Database In-Memory, SingleStore

Hybrid Approach:

Transaction processing uses row storage
Analytics uses columnar representation
Background processes synchronize between representations
Query optimizer chooses optimal storage access

4. Key-Value In-Memory Stores

The simplest in-memory data model: a hash table or tree mapping keys to values.

Examples: Redis, Memcached, Aerospike

Characteristics:

O(1) average case get/set operations
Sub-millisecond latencies
Rich data type support (Redis)
Often used as cache layers or session stores

Data Structures for In-Memory Databases

Without disk I/O constraints, in-memory databases can choose data structures based purely on CPU and cache efficiency. This has led to the adoption—and invention—of specialized structures.

Index Structures

While B+-trees remain common (and still work well), several alternatives emerged specifically for in-memory scenarios:

Index Structures for In-Memory Databases
Structure	Characteristics	Best For
T-Trees	Binary search tree variant; stores multiple keys per node; predecessor of in-memory indexing	Legacy IMDB systems
Adaptive Radix Trees (ART)	Radix tree with adaptive node sizes; O(k) lookup where k is key length; cache-efficient	Variable-length keys, range scans
Hash Indexes	O(1) average lookup; no range query support; highly cache-efficient	Point queries, exact matches
Masstree	Trie of B+-trees; combines prefix compression with cache efficiency	Concurrent workloads, mixed key types
Bw-Trees	Lock-free B+-tree variant; uses logical pointers; designed for modern CPUs	High-concurrency OLTP
Skip Lists	Probabilistic balanced structure; lock-free variants available	Concurrent sorted maps

Cache-Conscious Data Layouts

Traditional data structures often perform poorly on modern CPUs because they ignore cache behavior. In-memory databases employ several techniques to maximize cache utilization:

1. Cache-Sensitive B+-Trees (CSB+-Trees)

Standard B+-trees store pointers to child nodes. CSB+-trees instead store all children of a node contiguously in memory, requiring only a single pointer per node. This dramatically improves cache utilization during tree traversal.

2. Column Compression

Columnar data can be heavily compressed because similar values are stored together:

Dictionary Encoding: Replace strings with integer codes
Run-Length Encoding: Store (value, count) for repeated values
Bit-Packing: Use only necessary bits for small integers
Delta Encoding: Store differences from previous value

Compression reduces memory footprint and improves performance by increasing the amount of data that fits in cache.

dictionary_encoding_example.txt
Dictionary Encoding Example:
 
Original Column (Country):
["USA", "Canada", "USA", "USA", "Germany", "Canada", "USA", ...]
 
Dictionary:
0 → "Canada"
1 → "Germany"  
2 → "USA"
 
Encoded Column (integers):
[2, 0, 2, 2, 1, 0, 2, ...]
 
Benefits:
• Storage: "USA" (3 bytes) → 2 (bits can represent 0-2)
• Comparison: Integer comparison instead of string comparison
• Cache: More values fit in cache line
• SIMD: Process multiple encoded values in parallel

Vectorized Query Execution

Modern in-memory databases process data in batches (vectors) of values rather than one tuple at a time. Combined with SIMD instructions (SSE, AVX), this allows processing 4-16 values with a single CPU instruction. Columnar layout enables this naturally: a column is already a vector of values ready for parallel processing.

Concurrency in In-Memory Databases

The absence of disk I/O fundamentally changes concurrency control. In traditional databases, I/O wait times dwarf lock hold times, so coarse-grained locking has minimal impact. In in-memory databases, transactions complete in microseconds—making lock contention the primary scalability bottleneck.

The Serial Execution Approach

Some in-memory databases (notably VoltDB and Redis) take a radical approach: execute all transactions serially on a single thread. This eliminates concurrency control overhead entirely.

Advantages:

No lock overhead whatsoever
Simpler implementation, easier to reason about
Deterministic execution

Disadvantages:

Cannot utilize multiple CPU cores for parallelism
Long-running queries block everything
Must partition data to scale (each partition has its own serial executor)

Optimistic Concurrency Control (OCC)

OCC assumes conflicts are rare. Transactions execute without acquiring locks and validate at commit time:

Read Phase: Transaction reads data, tracking what was accessed
Validation Phase: Check if any read data was modified by concurrent transactions
Write Phase: If validation passes, apply all writes atomically

In-memory databases favor OCC because it eliminates lock wait time, and the fast memory access means the validation phase is cheap.

Multi-Version Concurrency Control (MVCC)

MVCC maintains multiple versions of each data item, allowing readers to see consistent snapshots without blocking writers:

Writers create new versions instead of modifying in place
Readers access the appropriate version based on their transaction timestamp
Old versions are garbage collected when no longer needed

MVCC is particularly powerful for in-memory databases because version chains can be traversed quickly in memory, and the read-write separation eliminates most conflicts.

mvcc_version_chain.txt
MVCC Version Chain Example:
 
Record: Employee #1001
 
┌─────────────────────────────────────────────────────────────────┐
│ Latest Version (timestamp: 105)                                 │
│ { id: 1001, name: "Alice", salary: 95000 }                     │
│ ← Written by Transaction 105                                    │
└─────────────────────────────────────────────────────────────────┘
                    │ (prev pointer)
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│ Previous Version (timestamp: 98)                                │
│ { id: 1001, name: "Alice", salary: 90000 }                     │
│ ← Written by Transaction 98                                     │
└─────────────────────────────────────────────────────────────────┘
                    │ (prev pointer)
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│ Original Version (timestamp: 42)                                │
│ { id: 1001, name: "Alice", salary: 85000 }                     │
│ ← Written by Transaction 42                                     │
└─────────────────────────────────────────────────────────────────┘
 
Transaction with read timestamp 100 sees salary = 90000
Transaction with read timestamp 110 sees salary = 95000
Both proceed without blocking!

Lightweight Locking Techniques

When locking is necessary, in-memory databases use lightweight primitives: spin locks instead of OS mutexes (avoiding context switch overhead), lock-free data structures using atomic compare-and-swap operations, and hardware transactional memory (HTM) on supporting CPUs (Intel TSX, IBM POWER). These techniques reduce lock overhead from microseconds to nanoseconds.

When In-Memory Databases Make Sense

In-memory databases aren't universally superior to disk-based systems. They involve trade-offs that make them ideal for specific scenarios while less suitable for others.

Ideal Use Cases

When In-Memory Databases Excel

•Real-Time Analytics: Dashboards requiring sub-second responses on large datasets. The ability to scan millions of records in memory enables interactive exploration.
•High-Throughput OLTP: Applications requiring thousands of transactions per second with sub-millisecond latency—trading platforms, telecommunications billing, online gaming.
•Caching Layers: Session stores, API response caches, frequently accessed reference data. Redis and Memcached dominate this space.
•Machine Learning Feature Stores: Serving precomputed features for real-time ML inference, where low latency is critical for user-facing predictions.
•Time-Critical Operations: Fraud detection, risk management, and any scenario where decisions must be made in milliseconds.
•Working Sets That Fit in RAM: Datasets of roughly 64GB-2TB that can fit in a single server's memory, or can be partitioned across a cluster.

When Disk-Based Databases Are Better

•Multi-Terabyte Datasets: When data far exceeds available RAM. Storage costs make disk-based systems 10-100x more economical.
•Cold Data Access: Archival data accessed infrequently doesn't benefit from RAM storage. Disk is perfectly adequate when latency doesn't matter.
•Complex Recovery Requirements: Critical financial systems may prefer disk-based durability guarantees with proven recovery procedures.
•Cost-Sensitive Deployments: RAM costs ~10x more per GB than SSD, and ~100x more than HDD. For latency-insensitive workloads, this isn't justified.
•Regulatory Compliance: Some regulations may require specific data storage characteristics that in-memory architectures don't naturally provide.

The Persistence Question

The term 'in-memory database' can be misleading. Production in-memory databases are NOT merely volatile caches—they include persistence mechanisms to survive restarts and failures. We'll explore these persistence options in detail later in this module. For now, understand that 'in-memory' refers to the primary data access path, not the absence of durability.

Summary: The In-Memory Concept

We've explored the foundational concepts of in-memory databases. Let's consolidate the key insights:

Key Takeaways

•The I/O Wall Defined Database History: Traditional databases evolved complex architectures primarily to hide disk latency. Buffer pools, B+-trees, and write-ahead logging all exist to minimize I/O.
•In-Memory Databases Remove the Fundamental Constraint: By storing all data in RAM, IMDBs bypass I/O bottlenecks and operate at memory speed—100,000x faster than disk access.
•It's Not Just Caching: True in-memory databases redesign everything: data structures, indexing, concurrency control, and query execution. The result is far more than faster disk-based systems.
•Memory Hierarchy Still Matters: Even within RAM, CPU caches create a hierarchy. Cache-conscious data structures and access patterns are essential for realizing IMDB potential.
•Multiple Architectural Paradigms Exist: Row stores for OLTP, column stores for analytics, hybrid systems for mixed workloads, and key-value stores for simple access patterns.
•Concurrency Control Changes Fundamentally: Without I/O waits, transactions complete in microseconds. This shifts bottlenecks to lock contention, driving adoption of OCC, MVCC, and lock-free structures.
•Use Cases Must Justify the Cost: In-memory databases excel at low-latency, high-throughput workloads with datasets that fit in RAM. Large, infrequently accessed datasets are better served by disk-based systems.

What's Next:

With the conceptual foundation established, we'll next quantify the performance benefits that make in-memory databases compelling. We'll examine benchmark results, real-world latency measurements, and the mathematical basis for performance gains across different workload types.

Page Complete

You now understand the fundamental concepts underlying in-memory databases: the storage hierarchy problem that motivated their development, the architectural changes they enable, and the scenarios where they deliver transformative value. Next, we'll examine the quantified performance benefits in detail.

In-Memory Concept

The RAM Revolution in Database Systems

But what happens when we remove this constraint entirely?

What You Will Learn

The Bottleneck That Shaped Database History

To appreciate why in-memory databases represent such a radical departure, we must first understand the constraint that shaped traditional database design: the I/O wall.

The Disk Access Problem

Storage Hierarchy Latency Comparison
Storage Type	Typical Latency	Relative Speed	Capacity (Typical Server)
CPU L1 Cache	~1 nanosecond	1x (baseline)	64-128 KB
CPU L2 Cache	~4 nanoseconds	4x slower	256 KB - 1 MB
CPU L3 Cache	~12 nanoseconds	12x slower	8-64 MB
Main Memory (RAM)	~100 nanoseconds	100x slower	64-2048 GB
NVMe SSD	~10-100 microseconds	10,000-100,000x slower	1-32 TB
SATA SSD	~100 microseconds	100,000x slower	1-16 TB
HDD (7200 RPM)	~5-10 milliseconds	5,000,000-10,000,000x slower	1-18 TB

The Scale of the Problem

How Traditional Databases Cope

Because disk access is so expensive, traditional DBMS architectures invest enormous complexity in minimizing I/O:

Buffer Pool Management: Keep frequently accessed pages in memory, using algorithms like LRU to decide what to evict
Page Organization: Group related data on the same disk pages to minimize random access
Write-Ahead Logging (WAL): Batch and sequence writes to reduce random I/O
Prefetching: Predict which pages will be needed and load them before they're requested
B+-Tree Indexes: Minimize tree depth to reduce the number of disk reads per lookup

All of this complexity exists for one reason: to hide the latency of disk from the application. But what if we eliminated the underlying problem instead?

The In-Memory Database Concept

Core Principle

Disk-Based DBMS

•Data resides on disk; memory is a cache
•Query execution waits for I/O completion
•Buffer pool manages page caching
•Complex page layout optimization
•Write-ahead logging for every transaction
•I/O dominates execution time
•Indexes designed to minimize disk reads

In-Memory DBMS

•Data resides in RAM; disk is for persistence
•Query execution at memory speed
•No buffer pool needed—all data accessible
•Data structures optimized for CPU caches
•Logging strategies optimized for throughput
•CPU becomes the primary bottleneck
•Indexes optimized for cache efficiency

Beyond Simple Caching

No Page Management: Disk-based databases organize data into fixed-size pages (typically 4KB-16KB) optimized for disk block reads. In-memory databases can use arbitrary data structures optimized for pointer traversal and cache locality.
Direct Memory Pointers: Instead of storing disk addresses that must be translated through buffer pool lookups, in-memory databases can use direct memory pointers for relationships between data objects.
Lock-Free Data Structures: Without I/O waits, transactions complete in microseconds. This makes fine-grained locking and lock-free concurrency control practical.
Different Indexing Strategies: B+-trees are designed to minimize tree height (disk accesses). In-memory databases can use T-trees, adaptive radix trees, or cache-conscious structures that exploit CPU cache hierarchies.

The Memory Pricing Revolution

Memory Architecture Foundations

The CPU Cache Hierarchy

Modern CPUs don't access main memory directly. Instead, data flows through multiple levels of cache, each faster and smaller than the next:

memory_hierarchy.txt
┌─────────────────────────────────────────────────────────────────┐
│                          CPU CORE                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Registers (< 1 KB) - Access: ~0.25 ns                  │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  L1 Cache (64-128 KB) - Access: ~1 ns                   │    │
│  │  • Split: L1d (Data) + L1i (Instructions)               │    │
│  │  • Per-core, very fast, very small                      │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  L2 Cache (256 KB - 1 MB) - Access: ~4 ns               │    │
│  │  • Per-core, larger than L1                             │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────────┐
│  L3 Cache (8-64 MB) - Access: ~12 ns                            │
│  • Shared across all cores                                      │
│  • Last level cache (LLC) before main memory                    │
└─────────────────────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────────┐
│  Main Memory (DRAM) - Access: ~100 ns                           │
│  • 64 GB - 2 TB+ capacity                                       │
│  • Accessed via memory controller                               │
│  • The "in-memory" in in-memory databases                       │
└─────────────────────────────────────────────────────────────────┘

Cache-Conscious Database Design

Key Cache Optimization Principles:

Cache Line Awareness: CPUs fetch data in 64-byte cache lines. Data structures should be organized so that accessing one element brings related elements into cache.
Sequential Access Patterns: Sequential memory access is significantly faster than random access, even in RAM. Prefetchers can predict and preload sequential data.
Data Structure Layout: Traditional pointer-heavy structures (linked lists, tree nodes with child pointers) cause cache misses. Flat, array-based structures are dramatically faster.
NUMA Considerations: On multi-socket systems, memory access latency varies depending on which socket owns the memory. In-memory databases must be NUMA-aware.

The New Bottleneck: Memory Bandwidth

Architectural Paradigms

1. Row-Oriented In-Memory Databases

These systems store data row-by-row, similar to traditional OLTP databases, but entirely in memory. They excel at transactional workloads where queries access complete records.

Examples: VoltDB, MemSQL (now SingleStore), Oracle TimesTen

Characteristics:

Optimized for point queries and short transactions
Record-at-a-time processing
Tuple layout keeps all columns of a row contiguous
Excellent for OLTP workloads

row_store_layout.txt
Row Store Memory Layout:
┌──────────────────────────────────────────────────────────────┐
│ Row 1: [ID₁ | Name₁ | Age₁ | Salary₁ | Dept₁ | ...]          │
├──────────────────────────────────────────────────────────────┤
│ Row 2: [ID₂ | Name₂ | Age₂ | Salary₂ | Dept₂ | ...]          │
├──────────────────────────────────────────────────────────────┤
│ Row 3: [ID₃ | Name₃ | Age₃ | Salary₃ | Dept₃ | ...]          │
├──────────────────────────────────────────────────────────────┤
│ ...                                                          │
└──────────────────────────────────────────────────────────────┘
 
Advantages:
• Reading complete row requires single memory access
• Excellent for INSERT/UPDATE of complete records
• Natural fit for transactional workloads
 
Disadvantages:
• Analytical queries (e.g., SUM(Salary)) read unneeded columns
• Poor cache utilization for columnar aggregations

2. Column-Oriented In-Memory Databases

Column stores organize data by column rather than by row. Each column is stored as a contiguous array in memory. This layout provides dramatic performance improvements for analytical queries.

Examples: SAP HANA (columnar tables), ClickHouse, Apache Druid

Characteristics:

Optimized for aggregations and analytics
Column-at-a-time processing
Excellent compression ratios
SIMD-friendly data layout

column_store_layout.txt
Column Store Memory Layout:
┌──────────────────────────────────────────────────┐
│ ID Column:      [ID₁ | ID₂ | ID₃ | ID₄ | ...]    │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Name Column:    [Name₁ | Name₂ | Name₃ | ...]    │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Age Column:     [Age₁ | Age₂ | Age₃ | ...]       │
└──────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────┐
│ Salary Column:  [Sal₁ | Sal₂ | Sal₃ | ...]       │
└──────────────────────────────────────────────────┘
 
Advantages:
• SUM(Salary) reads only the Salary column - perfect cache utilization
• Excellent compression (similar values cluster together)
• SIMD operations process 4-8+ values simultaneously
• Vectorized query execution
 
Disadvantages:
• Reconstruct full rows requires reading all columns
• Point updates may require multiple memory locations

3. Hybrid Row-Column Stores

Modern in-memory databases increasingly support both layouts, choosing the optimal representation for each workload.

Examples: SAP HANA, Oracle Database In-Memory, SingleStore

Hybrid Approach:

Transaction processing uses row storage
Analytics uses columnar representation
Background processes synchronize between representations
Query optimizer chooses optimal storage access

4. Key-Value In-Memory Stores

The simplest in-memory data model: a hash table or tree mapping keys to values.

Examples: Redis, Memcached, Aerospike

Characteristics:

O(1) average case get/set operations
Sub-millisecond latencies
Rich data type support (Redis)
Often used as cache layers or session stores

Data Structures for In-Memory Databases

Without disk I/O constraints, in-memory databases can choose data structures based purely on CPU and cache efficiency. This has led to the adoption—and invention—of specialized structures.

Index Structures

While B+-trees remain common (and still work well), several alternatives emerged specifically for in-memory scenarios:

Index Structures for In-Memory Databases
Structure	Characteristics	Best For
T-Trees	Binary search tree variant; stores multiple keys per node; predecessor of in-memory indexing	Legacy IMDB systems
Adaptive Radix Trees (ART)	Radix tree with adaptive node sizes; O(k) lookup where k is key length; cache-efficient	Variable-length keys, range scans
Hash Indexes	O(1) average lookup; no range query support; highly cache-efficient	Point queries, exact matches
Masstree	Trie of B+-trees; combines prefix compression with cache efficiency	Concurrent workloads, mixed key types
Bw-Trees	Lock-free B+-tree variant; uses logical pointers; designed for modern CPUs	High-concurrency OLTP
Skip Lists	Probabilistic balanced structure; lock-free variants available	Concurrent sorted maps

Cache-Conscious Data Layouts

Traditional data structures often perform poorly on modern CPUs because they ignore cache behavior. In-memory databases employ several techniques to maximize cache utilization:

1. Cache-Sensitive B+-Trees (CSB+-Trees)

2. Column Compression

Columnar data can be heavily compressed because similar values are stored together:

Dictionary Encoding: Replace strings with integer codes
Run-Length Encoding: Store (value, count) for repeated values
Bit-Packing: Use only necessary bits for small integers
Delta Encoding: Store differences from previous value

Compression reduces memory footprint and improves performance by increasing the amount of data that fits in cache.

dictionary_encoding_example.txt
Dictionary Encoding Example:
 
Original Column (Country):
["USA", "Canada", "USA", "USA", "Germany", "Canada", "USA", ...]
 
Dictionary:
0 → "Canada"
1 → "Germany"  
2 → "USA"
 
Encoded Column (integers):
[2, 0, 2, 2, 1, 0, 2, ...]
 
Benefits:
• Storage: "USA" (3 bytes) → 2 (bits can represent 0-2)
• Comparison: Integer comparison instead of string comparison
• Cache: More values fit in cache line
• SIMD: Process multiple encoded values in parallel

Vectorized Query Execution

Concurrency in In-Memory Databases

The Serial Execution Approach

Some in-memory databases (notably VoltDB and Redis) take a radical approach: execute all transactions serially on a single thread. This eliminates concurrency control overhead entirely.

Advantages:

No lock overhead whatsoever
Simpler implementation, easier to reason about
Deterministic execution

Disadvantages:

Cannot utilize multiple CPU cores for parallelism
Long-running queries block everything
Must partition data to scale (each partition has its own serial executor)

Optimistic Concurrency Control (OCC)

OCC assumes conflicts are rare. Transactions execute without acquiring locks and validate at commit time:

Read Phase: Transaction reads data, tracking what was accessed
Validation Phase: Check if any read data was modified by concurrent transactions
Write Phase: If validation passes, apply all writes atomically

In-memory databases favor OCC because it eliminates lock wait time, and the fast memory access means the validation phase is cheap.

Multi-Version Concurrency Control (MVCC)

MVCC maintains multiple versions of each data item, allowing readers to see consistent snapshots without blocking writers:

Writers create new versions instead of modifying in place
Readers access the appropriate version based on their transaction timestamp
Old versions are garbage collected when no longer needed

MVCC is particularly powerful for in-memory databases because version chains can be traversed quickly in memory, and the read-write separation eliminates most conflicts.

mvcc_version_chain.txt
MVCC Version Chain Example:
 
Record: Employee #1001
 
┌─────────────────────────────────────────────────────────────────┐
│ Latest Version (timestamp: 105)                                 │
│ { id: 1001, name: "Alice", salary: 95000 }                     │
│ ← Written by Transaction 105                                    │
└─────────────────────────────────────────────────────────────────┘
                    │ (prev pointer)
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│ Previous Version (timestamp: 98)                                │
│ { id: 1001, name: "Alice", salary: 90000 }                     │
│ ← Written by Transaction 98                                     │
└─────────────────────────────────────────────────────────────────┘
                    │ (prev pointer)
                    ▼
┌─────────────────────────────────────────────────────────────────┐
│ Original Version (timestamp: 42)                                │
│ { id: 1001, name: "Alice", salary: 85000 }                     │
│ ← Written by Transaction 42                                     │
└─────────────────────────────────────────────────────────────────┘
 
Transaction with read timestamp 100 sees salary = 90000
Transaction with read timestamp 110 sees salary = 95000
Both proceed without blocking!

Lightweight Locking Techniques

When In-Memory Databases Make Sense

In-memory databases aren't universally superior to disk-based systems. They involve trade-offs that make them ideal for specific scenarios while less suitable for others.

Ideal Use Cases

When In-Memory Databases Excel

•Real-Time Analytics: Dashboards requiring sub-second responses on large datasets. The ability to scan millions of records in memory enables interactive exploration.
•High-Throughput OLTP: Applications requiring thousands of transactions per second with sub-millisecond latency—trading platforms, telecommunications billing, online gaming.
•Caching Layers: Session stores, API response caches, frequently accessed reference data. Redis and Memcached dominate this space.
•Machine Learning Feature Stores: Serving precomputed features for real-time ML inference, where low latency is critical for user-facing predictions.
•Time-Critical Operations: Fraud detection, risk management, and any scenario where decisions must be made in milliseconds.
•Working Sets That Fit in RAM: Datasets of roughly 64GB-2TB that can fit in a single server's memory, or can be partitioned across a cluster.

When Disk-Based Databases Are Better

•Multi-Terabyte Datasets: When data far exceeds available RAM. Storage costs make disk-based systems 10-100x more economical.
•Cold Data Access: Archival data accessed infrequently doesn't benefit from RAM storage. Disk is perfectly adequate when latency doesn't matter.
•Complex Recovery Requirements: Critical financial systems may prefer disk-based durability guarantees with proven recovery procedures.
•Cost-Sensitive Deployments: RAM costs ~10x more per GB than SSD, and ~100x more than HDD. For latency-insensitive workloads, this isn't justified.
•Regulatory Compliance: Some regulations may require specific data storage characteristics that in-memory architectures don't naturally provide.

The Persistence Question

Summary: The In-Memory Concept

We've explored the foundational concepts of in-memory databases. Let's consolidate the key insights:

Key Takeaways

•The I/O Wall Defined Database History: Traditional databases evolved complex architectures primarily to hide disk latency. Buffer pools, B+-trees, and write-ahead logging all exist to minimize I/O.
•In-Memory Databases Remove the Fundamental Constraint: By storing all data in RAM, IMDBs bypass I/O bottlenecks and operate at memory speed—100,000x faster than disk access.
•It's Not Just Caching: True in-memory databases redesign everything: data structures, indexing, concurrency control, and query execution. The result is far more than faster disk-based systems.
•Memory Hierarchy Still Matters: Even within RAM, CPU caches create a hierarchy. Cache-conscious data structures and access patterns are essential for realizing IMDB potential.
•Multiple Architectural Paradigms Exist: Row stores for OLTP, column stores for analytics, hybrid systems for mixed workloads, and key-value stores for simple access patterns.
•Concurrency Control Changes Fundamentally: Without I/O waits, transactions complete in microseconds. This shifts bottlenecks to lock contention, driving adoption of OCC, MVCC, and lock-free structures.
•Use Cases Must Justify the Cost: In-memory databases excel at low-latency, high-throughput workloads with datasets that fit in RAM. Large, infrequently accessed datasets are better served by disk-based systems.

What's Next:

Page Complete