Loading learning content...
Main memory, commonly referred to as RAM (Random Access Memory), is the primary workspace of a computer system. It holds the operating system kernel, running applications, and their data during execution. While caches provide speed and storage provides capacity, main memory occupies the critical middle ground—large enough to hold active workloads, fast enough to keep the CPU reasonably fed.
From an operating system perspective, main memory is one of the most precious and carefully managed resources. The OS must:
Understanding how RAM works at the hardware level is foundational to understanding all of these OS functions.
By the end of this page, you will understand: how DRAM technology works at the cell level; how memory is organized into channels, DIMMs, ranks, and banks; the evolution of DDR memory standards; memory timing and access patterns; memory controllers and their scheduling algorithms; and how the OS views and manages physical memory.
Main memory in nearly all modern computers uses DRAM (Dynamic Random Access Memory), a technology that stores each bit as a charge in a tiny capacitor. Understanding DRAM's fundamental operation explains many of its performance characteristics and why it behaves differently from caches.
The DRAM cell:
Each DRAM cell consists of:
This "1T1C" design is extremely compact—the smallest practical memory cell—which is why DRAM achieves high densities at low cost. Compare this to SRAM (cache), which uses 6 transistors per bit.
The problem with capacitors:
Capacitors leak charge over time. Left alone, a DRAM cell would lose its stored value within milliseconds. This creates two critical consequences:
Sensing the bit:
DRAM cells are arranged in a 2D grid of rows and columns. Reading a cell involves:
This process takes tens of nanoseconds—orders of magnitude slower than an SRAM read. The sense amplifiers act as a row buffer, holding the entire activated row (typically 8KB) for subsequent column accesses.
When a row is activated, subsequent accesses to different columns within that row are much faster (column access time ~15ns) than accesses requiring a new row activation (row access time ~30-50ns). This is row buffer locality—a critical consideration for memory-efficient code and OS page allocation.
| Characteristic | DRAM | SRAM |
|---|---|---|
| Transistors per bit | 1 | 6 |
| Density | Very high | Low |
| Cost per bit | Low ($) | High ($$$) |
| Speed | ~10-20 ns | ~1-2 ns |
| Power (static) | Low (but refresh) | Higher (leakage) |
| Volatility | Volatile | Volatile |
| Refresh required | Yes (every 32-64 ms) | No |
| Typical use | Main memory | Cache memory |
Modern memory systems are hierarchically organized to maximize bandwidth and parallelism while managing physical constraints. Understanding this organization is essential for understanding memory performance characteristics.
Memory hierarchy (from CPU outward):
Address mapping example:
When the memory controller receives a physical address, it decodes it into:
The exact bit positions vary by system and can significantly impact performance. Interleaving lower address bits across channels/banks improves parallelism for sequential accesses.
Modern memory controllers exploit bank-level parallelism: while one bank is activating a row (slow), another bank can be serving a read from its already-activated row (fast). Address mappings that spread sequential accesses across banks achieve higher bandwidth than those that concentrate accesses in one bank.
| Component | Typical Values | Purpose |
|---|---|---|
| Channels per DIMM | 2 | DDR5 splits each DIMM into 2 independent 32-bit channels |
| Ranks per channel | 1-2 | More ranks = more capacity but also more command bus contention |
| Banks per rank | 32 (4 bank groups × 8 banks) | More banks = more parallelism |
| Row buffer size | 8 KB per bank | Larger = more row buffer hits |
| Chip width | ×8 typical | Wider chips = fewer chips per rank |
DDR SDRAM (Double Data Rate Synchronous DRAM) is the dominant memory technology in computers. "Double data rate" means data is transferred on both the rising and falling edges of the clock signal, effectively doubling bandwidth compared to single data rate (SDR) memory at the same clock frequency.
Each DDR generation roughly doubles bandwidth through higher clock speeds and architectural improvements, while the underlying DRAM cell technology remains similar.
| Generation | Data Rate (MT/s) | Voltage | Bandwidth (per channel) | Key Features | Era |
|---|---|---|---|---|---|
| DDR | 200-400 | 2.5V | 1.6-3.2 GB/s | Double data rate, prefetch 2n | 2000-2003 |
| DDR2 | 400-1066 | 1.8V | 3.2-8.5 GB/s | Prefetch 4n, higher density | 2003-2008 |
| DDR3 | 800-2133 | 1.5V | 6.4-17 GB/s | Prefetch 8n, lower power | 2007-2015 |
| DDR4 | 1600-3200 | 1.2V | 12.8-25.6 GB/s | Bank groups, higher density | 2014-present |
| DDR5 | 3200-8800+ | 1.1V | 25.6-70+ GB/s | Dual channel per DIMM, on-DIMM power management, 32 banks | 2021-present |
Key evolutionary improvements:
Prefetch architecture: Each DDR generation increases the prefetch width—the number of bits fetched from the memory array in a single internal access. DDR5 prefetches 16n bits (16 bits per data pin per access). Higher prefetch enables higher external data rates while keeping internal array speeds manageable.
Bank groups: DDR4 introduced bank groups—clusters of banks that can serve back-to-back requests faster than banks in different groups. This helps maintain high bandwidth for interleaved access patterns.
DDR5 innovations:
While bandwidth has increased ~30× from DDR to DDR5, latency has improved only modestly. DDR5-4800 has similar absolute latency (~14-16 ns first access) to DDR-400. Modern architectures are increasingly bandwidth-limited, not latency-limited. Larger caches and out-of-order execution hide latency; parallelism exploits bandwidth.
DRAM operations are governed by precise timing requirements. Understanding these timings helps explain why memory access patterns dramatically affect performance and why the memory controller is so complex.
The fundamental timing parameters:
Access latency scenarios:
Row buffer hit (best case):
Row buffer miss (row already activated, needs different row):
Row buffer closed (no row activated):
Impact on programming:
Sequential memory access achieves row buffer hits—only paying tCL for each cache line after the first. Random access within a large region pays full tRP + tRCD + tCL for most accesses. This 3-4× latency difference is why access patterns matter so much.
1234567891011121314151617181920212223242526
// Sequential access: achieves row buffer hits// ~12 GB/s on typical DDR4 single-channel systemvoid sequential_sum(int* arr, size_t n) { long sum = 0; for (size_t i = 0; i < n; i++) { sum += arr[i]; // Sequential: exploits row buffer }} // Random access: constant row buffer misses// ~1-2 GB/s on same system (6-10× slower!)void random_sum(int* arr, size_t n, size_t* indices) { long sum = 0; for (size_t i = 0; i < n; i++) { sum += arr[indices[i]]; // Random: row miss each time }} // Strided access: may or may not hit row buffer// Stride < 8KB: hits same row; Stride >= 8KB: always missesvoid strided_sum(int* arr, size_t n, size_t stride) { long sum = 0; for (size_t i = 0; i < n; i += stride) { sum += arr[i]; // Stride-dependent performance }}The memory controller is the hardware unit that translates CPU memory requests into DRAM commands (activate, precharge, read, write, refresh). Modern memory controllers are integrated into the CPU die and implement sophisticated scheduling algorithms to maximize performance.
Memory controller responsibilities:
Command scheduling policies:
FCFS (First Come First Served): Process requests in arrival order. Simple but poor performance—ignores row buffer locality.
FR-FCFS (First Ready - First Come First Served): Prioritize row buffer hits over misses, using FCFS as a tiebreaker. Much better performance but can starve requests to different rows if one row is hot.
ATLAS (Adaptive per-Thread Least-Attained-Service): Tracks how much service each thread has received; prioritizes under-served threads. Improves fairness in multi-core systems.
BLISS (Blacklisting: Throttle memory-intensive threads): Identifies and temporarily de-prioritizes memory-hogging threads to prevent them from blocking others. Improves quality of service for latency-sensitive workloads.
Row buffer policies:
The memory controller operates below the OS abstraction layer—the operating system cannot directly control scheduling decisions. However, the OS affects memory behavior through physical page placement, huge pages (which improve row buffer hit rates), and NUMA-aware allocation.
The operating system must manage physical memory as a precious, finite resource. The OS doesn't see DRAM cells and timing parameters—it sees a contiguous range of physical addresses that must be partitioned, tracked, and allocated efficiently.
Physical address space layout:
Not all physical addresses correspond to RAM. The physical address space includes:
The BIOS/UEFI provides a memory map to the OS at boot time, describing which regions are usable RAM, reserved, or memory-mapped I/O.
Page-based memory management:
The OS manages physical memory in fixed-size units called page frames (typically 4 KB on x86). The page frame allocator tracks which frames are:
Key data structures:
Modern CPUs support larger page sizes (2 MB and 1 GB on x86-64). Huge pages reduce TLB misses for large allocations and improve row buffer locality (a 2 MB huge page spans ~250 DRAM rows, keeping more accesses on the same row). Databases, VMs, and HPC applications commonly use huge pages for performance.
Memory pressure and reclamation:
When physical memory runs low, the OS must reclaim pages. Strategies include:
The OS continuously balances page cache size (for I/O performance) against free memory (for allocation headroom). The kswapd daemon in Linux proactively reclaims pages when free memory drops below thresholds.
NUMA (Non-Uniform Memory Access) describes architectures where memory access time depends on which processor accesses which memory. In NUMA systems, each processor (or group of processors) has "local" memory that it can access faster than "remote" memory attached to other processors.
Why NUMA exists:
As core counts increased, memory bandwidth became a bottleneck. If all cores shared a single memory controller, that controller becomes a chokepoint. NUMA distributes memory controllers across sockets, providing:
| Access Type | Latency (ns) | Relative Cost | Bandwidth |
|---|---|---|---|
| Local memory (same socket) | ~70-80 | 1x | Full local bandwidth |
| Remote memory (other socket) | ~120-150 | 1.5-2x | Shared interconnect |
| Cross-NUMA write | ~150-200 | 2-2.5x | Often worse than reads |
OS NUMA support:
Operating systems expose NUMA topology to applications and implement NUMA-aware policies:
Allocation policies:
Process scheduling:
Linux tools:
numactl: Run programs with specific NUMA policiesnumastat: Display NUMA memory statistics/sys/devices/system/node/: NUMA topology informationmbind(), set_mempolicy(): System calls for memory policiesNUMA-unaware applications can suffer severe performance degradation. Common mistakes: allocating all memory on first touch (ending up on one node), spawning threads that access memory allocated by other threads, and reading/writing shared data structures from multiple sockets. Profiling with 'perf stat -e numa_hit,numa_miss' reveals NUMA access patterns.
Memory errors occur in real systems—cosmic rays, electrical noise, manufacturing defects, and aging can all cause bit flips. For systems where reliability matters (servers, storage, scientific computing), ECC (Error Correcting Code) memory provides protection.
Types of memory errors:
How ECC works:
ECC memory uses additional bits to store error detection/correction codes. The most common scheme is SECDED (Single Error Correction, Double Error Detection):
ECC overhead:
Error rates in practice:
Google's study (2009) found:
For enterprise workloads and systems with large memory (TB scale), ECC is essential—without it, silent data corruption is statistically likely.
Advanced ECC schemes like Intel's Chipkill or AMD's SDDC (Single Device Data Correction) can correct all errors from a complete DRAM chip failure (8-bit positions on ×4 mode). This protects against hard failures taking out an entire chip, not just single bit flips.
We've explored main memory in depth—from DRAM cell physics to operating system memory management. Main memory sits at a critical juncture in the memory hierarchy: large enough to hold working data, but slow enough that access patterns dramatically affect performance.
What's next:
With volatile memory covered, we'll now explore secondary storage—the persistent tier of the memory hierarchy. We'll examine storage technologies from magnetic disks to SSDs, their performance characteristics, and how they interface with the operating system through storage drivers and file systems.
You now understand main memory from DRAM physics to OS management. This knowledge is essential for writing memory-efficient software, understanding system performance, and designing operating system memory management subsystems.