Storage Hierarchy - Learning Module

Loading content...

0/241

Storage Selection - Architecting for Your Workload

The Art and Science of Storage Selection

Every database deployment is unique. The e-commerce platform processing millions of transactions differs fundamentally from the data warehouse analyzing petabytes of historical data. The real-time gaming leaderboard has nothing in common with the compliance archive storing decades of financial records. Yet each requires storage—and the right storage makes the difference between a system that performs brilliantly and one that struggles under load.

This page synthesizes everything we've learned about the memory hierarchy, performance characteristics, and economics into a practical storage selection framework. You'll learn to analyze workloads systematically, map requirements to storage technologies, navigate common pitfalls, and validate your choices through testing and monitoring.

Storage selection is where theory meets practice. By the end of this page, you'll possess a principled methodology for making storage architecture decisions that optimize for your specific workload, constraints, and organizational context.

What You Will Master

By the end of this page, you will have a systematic workload analysis methodology, decision frameworks for storage technology selection, understanding of common workload archetypes and their optimal storage configurations, practical validation and testing approaches, and the ability to document and justify storage architecture decisions.

Workload Analysis Methodology

Before selecting storage, you must understand your workload. Workload analysis systematically characterizes how your database uses storage—what patterns dominate, what performance matters, and what constraints exist.

The Five Dimensions of Workload Analysis:

Workload Characterization Dimensions

•Access Pattern: Random vs. sequential, read vs. write, point queries vs. range scans, ratio of operations
•Volume Metrics: Total data size, data growth rate, working set size, transaction volume
•Performance Requirements: Latency targets (p50, p99, p999), throughput needs (IOPS, MB/s), SLAs
•Availability Requirements: Uptime SLA, RPO (data loss tolerance), RTO (recovery time), redundancy needs
•Constraints: Budget, physical space, power, regulatory requirements, existing infrastructure

Gathering Workload Data:

For existing systems:

Source	Metrics Available	Collection Method
Database statistics	Query patterns, buffer pool hits, I/O wait	Built-in views (pg_stat_, v$sysstat, sys.dm_)
OS-level metrics	Disk IOPS, throughput, latency	iostat, sar, perfmon
Storage array	Array-level stats, cache hits	Vendor tools
Application logs	Query frequency, response times	Log analysis
APM tools	End-to-end latency breakdown	Datadog, New Relic, etc.

For new systems:

Load modeling: Estimate based on similar systems, business projections
Prototyping: Build minimal system, measure actual patterns
Benchmarking: Use representative synthetic workloads (TPC-C, TPC-H, YCSB)
Expert consultation: Leverage vendor sizing tools and SEs

The Working Set Question

The single most important metric to determine is working set size—the amount of data actively accessed. If working set fits in RAM, storage performance matters less. If working set exceeds RAM, storage becomes critical. Measure by tracking buffer pool evictions and page fault rates under load.

Workload Profile Template:

Document your workload using a structured format:

=== WORKLOAD PROFILE: [Application Name] ===

1. ACCESS PATTERN
   - Read/Write Ratio: ___% read, ___% write
   - Random/Sequential: ___% random, ___% sequential
   - Average I/O Size: ___ KB (reads), ___ KB (writes)
   - IOPS Requirement: ___ read IOPS, ___ write IOPS
   - Throughput Requirement: ___ MB/s read, ___ MB/s write

2. DATA VOLUME
   - Current Database Size: ___ TB
   - Working Set Size: ___ GB
   - Annual Growth Rate: ___% / ___ TB/year
   - Retention Period: ___ years

3. PERFORMANCE REQUIREMENTS
   - Transaction Latency Target: p50 < ___ ms, p99 < ___ ms
   - Query Latency Target: p50 < ___ ms, p99 < ___ ms
   - Batch Processing Window: ___ hours

4. AVAILABILITY REQUIREMENTS
   - Uptime SLA: ___% (e.g., 99.99%)
   - RPO: ___ (seconds/minutes/hours)
   - RTO: ___ (minutes/hours)
   - Redundancy: (none/local RAID/synchronous replica/geo-distributed)

5. CONSTRAINTS
   - Budget: $___/year for storage
   - Physical: ___ U rack space, ___ kW power
   - Regulatory: (HIPAA/PCI-DSS/GDPR/other)
   - Integration: Must work with ___

Mapping Requirements to Storage Technologies

Once you understand your workload, map its requirements to appropriate storage technologies. This section provides decision rules for each requirement dimension.

Access Pattern Mapping:

Access Pattern to Storage Technology Mapping
Access Pattern	Recommended Storage	Rationale	Avoid
High random read IOPS	NVMe SSD	100K+ IOPS, low latency	HDD (limited to ~150 IOPS)
High random write IOPS	NVMe SSD with overprovisioning	Sustained random writes need FTL headroom	Consumer SSD (weak write)
Sequential read-heavy	HDD acceptable, SSD preferred	Throughput matters more than IOPS	Expensive high-IOPS SSD
Sequential write (logs)	NVMe SSD	Durability + low commit latency	HDD (adds commit latency)
Mixed random + sequential	SSD with tiering	Handle both patterns well	HDD-only (random kills performance)
Read-heavy, write-rare	QLC/TLC SSD or archive	Cost-optimized for reads	Expensive high-endurance SSD

Volume Mapping:

Data Size	Working Set	Recommended Approach
< 100 GB	Fits in RAM	Almost any storage works; RAM dominates
100 GB - 1 TB	Mostly fits in RAM	Fast SSD for buffer misses
1 TB - 10 TB	Partial RAM cache	Tiered SSD with HDD cold tier
10 TB - 100 TB	Small % in RAM	NVMe hot tier, SATA SSD warm, HDD cold
100 TB+	Massive I/O requirements	Distributed storage, multiple tiers, careful partitioning

Performance Requirement Mapping:

Latency Requirement	Storage Recommendation	Configuration Notes
p99 < 1 ms	In-memory + NVMe	Working set must fit in RAM
p99 < 10 ms	NVMe SSD	Adequate for most OLTP
p99 < 100 ms	SATA SSD	Cost-effective for less demanding
p99 < 1 second	HDD acceptable	Analytics, batch processing
Throughput > 1 GB/s	NVMe SSD (multiple)	May need RAID or parallel drives
Throughput > 10 GB/s	High-end SSD array	Enterprise hardware

Availability Requirement Mapping:

Availability Need	Storage Approach
Standard (99.9%)	Single node, RAID, regular backups
High (99.99%)	Synchronous local replica, fast failover
Very High (99.999%)	Synchronous cross-zone replication
Mission Critical	Geo-distributed synchronous writes, stretched cluster

The Replication Trade-off

Higher availability requires synchronous replication, which adds latency equal to round-trip time to the replica. Synchronous cross-datacenter replication can add 10-50ms to every commit. Balance availability against latency requirements—async replication is faster but allows data loss.

Common Workload Archetypes

Most database workloads fit into recognizable archetypes. Understanding these patterns accelerates storage selection by providing proven configurations as starting points.

Archetype 1: High-Frequency OLTP

Examples: Payment processing, trading systems, real-time gaming

High-Frequency OLTP Storage Profile
Component	Recommendation	Rationale
Buffer Pool	Maximize RAM (80%+ of working set)	Minimize disk access
Transaction Log	NVMe SSD, dedicated drive	Commit latency critical
Data Files	NVMe SSD	Random read IOPS essential
Temp Space	SSD (local if possible)	Sort/hash operations
Backups	Secondary storage, HDD/object	Cost-effective, off primary path

Archetype 2: Data Warehouse / Analytics

Examples: Business intelligence, reporting, historical analysis

Component	Recommendation	Rationale
Data Files	Mixed SSD/HDD tiered	Large sequential scans, cost-sensitive
Temp Space	NVMe SSD	Heavy sort/hash/aggregation
Indexes	SSD preferred	May still use random access
Archive	HDD or object storage	Rarely accessed, large volume
Working Set	Columnar format with compression	Maximize scan throughput

Archetype 3: Mixed Workload (HTAP)

Examples: Operational reporting, real-time analytics

Component	Recommendation	Rationale
Primary Data	NVMe SSD	Must handle OLTP requirements
Read Replicas	SATA SSD	Analytics on replicas reduces primary load
Buffer Pool	Based on OLTP working set	Analytics scans may pollute cache—consider separate pool
Indexes	SSD	Used by both workloads
Strategy	Separate OLTP and analytics pathways	Prevent resource contention

Archetype 4: Document/Content Store

Examples: CMS, media libraries, document management

Component	Recommendation	Rationale
Metadata	SSD	Random lookups by document ID
Content (BLOBs)	HDD or object storage	Large objects, sequential access
Index	SSD	Search and filtering
CDN Cache	Edge SSD	For frequently-accessed content
Strategy	Separate metadata from content	Different access patterns

Archetype 5: Time-Series / IoT

Examples: Sensor data, metrics, monitoring systems

Component	Recommendation	Rationale
Recent Data (hot)	NVMe SSD	High write throughput, recent queries
Historical (warm)	SATA SSD or HDD	Time-range queries, compacted
Archive (cold)	Object storage	Long retention, rarely accessed
Pattern	Append-only with time-based partitioning	Matches time-series access
Compression	Columnar with time-optimized encoding	10-20x compression typical

Archetypes as Starting Points

Archetypes provide proven defaults, not rigid rules. Your specific workload may combine elements from multiple archetypes. Use the workload analysis methodology to understand your actual patterns, then adapt archetype configurations to match.

The Storage Decision Framework

Combining workload analysis with technology mapping yields a structured decision process. Follow this framework to systematically select storage architecture.

Storage Selection Decision Process

•Complete workload profile — Gather all five dimensions of workload data. Don't skip this step.
•Identify the limiting factor — What's most constrained: latency, IOPS, throughput, capacity, or budget?
•Size the tiers — Estimate capacity needed at each performance tier based on data temperature analysis.
•Select technologies — Map each tier to appropriate storage technology using requirement mapping tables.
•Validate against constraints — Confirm the solution fits within budget, space, power, and regulatory constraints.
•Design for growth — Ensure architecture can scale to projected growth without fundamental redesign.
•Document and review — Create architecture documentation, get peer review, validate with stakeholders.
•Prototype and test — Build proof-of-concept, run representative workloads, measure actual performance.

Decision Tree for Primary Storage:

Start: What is the primary workload characteristic?
│
├─► High Random IOPS (> 10K/s)
│   │
│   └─► NVMe SSD required
│       ├─► Budget constrained? → SATA SSD (accept lower IOPS)
│       └─► Extreme IOPS (>100K)? → Multiple NVMe or storage array
│
├─► High Throughput (> 500 MB/s)
│   │
│   ├─► Sequential pattern? → HDD RAID or large SSD
│   └─► Random pattern? → Multiple NVMe SSD
│
├─► Large Capacity (> 50 TB)
│   │
│   ├─► Hot data < 20%? → Tiered: SSD hot + HDD cold
│   └─► Mostly random access? → Distributed SSD (expensive)
│
├─► Low Latency (< 1ms p99)
│   │
│   ├─► Working set in RAM? → NVMe SSD for misses
│   └─► Working set > RAM? → Maximize RAM first, then NVMe
│
└─► Cost Constrained
    │
    ├─► Can tolerate latency? → HDD with SSD caching
    └─► Need balance? → SATA SSD with HDD archive tier

The Bottleneck Cascade

Storage is one component in a chain. Don't over-invest in storage if the network, CPU, or application code is the bottleneck. Profile the entire stack before concluding storage is limiting. Conversely, premium storage is wasted if the database or application can't utilize it.

Common Pitfalls and Anti-Patterns

Learning from others' mistakes is more efficient than making your own. These pitfalls frequently derail storage architectures.

Storage Architecture Anti-Patterns

•One-size-fits-all storage: Using identical storage for logs, data, temp, and backups. Each has different requirements; optimize separately.
•Ignoring working set analysis: Buying more SSD when RAM would be more effective. Always analyze whether working set fits in memory first.
•HDD for OLTP: Assuming HDDs are 'good enough' for transaction processing. Random IOPS limitations will cripple performance.
•Consumer SSD for enterprise: Using consumer-grade SSDs in production. Lack of power-loss protection, lower endurance, no consistent performance.
•Over-provisioning everything: Buying maximum specs across all tiers 'just in case.' This wastes budget that could go toward actually limiting factors.
•Under-provisioning logs: Placing transaction logs on slow storage. Commit latency becomes the bottleneck for all transactions.
•Ignoring cloud egress costs: Designing architectures that move large data volumes out of cloud. Egress fees accumulate rapidly.
•No growth planning: Sizing for current needs only. Running out of storage capacity causes emergency, expensive expansions.

Pitfall Examples and Corrections
Pitfall	Symptom	Root Cause	Correction
Log on HDD	High commit latency, reduced TPS	Sequential writes still slow on HDD	Move logs to dedicated SSD
Small buffer pool	Low cache hit ratio, high I/O wait	Working set doesn't fit	Increase RAM allocation
Single SSD for everything	I/O contention, variable latency	Log, data, temp competing	Separate drives by function
No RAID/redundancy	Drive failure = extended downtime	Cost optimization gone wrong	At minimum RAID-1 for critical paths
Ignoring IOPS limits	SSD 'throttling' under load	Consumer SSD write sustained limits	Use enterprise SSD with consistent performance

The 'We'll Fix It Later' Trap

Technical debt in storage architecture is expensive to repay. Data migrations, extended downtime, and application changes may be required to fix poor initial decisions. Invest in proper analysis upfront—it's far cheaper than remediation.

Validation and Testing

Before committing to a storage architecture, validate it through testing. Paper designs often reveal surprises when subjected to real workloads.

Testing Methodology:

Storage Validation Stages

•Device-level benchmarking: Use fio or diskspd to measure raw device performance. Confirm drives meet specifications.
•Database-level benchmarking: Run pgbench, sysbench, or HammerDB to measure database operations per second.
•Application workload replay: Capture production query patterns and replay against test system.
•Load testing: Gradually increase load until performance degrades. Find breaking points.
•Longevity testing: Run sustained load for hours/days. Identify thermal throttling, GC storms, or wear-related slowdowns.
•Failure testing: Kill drives, corrupt data, simulate power loss. Validate redundancy and recovery.

Key Benchmarks to Run:

fio (Flexible I/O Tester):

# Random read test
fio --name=randread --ioengine=libaio --direct=1 --rw=randread \
    --bs=4k --numjobs=16 --size=10G --runtime=60 --group_reporting

# Random write test  
fio --name=randwrite --ioengine=libaio --direct=1 --rw=randwrite \
    --bs=4k --numjobs=16 --size=10G --runtime=60 --group_reporting

# Mixed 70/30 read/write
fio --name=mixed --ioengine=libaio --direct=1 --rw=randrw --rwmixread=70 \
    --bs=4k --numjobs=16 --size=10G --runtime=60 --group_reporting

Database Benchmark (PostgreSQL example):

# Initialize benchmark database
pgbench -i -s 100 testdb

# Run benchmark
pgbench -c 20 -j 4 -T 300 testdb

Metrics to Capture:

Metric	Target Range	What It Tells You
IOPS (4K random)	Compare to spec	Raw drive capability
Latency p50/p99	Per requirements	Consistency under load
Throughput	Compare to spec	Sequential bandwidth
CPU wait %	< 10%	If high, storage is bottleneck
Queue depth	Below saturation	Optimal concurrency level

Test with Realistic Data

Empty databases and unrealistic datasets produce misleading results. Use production-like data volumes, realistic query patterns, and representative concurrency levels. If possible, use anonymized production data for testing.

Documentation and Governance

Storage architecture decisions should be documented, justified, and governed. This creates organizational knowledge, facilitates reviews, and enables future evolution.

Architecture Decision Record (ADR) Template:

=== STORAGE ARCHITECTURE DECISION RECORD ===

Title: [Descriptive title, e.g., "Primary Storage for Order Processing System"]
Date: [Decision date]
Status: [Proposed | Accepted | Superseded]
Deciders: [Names of decision makers]

## Context
[Describe the workload, requirements, and constraints that led to this decision]

## Decision
[State the storage architecture chosen]

## Rationale
[Explain WHY this architecture was selected, with reference to:
 - Workload analysis results
 - Requirement mapping
 - Alternative options considered and rejected
 - Cost analysis
 - Testing results]

## Consequences
[Both positive and negative impacts of this decision:
 - Expected performance characteristics
 - Cost implications
 - Operational complexity
 - Growth limitations
 - Risks accepted]

## Metrics and Validation
[How will success be measured? What metrics to monitor?]

## Review Schedule
[When will this decision be re-evaluated?]

Governance Practices:

Practice	Description	Benefit
Architecture Review Board	Peer review of storage decisions	Catch issues, share knowledge
Capacity Review (quarterly)	Compare actual vs projected usage	Catch deviations early
Performance Baseline	Document expected performance at deployment	Enable regression detection
Cost Tracking	Monthly cost per workload	Identify optimization opportunities
Technology Refresh Planning	Evaluate new storage tech annually	Avoid obsolescence, capture improvements

Monitoring Post-Deployment:

After deployment, continuous monitoring validates that the storage architecture performs as expected:

Alert on latency exceeding SLA — Catch problems before users notice
Trend capacity utilization — Predict when expansion is needed
Track IOPS/throughput headroom — Know how close to limits you operate
Monitor error rates — Catch failing drives before data loss
Record cost per transaction — Drive efficiency improvements

The Living Architecture

Storage architecture is not set-and-forget. Workloads evolve, data grows, new technologies emerge, costs change. Plan for periodic review and evolution. A decision that was optimal two years ago may be suboptimal today.

Summary: Principled Storage Selection

Selecting the right storage architecture is a principled process, not guesswork. We've developed a comprehensive methodology from workload analysis through validation and governance. Let's consolidate the key insights:

Key Takeaways

•Workload analysis is mandatory — You cannot select storage correctly without understanding access patterns, volumes, performance requirements, availability needs, and constraints.
•Requirements map to technologies — Systematic mapping tables connect workload characteristics to appropriate storage types.
•Archetypes provide starting points — OLTP, analytics, mixed workload, document store, and time-series patterns have proven configurations.
•Follow a structured decision process — Profile → Identify limiting factors → Size tiers → Select technologies → Validate → Document.
•Avoid common pitfalls — One-size-fits-all, ignoring working set, HDD for OLTP, and under-provisioning logs cause frequent failures.
•Validate before committing — Benchmark at device, database, and application levels with realistic data and load.
•Document and govern — Architecture Decision Records capture rationale; ongoing monitoring validates performance.

The Module Complete:

This concludes the Storage Hierarchy module. You now possess:

Understanding of the complete memory hierarchy from registers to tape
Deep knowledge of RAM, SSD, and HDD technologies and characteristics
Fluency in access time physics across all storage tiers
Economic analysis skills for storage cost optimization
A practical framework for storage architecture selection

These concepts form the foundation for the remaining storage and indexing chapters. Next, you'll dive into Disk Structure—exploring the physical organization of data on persistent storage and how databases leverage disk characteristics for optimal performance.

Module Complete

Congratulations! You've completed the Storage Hierarchy module. You now understand the fundamental storage concepts that underpin all database system design. This knowledge will inform every subsequent topic—from disk organization to buffer management to index structures to query optimization.