Loading learning content...
RAID was designed to solve two problems: reliability (surviving disk failures) and performance (exceeding single-disk capabilities). While reliability is often the primary motivation for deploying RAID, performance characteristics frequently determine whether a storage system meets application requirements.
Performance in RAID is not a single number—it's a multi-dimensional space defined by throughput (MB/s), IOPS (I/O operations per second), latency (response time), and how these metrics change under different workloads and failure conditions. Understanding these dimensions is essential for capacity planning, array design, and troubleshooting bottlenecks.
By the end of this page, you will understand how to model RAID throughput and IOPS for different configurations, identify performance bottlenecks in storage systems, predict performance degradation during disk failures and rebuilds, optimize RAID configuration for specific workload patterns, and apply queuing theory concepts to understand latency behavior.
Before analyzing RAID-specific performance, we must establish a foundation in storage performance metrics. These metrics describe different aspects of storage system capability:
Throughput (Bandwidth)
Measured in MB/s (megabytes per second) or GB/s. Throughput represents the raw data transfer rate of the storage system. It's the primary metric for sequential workloads like video streaming, backup, or large file transfers.
IOPS (Input/Output Operations Per Second)
The number of discrete read or write operations completed per second. IOPS is the primary metric for random-access workloads like database transactions, email servers, or virtualization. A workload may be IOPS-bound even when throughput is low.
Latency (Response Time)
The time between issuing an I/O request and receiving the response. Components of latency include:
The Relationship Between Metrics:
These metrics are interrelated through a fundamental equation:
$$\text{Throughput} = \text{IOPS} \times \text{I/O Size}$$
For example:
Little's Law for Storage:
Queuing theory provides insight into latency:
$$\text{Average Queue Length} = \text{Arrival Rate} \times \text{Average Wait Time}$$
Or equivalently: $$L = \lambda \times W$$
This means that as utilization increases, queue lengths and latencies grow non-linearly. At 70% utilization, latency is roughly 3× the service time. At 90%, it's approximately 10× the service time.
| Media Type | Sequential Read | Random Read IOPS | Random Write IOPS | Latency |
|---|---|---|---|---|
| 7200 RPM HDD | 150-200 MB/s | 80-150 | 80-150 | 4-8 ms |
| 10K RPM HDD | 200-250 MB/s | 150-200 | 150-200 | 3-5 ms |
| 15K RPM HDD | 250-300 MB/s | 200-300 | 200-300 | 2-4 ms |
| SATA SSD | 500-550 MB/s | 50K-100K | 30K-70K | 0.1-0.2 ms |
| NVMe SSD | 3-7 GB/s | 500K-1M | 300K-700K | 0.02-0.1 ms |
Real workloads are mixtures of reads and writes, sequential and random access, and various I/O sizes. Characterizing your workload (read/write ratio, random/sequential ratio, I/O size distribution) is the first step in predicting RAID performance.
Throughput in RAID arrays depends on the ability to parallelize I/O across multiple disks. Let's analyze each RAID level:
RAID 0 Throughput:
With n disks, theoretical maximum: $$T_{RAID0} = n \times T_{disk}$$
For large sequential I/O that spans all disks, this linear scaling is achievable. With four 150 MB/s HDDs: $$T_{RAID0} = 4 \times 150 = 600 \text{ MB/s}$$
RAID 1 Throughput:
Reads: Can load-balance across mirrors $$T_{RAID1_read} = n \times T_{disk}$$ (for n-way mirror)
Writes: Must write to all mirrors; limited by slowest $$T_{RAID1_write} = T_{disk}$$
For a 2-way mirror: 2× read throughput, 1× write throughput.
RAID 5 Throughput:
Reads: All data disks contribute $$T_{RAID5_read} = n \times T_{disk}$$
Full-stripe writes: Very efficient $$T_{RAID5_fullstripe} = (n-1) \times T_{disk}$$
Random small writes: Severely limited by read-modify-write $$T_{RAID5_random_write} = \frac{n \times T_{disk}}{4}$$
RAID 6 Throughput:
Similar to RAID 5, but with dual parity overhead:
RAID 10 Throughput:
Reads: All disks can contribute (mirror pairs share load) $$T_{RAID10_read} = n \times T_{disk}$$
Writes: Limited by mirror pair throughput $$T_{RAID10_write} = \frac{n}{2} \times T_{disk}$$
For 8 disks (4 mirror pairs) × 150 MB/s:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
def calculate_raid_throughput( num_disks: int, disk_throughput_mbs: float, raid_level: str, workload: str = "sequential_read") -> float: """ Calculate theoretical RAID throughput. Args: num_disks: Total number of drives in the array disk_throughput_mbs: Sequential throughput of single disk in MB/s raid_level: One of "0", "1", "5", "6", "10" workload: "sequential_read", "sequential_write", "random_write" Returns: Theoretical throughput in MB/s """ T = disk_throughput_mbs n = num_disks if raid_level == "0": # All stripes contribute for both reads and writes return n * T elif raid_level == "1": if workload == "sequential_read": return 2 * T # 2-way mirror else: return T # Writes go to both mirrors elif raid_level == "5": if workload == "sequential_read": return n * T elif workload == "sequential_write": return (n - 1) * T # One disk for parity else: # random_write return n * T / 4 # 4x write penalty elif raid_level == "6": if workload == "sequential_read": return n * T elif workload == "sequential_write": return (n - 2) * T # Two disks for parity else: # random_write return n * T / 6 # 6x write penalty elif raid_level == "10": num_pairs = n // 2 if workload == "sequential_read": return n * T # All disks contribute else: # writes return num_pairs * T # One write per pair # Example: Compare 8-disk arraysfor raid in ["0", "5", "6", "10"]: for workload in ["sequential_read", "sequential_write", "random_write"]: throughput = calculate_raid_throughput(8, 150, raid, workload) print(f"RAID {raid:>2} - {workload:<18}: {throughput:>6.0f} MB/s") print() # Output:# RAID 0 - sequential_read : 1200 MB/s# RAID 0 - sequential_write : 1200 MB/s# RAID 0 - random_write : 1200 MB/s # RAID 5 - sequential_read : 1200 MB/s# RAID 5 - sequential_write : 1050 MB/s# RAID 5 - random_write : 300 MB/s # RAID 6 - sequential_read : 1200 MB/s# RAID 6 - sequential_write : 900 MB/s# RAID 6 - random_write : 200 MB/s # RAID 10 - sequential_read : 1200 MB/s# RAID 10 - sequential_write : 600 MB/s# RAID 10 - random_write : 600 MB/sThese are theoretical maximums. Real-world throughput is limited by controller processing power, bus bandwidth (PCIe, SAS lanes), cache hit rates, workload alignment, and the gap between spindle speed and interface speed. Expect 60-85% of theoretical in well-tuned systems.
For transaction-oriented workloads, IOPS is the critical metric. RAID's impact on IOPS differs significantly between reads and writes.
Read IOPS:
For all RAID levels, read IOPS scales roughly linearly with disk count:
$$IOPS_{array_read} = n \times IOPS_{disk}$$
With 8 HDDs at 150 IOPS each: 1,200 read IOPS With 8 SSDs at 50K IOPS each: 400K read IOPS
Mirror-based RAID (1, 10) can sometimes exceed this slightly due to read load balancing optimizations.
Write IOPS:
The write penalty dramatically impacts effective write IOPS:
| RAID Level | Write Operations per Logical Write | Effective Write IOPS |
|---|---|---|
| RAID 0 | 1 | n × IOPS_disk |
| RAID 1 | 2 (parallel) | n × IOPS_disk / 2 |
| RAID 5 | 4 (2 read + 2 write) | n × IOPS_disk / 4 |
| RAID 6 | 6 (3 read + 3 write) | n × IOPS_disk / 6 |
| RAID 10 | 2 (parallel) | n × IOPS_disk / 2 |
Mixed Workload IOPS:
Real applications perform both reads and writes. For a workload with read fraction r and write fraction w (where r + w = 1):
$$IOPS_{effective} = \frac{IOPS_{disk} \times n}{r \times 1 + w \times \text{write_penalty}}$$
Example: 70% read, 30% write workload on 8-disk RAID 5
$$IOPS_{effective} = \frac{150 \times 8}{0.7 \times 1 + 0.3 \times 4}$$ $$= \frac{1200}{0.7 + 1.2} = \frac{1200}{1.9} \approx 632\ IOPS$$
Compare to RAID 10 with same workload: $$IOPS_{effective} = \frac{150 \times 8}{0.7 \times 1 + 0.3 \times 2}$$ $$= \frac{1200}{0.7 + 0.6} = \frac{1200}{1.3} \approx 923\ IOPS$$
RAID 10 delivers ~46% more IOPS for this write-moderate workload.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
def calculate_mixed_iops( num_disks: int, disk_iops: float, raid_level: str, read_percent: float) -> dict: """ Calculate effective IOPS for mixed read/write workload. Returns dictionary with detailed breakdown. """ write_percent = 1.0 - read_percent write_penalties = { "0": 1, "1": 2, "5": 4, "6": 6, "10": 2 } penalty = write_penalties[raid_level] raw_iops = num_disks * disk_iops # Weighted average of read (1x) and write (penalty x) effective_multiplier = read_percent * 1 + write_percent * penalty effective_iops = raw_iops / effective_multiplier # Calculate breakdown read_iops = effective_iops * read_percent write_iops = effective_iops * write_percent physical_ops = read_iops + (write_iops * penalty) return { "raid_level": raid_level, "raw_iops": raw_iops, "effective_iops": effective_iops, "read_iops": read_iops, "write_iops": write_iops, "physical_ops_per_second": physical_ops, "write_penalty": penalty, "efficiency": (effective_iops / raw_iops) * 100 } # Compare RAID levels for database workload (70% read, 30% write)print("8-disk array, 150 IOPS/disk, 70% read / 30% write workload:\n")print(f"{'RAID':^6} | {'Effective':^10} | {'Efficiency':^10} | {'Writes':^10}")print("-" * 50) for raid in ["0", "5", "6", "10"]: result = calculate_mixed_iops(8, 150, raid, 0.70) print(f"{raid:^6} | {result['effective_iops']:>8.0f} | " f"{result['efficiency']:>8.1f}% | {result['write_iops']:>8.0f}") print("\n100% write workload (worst case):\n")for raid in ["0", "5", "6", "10"]: result = calculate_mixed_iops(8, 150, raid, 0.0) print(f"RAID {raid}: {result['effective_iops']:.0f} effective IOPS " f"({result['efficiency']:.1f}% efficiency)")When sizing storage, work backwards: determine required application IOPS, apply the write penalty formula to find required raw IOPS, then calculate the number of disks needed. Always add headroom (30-50%) because performance degrades sharply at high utilization.
Latency is often the most critical performance metric for interactive applications. Users notice latency; they rarely notice throughput directly. RAID impacts latency through several mechanisms:
Components of RAID Latency:
Media latency: Time for the disk to complete the I/O
Controller overhead: RAID calculations, cache lookup
Queue wait time: Time waiting for disk availability
Parity operations: Additional I/O for parity reads/writes
The Queue Depth Effect:
Latency is not constant—it increases as the array becomes more loaded. For a disk with service time S and utilization ρ (fraction of time busy), the average response time follows the M/M/1 queuing model:
$$T_{response} = \frac{S}{1 - \rho}$$
This relationship is non-linear and becomes severe at high utilization:
| Utilization | Response Time (multiple of S) |
|---|---|
| 50% | 2× |
| 70% | 3.3× |
| 80% | 5× |
| 90% | 10× |
| 95% | 20× |
| 99% | 100× |
This is why storage systems should never run at sustained high utilization—latency becomes unacceptable well before theoretical IOPS limits.
Latency Impact by RAID Level:
Read Latency:
Write Latency (without cache):
Write Latency (with write-back cache):
Average latency can be misleading. For user-facing applications, the 99th or 99.9th percentile latency (tail latency) often determines user experience. RAID 5/6 can have severe tail latency spikes during parity operations, even when average latency is acceptable.
When a disk fails, the array enters degraded mode. Understanding degraded performance is crucial because this is precisely when you need your storage to keep working—and it's when performance is worst.
RAID 5/6 Degraded Performance:
Every read to a block that was on the failed disk now requires:
For a 5-disk RAID 5 array:
This means:
| RAID Level | Read Performance | Write Performance | Overall Impact |
|---|---|---|---|
| RAID 0 | N/A (no fault tolerance) | N/A | Array fails completely |
| RAID 1 | 50% capacity, full speed | 50% capacity, full speed | Minimal—just lost redundancy |
| RAID 5 | ~25-50% of normal | Severely degraded (6× ops) | Major degradation |
| RAID 6 (1 failure) | ~25-50% of normal | Severely degraded (8× ops) | Major degradation |
| RAID 6 (2 failures) | ~10-25% of normal | Extremely degraded | Critical degradation |
| RAID 10 | ~87% of normal | ~87% of normal | Minor degradation |
Why RAID 10 Degrades Gracefully:
In RAID 10, a disk failure affects only its mirror pair:
Rebuild Performance Impact:
During rebuild, the array must:
This triple demand creates severe contention:
Empirical degradation during rebuild:
| Priority Setting | Application Performance | Rebuild Time |
|---|---|---|
| High (aggressive) | 20-40% of normal | Fastest |
| Medium | 50-70% of normal | Moderate |
| Low (background) | 80-90% of normal | Very slow |
A RAID 5 array in degraded mode while rebuilding is at maximum vulnerability with minimum performance. This is when second failures occur (due to rebuild stress) and when users complain most (due to slowness). Plan for this: schedule rebuilds during low-activity periods, consider RAID 6/10 for critical systems.
RAID arrays can be bottlenecked at multiple points. Identifying the limiting factor is essential for optimization:
Potential Bottlenecks:
Disk spindle IOPS (for random workloads)
Disk throughput (for sequential workloads)
Controller processing power
Controller cache
Interface bandwidth (SAS, PCIe)
123456789101112131415161718192021222324252627282930313233
# Monitor disk I/O utilizationiostat -xz 1 # Key metrics to watch:# %util - Disk utilization (100% = bottleneck)# await - Average I/O wait time in ms# r/s, w/s - Read/write IOPS# avgqu-sz - Average queue depth # Example output:# Device r/s w/s rkB/s wkB/s await %util# sda 12.00 284.00 48.00 1136.00 2.84 88.40# sdb 8.00 280.00 32.00 1120.00 3.12 85.60 # Monitor RAID array status (mdadm for Linux software RAID)cat /proc/mdstatmdadm --detail /dev/md0 # Monitor cache effectiveness# For hardware RAID, check controller-specific tools# For bcache or dm-cache:cat /sys/block/bcache0/bcache/stats_total/cache_hit_ratio # Check for I/O wait bottleneck at system levelvmstat 1# Look at 'wa' column - high values indicate I/O bottleneck # Trace I/O latency distribution# Using bcc/BPF toolsbiolatency -D 10 # Check queue depths per diskcat /sys/block/sd*/queue/nr_requestsDiagnostic Decision Tree:
1. Is any disk at 100% utilization?
YES → Disk spindle bottleneck
NO → Continue
2. Is total throughput near theoretical max?
YES → Array performing optimally (or interface bottleneck)
NO → Continue
3. Is RAID controller CPU high?
YES → Controller processing bottleneck
NO → Continue
4. Is cache hit rate low with working set > cache size?
YES → Cache capacity bottleneck
NO → Continue
5. Are latencies high despite low utilization?
YES → Check for lock contention, misalignment, or controller issues
NO → Investigate application-level issues
Common Performance Anti-patterns:
RAID 5 for database transaction logs: Transaction logs require fast synchronous sequential writes. RAID 5's write penalty makes this a poor choice; RAID 1 or RAID 10 is preferred.
Too-small stripe size for random I/O: Small stripes spread single operations across multiple disks, adding coordination overhead without benefit for random access.
Misaligned partitions: If the partition start doesn't align with stripe boundaries, every I/O potentially crosses stripes, requiring multiple disk accesses.
Armed with an understanding of RAID performance characteristics, let's explore optimization strategies:
Workload Matching:
| Workload Profile | Optimal Configuration |
|---|---|
| Read-heavy, large sequential | RAID 5/6 with small stripe, many disks |
| Write-heavy, small random | RAID 10, large cache with write-back |
| Mixed, transaction processing | RAID 10 with SSD |
| Streaming media | RAID 0 (if replaceable) or RAID 5 |
| Virtualization | RAID 10 with SSD, separate arrays per purpose |
SSD-Specific Considerations:
SSDs fundamentally change RAID performance calculus:
No seek time: Random and sequential IOPS are similar. Stripe size matters less.
Parallelism within SSDs: A single SSD already has internal parallelism (multiple channels, dies). RAID adds another layer.
Write amplification: Both RAID parity and SSD wear-leveling cause write amplification. Combined effect can be severe.
TRIM/UNMAP: File system must support TRIM, RAID layer must pass it through, and SSDs must support it. Chain must be complete.
Latency still matters: At 100K IOPS, queuing effects dominate. Keep utilization below 70% for consistent latency.
80% of performance issues come from fundamental choices: wrong RAID level for workload, insufficient disk count, or missing write-back cache. Address these first before pursuing advanced optimizations. Measure before and after every change.
We've explored the multi-dimensional nature of RAID performance. Here are the essential principles:
What's Next:
The final page of this module examines RAID reliability from a mathematical perspective. We'll calculate Mean Time To Data Loss (MTTDL) for various configurations, understand how drive capacity, array size, and rebuild time affect failure probability, and learn to make informed reliability trade-offs.
You now have a comprehensive understanding of RAID performance characteristics, modeling techniques, and optimization strategies. This knowledge enables you to design storage systems that meet performance requirements while maintaining appropriate reliability—the central engineering challenge in storage architecture.