Loading content...
Every RAID configuration ultimately derives from two fundamental techniques: striping for performance and mirroring for redundancy. These concepts are deceptively simple on the surface—dividing data across disks, copying data between disks—but their implementation details, tuning parameters, and interaction with workload patterns create a rich landscape of engineering trade-offs.
Understanding striping and mirroring at a deep level is essential because these primitives appear not only in traditional RAID but also in modern distributed storage systems, flash storage arrays, and cloud storage architectures. The principles we explore here extend far beyond spinning hard drives.
By the end of this page, you will understand exactly how data is divided and distributed in striping, how mirrors maintain consistency across copies, the performance mathematics of both techniques, and the critical tuning decisions that determine whether these techniques help or hinder your specific workload.
Striping is the technique of dividing a single logical data unit (file, logical volume, database tablespace) into smaller pieces and distributing those pieces across multiple physical storage devices. The fundamental goal is to harness parallelism: if data can be read from or written to multiple disks simultaneously, the aggregate throughput multiplies.
The Anatomy of a Stripe:
Consider an array of 4 disks with a stripe size of 64KB. A 256KB file would be divided as follows:
A full stripe encompasses one stripe unit from each disk. Reading the entire 256KB file can theoretically engage all 4 disks in parallel, achieving 4x the throughput of a single disk.
Stripe Size: The Critical Tuning Parameter
The stripe size (also called chunk size or stripe width) fundamentally determines which I/O patterns benefit from striping:
Small Stripe Sizes (4KB - 32KB):
Large Stripe Sizes (128KB - 1MB):
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
def calculate_stripe_location(logical_address: int, stripe_size: int, num_disks: int) -> dict: """ Calculate which disk and position contains a given logical address. Args: logical_address: Byte offset in the logical volume stripe_size: Size of each stripe unit in bytes num_disks: Number of data disks in the array Returns: Dictionary with disk number, stripe row, and offset within stripe """ # Which stripe unit does this address fall into? stripe_unit = logical_address // stripe_size # Which disk holds this stripe unit? disk_number = stripe_unit % num_disks # Which row of stripes (which full stripe) is this? stripe_row = stripe_unit // num_disks # Offset within the stripe unit offset_in_stripe = logical_address % stripe_size # Physical address on the disk physical_address = (stripe_row * stripe_size) + offset_in_stripe return { 'disk': disk_number, 'stripe_row': stripe_row, 'offset_in_stripe': offset_in_stripe, 'physical_address': physical_address } # Example: 4 disks, 64KB stripe size# Where does byte 200,000 reside?result = calculate_stripe_location( logical_address=200000, stripe_size=65536, # 64KB num_disks=4)print(f"Logical address 200,000 maps to:")print(f" Disk: {result['disk']}")print(f" Stripe row: {result['stripe_row']}")print(f" Offset in stripe: {result['offset_in_stripe']}")print(f" Physical address on disk: {result['physical_address']}") # Output:# Logical address 200,000 maps to:# Disk: 3# Stripe row: 0# Offset in stripe: 3392# Physical address on disk: 3392The performance benefits of striping are not automatic—they depend on the relationship between I/O request size, stripe size, number of disks, and workload patterns. Understanding these relationships is essential for predicting and optimizing striped array performance.
Sequential Throughput Model:
For large sequential I/O, the theoretical throughput of a striped array is:
$$T_{array} = n \times T_{disk}$$
where n is the number of disks and T_disk is the throughput of a single disk. In practice, this linear scaling is limited by:
Disks | Single Disk (MB/s) | Theoretical Max | Typical Achieved | Efficiency |
|---|---|---|---|---|
| 2 | 150 | 300 MB/s | 280 MB/s | 93% |
| 4 | 150 | 600 MB/s | 520 MB/s | 87% |
| 8 | 150 | 1,200 MB/s | 950 MB/s | 79% |
| 16 | 150 | 2,400 MB/s | 1,700 MB/s | 71% |
| 32 | 150 | 4,800 MB/s | 2,800 MB/s | 58% |
Random I/O and Stripe Size Interaction:
The relationship between random I/O performance and stripe size is nuanced. Consider random 4KB reads on an array with different stripe sizes:
Stripe size = 4KB:
Stripe size = 64KB:
Stripe size = 1MB:
For predominantly large sequential I/O (video, backups): Use small stripes (16-64KB). For predominantly small random I/O (OLTP databases): Use larger stripes (128KB-256KB) to reduce stripe boundary crossings. For mixed workloads: Default to 64KB-128KB as a balanced starting point, then tune based on measurements.
The Stripe Alignment Problem:
Performance degradation occurs when I/O operations are misaligned with stripe boundaries. Consider a 64KB stripe size and a 64KB write starting at byte offset 32,768:
This misaligned write requires:
Modern file systems and databases typically align allocations to stripe boundaries, but application-level I/O patterns can still cause misalignment. This is particularly problematic for virtualization where guest OS alignment may not match host array alignment.
Mirroring is the technique of maintaining identical copies of data on multiple physical devices. Unlike parity-based redundancy, mirroring provides complete data duplication with no computation required—every byte exists in its entirety on every mirror copy.
Mirror Architectures:
Two-Way Mirror (RAID 1): The most common configuration, maintaining exactly two copies:
Three-Way Mirror (RAID 1 with three copies): Used for mission-critical data:
N-Way Mirror: Some systems support arbitrary mirror counts for extreme reliability requirements.
Write Propagation Strategies:
When a write occurs, all mirror copies must be updated. The timing and ordering of these updates creates critical design choices:
Synchronous Mirroring:
Application Write Request
|
v
Write to Disk A ──────┐
| |
v v
Write to Disk B (parallel)
| |
└───────┬─────────┘
v
Wait for BOTH to complete
|
v
Return success to application
Synchronous mirroring guarantees that when a write returns successfully, the data exists on all mirror copies. This is the safest approach but introduces latency equal to the slowest mirror's response time.
Asynchronous Mirroring (rarely used for local RAID):
The Write Hole Problem:
A critical vulnerability in mirrored arrays occurs during the window when a write has completed on some mirrors but not others. If a power failure occurs:
Upon recovery, which disk contains the "correct" data? Without additional information, the system cannot know. Solutions include:
Enterprise RAID controllers universally include battery or capacitor-backed cache to close this hole.
Since mirror copies are identical, any copy can satisfy a read request. This creates opportunities for sophisticated read optimization. RAID controllers employ various strategies to maximize read performance:
Round-Robin Reading:
The simplest approach alternates read requests between mirrors:
This achieves load balancing but ignores disk state (e.g., current head position, queue depth).
Queue-Depth-Aware Reading:
More sophisticated controllers track the outstanding I/O queue on each disk and route new requests to the least-loaded disk:
Disk A queue: [op1, op2, op3] (depth = 3)
Disk B queue: [op4] (depth = 1)
New read request → Route to Disk B (shorter queue)
This approach adapts to asymmetric loads and can significantly reduce latency variability.
Seek-Aware Reading:
For HDDs with rotational latency, advanced controllers consider head position:
This optimization is less relevant for SSDs where "seek time" is negligible.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
import randomfrom dataclasses import dataclassfrom typing import List @dataclassclass Disk: id: str queue_depth: int = 0 head_position: int = 0 # Logical block address class MirrorReadScheduler: def __init__(self, disks: List[Disk]): self.disks = disks def round_robin(self, request_num: int) -> Disk: """Simple alternating strategy""" return self.disks[request_num % len(self.disks)] def shortest_queue(self) -> Disk: """Select disk with smallest queue""" return min(self.disks, key=lambda d: d.queue_depth) def nearest_seek(self, target_lba: int) -> Disk: """Select disk with head closest to target""" return min(self.disks, key=lambda d: abs(d.head_position - target_lba)) def adaptive(self, target_lba: int) -> Disk: """Combined strategy: balance queue depth and seek distance""" def cost(disk: Disk) -> float: seek_cost = abs(disk.head_position - target_lba) / 1000000 # Normalize queue_cost = disk.queue_depth * 0.5 # Weight queue more return seek_cost + queue_cost return min(self.disks, key=cost) # Simulate selectiondisk_a = Disk("A", queue_depth=2, head_position=100000)disk_b = Disk("B", queue_depth=4, head_position=500000) scheduler = MirrorReadScheduler([disk_a, disk_b])target = 120000 print(f"Target LBA: {target}")print(f"Disk A: queue={disk_a.queue_depth}, head={disk_a.head_position}")print(f"Disk B: queue={disk_b.queue_depth}, head={disk_b.head_position}")print(f"Round-robin (req 0): {scheduler.round_robin(0).id}")print(f"Shortest queue: {scheduler.shortest_queue().id}")print(f"Nearest seek: {scheduler.nearest_seek(target).id}")print(f"Adaptive: {scheduler.adaptive(target).id}")Read Performance Gains:
For a 2-way mirror with optimal read scheduling:
For 3-way or higher mirrors:
This asymmetry makes n-way mirrors attractive for extremely read-heavy workloads with extremely high reliability requirements.
Some enterprise arrays periodically perform 'read verification' where they read the same block from all mirrors and compare. Mismatches indicate silent data corruption (bit rot) and trigger repair from a good copy. This background scrubbing is essential for long-term data integrity.
The true power of RAID emerges when striping and mirroring are combined. Two distinct approaches exist, and the differences between them have profound implications for performance and reliability.
RAID 0+1 (Stripe, Then Mirror):
First, create striped arrays. Then, mirror those arrays:
[Mirrored Pair]
|
┌───────┴───────┐
[Stripe1] [Stripe2]
| |
┌──┴──┐ ┌──┴──┐
D0 D1 D2 D3
A C B D A C B D
If Disk 0 fails:
RAID 1+0 (Mirror, Then Stripe):
First, create mirrored pairs. Then, stripe across pairs:
[Striped Array]
|
┌───────┴───────┐
[Mirror1] [Mirror2]
| |
┌──┴──┐ ┌──┴──┐
D0 D1 D2 D3
A C A C B D B D
If Disk 0 fails:
Probability Analysis:
Consider a 4-disk array where each disk has failure probability p:
RAID 0+1 Failure Probability:
The array fails if: (first stripe loses any disk) AND (second stripe loses any disk)
P(first stripe degraded) = 1 - (1-p)² ≈ 2p for small p
After first failure, P(data loss) = P(any disk in remaining stripe fails)
Overall: complicated, but worse than RAID 1+0
RAID 1+0 Failure Probability:
The array fails only if both disks in ANY mirror pair fail:
P(mirror pair fails) = p × (p during rebuild window) ≈ p²
With 2 mirror pairs: P(data loss) = 1 - (1 - p²)² ≈ 2p² for small p
This is substantially better—failure probability scales with p² rather than p.
Industry terminology is inconsistent. Some vendors call mirror-then-stripe 'RAID 10' while others call it 'RAID 1+0'. Similarly, stripe-then-mirror is sometimes 'RAID 01' and sometimes 'RAID 0+1'. Always verify the actual architecture rather than relying on naming conventions.
The number of disks participating in a stripe (stripe width) significantly impacts performance, reliability, and capacity. This section examines the trade-offs involved in selecting appropriate disk counts.
Stripe Width Trade-offs:
| More Disks (Wider Stripe) | Fewer Disks (Narrower Stripe) |
|---|---|
| Higher sequential throughput | Lower sequential throughput |
| Higher total capacity | Lower total capacity |
| Higher failure probability (more disks to fail) | Lower failure probability |
| Longer rebuild times | Shorter rebuild times |
| More complex controller logic | Simpler controller logic |
| Better parallelism for large I/O | Less parallelism benefit |
Optimal Stripe Width by RAID Level:
RAID 0: No upper limit from reliability perspective (but controller limits apply). More disks = more performance = more failure risk. Typical: 2-8 disks.
RAID 5: Recommended: 3-8 disks. Fewer than 3 is impossible (need data + parity). More than 8 increases rebuild time dangerously. Each additional disk increases capacity efficiency but also increases rebuild vulnerability.
4-disk RAID 5: 75% efficiency, moderate rebuild time 8-disk RAID 5: 87.5% efficiency, long rebuild time
RAID 6: Recommended: 4-16 disks. The dual parity allows larger arrays with acceptable reliability. Still, extremely large arrays should be split into multiple smaller RAID 6 groups.
RAID 10: Stripe width = number of mirror pairs. Recommended: 2-8 mirror pairs (4-16 disks total). More pairs = more performance, but also more components to manage.
| Disk Size | RAID 5 (4 disks) | RAID 5 (8 disks) | RAID 6 (8 disks) | RAID 10 (8 disks) |
|---|---|---|---|---|
| 1 TB | ~3 hours | ~6 hours | ~6 hours | ~3 hours |
| 4 TB | ~12 hours | ~24 hours | ~24 hours | ~12 hours |
| 8 TB | ~24 hours | ~48 hours | ~48 hours | ~24 hours |
| 16 TB | ~48 hours | ~96 hours | ~96 hours | ~48 hours |
The Rebuild Storm Problem:
During rebuild, the array must read from all surviving disks while simultaneously writing to the replacement disk. This creates several challenges:
Performance Impact: Normal I/O competes with rebuild I/O, degrading application performance by 30-70%
Temperature Stress: Continuous heavy I/O raises drive temperatures, potentially accelerating failures
Unrecoverable Read Errors (URE): Enterprise HDDs specify a URE rate of ~1 in 10^15 bits read. For a 1TB read:
$$P(URE) = 1 - (1 - 10^{-15})^{8 \times 10^{12}} \approx 0.8%$$
For an 8-disk RAID 5 with 8TB drives, reading 56TB during rebuild: $$P(URE) = 1 - (1 - 10^{-15})^{448 \times 10^{12}} \approx 35%$$
A 35% chance of hitting an unrecoverable error during rebuild—causing data loss even though only one disk failed!
The URE problem makes RAID 5 increasingly dangerous as drive sizes grow. This is the primary reason RAID 6 (which can tolerate one URE + one failure) or RAID 10 (which only reads from one disk during rebuild) are becoming mandatory for large-capacity drives.
A hot spare is an idle disk kept powered and ready in the array enclosure. When a drive fails, the controller immediately begins rebuilding to the hot spare without human intervention. This automated response dramatically reduces the vulnerability window.
Hot Spare Strategies:
Dedicated Hot Spares:
Global Hot Spares:
Virtual Hot Spares (Distributed Spare Space):
Rebuild Priority and Scheduling:
Reconstruction must balance two competing goals:
Modern controllers offer rebuild priority settings:
Some advanced systems offer adaptive rebuild:
After rebuilding to a hot spare, some systems support 'copyback'—automatically copying data back to a replacement disk inserted in the original slot, then restoring the original disk to hot spare status. This maintains consistent physical disk layouts and simplifies spare management.
We've explored the fundamental techniques underlying all RAID configurations. Let's consolidate the key insights:
What's Next:
In the following page, we'll dive deep into parity—the mathematical technique that enables RAID 5 and RAID 6 to achieve fault tolerance without the storage overhead of full mirroring. We'll explore XOR operations, Galois Field arithmetic, and the algorithms that calculate and verify parity across disk arrays.
You now possess a thorough understanding of striping and mirroring—the two fundamental building blocks of RAID. These concepts extend beyond traditional RAID to distributed storage, software-defined storage, and cloud architectures. Next, we'll explore how parity enables efficient redundancy.