Raid - Learning Module | OneNoughtOne

Loading content...

0/227

Striping and Mirroring

The Two Pillars of RAID

Every RAID configuration ultimately derives from two fundamental techniques: striping for performance and mirroring for redundancy. These concepts are deceptively simple on the surface—dividing data across disks, copying data between disks—but their implementation details, tuning parameters, and interaction with workload patterns create a rich landscape of engineering trade-offs.

Understanding striping and mirroring at a deep level is essential because these primitives appear not only in traditional RAID but also in modern distributed storage systems, flash storage arrays, and cloud storage architectures. The principles we explore here extend far beyond spinning hard drives.

What You Will Learn

By the end of this page, you will understand exactly how data is divided and distributed in striping, how mirrors maintain consistency across copies, the performance mathematics of both techniques, and the critical tuning decisions that determine whether these techniques help or hinder your specific workload.

Data Striping: The Foundation of Parallel I/O

Striping is the technique of dividing a single logical data unit (file, logical volume, database tablespace) into smaller pieces and distributing those pieces across multiple physical storage devices. The fundamental goal is to harness parallelism: if data can be read from or written to multiple disks simultaneously, the aggregate throughput multiplies.

The Anatomy of a Stripe:

Consider an array of 4 disks with a stripe size of 64KB. A 256KB file would be divided as follows:

Stripe Unit 0 (bytes 0-65,535): Written to Disk 0
Stripe Unit 1 (bytes 65,536-131,071): Written to Disk 1
Stripe Unit 2 (bytes 131,072-196,607): Written to Disk 2
Stripe Unit 3 (bytes 196,608-262,143): Written to Disk 3

A full stripe encompasses one stripe unit from each disk. Reading the entire 256KB file can theoretically engage all 4 disks in parallel, achieving 4x the throughput of a single disk.

Converting Mermaid diagram...

Stripe Size: The Critical Tuning Parameter

The stripe size (also called chunk size or stripe width) fundamentally determines which I/O patterns benefit from striping:

Small Stripe Sizes (4KB - 32KB):

A single large I/O operation spans multiple disks
Maximizes throughput for sequential workloads (video streaming, large file transfers)
Individual small I/O operations may still hit only one disk
Creates more stripe boundaries, potentially more metadata overhead

Large Stripe Sizes (128KB - 1MB):

Small I/O operations are likely contained within a single stripe unit
Less disk involvement for small random I/O
Large sequential operations still span multiple disks (just fewer stripes)
Reduces the probability of partial stripe writes

Stripe Address Calculation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def calculate_stripe_location(logical_address: int, 
                                    stripe_size: int, 
                                    num_disks: int) -> dict:
    """
    Calculate which disk and position contains a given logical address.
    
    Args:
        logical_address: Byte offset in the logical volume
        stripe_size: Size of each stripe unit in bytes
        num_disks: Number of data disks in the array
        
    Returns:
        Dictionary with disk number, stripe row, and offset within stripe
    """
    # Which stripe unit does this address fall into?
    stripe_unit = logical_address // stripe_size
    
    # Which disk holds this stripe unit?
    disk_number = stripe_unit % num_disks
    
    # Which row of stripes (which full stripe) is this?
    stripe_row = stripe_unit // num_disks
    
    # Offset within the stripe unit
    offset_in_stripe = logical_address % stripe_size
    
    # Physical address on the disk
    physical_address = (stripe_row * stripe_size) + offset_in_stripe
    
    return {
        'disk': disk_number,
        'stripe_row': stripe_row,
        'offset_in_stripe': offset_in_stripe,
        'physical_address': physical_address
    }
 
# Example: 4 disks, 64KB stripe size
# Where does byte 200,000 reside?
result = calculate_stripe_location(
    logical_address=200000,
    stripe_size=65536,  # 64KB
    num_disks=4
)
print(f"Logical address 200,000 maps to:")
print(f"  Disk: {result['disk']}")
print(f"  Stripe row: {result['stripe_row']}")
print(f"  Offset in stripe: {result['offset_in_stripe']}")
print(f"  Physical address on disk: {result['physical_address']}")
 
# Output:
# Logical address 200,000 maps to:
#   Disk: 3
#   Stripe row: 0
#   Offset in stripe: 3392
#   Physical address on disk: 3392

Striping Performance Analysis

The performance benefits of striping are not automatic—they depend on the relationship between I/O request size, stripe size, number of disks, and workload patterns. Understanding these relationships is essential for predicting and optimizing striped array performance.

Sequential Throughput Model:

For large sequential I/O, the theoretical throughput of a striped array is:

$$T_{array} = n \times T_{disk}$$

where n is the number of disks and T_disk is the throughput of a single disk. In practice, this linear scaling is limited by:

Controller saturation: The RAID controller has finite processing capacity
Bus bandwidth: PCIe, SAS, or SATA links may become bottlenecks
Seek overhead: HDDs must position heads; random access reduces effective throughput
Disk imbalance: Slower or busier disks limit overall throughput

Theoretical vs. Achieved Striping Throughput
Disks	Single Disk (MB/s)	Theoretical Max	Typical Achieved	Efficiency
2	150	300 MB/s	280 MB/s	93%
4	150	600 MB/s	520 MB/s	87%
8	150	1,200 MB/s	950 MB/s	79%
16	150	2,400 MB/s	1,700 MB/s	71%
32	150	4,800 MB/s	2,800 MB/s	58%

Random I/O and Stripe Size Interaction:

The relationship between random I/O performance and stripe size is nuanced. Consider random 4KB reads on an array with different stripe sizes:

Stripe size = 4KB:

Each 4KB read involves exactly one disk
With 8 disks, the array can handle 8× the IOPS of a single disk
Maximum parallelism for small random workloads

Stripe size = 64KB:

Each 4KB read also involves exactly one disk
Same IOPS benefit as 4KB stripe size for small random I/O
But large 256KB reads now span only 4 disks instead of 64

Stripe size = 1MB:

Small random I/O is still single-disk
Large sequential I/O loses some parallelism
Fewer stripe boundary crossings, potentially better for databases

Stripe Size Guidelines

For predominantly large sequential I/O (video, backups): Use small stripes (16-64KB). For predominantly small random I/O (OLTP databases): Use larger stripes (128KB-256KB) to reduce stripe boundary crossings. For mixed workloads: Default to 64KB-128KB as a balanced starting point, then tune based on measurements.

The Stripe Alignment Problem:

Performance degradation occurs when I/O operations are misaligned with stripe boundaries. Consider a 64KB stripe size and a 64KB write starting at byte offset 32,768:

Bytes 32,768 - 65,535 go to Disk 0 (latter half of stripe unit 0)
Bytes 65,536 - 98,303 go to Disk 1 (first half of stripe unit 1)

This misaligned write requires:

Read-modify-write on Disk 0 (read first half, write combined)
Read-modify-write on Disk 1 (unless we write full stripe unit)

Modern file systems and databases typically align allocations to stripe boundaries, but application-level I/O patterns can still cause misalignment. This is particularly problematic for virtualization where guest OS alignment may not match host array alignment.

Data Mirroring: The Foundation of Fault Tolerance

Mirroring is the technique of maintaining identical copies of data on multiple physical devices. Unlike parity-based redundancy, mirroring provides complete data duplication with no computation required—every byte exists in its entirety on every mirror copy.

Mirror Architectures:

Two-Way Mirror (RAID 1): The most common configuration, maintaining exactly two copies:

Disk A contains: D₀, D₁, D₂, D₃, ...
Disk B contains: D₀, D₁, D₂, D₃, ... (identical)

Three-Way Mirror (RAID 1 with three copies): Used for mission-critical data:

Disk A, B, C all contain identical copies
Survives any two-disk failure
Triple storage overhead

N-Way Mirror: Some systems support arbitrary mirror counts for extreme reliability requirements.

Write Propagation Strategies:

When a write occurs, all mirror copies must be updated. The timing and ordering of these updates creates critical design choices:

Synchronous Mirroring:

Application Write Request
        |
        v
   Write to Disk A ──────┐
        |                 |
        v                 v
   Write to Disk B    (parallel)
        |                 |
        └───────┬─────────┘
                v
   Wait for BOTH to complete
                |
                v
   Return success to application

Synchronous mirroring guarantees that when a write returns successfully, the data exists on all mirror copies. This is the safest approach but introduces latency equal to the slowest mirror's response time.

Asynchronous Mirroring (rarely used for local RAID):

Returns success after writing to one copy
Updates other copies in background
Risk of data loss if primary fails before propagation
Common in long-distance replication, not local RAID

Mirror Write Ordering Guarantees

•Write Ordering: Writes must be applied to all mirrors in the same order to maintain consistency
•Atomicity: A write either completes on all mirrors or none (requires careful handling of partial failures)
•Durability: Data is considered durable only when persisted to stable storage on sufficient mirrors
•Read Consistency: Reads must return the most recent successfully written data
•Rebuild Consistency: During rebuild, reads must not return stale data from the degraded period

The Write Hole Problem:

A critical vulnerability in mirrored arrays occurs during the window when a write has completed on some mirrors but not others. If a power failure occurs:

Mirror A has: D_new (write completed)
Mirror B has: D_old (write not yet completed)

Upon recovery, which disk contains the "correct" data? Without additional information, the system cannot know. Solutions include:

Battery-backed write cache: Preserve in-flight writes across power failures
Write intent logging: Log that a write is in progress before starting
Copy-on-write semantics: Never overwrite in place; new writes go to new locations

Enterprise RAID controllers universally include battery or capacitor-backed cache to close this hole.

Mirror Read Strategies and Optimization

Since mirror copies are identical, any copy can satisfy a read request. This creates opportunities for sophisticated read optimization. RAID controllers employ various strategies to maximize read performance:

Round-Robin Reading:

The simplest approach alternates read requests between mirrors:

Request 1 → Disk A
Request 2 → Disk B
Request 3 → Disk A
...

This achieves load balancing but ignores disk state (e.g., current head position, queue depth).

Queue-Depth-Aware Reading:

More sophisticated controllers track the outstanding I/O queue on each disk and route new requests to the least-loaded disk:

Disk A queue: [op1, op2, op3]  (depth = 3)
Disk B queue: [op4]            (depth = 1)

New read request → Route to Disk B (shorter queue)

This approach adapts to asymmetric loads and can significantly reduce latency variability.

Seek-Aware Reading:

For HDDs with rotational latency, advanced controllers consider head position:

Track approximate head position on each disk
Route reads to the disk whose head is closest to the requested sector
Reduces average seek time, improving IOPS

This optimization is less relevant for SSDs where "seek time" is negligible.

Mirror Read Strategy Simulation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import random
from dataclasses import dataclass
from typing import List
 
@dataclass
class Disk:
    id: str
    queue_depth: int = 0
    head_position: int = 0  # Logical block address
 
class MirrorReadScheduler:
    def __init__(self, disks: List[Disk]):
        self.disks = disks
    
    def round_robin(self, request_num: int) -> Disk:
        """Simple alternating strategy"""
        return self.disks[request_num % len(self.disks)]
    
    def shortest_queue(self) -> Disk:
        """Select disk with smallest queue"""
        return min(self.disks, key=lambda d: d.queue_depth)
    
    def nearest_seek(self, target_lba: int) -> Disk:
        """Select disk with head closest to target"""
        return min(self.disks, key=lambda d: abs(d.head_position - target_lba))
    
    def adaptive(self, target_lba: int) -> Disk:
        """Combined strategy: balance queue depth and seek distance"""
        def cost(disk: Disk) -> float:
            seek_cost = abs(disk.head_position - target_lba) / 1000000  # Normalize
            queue_cost = disk.queue_depth * 0.5  # Weight queue more
            return seek_cost + queue_cost
        
        return min(self.disks, key=cost)
 
# Simulate selection
disk_a = Disk("A", queue_depth=2, head_position=100000)
disk_b = Disk("B", queue_depth=4, head_position=500000)
 
scheduler = MirrorReadScheduler([disk_a, disk_b])
target = 120000
 
print(f"Target LBA: {target}")
print(f"Disk A: queue={disk_a.queue_depth}, head={disk_a.head_position}")
print(f"Disk B: queue={disk_b.queue_depth}, head={disk_b.head_position}")
print(f"Round-robin (req 0): {scheduler.round_robin(0).id}")
print(f"Shortest queue: {scheduler.shortest_queue().id}")
print(f"Nearest seek: {scheduler.nearest_seek(target).id}")
print(f"Adaptive: {scheduler.adaptive(target).id}")

Read Performance Gains:

For a 2-way mirror with optimal read scheduling:

Throughput: Theoretical 2× improvement (both disks serving reads)
IOPS: Up to 2× improvement (requests distributed)
Latency: Improved average latency (better load distribution reduces queuing)

For 3-way or higher mirrors:

Read performance continues scaling (3×, 4×, etc.)
Write performance does not scale (still must write to all copies)

This asymmetry makes n-way mirrors attractive for extremely read-heavy workloads with extremely high reliability requirements.

Mirror Read Verification

Some enterprise arrays periodically perform 'read verification' where they read the same block from all mirrors and compare. Mismatches indicate silent data corruption (bit rot) and trigger repair from a good copy. This background scrubbing is essential for long-term data integrity.

Combining Striping and Mirroring

The true power of RAID emerges when striping and mirroring are combined. Two distinct approaches exist, and the differences between them have profound implications for performance and reliability.

RAID 0+1 (Stripe, Then Mirror):

First, create striped arrays. Then, mirror those arrays:

         [Mirrored Pair]
              |
      ┌───────┴───────┐
   [Stripe1]       [Stripe2]
      |               |
   ┌──┴──┐         ┌──┴──┐  
  D0   D1         D2   D3
  A C   B D       A C   B D

If Disk 0 fails:

Stripe1 is broken (was A,B across D0,D1)
The entire Stripe1 is now unavailable
Only Stripe2 survives
If ANY disk in Stripe2 fails next, data is lost

RAID 1+0 (Mirror, Then Stripe):

First, create mirrored pairs. Then, stripe across pairs:

         [Striped Array]
              |
      ┌───────┴───────┐
   [Mirror1]       [Mirror2]
      |               |
   ┌──┴──┐         ┌──┴──┐  
  D0   D1         D2   D3
  A C  A C        B D  B D

If Disk 0 fails:

Mirror1 survives (D1 still has A,C)
Stripe continues operating
Only if D1 ALSO fails (same mirror pair) is data lost
D2 or D3 failing has no effect on Mirror1

RAID 0+1 (Stripe then Mirror)

•Any single disk failure degrades an entire stripe
•After first failure, half the array is vulnerable
•Second failure in remaining stripe = data loss
•Rebuild requires copying entire stripe (more data)
•Rarely used in modern systems

RAID 1+0 (Mirror then Stripe)

•Single disk failure affects only one mirror pair
•Other mirror pairs continue operating normally
•Can survive n/2 failures if in different pairs
•Rebuild affects only one mirror pair (less data)
•Standard for production systems

Probability Analysis:

Consider a 4-disk array where each disk has failure probability p:

RAID 0+1 Failure Probability:

The array fails if: (first stripe loses any disk) AND (second stripe loses any disk)

P(first stripe degraded) = 1 - (1-p)² ≈ 2p for small p

After first failure, P(data loss) = P(any disk in remaining stripe fails)

Overall: complicated, but worse than RAID 1+0

RAID 1+0 Failure Probability:

The array fails only if both disks in ANY mirror pair fail:

P(mirror pair fails) = p × (p during rebuild window) ≈ p²

With 2 mirror pairs: P(data loss) = 1 - (1 - p²)² ≈ 2p² for small p

This is substantially better—failure probability scales with p² rather than p.

Terminology Clarification

Industry terminology is inconsistent. Some vendors call mirror-then-stripe 'RAID 10' while others call it 'RAID 1+0'. Similarly, stripe-then-mirror is sometimes 'RAID 01' and sometimes 'RAID 0+1'. Always verify the actual architecture rather than relying on naming conventions.

Stripe Width and Disk Count Optimization

The number of disks participating in a stripe (stripe width) significantly impacts performance, reliability, and capacity. This section examines the trade-offs involved in selecting appropriate disk counts.

Stripe Width Trade-offs:

More Disks (Wider Stripe)	Fewer Disks (Narrower Stripe)
Higher sequential throughput	Lower sequential throughput
Higher total capacity	Lower total capacity
Higher failure probability (more disks to fail)	Lower failure probability
Longer rebuild times	Shorter rebuild times
More complex controller logic	Simpler controller logic
Better parallelism for large I/O	Less parallelism benefit

Optimal Stripe Width by RAID Level:

RAID 0: No upper limit from reliability perspective (but controller limits apply). More disks = more performance = more failure risk. Typical: 2-8 disks.

RAID 5: Recommended: 3-8 disks. Fewer than 3 is impossible (need data + parity). More than 8 increases rebuild time dangerously. Each additional disk increases capacity efficiency but also increases rebuild vulnerability.

4-disk RAID 5: 75% efficiency, moderate rebuild time 8-disk RAID 5: 87.5% efficiency, long rebuild time

RAID 6: Recommended: 4-16 disks. The dual parity allows larger arrays with acceptable reliability. Still, extremely large arrays should be split into multiple smaller RAID 6 groups.

RAID 10: Stripe width = number of mirror pairs. Recommended: 2-8 mirror pairs (4-16 disks total). More pairs = more performance, but also more components to manage.

Rebuild Time Estimation (7200 RPM HDDs, ~100 MB/s)
Disk Size	RAID 5 (4 disks)	RAID 5 (8 disks)	RAID 6 (8 disks)	RAID 10 (8 disks)
1 TB	~3 hours	~6 hours	~6 hours	~3 hours
4 TB	~12 hours	~24 hours	~24 hours	~12 hours
8 TB	~24 hours	~48 hours	~48 hours	~24 hours
16 TB	~48 hours	~96 hours	~96 hours	~48 hours

The Rebuild Storm Problem:

During rebuild, the array must read from all surviving disks while simultaneously writing to the replacement disk. This creates several challenges:

Performance Impact: Normal I/O competes with rebuild I/O, degrading application performance by 30-70%
Temperature Stress: Continuous heavy I/O raises drive temperatures, potentially accelerating failures
Unrecoverable Read Errors (URE): Enterprise HDDs specify a URE rate of ~1 in 10^15 bits read. For a 1TB read:

$$P(URE) = 1 - (1 - 10^{-15})^{8 \times 10^{12}} \approx 0.8%$$

For an 8-disk RAID 5 with 8TB drives, reading 56TB during rebuild: $$P(URE) = 1 - (1 - 10^{-15})^{448 \times 10^{12}} \approx 35%$$

A 35% chance of hitting an unrecoverable error during rebuild—causing data loss even though only one disk failed!

URE and Large Drives

The URE problem makes RAID 5 increasingly dangerous as drive sizes grow. This is the primary reason RAID 6 (which can tolerate one URE + one failure) or RAID 10 (which only reads from one disk during rebuild) are becoming mandatory for large-capacity drives.

Hot Spares and Rebuild Strategies

A hot spare is an idle disk kept powered and ready in the array enclosure. When a drive fails, the controller immediately begins rebuilding to the hot spare without human intervention. This automated response dramatically reduces the vulnerability window.

Hot Spare Strategies:

Dedicated Hot Spares:

Each specific RAID group has its own dedicated spare
Fastest response (no allocation decision needed)
Less efficient (more idle disks)
Best for mission-critical arrays

Global Hot Spares:

A pool of spares shared across multiple RAID groups
Controller allocates spare to any failed group
More efficient (fewer total spares needed)
Slightly slower response (allocation logic)
May need spares of different sizes for mixed arrays

Virtual Hot Spares (Distributed Spare Space):

Each disk reserves some capacity as spare space
Failure triggers redistribution to space on all remaining disks
No dedicated idle disk needed
Used by some advanced systems (ZFS, some enterprise arrays)

Rebuild Priority and Scheduling:

Reconstruction must balance two competing goals:

Minimize vulnerability window: Faster rebuild = less time exposed to second failure
Maintain application performance: Aggressive rebuild I/O can cripple applications

Modern controllers offer rebuild priority settings:

High/Aggressive: Rebuild as fast as possible, significant performance impact
Medium/Balanced: Moderate rebuild speed, moderate performance impact
Low/Background: Minimal performance impact, very slow rebuild

Some advanced systems offer adaptive rebuild:

Aggressive during low-activity periods (nights, weekends)
Throttled during peak usage
Monitors application latency and adjusts dynamically

Best Practices for Rebuild Management

•Always configure hot spares: The hours saved by automatic rebuild far exceed the cost of spare disks
•Use global spares efficiently: 1 spare per 10-20 drives is a common ratio for large deployments
•Monitor rebuild progress: Track completion percentage and estimated time; alert on stalls
•Schedule intensive rebuilds carefully: For non-critical arrays, consider low-priority rebuild during off-hours
•Verify rebuild completion: Confirm array returns to optimal state; some failures may not complete
•Replace failed disks promptly: Don't leave arrays degraded; human delay is often the largest risk factor

Copyback Operations

After rebuilding to a hot spare, some systems support 'copyback'—automatically copying data back to a replacement disk inserted in the original slot, then restoring the original disk to hot spare status. This maintains consistent physical disk layouts and simplifies spare management.

Summary: Striping and Mirroring Mastery

We've explored the fundamental techniques underlying all RAID configurations. Let's consolidate the key insights:

Key Takeaways

•Striping divides data across disks for parallel I/O, with stripe size determining the performance characteristics for different workloads
•Mirroring maintains complete copies on multiple disks, with synchronous writes ensuring consistency at the cost of write latency
•Mirror read optimization can achieve near-linear throughput scaling by load-balancing reads across copies
•RAID 1+0 (mirror-then-stripe) is superior to RAID 0+1 (stripe-then-mirror) for fault tolerance and rebuild efficiency
•Stripe width involves trade-offs between performance, efficiency, and rebuild risk—larger arrays need careful consideration
•Hot spares dramatically reduce vulnerability windows by enabling immediate automatic rebuild
•Rebuild time and URE probability make large RAID 5 arrays increasingly dangerous; RAID 6 or RAID 10 are preferred

What's Next:

In the following page, we'll dive deep into parity—the mathematical technique that enables RAID 5 and RAID 6 to achieve fault tolerance without the storage overhead of full mirroring. We'll explore XOR operations, Galois Field arithmetic, and the algorithms that calculate and verify parity across disk arrays.

Page Complete

You now possess a thorough understanding of striping and mirroring—the two fundamental building blocks of RAID. These concepts extend beyond traditional RAID to distributed storage, software-defined storage, and cloud architectures. Next, we'll explore how parity enables efficient redundancy.

Striping and Mirroring

The Two Pillars of RAID

What You Will Learn

Data Striping: The Foundation of Parallel I/O

The Anatomy of a Stripe:

Consider an array of 4 disks with a stripe size of 64KB. A 256KB file would be divided as follows:

Stripe Unit 0 (bytes 0-65,535): Written to Disk 0
Stripe Unit 1 (bytes 65,536-131,071): Written to Disk 1
Stripe Unit 2 (bytes 131,072-196,607): Written to Disk 2
Stripe Unit 3 (bytes 196,608-262,143): Written to Disk 3

A full stripe encompasses one stripe unit from each disk. Reading the entire 256KB file can theoretically engage all 4 disks in parallel, achieving 4x the throughput of a single disk.

Converting Mermaid diagram...

Stripe Size: The Critical Tuning Parameter

The stripe size (also called chunk size or stripe width) fundamentally determines which I/O patterns benefit from striping:

Small Stripe Sizes (4KB - 32KB):

A single large I/O operation spans multiple disks
Maximizes throughput for sequential workloads (video streaming, large file transfers)
Individual small I/O operations may still hit only one disk
Creates more stripe boundaries, potentially more metadata overhead

Large Stripe Sizes (128KB - 1MB):

Small I/O operations are likely contained within a single stripe unit
Less disk involvement for small random I/O
Large sequential operations still span multiple disks (just fewer stripes)
Reduces the probability of partial stripe writes

Stripe Address Calculation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def calculate_stripe_location(logical_address: int, 
                                    stripe_size: int, 
                                    num_disks: int) -> dict:
    """
    Calculate which disk and position contains a given logical address.
    
    Args:
        logical_address: Byte offset in the logical volume
        stripe_size: Size of each stripe unit in bytes
        num_disks: Number of data disks in the array
        
    Returns:
        Dictionary with disk number, stripe row, and offset within stripe
    """
    # Which stripe unit does this address fall into?
    stripe_unit = logical_address // stripe_size
    
    # Which disk holds this stripe unit?
    disk_number = stripe_unit % num_disks
    
    # Which row of stripes (which full stripe) is this?
    stripe_row = stripe_unit // num_disks
    
    # Offset within the stripe unit
    offset_in_stripe = logical_address % stripe_size
    
    # Physical address on the disk
    physical_address = (stripe_row * stripe_size) + offset_in_stripe
    
    return {
        'disk': disk_number,
        'stripe_row': stripe_row,
        'offset_in_stripe': offset_in_stripe,
        'physical_address': physical_address
    }
 
# Example: 4 disks, 64KB stripe size
# Where does byte 200,000 reside?
result = calculate_stripe_location(
    logical_address=200000,
    stripe_size=65536,  # 64KB
    num_disks=4
)
print(f"Logical address 200,000 maps to:")
print(f"  Disk: {result['disk']}")
print(f"  Stripe row: {result['stripe_row']}")
print(f"  Offset in stripe: {result['offset_in_stripe']}")
print(f"  Physical address on disk: {result['physical_address']}")
 
# Output:
# Logical address 200,000 maps to:
#   Disk: 3
#   Stripe row: 0
#   Offset in stripe: 3392
#   Physical address on disk: 3392

Striping Performance Analysis

Sequential Throughput Model:

For large sequential I/O, the theoretical throughput of a striped array is:

$$T_{array} = n \times T_{disk}$$

where n is the number of disks and T_disk is the throughput of a single disk. In practice, this linear scaling is limited by:

Controller saturation: The RAID controller has finite processing capacity
Bus bandwidth: PCIe, SAS, or SATA links may become bottlenecks
Seek overhead: HDDs must position heads; random access reduces effective throughput
Disk imbalance: Slower or busier disks limit overall throughput

Theoretical vs. Achieved Striping Throughput
Disks	Single Disk (MB/s)	Theoretical Max	Typical Achieved	Efficiency
2	150	300 MB/s	280 MB/s	93%
4	150	600 MB/s	520 MB/s	87%
8	150	1,200 MB/s	950 MB/s	79%
16	150	2,400 MB/s	1,700 MB/s	71%
32	150	4,800 MB/s	2,800 MB/s	58%

Random I/O and Stripe Size Interaction:

The relationship between random I/O performance and stripe size is nuanced. Consider random 4KB reads on an array with different stripe sizes:

Stripe size = 4KB:

Each 4KB read involves exactly one disk
With 8 disks, the array can handle 8× the IOPS of a single disk
Maximum parallelism for small random workloads

Stripe size = 64KB:

Each 4KB read also involves exactly one disk
Same IOPS benefit as 4KB stripe size for small random I/O
But large 256KB reads now span only 4 disks instead of 64

Stripe size = 1MB:

Small random I/O is still single-disk
Large sequential I/O loses some parallelism
Fewer stripe boundary crossings, potentially better for databases

Stripe Size Guidelines

The Stripe Alignment Problem:

Performance degradation occurs when I/O operations are misaligned with stripe boundaries. Consider a 64KB stripe size and a 64KB write starting at byte offset 32,768:

Bytes 32,768 - 65,535 go to Disk 0 (latter half of stripe unit 0)
Bytes 65,536 - 98,303 go to Disk 1 (first half of stripe unit 1)

This misaligned write requires:

Read-modify-write on Disk 0 (read first half, write combined)
Read-modify-write on Disk 1 (unless we write full stripe unit)

Data Mirroring: The Foundation of Fault Tolerance

Mirror Architectures:

Two-Way Mirror (RAID 1): The most common configuration, maintaining exactly two copies:

Disk A contains: D₀, D₁, D₂, D₃, ...
Disk B contains: D₀, D₁, D₂, D₃, ... (identical)

Three-Way Mirror (RAID 1 with three copies): Used for mission-critical data:

Disk A, B, C all contain identical copies
Survives any two-disk failure
Triple storage overhead

N-Way Mirror: Some systems support arbitrary mirror counts for extreme reliability requirements.

Write Propagation Strategies:

When a write occurs, all mirror copies must be updated. The timing and ordering of these updates creates critical design choices:

Synchronous Mirroring:

Application Write Request
        |
        v
   Write to Disk A ──────┐
        |                 |
        v                 v
   Write to Disk B    (parallel)
        |                 |
        └───────┬─────────┘
                v
   Wait for BOTH to complete
                |
                v
   Return success to application

Asynchronous Mirroring (rarely used for local RAID):

Returns success after writing to one copy
Updates other copies in background
Risk of data loss if primary fails before propagation
Common in long-distance replication, not local RAID

Mirror Write Ordering Guarantees

•Write Ordering: Writes must be applied to all mirrors in the same order to maintain consistency
•Atomicity: A write either completes on all mirrors or none (requires careful handling of partial failures)
•Durability: Data is considered durable only when persisted to stable storage on sufficient mirrors
•Read Consistency: Reads must return the most recent successfully written data
•Rebuild Consistency: During rebuild, reads must not return stale data from the degraded period

The Write Hole Problem:

A critical vulnerability in mirrored arrays occurs during the window when a write has completed on some mirrors but not others. If a power failure occurs:

Mirror A has: D_new (write completed)
Mirror B has: D_old (write not yet completed)

Upon recovery, which disk contains the "correct" data? Without additional information, the system cannot know. Solutions include:

Battery-backed write cache: Preserve in-flight writes across power failures
Write intent logging: Log that a write is in progress before starting
Copy-on-write semantics: Never overwrite in place; new writes go to new locations

Enterprise RAID controllers universally include battery or capacitor-backed cache to close this hole.

Mirror Read Strategies and Optimization

Round-Robin Reading:

The simplest approach alternates read requests between mirrors:

Request 1 → Disk A
Request 2 → Disk B
Request 3 → Disk A
...

This achieves load balancing but ignores disk state (e.g., current head position, queue depth).

Queue-Depth-Aware Reading:

More sophisticated controllers track the outstanding I/O queue on each disk and route new requests to the least-loaded disk:

Disk A queue: [op1, op2, op3]  (depth = 3)
Disk B queue: [op4]            (depth = 1)

New read request → Route to Disk B (shorter queue)

This approach adapts to asymmetric loads and can significantly reduce latency variability.

Seek-Aware Reading:

For HDDs with rotational latency, advanced controllers consider head position:

Track approximate head position on each disk
Route reads to the disk whose head is closest to the requested sector
Reduces average seek time, improving IOPS

This optimization is less relevant for SSDs where "seek time" is negligible.

Mirror Read Strategy Simulation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import random
from dataclasses import dataclass
from typing import List
 
@dataclass
class Disk:
    id: str
    queue_depth: int = 0
    head_position: int = 0  # Logical block address
 
class MirrorReadScheduler:
    def __init__(self, disks: List[Disk]):
        self.disks = disks
    
    def round_robin(self, request_num: int) -> Disk:
        """Simple alternating strategy"""
        return self.disks[request_num % len(self.disks)]
    
    def shortest_queue(self) -> Disk:
        """Select disk with smallest queue"""
        return min(self.disks, key=lambda d: d.queue_depth)
    
    def nearest_seek(self, target_lba: int) -> Disk:
        """Select disk with head closest to target"""
        return min(self.disks, key=lambda d: abs(d.head_position - target_lba))
    
    def adaptive(self, target_lba: int) -> Disk:
        """Combined strategy: balance queue depth and seek distance"""
        def cost(disk: Disk) -> float:
            seek_cost = abs(disk.head_position - target_lba) / 1000000  # Normalize
            queue_cost = disk.queue_depth * 0.5  # Weight queue more
            return seek_cost + queue_cost
        
        return min(self.disks, key=cost)
 
# Simulate selection
disk_a = Disk("A", queue_depth=2, head_position=100000)
disk_b = Disk("B", queue_depth=4, head_position=500000)
 
scheduler = MirrorReadScheduler([disk_a, disk_b])
target = 120000
 
print(f"Target LBA: {target}")
print(f"Disk A: queue={disk_a.queue_depth}, head={disk_a.head_position}")
print(f"Disk B: queue={disk_b.queue_depth}, head={disk_b.head_position}")
print(f"Round-robin (req 0): {scheduler.round_robin(0).id}")
print(f"Shortest queue: {scheduler.shortest_queue().id}")
print(f"Nearest seek: {scheduler.nearest_seek(target).id}")
print(f"Adaptive: {scheduler.adaptive(target).id}")

Read Performance Gains:

For a 2-way mirror with optimal read scheduling:

Throughput: Theoretical 2× improvement (both disks serving reads)
IOPS: Up to 2× improvement (requests distributed)
Latency: Improved average latency (better load distribution reduces queuing)

For 3-way or higher mirrors:

Read performance continues scaling (3×, 4×, etc.)
Write performance does not scale (still must write to all copies)

This asymmetry makes n-way mirrors attractive for extremely read-heavy workloads with extremely high reliability requirements.

Mirror Read Verification

Combining Striping and Mirroring

The true power of RAID emerges when striping and mirroring are combined. Two distinct approaches exist, and the differences between them have profound implications for performance and reliability.

RAID 0+1 (Stripe, Then Mirror):

First, create striped arrays. Then, mirror those arrays:

         [Mirrored Pair]
              |
      ┌───────┴───────┐
   [Stripe1]       [Stripe2]
      |               |
   ┌──┴──┐         ┌──┴──┐  
  D0   D1         D2   D3
  A C   B D       A C   B D

If Disk 0 fails:

Stripe1 is broken (was A,B across D0,D1)
The entire Stripe1 is now unavailable
Only Stripe2 survives
If ANY disk in Stripe2 fails next, data is lost

RAID 1+0 (Mirror, Then Stripe):

First, create mirrored pairs. Then, stripe across pairs:

         [Striped Array]
              |
      ┌───────┴───────┐
   [Mirror1]       [Mirror2]
      |               |
   ┌──┴──┐         ┌──┴──┐  
  D0   D1         D2   D3
  A C  A C        B D  B D

If Disk 0 fails:

Mirror1 survives (D1 still has A,C)
Stripe continues operating
Only if D1 ALSO fails (same mirror pair) is data lost
D2 or D3 failing has no effect on Mirror1

RAID 0+1 (Stripe then Mirror)

•Any single disk failure degrades an entire stripe
•After first failure, half the array is vulnerable
•Second failure in remaining stripe = data loss
•Rebuild requires copying entire stripe (more data)
•Rarely used in modern systems

RAID 1+0 (Mirror then Stripe)

•Single disk failure affects only one mirror pair
•Other mirror pairs continue operating normally
•Can survive n/2 failures if in different pairs
•Rebuild affects only one mirror pair (less data)
•Standard for production systems

Probability Analysis:

Consider a 4-disk array where each disk has failure probability p:

RAID 0+1 Failure Probability:

The array fails if: (first stripe loses any disk) AND (second stripe loses any disk)

P(first stripe degraded) = 1 - (1-p)² ≈ 2p for small p

After first failure, P(data loss) = P(any disk in remaining stripe fails)

Overall: complicated, but worse than RAID 1+0

RAID 1+0 Failure Probability:

The array fails only if both disks in ANY mirror pair fail:

P(mirror pair fails) = p × (p during rebuild window) ≈ p²

With 2 mirror pairs: P(data loss) = 1 - (1 - p²)² ≈ 2p² for small p

This is substantially better—failure probability scales with p² rather than p.

Terminology Clarification

Stripe Width and Disk Count Optimization

Stripe Width Trade-offs:

More Disks (Wider Stripe)	Fewer Disks (Narrower Stripe)
Higher sequential throughput	Lower sequential throughput
Higher total capacity	Lower total capacity
Higher failure probability (more disks to fail)	Lower failure probability
Longer rebuild times	Shorter rebuild times
More complex controller logic	Simpler controller logic
Better parallelism for large I/O	Less parallelism benefit

Optimal Stripe Width by RAID Level:

RAID 0: No upper limit from reliability perspective (but controller limits apply). More disks = more performance = more failure risk. Typical: 2-8 disks.

4-disk RAID 5: 75% efficiency, moderate rebuild time 8-disk RAID 5: 87.5% efficiency, long rebuild time

RAID 6: Recommended: 4-16 disks. The dual parity allows larger arrays with acceptable reliability. Still, extremely large arrays should be split into multiple smaller RAID 6 groups.

RAID 10: Stripe width = number of mirror pairs. Recommended: 2-8 mirror pairs (4-16 disks total). More pairs = more performance, but also more components to manage.

Rebuild Time Estimation (7200 RPM HDDs, ~100 MB/s)
Disk Size	RAID 5 (4 disks)	RAID 5 (8 disks)	RAID 6 (8 disks)	RAID 10 (8 disks)
1 TB	~3 hours	~6 hours	~6 hours	~3 hours
4 TB	~12 hours	~24 hours	~24 hours	~12 hours
8 TB	~24 hours	~48 hours	~48 hours	~24 hours
16 TB	~48 hours	~96 hours	~96 hours	~48 hours

The Rebuild Storm Problem:

During rebuild, the array must read from all surviving disks while simultaneously writing to the replacement disk. This creates several challenges:

Performance Impact: Normal I/O competes with rebuild I/O, degrading application performance by 30-70%
Temperature Stress: Continuous heavy I/O raises drive temperatures, potentially accelerating failures
Unrecoverable Read Errors (URE): Enterprise HDDs specify a URE rate of ~1 in 10^15 bits read. For a 1TB read:

$$P(URE) = 1 - (1 - 10^{-15})^{8 \times 10^{12}} \approx 0.8%$$

For an 8-disk RAID 5 with 8TB drives, reading 56TB during rebuild: $$P(URE) = 1 - (1 - 10^{-15})^{448 \times 10^{12}} \approx 35%$$

A 35% chance of hitting an unrecoverable error during rebuild—causing data loss even though only one disk failed!

URE and Large Drives

Hot Spares and Rebuild Strategies

Hot Spare Strategies:

Dedicated Hot Spares:

Each specific RAID group has its own dedicated spare
Fastest response (no allocation decision needed)
Less efficient (more idle disks)
Best for mission-critical arrays

Global Hot Spares:

A pool of spares shared across multiple RAID groups
Controller allocates spare to any failed group
More efficient (fewer total spares needed)
Slightly slower response (allocation logic)
May need spares of different sizes for mixed arrays

Virtual Hot Spares (Distributed Spare Space):

Each disk reserves some capacity as spare space
Failure triggers redistribution to space on all remaining disks
No dedicated idle disk needed
Used by some advanced systems (ZFS, some enterprise arrays)

Rebuild Priority and Scheduling:

Reconstruction must balance two competing goals:

Minimize vulnerability window: Faster rebuild = less time exposed to second failure
Maintain application performance: Aggressive rebuild I/O can cripple applications

Modern controllers offer rebuild priority settings:

High/Aggressive: Rebuild as fast as possible, significant performance impact
Medium/Balanced: Moderate rebuild speed, moderate performance impact
Low/Background: Minimal performance impact, very slow rebuild

Some advanced systems offer adaptive rebuild:

Aggressive during low-activity periods (nights, weekends)
Throttled during peak usage
Monitors application latency and adjusts dynamically

Best Practices for Rebuild Management

•Always configure hot spares: The hours saved by automatic rebuild far exceed the cost of spare disks
•Use global spares efficiently: 1 spare per 10-20 drives is a common ratio for large deployments
•Monitor rebuild progress: Track completion percentage and estimated time; alert on stalls
•Schedule intensive rebuilds carefully: For non-critical arrays, consider low-priority rebuild during off-hours
•Verify rebuild completion: Confirm array returns to optimal state; some failures may not complete
•Replace failed disks promptly: Don't leave arrays degraded; human delay is often the largest risk factor

Copyback Operations

Summary: Striping and Mirroring Mastery

We've explored the fundamental techniques underlying all RAID configurations. Let's consolidate the key insights:

Key Takeaways

•Striping divides data across disks for parallel I/O, with stripe size determining the performance characteristics for different workloads
•Mirroring maintains complete copies on multiple disks, with synchronous writes ensuring consistency at the cost of write latency
•Mirror read optimization can achieve near-linear throughput scaling by load-balancing reads across copies
•RAID 1+0 (mirror-then-stripe) is superior to RAID 0+1 (stripe-then-mirror) for fault tolerance and rebuild efficiency
•Stripe width involves trade-offs between performance, efficiency, and rebuild risk—larger arrays need careful consideration
•Hot spares dramatically reduce vulnerability windows by enabling immediate automatic rebuild
•Rebuild time and URE probability make large RAID 5 arrays increasingly dangerous; RAID 6 or RAID 10 are preferred

What's Next:

Page Complete

Striping and Mirroring

Disks

Striping and Mirroring

Disks