Loading content...
In the previous page, we established that stable storage is achieved through redundancy—storing data on multiple independent devices such that no single failure can cause data loss. But how do we actually implement this redundancy in practice?
The answer, for over four decades, has been RAID (Redundant Array of Independent Disks). RAID is not a single technology but a family of techniques for combining multiple physical storage devices into a logical unit that provides improved reliability, performance, or both.
RAID is so fundamental to database systems that you will encounter it in virtually every production deployment. Understanding RAID levels, their tradeoffs, and their failure characteristics is essential knowledge for any database professional.
By the end of this page, you will understand RAID levels 0, 1, 5, 6, and 10, their performance and reliability characteristics, how to choose RAID levels for different database workloads, and the limitations and failure scenarios of each configuration.
RAID was introduced in 1988 by Patterson, Gibson, and Katz at UC Berkeley. Their seminal paper, "A Case for Redundant Arrays of Inexpensive Disks (RAID)," proposed using multiple low-cost drives to match or exceed the performance and reliability of expensive mainframe storage.
The Core RAID Concepts:
1. Striping
Data is divided into chunks (called stripes or strips) and distributed across multiple drives. This allows parallel access—multiple drives can satisfy a single request simultaneously, increasing throughput.
2. Mirroring
Data is duplicated across multiple drives. If one drive fails, the mirror contains an identical copy. This provides fault tolerance at the cost of storage capacity.
3. Parity
Redundant information is computed and stored that allows reconstruction of lost data. Parity provides fault tolerance with less storage overhead than full mirroring.
The original RAID acronym used 'Inexpensive' to contrast with expensive mainframe storage. Modern usage typically expands RAID as 'Independent' to emphasize that the drives can fail independently—a property essential for actual redundancy.
RAID Implementations:
RAID can be implemented at different levels of the storage stack:
Hardware RAID
A dedicated RAID controller with its own processor handles all RAID operations. The controller presents a single logical drive to the operating system. High-end controllers include battery-backed cache for write performance and durability.
Advantages: Best performance, OS-independent, transparent to software Disadvantages: Expensive, vendor lock-in, failed controller requires compatible replacement
Software RAID
The operating system or storage driver implements RAID logic using the host CPU. Linux mdadm, Windows Storage Spaces, and ZFS all implement software RAID.
Advantages: No special hardware, portable across systems, CPU overhead trivial for modern processors Disadvantages: Uses host CPU, slightly lower performance, OS-dependent
Database-Level RAID
Some database systems implement their own redundancy. Oracle ASM (Automatic Storage Management) provides mirroring and striping independent of the OS or hardware RAID.
| Implementation | CPU Usage | Performance | Portability | Cost |
|---|---|---|---|---|
| Hardware RAID | Minimal (dedicated processor) | Excellent | Low (vendor-specific) | High |
| Software RAID | Low (modern CPUs) | Very Good | High (OS-specific) | Low |
| Database RAID (Oracle ASM) | Low | Good | Database-specific | Part of license |
RAID 0, despite the name, provides no redundancy whatsoever. It uses striping to distribute data across multiple drives for improved performance, but a single drive failure causes complete data loss.
How RAID 0 Works:
Data is divided into stripes (typically 64KB to 256KB) and written round-robin across all drives in the array:
Drive 0 Drive 1 Drive 2 Drive 3
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Stripe 0│ │ Stripe 1│ │ Stripe 2│ │ Stripe 3│
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ Stripe 4│ │ Stripe 5│ │ Stripe 6│ │ Stripe 7│
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ Stripe 8│ │ Stripe 9│ │ Stripe10│ │ Stripe11│
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Performance Characteristics:
RAID 0 does not provide any data protection. In fact, it increases the probability of data loss compared to a single drive. With n drives, if any one fails, all data is lost. For 4 drives with 2% annual failure rate each:
P(RAID 0 failure) = 1 - (0.98)^4 ≈ 7.8% annual failure rate
RAID 0 should NEVER be used for database data or transaction logs unless combined with other protection (like replication).
RAID 0 Properties:
| Property | Value | Notes |
|---|---|---|
| Minimum Drives | 2 | No upper limit |
| Usable Capacity | 100% (n × drive size) | All capacity is usable |
| Fault Tolerance | None | Any drive failure = total data loss |
| Read Performance | n × single drive | Scales with drive count |
| Write Performance | n × single drive | Scales with drive count |
| Rebuild Time | N/A | No rebuild possible—data is lost |
Legitimate Uses for RAID 0:
Despite its dangers, RAID 0 has valid applications:
For database systems, RAID 0 alone is almost never appropriate. However, it forms a building block for other RAID levels (like RAID 10).
RAID 1 is the simplest form of redundancy: every write is duplicated to two (or more) drives. Either drive can satisfy any read, and if one drive fails, the other contains a complete copy of all data.
How RAID 1 Works:
Write Request
│
┌────┴────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ Drive A │ │ Drive B │
│ (Copy 1)│ │ (Copy 2)│
│ │ │ │
│ Block 0 │ │ Block 0 │
│ Block 1 │ │ Block 1 │
│ Block 2 │ │ Block 2 │
│ ... │ │ ... │
└─────────┘ └─────────┘
│ │
└────┬────┘
│
Read Request
(Either drive can respond)
Performance Characteristics:
| Property | Value | Notes |
|---|---|---|
| Minimum Drives | 2 | Can use 3+ for multi-way mirror |
| Usable Capacity | 50% (n/2 × drive size) | Half capacity for mirrors |
| Fault Tolerance | 1 drive | More with multi-way mirrors |
| Read Performance | Up to 2× single drive | Depends on implementation |
| Write Performance | 1× single drive | Limited by slowest mirror |
| Rebuild Time | Hours to 1 day | Sequential copy of entire drive |
Transaction logs have high write intensity with small, sequential writes. RAID 1 is often the best choice because: it has no write penalty (unlike RAID 5/6), failed drive recovery doesn't impact write performance, and the 50% overhead is acceptable for critical data. Many database administrators use RAID 1 for logs and RAID 5/6 or RAID 10 for data files.
RAID 5 strikes a balance between the performance of RAID 0 and the protection of RAID 1. It uses striping with distributed parity—redundant information computed from the data that allows reconstruction of any single failed drive.
How RAID 5 Works:
Data is striped across drives like RAID 0, but one stripe in each row is a parity stripe computed from the other stripes using XOR:
Drive 0 Drive 1 Drive 2 Drive 3
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ D0 │ │ D1 │ │ D2 │ │ P0 │ Row 0
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ D3 │ │ D4 │ │ P1 │ │ D5 │ Row 1
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ D6 │ │ P2 │ │ D7 │ │ D8 │ Row 2
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ P3 │ │ D9 │ │ D10 │ │ D11 │ Row 3
└─────────┘ └─────────┘ └─────────┘ └─────────┘
P0 = D0 XOR D1 XOR D2
P1 = D3 XOR D4 XOR D5
... and so on
If Drive 1 fails:
D1 = D0 XOR D2 XOR P0 (can be reconstructed!)
The parity is distributed across all drives rather than concentrated on a single drive. This prevents the parity drive from becoming a bottleneck.
Every write to RAID 5 requires updating both the data stripe and the corresponding parity stripe. This requires reading the old data, reading the old parity, computing new parity, writing new data, and writing new parity—a 4:1 I/O ratio for small random writes. This 'write penalty' makes RAID 5 less suitable for write-intensive database workloads.
RAID 5 Write Operations in Detail:
Small Write (Read-Modify-Write):
1. Read old data block
2. Read old parity block
3. Compute: new_parity = old_parity XOR old_data XOR new_data
4. Write new data block
5. Write new parity block
Total: 2 reads + 2 writes = 4 I/O operations per logical write
Full Stripe Write (Optimal):
If writing all data blocks in a stripe simultaneously:
1. Compute parity from all new data blocks
2. Write all data blocks + parity block in parallel
Total: n writes (no reads required)
The full stripe write path is why sequential write workloads perform well on RAID 5, while random writes suffer significantly.
| Property | Value | Notes |
|---|---|---|
| Minimum Drives | 3 | More drives = better efficiency |
| Usable Capacity | (n-1)/n × total | One drive worth for parity |
| Fault Tolerance | 1 drive | Second failure = data loss |
| Read Performance | ~(n-1) × single drive | All drives contribute to reads |
| Write Performance | Varies | Excellent for sequential, poor for random |
| Rebuild Time | Many hours to days | Must read all surviving drives |
Capacity Efficiency vs. Drive Count:
| Drives | Usable Capacity | Overhead |
|---|---|---|
| 3 | 66.7% | 33.3% |
| 4 | 75.0% | 25.0% |
| 5 | 80.0% | 20.0% |
| 6 | 83.3% | 16.7% |
| 8 | 87.5% | 12.5% |
| 12 | 91.7% | 8.3% |
RAID 6 extends RAID 5 by adding a second parity block to each stripe, computed using a different algorithm. This allows the array to survive two simultaneous drive failures.
Why RAID 6 Matters:
RAID 5's single-drive fault tolerance has become increasingly risky as drives grow larger:
Larger drives take longer to rebuild — A 10TB drive might take 24-48 hours to rebuild, during which the array is vulnerable.
Unrecoverable Read Errors (URE) — Modern drives have URE rates around 10^14 (one per 12.5TB read). During a rebuild of a large array, hitting an URE on another drive is increasingly likely, causing data loss.
Rebuild stress can trigger failures — The intensive I/O during rebuild operation can trigger latent failures in other aging drives.
Many storage architects now consider RAID 5 unsuitable for drives larger than 2TB. With 8 x 10TB drives in RAID 5, you would read 70TB during rebuild. Given typical URE rates, you have a significant probability of hitting an unrecoverable error during rebuild—losing the entire array. RAID 6 provides the additional protection needed for large-capacity drives.
How RAID 6 Works:
RAID 6 uses two independent parity calculations, typically denoted P and Q:
Drive 0 Drive 1 Drive 2 Drive 3 Drive 4
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ D0 │ │ D1 │ │ D2 │ │ P0 │ │ Q0 │
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ D3 │ │ D4 │ │ P1 │ │ Q1 │ │ D5 │
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ D6 │ │ P2 │ │ Q2 │ │ D7 │ │ D8 │
├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤ ├─────────┤
│ P3 │ │ Q3 │ │ D9 │ │ D10 │ │ D11 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
P = Simple XOR parity
Q = Reed-Solomon or other algebraic code
With both P and Q, any two drives can be reconstructed
RAID 6 Write Penalty:
RAID 6's write penalty is worse than RAID 5 because two parity blocks must be updated:
Small Write:
1. Read old data block
2. Read old P block
3. Read old Q block
4. Compute new P and new Q
5. Write new data block
6. Write new P block
7. Write new Q block
Total: 3 reads + 3 writes = 6 I/O operations per logical write
| Property | Value | Notes |
|---|---|---|
| Minimum Drives | 4 | More practical with 5+ |
| Usable Capacity | (n-2)/n × total | Two drives worth for parity |
| Fault Tolerance | 2 drives | Third failure = data loss |
| Read Performance | ~(n-2) × single drive | Similar to RAID 5 |
| Write Performance | Lower than RAID 5 | Additional parity calculation |
| Rebuild Time | Many hours to days | Same as RAID 5, but safer |
For database data files on large-capacity drives, RAID 6 is increasingly the standard choice. The additional protection against double failures and UREs during rebuild outweighs the modest performance penalty. For write-intensive transaction logs, RAID 1 or RAID 10 remains preferred.
RAID 10 (also written as RAID 1+0) combines the performance benefits of RAID 0 striping with the reliability of RAID 1 mirroring. It's often considered the premium choice for database storage that requires both high performance and high reliability.
How RAID 10 Works:
Data is first mirrored (RAID 1), then the mirrored pairs are striped (RAID 0):
RAID 0 (Striping)
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Mirror Pair 0 │ │ Mirror Pair 1 │ │ Mirror Pair 2 │
│ │ │ │ │ │
│ ┌─────┐ ┌─────┐│ │ ┌─────┐ ┌─────┐│ │ ┌─────┐ ┌─────┐│
│ │ A │ │ A' ││ │ │ B │ │ B' ││ │ │ C │ │ C' ││
│ └─────┘ └─────┘│ │ └─────┘ └─────┘│ │ └─────┘ └─────┘│
│ │ │ │ │ │
│ Drive 0 Drive 1│ │ Drive 2 Drive 3│ │ Drive 4 Drive 5│
└─────────────────┘ └─────────────────┘ └─────────────────┘
RAID 1 RAID 1 RAID 1
Stripe 0: Written to Mirror Pair 0 (drives 0 and 1)
Stripe 1: Written to Mirror Pair 1 (drives 2 and 3)
Stripe 2: Written to Mirror Pair 2 (drives 4 and 5)
... continues striping across mirror pairs
Performance Characteristics:
| Property | Value | Notes |
|---|---|---|
| Minimum Drives | 4 | Always an even number |
| Usable Capacity | 50% | Half used for mirrors |
| Fault Tolerance | 1 per mirror pair | Can tolerate n/2 failures if in different pairs |
| Read Performance | n × single drive | All drives contribute |
| Write Performance | (n/2) × single drive | Limited by mirror pairs |
| Rebuild Time | Hours (single drive copy) | Much faster than RAID 5/6 |
RAID 10 can lose data if both drives in any mirror pair fail. With 6 drives (3 mirror pairs), losing drive 0 and drive 1 (same pair) = data loss. But losing drive 0, drive 2, and drive 4 (different pairs) = array survives. The failure pattern matters more than the number of failures.
RAID 10 vs. RAID 01:
RAID 10 (mirrors of stripes) is not the same as RAID 01 (stripes of mirrors):
RAID 10 is almost always preferred because it has better fault tolerance characteristics and rebuilds affect only the failed drive's mirror pair.
Choosing the right RAID level for database storage requires understanding your workload characteristics. Different database components have different I/O patterns:
Transaction Logs (WAL/Redo)
Data Files (Tables, Indexes)
Temporary/Working Space
| Component | I/O Pattern | Recommended RAID | Alternative |
|---|---|---|---|
| Transaction Log | Write-heavy, sequential | RAID 1 | RAID 10 |
| Data Files (OLTP) | Random read/write | RAID 10 | RAID 6 |
| Data Files (OLAP) | Sequential read-heavy | RAID 6 | RAID 5 |
| Temp Space | High throughput, temporary | RAID 10 | RAID 0 |
| Backup Staging | Sequential write | RAID 6 | RAID 5 |
Decision Matrix:
| RAID | Capacity | Read Perf | Write Perf | Redundancy | Best For |
|---|---|---|---|---|---|
| RAID 0 | 100% | Excellent | Excellent | None | Temp space, scratch |
| RAID 1 | 50% | Good | Good | 1 drive | Logs, boot, critical |
| RAID 5 | (n-1)/n | Good | Poor (random) | 1 drive | Read-heavy, older |
| RAID 6 | (n-2)/n | Good | Poor (random) | 2 drives | Large drives, archival |
| RAID 10 | 50% | Excellent | Excellent | 1 per pair | OLTP databases |
For mission-critical databases, RAID 10 for both logs and data is increasingly common despite the 50% capacity overhead. Storage costs have dropped dramatically, but downtime costs have increased. The performance and reliability benefits often justify the expense. When budget is constrained, use RAID 1 for logs and RAID 6 for data.
RAID technology transforms unreliable individual drives into reliable storage systems. Let's consolidate the key concepts:
What's Next:
RAID protects against drive failures within a single system, but what about failures that affect the entire system—power surges, controller failures, or physical damage? The next page explores mirroring at the system level, including synchronous replication and hot standby configurations.
You now understand RAID levels, their characteristics, and how to select the appropriate RAID configuration for different database workloads. This knowledge is essential for designing and managing reliable database storage systems.