Database Management SystemsDisk Structure

Disk Structure: Understanding Physical Storage

LevelIntermediate

Duration60 mins

TopicDisk Structure

5 / 5

Access Time Components

The Anatomy of Disk Latency

When a database requests data from a hard disk, the storage system doesn't respond instantly. Mechanical components must physically move, platters must rotate, and data must be transferred through multiple interfaces. Understanding the components of disk access time is fundamental to database performance engineering.

Why Access Time Matters:

Disk access time often dominates query execution time:

A CPU can execute billions of instructions per second
Main memory can be accessed in ~100 nanoseconds
A single random disk access takes ~10 milliseconds

This gap—100,000× slower than RAM—means that reducing disk access count and optimizing access patterns has outsized impact on database performance.

In this page, we dissect every component of disk access time, providing the quantitative foundation for understanding why databases are designed the way they are.

What You Will Learn

By the end of this page, you will understand the complete breakdown of disk access time: seek time (and its variants), rotational latency, data transfer time, and controller overhead. You will be able to calculate expected access times for different workloads and understand how each component affects database design decisions.

The Access Time Formula

The total time to access data on a disk is the sum of several components:

Total Access Time = Seek Time + Rotational Latency + Transfer Time + Controller Overhead

Component Breakdown:

Component	Description	Typical Value
Seek Time	Time for actuator to move heads to correct cylinder	3-15 ms
Rotational Latency	Time for platter to rotate desired sector under head	2-6 ms (average)
Transfer Time	Time to read/write the data	0.01-0.1 ms per sector
Controller Overhead	Command processing, bus transfer	0.1-0.5 ms

Order of Magnitude:

For typical random access:

Seek: ~8 ms
Rotation: ~4 ms
Transfer (1 sector): ~0.01 ms
Overhead: ~0.2 ms
Total: ~12 ms per random I/O

This means maximum random IOPS ≈ 1000 ms / 12 ms ≈ 80-100 IOPS per drive.

Access Time Component Comparison by Drive Type
Component	5400 RPM Desktop	7200 RPM Enterprise	15000 RPM Enterprise	SSD (Reference)
Average Seek Time	12-14 ms	8-10 ms	3-4 ms	N/A (0 ms)
Avg Rotational Latency	5.56 ms	4.17 ms	2.00 ms	N/A (0 ms)
Max Transfer Rate	~150 MB/s	~250 MB/s	~300 MB/s	500-7000 MB/s
Random Read Latency	~18 ms	~12 ms	~5 ms	~0.1 ms
Random IOPS	~50-60	~80-100	~180-200	10,000-1,000,000

The Dominance of Mechanical Latency:

For random access patterns:

Seek + rotation account for ~99% of access time
Transfer time is negligible for small reads
Reducing seeks and optimizing rotation are critical

For sequential access patterns:

Single initial seek + rotation
Subsequent accesses are transfer-time limited
Throughput approaches maximum transfer rate

The I/O Performance Equation

Database performance depends on whether workload is latency-bound or throughput-bound. OLTP (random small I/O) is latency-bound—limited by seek + rotation. OLAP (sequential large I/O) is throughput-bound—limited by transfer rate. Optimizing the wrong component wastes effort.

Seek Time in Depth

Seek time is the duration required for the actuator arm to move the read/write heads from their current position to the target cylinder. It is typically the largest component of random access latency.

Seek Time Components:

The seek operation consists of three phases:

Acceleration — Voice coil motor accelerates the arm assembly
Coast/Slew — Arm travels at maximum velocity (for long seeks)
Deceleration and Settle — Arm slows down and precisely positions over target track

Seek Time Specifications:

Drive specifications include several seek time metrics:

Track-to-Track Seek: Moving to adjacent cylinder (~0.5-1 ms)
Average Seek: Statistical average across all possible seeks (~8-12 ms)
Full Stroke Seek: Innermost to outermost cylinder (~15-25 ms)
Read Seek vs Write Seek: Write may be slightly longer (more precise positioning required)

Seek Time by Distance
Seek Type	Cylinders Moved	Typical Time	Database Scenario
Track-to-Track	1	0.5-1.0 ms	Sequential scan crossing track
Short Seek	1-100	1-3 ms	Localized random access
Average Seek	~1/3 of max	8-12 ms	Random access (typical)
Long Seek	1/2 of max	12-18 ms	Cross-partition access
Full Stroke	Maximum	15-25 ms	Worst case (inner ↔ outer)

The Seek Time Formula:

Seek time as a function of distance is approximately:

T_seek = a + b × sqrt(d)

Where:

d = number of cylinders to traverse
a = fixed overhead (arm startup, settling)
b = coefficient based on arm speed

The square root relationship occurs because:

Arm accelerates for first half of distance
Arm decelerates for second half
Time = 2 × sqrt(2 × distance / acceleration)

Why Average Seek ≠ Half of Full Stroke:

For uniformly distributed random seeks, the average distance is one-third of the maximum distance, not one-half. This is because:

More cylinder pairs are close together than far apart
Integral calculation shows E[distance] = (max_cylinders) / 3

Therefore, average seek time is less than half of full stroke time.

Seek Optimization Strategies

To minimize seek time impact: (1) Use clustering indexes to keep related data together, (2) Employ short-stroking (using only outer cylinders), (3) Partition tables to localize access, (4) Use SSDs for random-access-heavy workloads, (5) Increase I/O queue depth to allow elevator sorting.

Rotational Latency

Rotational latency (also called rotational delay) is the time waiting for the desired sector to rotate under the read/write head after the seek completes.

The Rotation Dynamics:

After the head reaches the target track, it must wait for the correct sector:

Best case: Sector arrives immediately (0 latency)
Worst case: Just missed, full rotation required
Average case: Half a rotation (statistically)

Calculating Rotational Latency:

Time per Revolution = 60,000 ms / RPM

Average Rotational Latency = (Time per Revolution) / 2

Rotational Latency by RPM
RPM	Time per Revolution	Avg Rotational Latency	Latency Range
4200 RPM	14.29 ms	7.14 ms	0 - 14.29 ms
5400 RPM	11.11 ms	5.56 ms	0 - 11.11 ms
7200 RPM	8.33 ms	4.17 ms	0 - 8.33 ms
10000 RPM	6.00 ms	3.00 ms	0 - 6.00 ms
15000 RPM	4.00 ms	2.00 ms	0 - 4.00 ms

Rotational Latency Variance:

Unlike seek time (predictable based on distance), rotational latency is essentially random:

Follows a uniform distribution from 0 to full rotation time
Cannot be predicted without knowing current sector position
Variance = (rotation_time)² / 12

The Rotational Position Sensing (RPS) Optimization:

Some drives and controllers implement RPS to reduce rotational latency waste:

When issuing a seek, controller calculates expected arrival time
Compares to target sector position at that time
May delay issuing the seek to minimize total time
Called "rotational position optimization" or "command ordering"

Head Switching and Rotation:

When switching heads within a cylinder:

New head position may not align with previous
Small rotational wait may be needed
Drive firmware accounts for this in access optimization

The Speed Premium

15000 RPM drives cost significantly more than 7200 RPM drives, yet the rotational latency improvement is only ~2 ms (4.17 ms → 2.00 ms). For random access workloads, this translates to roughly 20% lower total latency and 20% higher IOPS. For enterprise OLTP workloads where every millisecond matters, this premium is often justified. For sequential/batch workloads, the benefit is minimal.

Transfer Time

Transfer time (also called data transfer time or read/write time) is the duration to transfer data between the platter surface and the drive's buffer, and subsequently to/from the host system.

Transfer Time Components:

Media Transfer — Data read from platter to drive buffer
Interface Transfer — Data moved from drive buffer to host memory

Typically, media transfer is the bottleneck for HDDs, while interface transfer may bottleneck older connections.

Calculating Transfer Time:

Transfer Time = (Data Size) / (Transfer Rate)

Where transfer rate depends on zone (outer zones are faster due to ZBR) and drive specifications.

Transfer Time Examples
Data Size	Transfer Rate	Transfer Time	Context
4 KB (1 sector)	200 MB/s	0.02 ms	Single database page
64 KB (16 sectors)	200 MB/s	0.32 ms	Read-ahead buffer
1 MB	200 MB/s	5.0 ms	Large LOB read
10 MB	200 MB/s	50 ms	Backup chunk
1 GB	200 MB/s	5 seconds	Large table scan segment

Zone-Dependent Transfer Rates:

Due to Zone Bit Recording (ZBR), transfer rates vary significantly:

Outer zones (low LBAs): 180-250 MB/s
Middle zones: 140-180 MB/s
Inner zones (high LBAs): 90-130 MB/s

This ~2x variation means:

Benchmark results depend heavily on test location
Data placement affects throughput
Short-stroking (using only outer zones) significantly improves sequential performance

Interface Bandwidth:

The drive interface must keep up with media transfer rates:

Interface	Max Bandwidth	Bottleneck Status
SATA I	150 MB/s	May bottleneck outer zones
SATA II	300 MB/s	Generally sufficient for HDDs
SATA III	600 MB/s	Ample headroom for HDDs
SAS-1	300 MB/s	Generally sufficient
SAS-2	600 MB/s	Ample headroom
SAS-3	1200 MB/s	Designed for SSDs
SAS-4	2400 MB/s	Designed for SSDs

Sequential Performance Optimization

For sequential workloads (backup, restore, bulk loading, table scans), transfer time dominates. Optimization strategies: (1) Use larger I/O sizes to reduce command overhead, (2) Place performance-critical data on outer zones, (3) Use RAID striping to parallelize across drives, (4) Ensure interface bandwidth is not the bottleneck.

Controller and Command Overhead

Beyond the mechanical components, additional time is consumed by command processing, data movement through the I/O stack, and controller operations.

Sources of Overhead:

Command Parsing — Drive interprets the SATA/SAS command
Buffer Management — Data moved to/from drive cache
ECC Processing — Error correction encoding/decoding
Host Interface — Protocol overhead (command queuing, response)
Operating System — I/O scheduler, driver, interrupt handling
File System — Block allocation lookup, metadata updates

Overhead Time Breakdown

•Drive Controller Processing: 50-200 µs — Command decode, buffer management, ECC
•Host Bus Adapter (HBA): 10-50 µs — Protocol handling, DMA setup
•Operating System: 20-100 µs — I/O scheduler, driver, context switches
•File System: 10-50 µs — Block map lookup, journaling
•Database Layer: 10-100 µs — Buffer pool management, lock acquisition

Total Overhead Impact:

Combined overhead is typically 0.1-0.5 ms, which is:

Negligible for random I/O (dominated by ~12 ms seek + rotation)
Significant for sequential I/O (can limit small-block sequential performance)
Critical for SSDs (overhead becomes dominant when mechanical latency is zero)

Cache Hit Shortcut:

When data is found in the drive's buffer cache:

Seek and rotation are eliminated
Response time is purely overhead (~0.1-0.3 ms)
Drive cache hit rate depends on workload locality and cache size

Write Cache and Synchronous Writes:

Writing behavior significantly affects overhead:

Write cache enabled: Data acknowledged when in drive buffer (~0.1 ms)
Write cache disabled / FUA: Data acknowledged after media write (~12 ms)
Database transaction safety usually requires disabled cache or battery backup

Write Ordering and Data Integrity

Database systems rely on write ordering guarantees for crash recovery. If the write cache reorders writes and power fails, WAL (Write-Ahead Logging) guarantees may be violated. Enterprise drives with battery-backed write cache, or explicit cache flush commands (fsync), are required for database integrity. Never assume writes are durable until confirmed by the storage system.

Access Time Calculations

Let's work through detailed access time calculations for various scenarios, building intuition for performance analysis.

Example Drive Specifications:

RPM: 7200
Average Seek: 9 ms
Max Transfer Rate: 200 MB/s (outer zone)
Min Transfer Rate: 100 MB/s (inner zone)
Average Transfer Rate: 150 MB/s
Sector Size: 4 KB
Controller Overhead: 0.2 ms

Rotational Latency:

Time per revolution = 60,000 / 7200 = 8.33 ms
Average rotational latency = 8.33 / 2 = 4.17 ms

Access Time Calculation Examples
Scenario	Seek	Rotation	Transfer	Overhead	Total
Random 4 KB read	9.0 ms	4.17 ms	0.03 ms	0.2 ms	13.4 ms
Random 64 KB read	9.0 ms	4.17 ms	0.43 ms	0.2 ms	13.8 ms
Random 1 MB read	9.0 ms	4.17 ms	6.67 ms	0.2 ms	20.0 ms
Sequential 4 KB (after first)	0 ms*	0 ms*	0.03 ms	0.2 ms	0.23 ms
Sequential 1 MB (after first)	0 ms*	0 ms*	6.67 ms	0.2 ms	6.87 ms
Cache hit 4 KB	0 ms	0 ms	0 ms	0.2 ms	0.2 ms

*Sequential access: First I/O pays seek + rotation; subsequent reads are transfer-limited.

IOPS Calculations:

From access times, we can calculate maximum I/O operations per second:

Random IOPS = 1000 ms / access_time
Random 4 KB IOPS = 1000 / 13.4 ≈ 75 IOPS

Throughput Calculations:

Random Throughput = IOPS × I/O size
Random 4 KB Throughput = 75 × 4 KB = 300 KB/s = 0.3 MB/s

Sequential Throughput:

Sequential Throughput ≈ Transfer Rate = 150 MB/s (average)
Sequential IOPS (4 KB) = 150 MB/s / 4 KB ≈ 38,400 IOPS

The 500× Difference:

Random vs Sequential IOPS: 75 vs 38,400 (>500× difference!) This massive gap explains why databases work so hard to convert random access to sequential.

Real-World Considerations

These calculations assume ideal conditions. Real-world performance is affected by: command queuing and reordering (improves random by bundling), cache hits (eliminates mechanical latency), concurrent access (queue depth affects throughput), controller saturation (overhead increases under load), and workload mix (random + sequential simultaneously).

Command Queueing and Parallelism

Modern drives and interfaces support command queueing, allowing multiple I/O requests to be outstanding simultaneously. This enables the drive to optimize access order and significantly improves effective performance.

Native Command Queueing (NCQ) - SATA:

Supports up to 32 outstanding commands
Drive reorders commands to minimize seek distance
Host can issue commands without waiting for previous completion
Dramatically improves random I/O throughput

Tagged Command Queueing (TCQ) - SAS:

Supports up to 256 outstanding commands
Full command scheduling control
Better suited for enterprise multi-path environments

Queue Depth Impact on Random IOPS
Queue Depth	Effective Behavior	Approx IOPS Improvement
1	Strictly sequential execution	Baseline (~75 IOPS)
4	Some reordering possible	1.5-2× (~120 IOPS)
16	Significant optimization opportunity	2-3× (~180 IOPS)
32	Maximum NCQ utilization	2.5-4× (~200-300 IOPS)
256+	Saturates drive capability	Limited by drive mechanics

How Queueing Reduces Access Time:

Seek Reduction: Commands sorted by cylinder for elevator-style access
Rotational Optimization: Issue command for sector arriving soonest
Write Combining: Coalesce adjacent writes
Read-Ahead: Speculatively fetch following sectors

Database Implications:

High queue depth benefits database workloads by:

Reducing effective latency for parallel queries
Improving throughput during bulk operations
Enable asynchronous I/O patterns

However, fairness concerns arise:

One query's heavy I/O can delay others
I/O priority and scheduling become important
Some databases limit queue depth per connection

Asynchronous I/O:

Databases often use asynchronous I/O (AIO) to maximize queue depth:

// Synchronous (blocks until complete)
read(fd, buffer, size);

// Asynchronous (returns immediately, checks later)
io_submit(ctx, 1, &iocb);
// ... do other work ...
io_getevents(ctx, 1, 1, events, timeout);

Optimizing Queue Depth

For HDD-based database systems: use queue depth 8-32 for random workloads, enable NCQ/TCQ at drive level, use asynchronous I/O or multiple I/O threads, monitor I/O wait times to detect queuing bottlenecks. Too deep a queue can increase latency variance without throughput gains.

Database Design Implications

Understanding access time components directly informs database architecture and optimization decisions. Each component's characteristics drive specific design patterns.

Design Patterns Driven by Access Time Components:

Seek Time Minimization:

Clustered Indexes: Keep frequently accessed together data physically adjacent
Table Partitioning: Localize access within partitions
Extent-Based Allocation: Allocate contiguous disk space
Index Organization: B+-trees minimize seek count (log fanout)
Denormalization: Reduce joins that require multiple seeks

Rotational Latency Mitigation:

Large Page Sizes: Read more data per rotation
Read-Ahead/Prefetch: Predict and fetch upcoming pages
Asynchronous I/O: Hide latency with overlapped operations
Buffer Pool: Cache pages to avoid repeated reads

Access Time Components and Database Features
Component	Optimization Strategy	Database Feature
Seek Time	Minimize seeks	Clustered indexes, table clustering
Seek Time	Reduce seek distance	Partitioning, short-stroking
Seek Time	Reorder seeks	Query optimizer cost models
Rotational Latency	Read more per access	Multi-page I/O, large extents
Rotational Latency	Predict access	Sequential prefetch, read-ahead
Transfer Time	Maximize throughput	Sequential scan for analytics
Transfer Time	Use fast zones	Place hot data on outer tracks
All Components	Avoid disk entirely	Buffer pool, caching, in-memory DB

Query Optimizer Cost Models:

Query optimizers use disk access time models to choose execution plans:

Cost(Index Scan) = (index_levels × seek) + (index_pages × page_read)
                   + (data_pages × (seek + page_read))

Cost(Full Scan) = (initial_seek) + (table_pages × sequential_page_read)

The optimizer compares these costs to choose between:

Index scan (good when few rows, high selectivity)
Full table scan (good when many rows, low selectivity)

Break-Even Point:

At what selectivity does index scan beat full scan?

Depends on table size, clustering, and access times
Typically: 5-20% selectivity favors full scan
Highly clustered data: index remains efficient at higher selectivity

I/O Scheduling at Database Level:

Log Writer: Prioritize sequential log writes for durability
Checkpoint/Background Writer: Use low-priority I/O during idle periods
Query Reads: Balance between queries using I/O weight/priority

The Buffer Pool Effect

A well-tuned buffer pool with high hit rate can make disk access time nearly irrelevant for hot data. With 99% hit rate, effective average access time = 0.01 × 13.4 ms = 0.134 ms. This is why buffer pool sizing is often the single most important performance tuning parameter.

Summary: Access Time Components

We have completed a comprehensive examination of disk access time components and their implications for database design. Let's consolidate the key concepts:

Key Takeaways

•Access time = Seek + Rotation + Transfer + Overhead — For random I/O, seek and rotation dominate (~99%); for sequential, transfer dominates
•Seek time is the largest component — Average seek ~8-10 ms; full stroke ~15-25 ms; minimizing seeks is critical
•Rotational latency depends on RPM — 7200 RPM averages 4.17 ms; 15000 RPM averages 2.00 ms; cannot be predicted exactly
•Transfer time depends on zone — Outer zones 2× faster than inner; sequential access approaches transfer rate limit
•Random vs sequential: 500× difference — Random ~75 IOPS (0.3 MB/s); sequential ~150 MB/s; explains database design emphasis
•Command queueing improves effective IOPS — NCQ/TCQ enables reordering; 2-4× improvement possible with adequate queue depth
•Cache hits eliminate mechanical latency — Buffer pool hit rate is crucial; 99% hit rate changes effective latency dramatically
•Database design reflects access time physics — Clustered indexes, partitioning, B+-trees, read-ahead all minimize mechanical delays

Module Complete:

With this page, we have concluded Module 2: Disk Structure. You now possess a deep, comprehensive understanding of:

Physical disk construction (platters, heads, spindle, actuator)
Multi-surface organization (cylinders, head switching, parallel operations)
Disk surface layout (tracks, sectors, ZBR, Advanced Format)
Addressing schemes (CHS, LBA, translation, file system mapping)
Performance characteristics (access time components, IOPS, throughput)

This knowledge forms the foundation for understanding storage optimization in database systems—from buffer pool management to index design to query optimization.

Module Complete

Congratulations! You have mastered the physical structure of magnetic disk drives and understand how hardware constraints shape database storage design. This knowledge enables you to make informed decisions about storage configuration, data layout, and performance optimization. The principles learned here apply even as storage technologies evolve, as the fundamental tradeoffs between latency, throughput, and capacity remain central to database engineering.

5 / 5

Loading learning content...

Database Management SystemsDisk Structure

Disk Structure: Understanding Physical Storage

LevelIntermediate

Duration60 mins

TopicDisk Structure

5 / 5

Access Time Components

The Anatomy of Disk Latency

Why Access Time Matters:

Disk access time often dominates query execution time:

A CPU can execute billions of instructions per second
Main memory can be accessed in ~100 nanoseconds
A single random disk access takes ~10 milliseconds

This gap—100,000× slower than RAM—means that reducing disk access count and optimizing access patterns has outsized impact on database performance.

In this page, we dissect every component of disk access time, providing the quantitative foundation for understanding why databases are designed the way they are.

What You Will Learn

The Access Time Formula

The total time to access data on a disk is the sum of several components:

Total Access Time = Seek Time + Rotational Latency + Transfer Time + Controller Overhead

Component Breakdown:

Component	Description	Typical Value
Seek Time	Time for actuator to move heads to correct cylinder	3-15 ms
Rotational Latency	Time for platter to rotate desired sector under head	2-6 ms (average)
Transfer Time	Time to read/write the data	0.01-0.1 ms per sector
Controller Overhead	Command processing, bus transfer	0.1-0.5 ms

Order of Magnitude:

For typical random access:

Seek: ~8 ms
Rotation: ~4 ms
Transfer (1 sector): ~0.01 ms
Overhead: ~0.2 ms
Total: ~12 ms per random I/O

This means maximum random IOPS ≈ 1000 ms / 12 ms ≈ 80-100 IOPS per drive.

Access Time Component Comparison by Drive Type
Component	5400 RPM Desktop	7200 RPM Enterprise	15000 RPM Enterprise	SSD (Reference)
Average Seek Time	12-14 ms	8-10 ms	3-4 ms	N/A (0 ms)
Avg Rotational Latency	5.56 ms	4.17 ms	2.00 ms	N/A (0 ms)
Max Transfer Rate	~150 MB/s	~250 MB/s	~300 MB/s	500-7000 MB/s
Random Read Latency	~18 ms	~12 ms	~5 ms	~0.1 ms
Random IOPS	~50-60	~80-100	~180-200	10,000-1,000,000

The Dominance of Mechanical Latency:

For random access patterns:

Seek + rotation account for ~99% of access time
Transfer time is negligible for small reads
Reducing seeks and optimizing rotation are critical

For sequential access patterns:

Single initial seek + rotation
Subsequent accesses are transfer-time limited
Throughput approaches maximum transfer rate

The I/O Performance Equation

Seek Time in Depth

Seek Time Components:

The seek operation consists of three phases:

Acceleration — Voice coil motor accelerates the arm assembly
Coast/Slew — Arm travels at maximum velocity (for long seeks)
Deceleration and Settle — Arm slows down and precisely positions over target track

Seek Time Specifications:

Drive specifications include several seek time metrics:

Track-to-Track Seek: Moving to adjacent cylinder (~0.5-1 ms)
Average Seek: Statistical average across all possible seeks (~8-12 ms)
Full Stroke Seek: Innermost to outermost cylinder (~15-25 ms)
Read Seek vs Write Seek: Write may be slightly longer (more precise positioning required)

Seek Time by Distance
Seek Type	Cylinders Moved	Typical Time	Database Scenario
Track-to-Track	1	0.5-1.0 ms	Sequential scan crossing track
Short Seek	1-100	1-3 ms	Localized random access
Average Seek	~1/3 of max	8-12 ms	Random access (typical)
Long Seek	1/2 of max	12-18 ms	Cross-partition access
Full Stroke	Maximum	15-25 ms	Worst case (inner ↔ outer)

The Seek Time Formula:

Seek time as a function of distance is approximately:

T_seek = a + b × sqrt(d)

Where:

d = number of cylinders to traverse
a = fixed overhead (arm startup, settling)
b = coefficient based on arm speed

The square root relationship occurs because:

Arm accelerates for first half of distance
Arm decelerates for second half
Time = 2 × sqrt(2 × distance / acceleration)

Why Average Seek ≠ Half of Full Stroke:

For uniformly distributed random seeks, the average distance is one-third of the maximum distance, not one-half. This is because:

More cylinder pairs are close together than far apart
Integral calculation shows E[distance] = (max_cylinders) / 3

Therefore, average seek time is less than half of full stroke time.

Seek Optimization Strategies

Rotational Latency

Rotational latency (also called rotational delay) is the time waiting for the desired sector to rotate under the read/write head after the seek completes.

The Rotation Dynamics:

After the head reaches the target track, it must wait for the correct sector:

Best case: Sector arrives immediately (0 latency)
Worst case: Just missed, full rotation required
Average case: Half a rotation (statistically)

Calculating Rotational Latency:

Time per Revolution = 60,000 ms / RPM

Average Rotational Latency = (Time per Revolution) / 2

Rotational Latency by RPM
RPM	Time per Revolution	Avg Rotational Latency	Latency Range
4200 RPM	14.29 ms	7.14 ms	0 - 14.29 ms
5400 RPM	11.11 ms	5.56 ms	0 - 11.11 ms
7200 RPM	8.33 ms	4.17 ms	0 - 8.33 ms
10000 RPM	6.00 ms	3.00 ms	0 - 6.00 ms
15000 RPM	4.00 ms	2.00 ms	0 - 4.00 ms

Rotational Latency Variance:

Unlike seek time (predictable based on distance), rotational latency is essentially random:

Follows a uniform distribution from 0 to full rotation time
Cannot be predicted without knowing current sector position
Variance = (rotation_time)² / 12

The Rotational Position Sensing (RPS) Optimization:

Some drives and controllers implement RPS to reduce rotational latency waste:

When issuing a seek, controller calculates expected arrival time
Compares to target sector position at that time
May delay issuing the seek to minimize total time
Called "rotational position optimization" or "command ordering"

Head Switching and Rotation:

When switching heads within a cylinder:

New head position may not align with previous
Small rotational wait may be needed
Drive firmware accounts for this in access optimization

The Speed Premium

Transfer Time

Transfer time (also called data transfer time or read/write time) is the duration to transfer data between the platter surface and the drive's buffer, and subsequently to/from the host system.

Transfer Time Components:

Media Transfer — Data read from platter to drive buffer
Interface Transfer — Data moved from drive buffer to host memory

Typically, media transfer is the bottleneck for HDDs, while interface transfer may bottleneck older connections.

Calculating Transfer Time:

Transfer Time = (Data Size) / (Transfer Rate)

Where transfer rate depends on zone (outer zones are faster due to ZBR) and drive specifications.

Transfer Time Examples
Data Size	Transfer Rate	Transfer Time	Context
4 KB (1 sector)	200 MB/s	0.02 ms	Single database page
64 KB (16 sectors)	200 MB/s	0.32 ms	Read-ahead buffer
1 MB	200 MB/s	5.0 ms	Large LOB read
10 MB	200 MB/s	50 ms	Backup chunk
1 GB	200 MB/s	5 seconds	Large table scan segment

Zone-Dependent Transfer Rates:

Due to Zone Bit Recording (ZBR), transfer rates vary significantly:

Outer zones (low LBAs): 180-250 MB/s
Middle zones: 140-180 MB/s
Inner zones (high LBAs): 90-130 MB/s

This ~2x variation means:

Benchmark results depend heavily on test location
Data placement affects throughput
Short-stroking (using only outer zones) significantly improves sequential performance

Interface Bandwidth:

The drive interface must keep up with media transfer rates:

Interface	Max Bandwidth	Bottleneck Status
SATA I	150 MB/s	May bottleneck outer zones
SATA II	300 MB/s	Generally sufficient for HDDs
SATA III	600 MB/s	Ample headroom for HDDs
SAS-1	300 MB/s	Generally sufficient
SAS-2	600 MB/s	Ample headroom
SAS-3	1200 MB/s	Designed for SSDs
SAS-4	2400 MB/s	Designed for SSDs

Sequential Performance Optimization

Controller and Command Overhead

Beyond the mechanical components, additional time is consumed by command processing, data movement through the I/O stack, and controller operations.

Sources of Overhead:

Command Parsing — Drive interprets the SATA/SAS command
Buffer Management — Data moved to/from drive cache
ECC Processing — Error correction encoding/decoding
Host Interface — Protocol overhead (command queuing, response)
Operating System — I/O scheduler, driver, interrupt handling
File System — Block allocation lookup, metadata updates

Overhead Time Breakdown

•Drive Controller Processing: 50-200 µs — Command decode, buffer management, ECC
•Host Bus Adapter (HBA): 10-50 µs — Protocol handling, DMA setup
•Operating System: 20-100 µs — I/O scheduler, driver, context switches
•File System: 10-50 µs — Block map lookup, journaling
•Database Layer: 10-100 µs — Buffer pool management, lock acquisition

Total Overhead Impact:

Combined overhead is typically 0.1-0.5 ms, which is:

Negligible for random I/O (dominated by ~12 ms seek + rotation)
Significant for sequential I/O (can limit small-block sequential performance)
Critical for SSDs (overhead becomes dominant when mechanical latency is zero)

Cache Hit Shortcut:

When data is found in the drive's buffer cache:

Seek and rotation are eliminated
Response time is purely overhead (~0.1-0.3 ms)
Drive cache hit rate depends on workload locality and cache size

Write Cache and Synchronous Writes:

Writing behavior significantly affects overhead:

Write cache enabled: Data acknowledged when in drive buffer (~0.1 ms)
Write cache disabled / FUA: Data acknowledged after media write (~12 ms)
Database transaction safety usually requires disabled cache or battery backup

Write Ordering and Data Integrity

Access Time Calculations

Let's work through detailed access time calculations for various scenarios, building intuition for performance analysis.

Example Drive Specifications:

RPM: 7200
Average Seek: 9 ms
Max Transfer Rate: 200 MB/s (outer zone)
Min Transfer Rate: 100 MB/s (inner zone)
Average Transfer Rate: 150 MB/s
Sector Size: 4 KB
Controller Overhead: 0.2 ms

Rotational Latency:

Time per revolution = 60,000 / 7200 = 8.33 ms
Average rotational latency = 8.33 / 2 = 4.17 ms

Access Time Calculation Examples
Scenario	Seek	Rotation	Transfer	Overhead	Total
Random 4 KB read	9.0 ms	4.17 ms	0.03 ms	0.2 ms	13.4 ms
Random 64 KB read	9.0 ms	4.17 ms	0.43 ms	0.2 ms	13.8 ms
Random 1 MB read	9.0 ms	4.17 ms	6.67 ms	0.2 ms	20.0 ms
Sequential 4 KB (after first)	0 ms*	0 ms*	0.03 ms	0.2 ms	0.23 ms
Sequential 1 MB (after first)	0 ms*	0 ms*	6.67 ms	0.2 ms	6.87 ms
Cache hit 4 KB	0 ms	0 ms	0 ms	0.2 ms	0.2 ms

*Sequential access: First I/O pays seek + rotation; subsequent reads are transfer-limited.

IOPS Calculations:

From access times, we can calculate maximum I/O operations per second:

Random IOPS = 1000 ms / access_time
Random 4 KB IOPS = 1000 / 13.4 ≈ 75 IOPS

Throughput Calculations:

Random Throughput = IOPS × I/O size
Random 4 KB Throughput = 75 × 4 KB = 300 KB/s = 0.3 MB/s

Sequential Throughput:

Sequential Throughput ≈ Transfer Rate = 150 MB/s (average)
Sequential IOPS (4 KB) = 150 MB/s / 4 KB ≈ 38,400 IOPS

The 500× Difference:

Random vs Sequential IOPS: 75 vs 38,400 (>500× difference!) This massive gap explains why databases work so hard to convert random access to sequential.

Real-World Considerations

Command Queueing and Parallelism

Native Command Queueing (NCQ) - SATA:

Supports up to 32 outstanding commands
Drive reorders commands to minimize seek distance
Host can issue commands without waiting for previous completion
Dramatically improves random I/O throughput

Tagged Command Queueing (TCQ) - SAS:

Supports up to 256 outstanding commands
Full command scheduling control
Better suited for enterprise multi-path environments

Queue Depth Impact on Random IOPS
Queue Depth	Effective Behavior	Approx IOPS Improvement
1	Strictly sequential execution	Baseline (~75 IOPS)
4	Some reordering possible	1.5-2× (~120 IOPS)
16	Significant optimization opportunity	2-3× (~180 IOPS)
32	Maximum NCQ utilization	2.5-4× (~200-300 IOPS)
256+	Saturates drive capability	Limited by drive mechanics

How Queueing Reduces Access Time:

Seek Reduction: Commands sorted by cylinder for elevator-style access
Rotational Optimization: Issue command for sector arriving soonest
Write Combining: Coalesce adjacent writes
Read-Ahead: Speculatively fetch following sectors

Database Implications:

High queue depth benefits database workloads by:

Reducing effective latency for parallel queries
Improving throughput during bulk operations
Enable asynchronous I/O patterns

However, fairness concerns arise:

One query's heavy I/O can delay others
I/O priority and scheduling become important
Some databases limit queue depth per connection

Asynchronous I/O:

Databases often use asynchronous I/O (AIO) to maximize queue depth:

// Synchronous (blocks until complete)
read(fd, buffer, size);

// Asynchronous (returns immediately, checks later)
io_submit(ctx, 1, &iocb);
// ... do other work ...
io_getevents(ctx, 1, 1, events, timeout);

Optimizing Queue Depth

Database Design Implications

Understanding access time components directly informs database architecture and optimization decisions. Each component's characteristics drive specific design patterns.

Design Patterns Driven by Access Time Components:

Seek Time Minimization:

Clustered Indexes: Keep frequently accessed together data physically adjacent
Table Partitioning: Localize access within partitions
Extent-Based Allocation: Allocate contiguous disk space
Index Organization: B+-trees minimize seek count (log fanout)
Denormalization: Reduce joins that require multiple seeks

Rotational Latency Mitigation:

Large Page Sizes: Read more data per rotation
Read-Ahead/Prefetch: Predict and fetch upcoming pages
Asynchronous I/O: Hide latency with overlapped operations
Buffer Pool: Cache pages to avoid repeated reads

Access Time Components and Database Features
Component	Optimization Strategy	Database Feature
Seek Time	Minimize seeks	Clustered indexes, table clustering
Seek Time	Reduce seek distance	Partitioning, short-stroking
Seek Time	Reorder seeks	Query optimizer cost models
Rotational Latency	Read more per access	Multi-page I/O, large extents
Rotational Latency	Predict access	Sequential prefetch, read-ahead
Transfer Time	Maximize throughput	Sequential scan for analytics
Transfer Time	Use fast zones	Place hot data on outer tracks
All Components	Avoid disk entirely	Buffer pool, caching, in-memory DB

Query Optimizer Cost Models:

Query optimizers use disk access time models to choose execution plans:

Cost(Index Scan) = (index_levels × seek) + (index_pages × page_read)
                   + (data_pages × (seek + page_read))

Cost(Full Scan) = (initial_seek) + (table_pages × sequential_page_read)

The optimizer compares these costs to choose between:

Index scan (good when few rows, high selectivity)
Full table scan (good when many rows, low selectivity)

Break-Even Point:

At what selectivity does index scan beat full scan?

Depends on table size, clustering, and access times
Typically: 5-20% selectivity favors full scan
Highly clustered data: index remains efficient at higher selectivity

I/O Scheduling at Database Level:

Log Writer: Prioritize sequential log writes for durability
Checkpoint/Background Writer: Use low-priority I/O during idle periods
Query Reads: Balance between queries using I/O weight/priority

The Buffer Pool Effect

Summary: Access Time Components

We have completed a comprehensive examination of disk access time components and their implications for database design. Let's consolidate the key concepts:

Key Takeaways

•Access time = Seek + Rotation + Transfer + Overhead — For random I/O, seek and rotation dominate (~99%); for sequential, transfer dominates
•Seek time is the largest component — Average seek ~8-10 ms; full stroke ~15-25 ms; minimizing seeks is critical
•Rotational latency depends on RPM — 7200 RPM averages 4.17 ms; 15000 RPM averages 2.00 ms; cannot be predicted exactly
•Transfer time depends on zone — Outer zones 2× faster than inner; sequential access approaches transfer rate limit
•Random vs sequential: 500× difference — Random ~75 IOPS (0.3 MB/s); sequential ~150 MB/s; explains database design emphasis
•Command queueing improves effective IOPS — NCQ/TCQ enables reordering; 2-4× improvement possible with adequate queue depth
•Cache hits eliminate mechanical latency — Buffer pool hit rate is crucial; 99% hit rate changes effective latency dramatically
•Database design reflects access time physics — Clustered indexes, partitioning, B+-trees, read-ahead all minimize mechanical delays

Module Complete:

With this page, we have concluded Module 2: Disk Structure. You now possess a deep, comprehensive understanding of:

Physical disk construction (platters, heads, spindle, actuator)
Multi-surface organization (cylinders, head switching, parallel operations)
Disk surface layout (tracks, sectors, ZBR, Advanced Format)
Addressing schemes (CHS, LBA, translation, file system mapping)
Performance characteristics (access time components, IOPS, throughput)

This knowledge forms the foundation for understanding storage optimization in database systems—from buffer pool management to index design to query optimization.

Module Complete

5 / 5