Io Software Layers - Learning Module

Loading content...

0/240

Device-Independent Software

The Universal Translator

Consider the remarkable fact that a single read() system call works identically whether you're reading from a hard disk, an SSD, a USB drive, a network socket, or a keyboard. The application doesn't need to know the hardware details—it simply requests bytes, and bytes arrive. This magical uniformity isn't accidental; it's the result of careful engineering in the device-independent I/O software layer.

This layer sits between user-level I/O code and the device-specific drivers below. Its purpose is to provide common services that all devices need, implementing them once rather than duplicating logic across dozens of drivers. It's the layer that transforms the chaotic diversity of hardware into a unified, coherent abstraction.

What You Will Learn

By completing this page, you will understand the responsibilities of device-independent I/O software: uniform naming through device files, access control and protection mechanisms, device-independent buffering and caching, error reporting and handling, block size management, and the critical task of device allocation and scheduling. These concepts are fundamental to understanding how operating systems provide consistent I/O services.

Why Device Independence Matters

The principle of device independence is one of the most important abstractions in operating system design. It means that programs can be written without knowledge of which specific physical device they'll use, and can often be redirected to different devices without modification.

Historical Context:

In early computing, programs were tied to specific hardware. A program written for one disk model wouldn't work with another. This was a nightmare for software development and maintenance. The Unix revolution of the 1970s introduced a radical simplification: everything is a file.

Benefits of Device Independence

•Portability — Programs work across different hardware configurations without recompilation
•Flexibility — Devices can be swapped, upgraded, or virtualized transparently
•Simplicity — Application developers don't need to understand hardware details
•Composition — Programs can be connected via pipes regardless of underlying devices
•Maintainability — Device-specific code is isolated in drivers, not spread throughout applications
•Testing — Programs can be tested with simulated devices or logs redirected to files

The Device-Independent Layer's Mission:

The device-independent layer has a dual responsibility:

Upward Interface: Present a uniform API to user-level software—the same system calls work for all devices
Downward Interface: Provide a standardized framework for device drivers—common infrastructure they all can use

This layer is the bridge that connects the generic world of applications with the specific world of hardware.

Converting Mermaid diagram...

Uniform Naming and Device Files

One of the most elegant aspects of Unix-style device independence is the use of the filesystem namespace for device naming. Instead of special APIs for each device type, devices appear as files in the /dev directory. This allows standard file operations to work on devices:

# Write to a device using cat
cat data.txt > /dev/sda1

# Read from a device
dd if=/dev/sda of=backup.img bs=1M

# Interact with terminals
echo "Hello" > /dev/pts/0

# Access random numbers
head -c 32 /dev/urandom | base64

Device File Anatomy:

Device files in /dev are special files that represent devices rather than storing data. They're characterized by two numbers:

Device File Components
Component	Purpose	Example
Major Number	Identifies the device driver to use	8 = SCSI disk driver
Minor Number	Identifies the specific device instance	0 = first disk, 1 = second disk
Device Type	Block (buffered) or Character (unbuffered)	`b` for disks, `c` for terminals

device_files.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# List device files with details
$ ls -la /dev/sd* /dev/tty0 /dev/null /dev/zero 2>/dev/null
 
brw-rw---- 1 root disk 8, 0 Jan 15 10:00 /dev/sda
brw-rw---- 1 root disk 8, 1 Jan 15 10:00 /dev/sda1
brw-rw---- 1 root disk 8, 2 Jan 15 10:00 /dev/sda2
crw-rw-rw- 1 root root 1, 3 Jan 15 10:00 /dev/null
crw-rw-rw- 1 root root 1, 5 Jan 15 10:00 /dev/zero
crw--w---- 1 root tty  4, 0 Jan 15 10:00 /dev/tty0
 
# Reading the output:
# First character: 'b' = block device, 'c' = character device
# Major,Minor numbers shown before date
# 
# /dev/sda: Major 8 (SCSI), Minor 0 (first disk)
# /dev/sda1: Major 8, Minor 1 (first partition)
# /dev/null: Major 1 (mem), Minor 3 (null sink)
 
# Create a device file manually (requires root)
$ mknod /dev/my_device c 250 0
$ ls -la /dev/my_device
crw-r--r-- 1 root root 250, 0 Jan 15 10:05 /dev/my_device

The /dev Hierarchy:

Modern Linux systems organize device files into logical groups:

Common Device File Categories
Path Pattern	Device Type	Example
/dev/sd*	SCSI/SATA disks and partitions	/dev/sda, /dev/sda1
/dev/nvme*	NVMe SSDs	/dev/nvme0n1, /dev/nvme0n1p1
/dev/tty*	Terminals and consoles	/dev/tty0, /dev/ttyUSB0
/dev/pts/*	Pseudo-terminals (SSH, tmux)	/dev/pts/0, /dev/pts/1
/dev/loop*	Loop devices (mount files as disks)	/dev/loop0
/dev/null	Data sink (discards all input)	Write goes nowhere
/dev/zero	Zero source (infinite zero bytes)	Read returns 0x00
/dev/random, /dev/urandom	Random number generators	Cryptographic randomness

Dynamic Device Management: udev

Modern Linux systems use udev (or systemd-udevd) to dynamically create and remove device files as hardware is attached and detached. When you plug in a USB drive, udev receives a kernel event, runs its matching rules, and creates the appropriate /dev entries. This replaces the old static device file approach and enables hot-plugging without manual intervention.

Protection and Access Control

Because devices appear as files, Unix leverages the standard file permission system for device access control. This elegant design reuses existing mechanisms rather than inventing special device permission systems.

Device Permissions in Practice:

Device files have the same permission structure as regular files: owner, group, and other, each with read, write, and execute bits. However, the meaning differs slightly for devices:

Permission Meanings for Devices
Permission	Block Devices	Character Devices
Read (r)	Can read device contents	Can receive input from device
Write (w)	Can write to device	Can send output to device
Execute (x)	Not typically meaningful	Not typically meaningful

Group-Based Access:

Linux uses groups to manage device access. Users in specific groups gain access to related devices:

device_permissions.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# View device permissions and ownership
$ ls -la /dev/sda /dev/tty1 /dev/audio 2>/dev/null
 
brw-rw---- 1 root disk  8, 0 Jan 15 10:00 /dev/sda
crw--w---- 1 root tty   4, 1 Jan 15 10:00 /dev/tty1
crw-rw---- 1 root audio 14, 4 Jan 15 10:00 /dev/audio
 
# /dev/sda: Only root and 'disk' group members can read/write
# /dev/tty1: Root can read/write, tty group can write
# /dev/audio: Root and 'audio' group can read/write
 
# Check your groups
$ groups
user disk audio video
 
# Add user to a device group (as root)
$ usermod -aG disk username
# User must log out and back in for new group to take effect
 
# Temporary permission elevation for programs (setgid)
$ ls -la /usr/bin/ssh-agent
-rwxr-xr-x 1 root root 309K Jan 10 10:00 /usr/bin/ssh-agent
 
# Programs can be setgid to run with group privileges
$ chmod g+s /usr/bin/write
$ ls -la /usr/bin/write
-rwxr-sr-x 1 root tty 19K Jan 10 10:00 /usr/bin/write

Why Device Protection Matters:

Device access control is critical for system security:

Security Implications of Device Access

•Raw disk access bypasses filesystem permissions — Reading /dev/sda directly can access any file on the disk, even those you wouldn't normally have permission to read
•Terminal access allows session hijacking — Write access to another user's terminal can inject commands into their session
•Memory devices expose kernel secrets — /dev/mem and /dev/kmem (now restricted) provided direct RAM access
•GPU access enables DMA attacks — Video devices can access system memory through Direct Memory Access
•USB devices can be security risks — BadUSB attacks exploit overly permissive USB device access

The Principle of Least Privilege

Applications should only have access to the devices they need. Modern container systems like Docker carefully control which devices are exposed inside containers. The --device flag explicitly passes specific devices, while --privileged mode (which grants access to all devices) should be avoided in production.

Device-Independent Buffering

While user-level libraries implement buffering to reduce system calls, the kernel's device-independent layer implements another level of buffering with different goals. This kernel buffering serves several purposes that transcend individual devices:

Why Kernel Buffering?

Purposes of Kernel-Level Buffering

•Speed Mismatch Absorption — Producers and consumers operate at different rates; buffers smooth the flow
•Data Aggregation — Small writes can be combined into efficient larger transfers
•Copy Reduction — Data can sometimes stay in kernel buffers, reducing CPU cycles
•Decoupling — Applications don't block waiting for slow devices; they write to buffers and continue
•Read-Ahead — The kernel can prefetch data it predicts will be needed soon
•Write-Behind — Data can be written to fast buffers immediately, then flushed to slow devices later

The Buffer Cache Architecture:

Linux maintains a sophisticated caching system that operates at the block device level:

Converting Mermaid diagram...

Page Cache vs Buffer Cache:

Historically, Unix systems had separate caches:

Buffer Cache: Cached raw disk blocks by block number
Page Cache: Cached file contents by file and offset

Modern Linux unifies these—the page cache is primary, and the buffer cache is now essentially a view into the page cache for block-aligned access. This eliminates the old problem of double-caching where the same data existed in both caches.

check_buffers.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# View memory usage including buffer/cache
$ free -h
              total        used        free      shared  buff/cache   available
Mem:           31Gi       8.2Gi       1.5Gi       412Mi        21Gi        22Gi
Swap:         8.0Gi       256Mi       7.7Gi
 
# The "buff/cache" column shows kernel buffering
# "available" = free + reclaimable cache
 
# Detailed view from /proc/meminfo
$ grep -E "^(Buffers|Cached|SwapCached):" /proc/meminfo
Buffers:          2531456 kB
Cached:          19847632 kB
SwapCached:        128456 kB
 
# Drop caches (for testing, not production!)
$ sync                            # Flush dirty buffers to disk first
$ echo 3 > /proc/sys/vm/drop_caches   # Drop page cache, dentries, inodes
 
# Watch I/O buffering in action
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0 262656 1567432 2531456 19847632 0    0     8    45  128  256  2  1 97  0  0
 0  0 262656 1567180 2531460 19847888 0    0     0    24  156  298  1  0 99  0  0

Write Barriers and Data Safety

Kernel buffering improves performance but can risk data loss. If the system crashes before dirty buffers are flushed, data is lost. File systems use write barriers to ensure critical metadata is persisted before dependent operations proceed. The sync command and fsync() system call force immediate flushing.

Block Size Independence

Different devices have different natural transfer sizes—their block sizes. A disk might use 512-byte sectors or 4KB blocks, while network interfaces transfer variable-sized packets, and keyboards send single characters. The device-independent layer hides these differences from applications.

The Block Size Challenge:

Natural Block Sizes by Device Type
Device Type	Typical Block Size	Character/Block
Old hard drives	512 bytes	Block
Modern hard drives (Advanced Format)	4096 bytes (4K native)	Block
SSDs (NVMe)	4096 bytes typical	Block
CD/DVD	2048 bytes	Block
Tape drives	Variable (up to 64KB+ records)	Block/Character
Terminals	1 character	Character
Network (Ethernet)	Up to 1500 bytes (MTU)	Character-like

How the Kernel Achieves Block Size Independence:

User requests arbitrary sizes: Applications can request any number of bytes (e.g., read 100 bytes, write 7 bytes)
Kernel translates to device blocks: The device-independent layer converts byte-oriented requests to block-oriented operations:
- Reading 100 bytes from offset 50 might require reading a 512-byte block, then extracting bytes 50-149
- Writing 7 bytes might require read-modify-write: read the block, modify 7 bytes, write it back
Buffering absorbs misalignment: Cache buffers hold complete blocks, so partial access doesn't immediately force device I/O

block_size_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <linux/fs.h>
#include <sys/ioctl.h>
 
int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <device>
", argv[0]);
        return 1;
    }
    
    int fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        perror("open");
        return 1;
    }
    
    // Get logical block size (what the device reports)
    int logical_block_size;
    if (ioctl(fd, BLKSSZGET, &logical_block_size) == -1) {
        perror("BLKSSZGET");
    } else {
        printf("Logical block size: %d bytes
", logical_block_size);
    }
    
    // Get physical block size (actual hardware sectors)
    int physical_block_size;
    if (ioctl(fd, BLKPBSZGET, &physical_block_size) == -1) {
        perror("BLKPBSZGET");
    } else {
        printf("Physical block size: %d bytes
", physical_block_size);
    }
    
    // Get optimal I/O size
    int optimal_io_size;
    if (ioctl(fd, BLKIOOPT, &optimal_io_size) == -1) {
        perror("BLKIOOPT");
    } else {
        printf("Optimal I/O size: %d bytes
", optimal_io_size);
    }
    
    // Get device size
    unsigned long long size;
    if (ioctl(fd, BLKGETSIZE64, &size) == -1) {
        perror("BLKGETSIZE64");
    } else {
        printf("Device size: %llu bytes (%.2f GB)
", 
               size, size / (1024.0 * 1024.0 * 1024.0));
    }
    
    close(fd);
    return 0;
}
 
/*
 * Example output for a modern NVMe drive:
 *
 * $ sudo ./block_info /dev/nvme0n1
 * Logical block size: 512 bytes
 * Physical block size: 512 bytes
 * Optimal I/O size: 0 bytes
 * Device size: 500107862016 bytes (465.76 GB)
 *
 * Note: Many 4Kn drives still report 512-byte logical
 * sectors for compatibility (512e emulation).
 */

Performance Implications of Block Alignment

While the kernel abstracts block sizes, aligned I/O is still more efficient. When applications access data in units matching the device block size and at block-aligned offsets, the kernel can avoid read-modify-write cycles. High-performance applications like databases often use O_DIRECT to bypass the kernel buffer cache and ensure aligned, block-sized transfers for maximum performance.

Device-Independent Error Reporting

I/O errors originate from diverse sources—mechanical failures, electrical interference, software bugs, resource exhaustion—yet applications need a consistent way to handle them. The device-independent layer provides error translation and reporting that abstracts away the hardware-specific details.

Error Categories in I/O:

I/O Error Categories and Examples
Category	Example Causes	Typical errno Values
Resource Errors	Out of disk space, too many open files	`ENOSPC`, `EMFILE`, `ENFILE`
Permission Errors	Access denied, read-only filesystem	`EACCES`, `EPERM`, `EROFS`
Media Errors	Bad sectors, corrupted data	`EIO`, `ENXIO`
Network Errors	Connection reset, host unreachable	`ECONNRESET`, `EHOSTUNREACH`
Protocol Errors	Invalid argument, broken pipe	`EINVAL`, `EPIPE`
Transient Errors	Resource temporarily unavailable	`EAGAIN`, `EINTR`

Error Translation:

Device drivers report errors using device-specific codes. The device-independent layer translates these to standard errno values. For example:

A SCSI disk might report MEDIUM_ERROR → translated to EIO
An NVMe drive might report DATA_TRANSFER_ERROR → translated to EIO
A network interface might report CARRIER_LOST → translated to ENETDOWN

This translation allows applications to handle errors without understanding the specifics of each device technology.

robust_io.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
 
/*
 * Robust I/O: Handle transient errors with retry logic
 */
 
/* Retry read() on EINTR and optionally EAGAIN */
ssize_t robust_read(int fd, void *buf, size_t count, int max_retries) {
    ssize_t result;
    int retries = 0;
    
    while (1) {
        result = read(fd, buf, count);
        
        if (result >= 0) {
            return result;  // Success
        }
        
        // Handle retryable errors
        if (errno == EINTR) {
            // Interrupted by signal, always retry
            continue;
        }
        
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            // Non-blocking I/O would block
            if (++retries < max_retries) {
                usleep(1000);  // Brief delay before retry
                continue;
            }
        }
        
        // Non-retryable error or max retries exceeded
        return -1;
    }
}
 
/* Robust write that handles partial writes */
ssize_t robust_write_all(int fd, const void *buf, size_t count) {
    const char *ptr = (const char *)buf;
    size_t remaining = count;
    
    while (remaining > 0) {
        ssize_t written = write(fd, ptr, remaining);
        
        if (written < 0) {
            if (errno == EINTR) {
                continue;  // Retry on interrupt
            }
            return -1;  // Real error
        }
        
        if (written == 0) {
            // Unusual: write returned 0
            // Could indicate device error or resource issue
            errno = EIO;
            return -1;
        }
        
        ptr += written;
        remaining -= written;
    }
    
    return count;  // All bytes written
}
 
/* Error classification for handling decisions */
typedef enum {
    ERROR_CLASS_RETRYABLE,
    ERROR_CLASS_PERMISSION,
    ERROR_CLASS_NOT_FOUND,
    ERROR_CLASS_RESOURCE,
    ERROR_CLASS_FATAL
} ErrorClass;
 
ErrorClass classify_error(int err) {
    switch (err) {
        case EINTR:
        case EAGAIN:
        case EWOULDBLOCK:
            return ERROR_CLASS_RETRYABLE;
            
        case EACCES:
        case EPERM:
        case EROFS:
            return ERROR_CLASS_PERMISSION;
            
        case ENOENT:
        case ENXIO:
        case ENODEV:
            return ERROR_CLASS_NOT_FOUND;
            
        case ENOSPC:
        case EMFILE:
        case ENFILE:
        case ENOMEM:
            return ERROR_CLASS_RESOURCE;
            
        default:
            return ERROR_CLASS_FATAL;
    }
}

EIO: The Catch-All Error

EIO (I/O Error) is often used when no more specific error applies. When you see EIO, the actual cause could be: bad disk sectors, cable problems, driver bugs, overheating, or dozens of other issues. Check dmesg or syslog for the actual hardware-level error message that preceded the EIO.

Device Allocation and Scheduling

Some devices require exclusive access—only one process can use them at a time. Other devices can be shared, with multiple processes using them concurrently. The device-independent layer manages device allocation and, when multiple requests compete, schedules access fairly and efficiently.

Device Classification by Shareability:

Dedicated (Exclusive) Devices

•Tape drives — Sequential access only
•Printers — Output integrity requires exclusivity
•CD/DVD burners — Writing requires exclusive control
•Some specialized sensors
•Capture cards for video

Shared Devices

•Disks — Multiple processes read/write
•Network interfaces — Packet multiplexing
•Terminals — Multiple virtual consoles
•Sound cards — Audio mixing
•GPUs — Multiple render contexts

I/O Scheduling:

When multiple processes compete for a shared device (especially disks), the kernel schedules their requests to optimize:

Throughput — Maximize bytes transferred per second
Latency — Minimize wait time for individual requests
Fairness — Prevent starvation of any process

Linux provides multiple I/O schedulers (selectable per-device):

Linux I/O Schedulers
Scheduler	Algorithm	Best For
`none` (noop)	FIFO queue, no reordering	NVMe and SSDs (fast random access)
`mq-deadline`	Two deadline queues (read/write)	Databases, mixed workloads
`bfq` (Budget Fair Queueing)	Per-process budgets with low latency	Desktop, interactive workloads
`kyber`	Token-based, targets latency SLOs	High-performance SSDs

io_scheduler.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# View current I/O scheduler for a device
$ cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
 
# The one in brackets is currently active
 
# Change scheduler (requires root)
$ echo bfq > /sys/block/sda/queue/scheduler
$ cat /sys/block/sda/queue/scheduler
mq-deadline kyber [bfq] none
 
# View scheduler statistics
$ cat /sys/block/sda/queue/stat
    8652     1234    234567     12345     5678      456    123456      9876
 
# Fields: reads merged, sectors read, ms reading, writes merged, etc.
 
# Monitor I/O per process
$ sudo iotop -o
Total DISK READ:         5.23 M/s | Total DISK WRITE:        12.45 M/s
Actual DISK READ:        4.98 M/s | Actual DISK WRITE:       10.23 M/s
    PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  12345 be/4  postgres     4.50 M/s    8.20 M/s  0.00 %  4.12 %  postgres
  23456 be/4  mysql        0.00 B/s    3.50 M/s  0.00 %  1.23 %  mysqld

SSD vs HDD Scheduling

SSDs have fundamentally different characteristics than HDDs—no seek time, uniform access latency across addresses. For SSDs and NVMe drives, the none (noop) scheduler is often optimal because the device itself has sophisticated internal scheduling. The kernel's scheduler becomes overhead without benefit. HDDs, with their seek-dependent performance, benefit from mq-deadline or bfq.

Summary: The Device-Independent Layer

The device-independent I/O software layer is the unsung hero of the I/O stack. It transforms the chaos of diverse hardware into a uniform, coherent interface that applications can rely upon. Let's consolidate the key concepts:

Key Takeaways

•Device independence enables portability — Programs work across different hardware without modification, thanks to uniform system call interfaces.
•Device files unify naming — The /dev filesystem integrates devices into the standard file namespace, enabling powerful Unix composition with pipes and redirection.
•Standard permissions protect devices — Unix file permissions apply to device files, controlling who can access sensitive hardware.
•Kernel buffering optimizes throughput — The page cache and buffer cache decouple application I/O rates from device speeds, enabling efficient data flow.
•Block size abstraction simplifies access — Applications use byte-oriented I/O while the kernel translates to device-specific block operations.
•Unified error handling — Device-specific errors are translated to standard errno values, allowing consistent application error handling.
•I/O scheduling balances competing requests — The kernel schedules device access for throughput, latency, and fairness.

What's Next:

Below the device-independent layer lies the realm of device drivers—the device-specific code that actually communicates with hardware. In the next page, we'll explore driver architecture, interfaces, development practices, and the critical role drivers play in system stability.

Page Complete

You now understand how the device-independent layer provides uniform I/O services: naming through device files, access control via permissions, buffering for performance, block size abstraction, consistent error reporting, and fair device scheduling. This layer is the foundation upon which portable, efficient I/O applications are built.

Device-Independent Software

The Universal Translator

What You Will Learn

Why Device Independence Matters

Historical Context:

Benefits of Device Independence

•Portability — Programs work across different hardware configurations without recompilation
•Flexibility — Devices can be swapped, upgraded, or virtualized transparently
•Simplicity — Application developers don't need to understand hardware details
•Composition — Programs can be connected via pipes regardless of underlying devices
•Maintainability — Device-specific code is isolated in drivers, not spread throughout applications
•Testing — Programs can be tested with simulated devices or logs redirected to files

The Device-Independent Layer's Mission:

The device-independent layer has a dual responsibility:

Upward Interface: Present a uniform API to user-level software—the same system calls work for all devices
Downward Interface: Provide a standardized framework for device drivers—common infrastructure they all can use

This layer is the bridge that connects the generic world of applications with the specific world of hardware.

Converting Mermaid diagram...

Uniform Naming and Device Files

# Write to a device using cat
cat data.txt > /dev/sda1

# Read from a device
dd if=/dev/sda of=backup.img bs=1M

# Interact with terminals
echo "Hello" > /dev/pts/0

# Access random numbers
head -c 32 /dev/urandom | base64

Device File Anatomy:

Device files in /dev are special files that represent devices rather than storing data. They're characterized by two numbers:

Device File Components
Component	Purpose	Example
Major Number	Identifies the device driver to use	8 = SCSI disk driver
Minor Number	Identifies the specific device instance	0 = first disk, 1 = second disk
Device Type	Block (buffered) or Character (unbuffered)	`b` for disks, `c` for terminals

device_files.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# List device files with details
$ ls -la /dev/sd* /dev/tty0 /dev/null /dev/zero 2>/dev/null
 
brw-rw---- 1 root disk 8, 0 Jan 15 10:00 /dev/sda
brw-rw---- 1 root disk 8, 1 Jan 15 10:00 /dev/sda1
brw-rw---- 1 root disk 8, 2 Jan 15 10:00 /dev/sda2
crw-rw-rw- 1 root root 1, 3 Jan 15 10:00 /dev/null
crw-rw-rw- 1 root root 1, 5 Jan 15 10:00 /dev/zero
crw--w---- 1 root tty  4, 0 Jan 15 10:00 /dev/tty0
 
# Reading the output:
# First character: 'b' = block device, 'c' = character device
# Major,Minor numbers shown before date
# 
# /dev/sda: Major 8 (SCSI), Minor 0 (first disk)
# /dev/sda1: Major 8, Minor 1 (first partition)
# /dev/null: Major 1 (mem), Minor 3 (null sink)
 
# Create a device file manually (requires root)
$ mknod /dev/my_device c 250 0
$ ls -la /dev/my_device
crw-r--r-- 1 root root 250, 0 Jan 15 10:05 /dev/my_device

The /dev Hierarchy:

Modern Linux systems organize device files into logical groups:

Common Device File Categories
Path Pattern	Device Type	Example
/dev/sd*	SCSI/SATA disks and partitions	/dev/sda, /dev/sda1
/dev/nvme*	NVMe SSDs	/dev/nvme0n1, /dev/nvme0n1p1
/dev/tty*	Terminals and consoles	/dev/tty0, /dev/ttyUSB0
/dev/pts/*	Pseudo-terminals (SSH, tmux)	/dev/pts/0, /dev/pts/1
/dev/loop*	Loop devices (mount files as disks)	/dev/loop0
/dev/null	Data sink (discards all input)	Write goes nowhere
/dev/zero	Zero source (infinite zero bytes)	Read returns 0x00
/dev/random, /dev/urandom	Random number generators	Cryptographic randomness

Dynamic Device Management: udev

Protection and Access Control

Device Permissions in Practice:

Device files have the same permission structure as regular files: owner, group, and other, each with read, write, and execute bits. However, the meaning differs slightly for devices:

Permission Meanings for Devices
Permission	Block Devices	Character Devices
Read (r)	Can read device contents	Can receive input from device
Write (w)	Can write to device	Can send output to device
Execute (x)	Not typically meaningful	Not typically meaningful

Group-Based Access:

Linux uses groups to manage device access. Users in specific groups gain access to related devices:

device_permissions.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# View device permissions and ownership
$ ls -la /dev/sda /dev/tty1 /dev/audio 2>/dev/null
 
brw-rw---- 1 root disk  8, 0 Jan 15 10:00 /dev/sda
crw--w---- 1 root tty   4, 1 Jan 15 10:00 /dev/tty1
crw-rw---- 1 root audio 14, 4 Jan 15 10:00 /dev/audio
 
# /dev/sda: Only root and 'disk' group members can read/write
# /dev/tty1: Root can read/write, tty group can write
# /dev/audio: Root and 'audio' group can read/write
 
# Check your groups
$ groups
user disk audio video
 
# Add user to a device group (as root)
$ usermod -aG disk username
# User must log out and back in for new group to take effect
 
# Temporary permission elevation for programs (setgid)
$ ls -la /usr/bin/ssh-agent
-rwxr-xr-x 1 root root 309K Jan 10 10:00 /usr/bin/ssh-agent
 
# Programs can be setgid to run with group privileges
$ chmod g+s /usr/bin/write
$ ls -la /usr/bin/write
-rwxr-sr-x 1 root tty 19K Jan 10 10:00 /usr/bin/write

Why Device Protection Matters:

Device access control is critical for system security:

Security Implications of Device Access

•Raw disk access bypasses filesystem permissions — Reading /dev/sda directly can access any file on the disk, even those you wouldn't normally have permission to read
•Terminal access allows session hijacking — Write access to another user's terminal can inject commands into their session
•Memory devices expose kernel secrets — /dev/mem and /dev/kmem (now restricted) provided direct RAM access
•GPU access enables DMA attacks — Video devices can access system memory through Direct Memory Access
•USB devices can be security risks — BadUSB attacks exploit overly permissive USB device access

The Principle of Least Privilege

Device-Independent Buffering

Why Kernel Buffering?

Purposes of Kernel-Level Buffering

•Speed Mismatch Absorption — Producers and consumers operate at different rates; buffers smooth the flow
•Data Aggregation — Small writes can be combined into efficient larger transfers
•Copy Reduction — Data can sometimes stay in kernel buffers, reducing CPU cycles
•Decoupling — Applications don't block waiting for slow devices; they write to buffers and continue
•Read-Ahead — The kernel can prefetch data it predicts will be needed soon
•Write-Behind — Data can be written to fast buffers immediately, then flushed to slow devices later

The Buffer Cache Architecture:

Linux maintains a sophisticated caching system that operates at the block device level:

Converting Mermaid diagram...

Page Cache vs Buffer Cache:

Historically, Unix systems had separate caches:

Buffer Cache: Cached raw disk blocks by block number
Page Cache: Cached file contents by file and offset

check_buffers.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# View memory usage including buffer/cache
$ free -h
              total        used        free      shared  buff/cache   available
Mem:           31Gi       8.2Gi       1.5Gi       412Mi        21Gi        22Gi
Swap:         8.0Gi       256Mi       7.7Gi
 
# The "buff/cache" column shows kernel buffering
# "available" = free + reclaimable cache
 
# Detailed view from /proc/meminfo
$ grep -E "^(Buffers|Cached|SwapCached):" /proc/meminfo
Buffers:          2531456 kB
Cached:          19847632 kB
SwapCached:        128456 kB
 
# Drop caches (for testing, not production!)
$ sync                            # Flush dirty buffers to disk first
$ echo 3 > /proc/sys/vm/drop_caches   # Drop page cache, dentries, inodes
 
# Watch I/O buffering in action
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0 262656 1567432 2531456 19847632 0    0     8    45  128  256  2  1 97  0  0
 0  0 262656 1567180 2531460 19847888 0    0     0    24  156  298  1  0 99  0  0

Write Barriers and Data Safety

Block Size Independence

The Block Size Challenge:

Natural Block Sizes by Device Type
Device Type	Typical Block Size	Character/Block
Old hard drives	512 bytes	Block
Modern hard drives (Advanced Format)	4096 bytes (4K native)	Block
SSDs (NVMe)	4096 bytes typical	Block
CD/DVD	2048 bytes	Block
Tape drives	Variable (up to 64KB+ records)	Block/Character
Terminals	1 character	Character
Network (Ethernet)	Up to 1500 bytes (MTU)	Character-like

How the Kernel Achieves Block Size Independence:

User requests arbitrary sizes: Applications can request any number of bytes (e.g., read 100 bytes, write 7 bytes)
Kernel translates to device blocks: The device-independent layer converts byte-oriented requests to block-oriented operations:
- Reading 100 bytes from offset 50 might require reading a 512-byte block, then extracting bytes 50-149
- Writing 7 bytes might require read-modify-write: read the block, modify 7 bytes, write it back
Buffering absorbs misalignment: Cache buffers hold complete blocks, so partial access doesn't immediately force device I/O

block_size_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <linux/fs.h>
#include <sys/ioctl.h>
 
int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <device>
", argv[0]);
        return 1;
    }
    
    int fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        perror("open");
        return 1;
    }
    
    // Get logical block size (what the device reports)
    int logical_block_size;
    if (ioctl(fd, BLKSSZGET, &logical_block_size) == -1) {
        perror("BLKSSZGET");
    } else {
        printf("Logical block size: %d bytes
", logical_block_size);
    }
    
    // Get physical block size (actual hardware sectors)
    int physical_block_size;
    if (ioctl(fd, BLKPBSZGET, &physical_block_size) == -1) {
        perror("BLKPBSZGET");
    } else {
        printf("Physical block size: %d bytes
", physical_block_size);
    }
    
    // Get optimal I/O size
    int optimal_io_size;
    if (ioctl(fd, BLKIOOPT, &optimal_io_size) == -1) {
        perror("BLKIOOPT");
    } else {
        printf("Optimal I/O size: %d bytes
", optimal_io_size);
    }
    
    // Get device size
    unsigned long long size;
    if (ioctl(fd, BLKGETSIZE64, &size) == -1) {
        perror("BLKGETSIZE64");
    } else {
        printf("Device size: %llu bytes (%.2f GB)
", 
               size, size / (1024.0 * 1024.0 * 1024.0));
    }
    
    close(fd);
    return 0;
}
 
/*
 * Example output for a modern NVMe drive:
 *
 * $ sudo ./block_info /dev/nvme0n1
 * Logical block size: 512 bytes
 * Physical block size: 512 bytes
 * Optimal I/O size: 0 bytes
 * Device size: 500107862016 bytes (465.76 GB)
 *
 * Note: Many 4Kn drives still report 512-byte logical
 * sectors for compatibility (512e emulation).
 */

Performance Implications of Block Alignment

Device-Independent Error Reporting

Error Categories in I/O:

I/O Error Categories and Examples
Category	Example Causes	Typical errno Values
Resource Errors	Out of disk space, too many open files	`ENOSPC`, `EMFILE`, `ENFILE`
Permission Errors	Access denied, read-only filesystem	`EACCES`, `EPERM`, `EROFS`
Media Errors	Bad sectors, corrupted data	`EIO`, `ENXIO`
Network Errors	Connection reset, host unreachable	`ECONNRESET`, `EHOSTUNREACH`
Protocol Errors	Invalid argument, broken pipe	`EINVAL`, `EPIPE`
Transient Errors	Resource temporarily unavailable	`EAGAIN`, `EINTR`

Error Translation:

Device drivers report errors using device-specific codes. The device-independent layer translates these to standard errno values. For example:

A SCSI disk might report MEDIUM_ERROR → translated to EIO
An NVMe drive might report DATA_TRANSFER_ERROR → translated to EIO
A network interface might report CARRIER_LOST → translated to ENETDOWN

This translation allows applications to handle errors without understanding the specifics of each device technology.

robust_io.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
 
/*
 * Robust I/O: Handle transient errors with retry logic
 */
 
/* Retry read() on EINTR and optionally EAGAIN */
ssize_t robust_read(int fd, void *buf, size_t count, int max_retries) {
    ssize_t result;
    int retries = 0;
    
    while (1) {
        result = read(fd, buf, count);
        
        if (result >= 0) {
            return result;  // Success
        }
        
        // Handle retryable errors
        if (errno == EINTR) {
            // Interrupted by signal, always retry
            continue;
        }
        
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            // Non-blocking I/O would block
            if (++retries < max_retries) {
                usleep(1000);  // Brief delay before retry
                continue;
            }
        }
        
        // Non-retryable error or max retries exceeded
        return -1;
    }
}
 
/* Robust write that handles partial writes */
ssize_t robust_write_all(int fd, const void *buf, size_t count) {
    const char *ptr = (const char *)buf;
    size_t remaining = count;
    
    while (remaining > 0) {
        ssize_t written = write(fd, ptr, remaining);
        
        if (written < 0) {
            if (errno == EINTR) {
                continue;  // Retry on interrupt
            }
            return -1;  // Real error
        }
        
        if (written == 0) {
            // Unusual: write returned 0
            // Could indicate device error or resource issue
            errno = EIO;
            return -1;
        }
        
        ptr += written;
        remaining -= written;
    }
    
    return count;  // All bytes written
}
 
/* Error classification for handling decisions */
typedef enum {
    ERROR_CLASS_RETRYABLE,
    ERROR_CLASS_PERMISSION,
    ERROR_CLASS_NOT_FOUND,
    ERROR_CLASS_RESOURCE,
    ERROR_CLASS_FATAL
} ErrorClass;
 
ErrorClass classify_error(int err) {
    switch (err) {
        case EINTR:
        case EAGAIN:
        case EWOULDBLOCK:
            return ERROR_CLASS_RETRYABLE;
            
        case EACCES:
        case EPERM:
        case EROFS:
            return ERROR_CLASS_PERMISSION;
            
        case ENOENT:
        case ENXIO:
        case ENODEV:
            return ERROR_CLASS_NOT_FOUND;
            
        case ENOSPC:
        case EMFILE:
        case ENFILE:
        case ENOMEM:
            return ERROR_CLASS_RESOURCE;
            
        default:
            return ERROR_CLASS_FATAL;
    }
}

EIO: The Catch-All Error

Device Allocation and Scheduling

Device Classification by Shareability:

Dedicated (Exclusive) Devices

•Tape drives — Sequential access only
•Printers — Output integrity requires exclusivity
•CD/DVD burners — Writing requires exclusive control
•Some specialized sensors
•Capture cards for video

Shared Devices

•Disks — Multiple processes read/write
•Network interfaces — Packet multiplexing
•Terminals — Multiple virtual consoles
•Sound cards — Audio mixing
•GPUs — Multiple render contexts

I/O Scheduling:

When multiple processes compete for a shared device (especially disks), the kernel schedules their requests to optimize:

Throughput — Maximize bytes transferred per second
Latency — Minimize wait time for individual requests
Fairness — Prevent starvation of any process

Linux provides multiple I/O schedulers (selectable per-device):

Linux I/O Schedulers
Scheduler	Algorithm	Best For
`none` (noop)	FIFO queue, no reordering	NVMe and SSDs (fast random access)
`mq-deadline`	Two deadline queues (read/write)	Databases, mixed workloads
`bfq` (Budget Fair Queueing)	Per-process budgets with low latency	Desktop, interactive workloads
`kyber`	Token-based, targets latency SLOs	High-performance SSDs

io_scheduler.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# View current I/O scheduler for a device
$ cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
 
# The one in brackets is currently active
 
# Change scheduler (requires root)
$ echo bfq > /sys/block/sda/queue/scheduler
$ cat /sys/block/sda/queue/scheduler
mq-deadline kyber [bfq] none
 
# View scheduler statistics
$ cat /sys/block/sda/queue/stat
    8652     1234    234567     12345     5678      456    123456      9876
 
# Fields: reads merged, sectors read, ms reading, writes merged, etc.
 
# Monitor I/O per process
$ sudo iotop -o
Total DISK READ:         5.23 M/s | Total DISK WRITE:        12.45 M/s
Actual DISK READ:        4.98 M/s | Actual DISK WRITE:       10.23 M/s
    PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  12345 be/4  postgres     4.50 M/s    8.20 M/s  0.00 %  4.12 %  postgres
  23456 be/4  mysql        0.00 B/s    3.50 M/s  0.00 %  1.23 %  mysqld

SSD vs HDD Scheduling

Summary: The Device-Independent Layer

Key Takeaways

•Device independence enables portability — Programs work across different hardware without modification, thanks to uniform system call interfaces.
•Device files unify naming — The /dev filesystem integrates devices into the standard file namespace, enabling powerful Unix composition with pipes and redirection.
•Standard permissions protect devices — Unix file permissions apply to device files, controlling who can access sensitive hardware.
•Kernel buffering optimizes throughput — The page cache and buffer cache decouple application I/O rates from device speeds, enabling efficient data flow.
•Block size abstraction simplifies access — Applications use byte-oriented I/O while the kernel translates to device-specific block operations.
•Unified error handling — Device-specific errors are translated to standard errno values, allowing consistent application error handling.
•I/O scheduling balances competing requests — The kernel schedules device access for throughput, latency, and fairness.

What's Next:

Page Complete