Operating SystemsI/O Software

Blocking and Non-Blocking I/O

LevelIntermediate

Duration90 mins

TopicI/O Software

1 / 5

Blocking I/O

The Synchronous I/O Foundation

When you call read() on a file descriptor and the data isn't immediately available, what happens? Does your process continue executing? Does it spin in a loop? Does it yield the CPU? The answer to these questions defines the most fundamental I/O model in computing: blocking I/O.

Blocking I/O is the default behavior for virtually all I/O operations in Unix-like systems and Windows. It's the model that most programmers encounter first, and it shapes how we think about program execution. Understanding blocking I/O deeply is essential—not just to use it correctly, but to understand why other I/O models exist and when they're necessary.

What You Will Learn

By the end of this page, you will understand the precise semantics of blocking I/O, how the kernel manages blocked processes, the relationship between blocking and process scheduling, and the implications for application design. You'll see exactly what happens in the kernel when a process blocks, and why this seemingly simple model has profound implications for system architecture.

Defining Blocking I/O

Blocking I/O is an I/O model where a system call does not return to the calling process until the requested operation is complete—meaning the data is available for a read or the data has been accepted for a write. During this waiting period, the process is said to be blocked or sleeping, and the CPU is available for other processes.

The formal definition:

A blocking I/O operation suspends the execution of the calling thread until:

The requested data is available (for read operations)
Buffer space is available (for write operations)
An error condition occurs
A timeout expires (if configured)
A signal interrupts the operation

This is in contrast to other models where the system call returns immediately, regardless of whether the operation is complete.

Blocking is the Default

When you open a file, socket, pipe, or any I/O resource, it's opened in blocking mode by default. This means every read() and write() can potentially block. Non-blocking behavior must be explicitly requested using flags like O_NONBLOCK.

The mental model:

Think of blocking I/O like ordering at a restaurant with table service. You place your order (make a system call), and then you wait at your table (process is suspended). You can't do anything else until your food arrives (data becomes available). The waiter (kernel) serves other tables (runs other processes) while your food is being prepared.

This model is synchronous—the caller and the I/O operation move in lockstep. The caller's thread of execution cannot proceed until the I/O completes.

Blocking I/O Behavior by Operation Type
Operation	Blocks When	Returns When	Error Behavior
read(fd, buf, n)	No data available in buffer	At least 1 byte available or EOF	Returns -1 with errno set
write(fd, buf, n)	Kernel buffer full	At least 1 byte accepted	Returns -1 with errno set
accept(sockfd, ...)	No pending connections	Connection available	Returns -1 with errno set
connect(sockfd, ...)	TCP handshake in progress	Connection established	Returns -1 with errno set
recv(sockfd, ...)	No data in socket buffer	Data received or connection closed	Returns -1 with errno set
send(sockfd, ...)	Send buffer full	Data queued for transmission	Returns -1 with errno set

The Kernel's Role in Blocking

When a process makes a blocking I/O call, a sophisticated dance occurs between user space and kernel space. Understanding this dance is crucial for system programmers and anyone diagnosing I/O performance issues.

The blocking sequence in detail:

Kernel Blocking Sequence

•System call entry — The process executes a syscall instruction, transitioning from user mode to kernel mode. The CPU saves user context and loads kernel context.
•Resource check — The kernel checks if the requested resource is immediately available (e.g., data in the socket receive buffer).
•Immediate return (happy path) — If data is available, the kernel copies it to user space and returns immediately. No blocking occurs.
•Wait queue insertion — If data is not available, the process is added to a wait queue associated with the specific resource (file, socket, device).
•State transition — The process state changes from TASK_RUNNING to TASK_INTERRUPTIBLE (or TASK_UNINTERRUPTIBLE for certain operations).
•Context switch — The scheduler removes this process from the run queue and selects another runnable process. The blocked process consumes no CPU.
•Event occurrence — When data arrives (e.g., a network packet), the device driver or protocol handler runs, typically in interrupt context or softirq.
•Wake-up — The kernel removes the process from the wait queue, sets its state to TASK_RUNNING, and adds it to the run queue.
•Rescheduling — Eventually (possibly after other processes run), the scheduler selects our process. It resumes exactly where it left off—inside the system call.
•Completion — The kernel copies data to user space and returns to user mode with the result.

kernel_wait_queue_example.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// Simplified kernel code demonstrating blocking I/O implementation
// This is representative of how wait queues work in Linux
 
struct wait_queue_entry {
    struct task_struct *task;
    struct list_head list;
    unsigned int flags;
};
 
// The wait queue associated with a socket's receive buffer
struct wait_queue_head socket_wait_queue;
 
// Inside the read() system call implementation
ssize_t socket_read(struct socket *sock, char __user *buf, size_t count) {
    struct sock *sk = sock->sk;
    DEFINE_WAIT(wait);  // Declare a wait queue entry for current process
    
    // Lock the socket to check buffer state
    lock_sock(sk);
    
    while (1) {
        // Check if data is available
        if (skb_queue_len(&sk->sk_receive_queue) > 0) {
            // Data available! Copy to user space and return
            ssize_t copied = copy_data_to_user(sk, buf, count);
            release_sock(sk);
            return copied;
        }
        
        // No data available - prepare to sleep
        // Add ourselves to the socket's wait queue
        prepare_to_wait(&sk->sk_wq->wait, &wait, TASK_INTERRUPTIBLE);
        
        // Release lock before sleeping (important for deadlock prevention)
        release_sock(sk);
        
        // Check for signals before actually sleeping
        if (signal_pending(current)) {
            finish_wait(&sk->sk_wq->wait, &wait);
            return -EINTR;  // Interrupted by signal
        }
        
        // Actually sleep - scheduler takes over
        // CPU is now free for other processes
        schedule();
        
        // We've been woken up! Clean up wait queue entry
        finish_wait(&sk->sk_wq->wait, &wait);
        
        // Re-acquire lock and loop to check for data
        lock_sock(sk);
    }
}
 
// Called by network stack when data arrives (softirq context)
void tcp_data_ready(struct sock *sk) {
    // Wake up any processes waiting for data on this socket
    wake_up_interruptible(&sk->sk_wq->wait);
}

Wait Queues Are The Foundation

Wait queues are one of the most important kernel data structures. Every blockable resource—files, sockets, pipes, devices, IPC mechanisms—has associated wait queues. When you see a system where blocking I/O performs poorly, the investigation often leads to wait queue behavior and wake-up patterns.

Process States and Blocking

Blocking I/O directly affects process scheduling. When a process blocks, its state changes in ways that the scheduler understands. Let's examine the relevant process states in Linux:

TASK_RUNNING (R) The process is either currently executing on a CPU or is waiting in the run queue to be scheduled. This is the only state from which a process can be selected by the scheduler.

TASK_INTERRUPTIBLE (S) The process is sleeping, waiting for some condition (like I/O completion). It can be awakened by either:

The awaited event (data arrival, timer expiration)
A signal delivered to the process

Most blocking I/O uses this state. When you see "S" in ps output, this is what it means.

TASK_UNINTERRUPTIBLE (D) The process is sleeping and cannot be interrupted by signals. This is used when the kernel cannot safely stop waiting—typically for disk I/O where the operation must complete to maintain filesystem consistency. Processes in this state show as "D" in ps and are sometimes called "uninterruptible sleep."

The Dreaded D State

Processes in TASK_UNINTERRUPTIBLE cannot be killed—not even with SIGKILL. If you see processes stuck in D state, it typically indicates a kernel-level issue: a hung NFS mount, a malfunctioning device driver, or disk I/O stuck waiting for unresponsive hardware. These processes will remain until the I/O completes or the system is rebooted.

process_state_observation.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Observing process states during I/O operations
 
# Create a named pipe for controlled blocking
mkfifo /tmp/test_pipe
 
# In one terminal, start a blocking read (will block until data arrives)
cat /tmp/test_pipe &
READER_PID=$!
 
# Check the process state - should show 'S' (interruptible sleep)
ps -o pid,state,comm -p $READER_PID
# Output: 
#   PID S COMMAND
# 12345 S cat
 
# The process is sleeping, waiting for data on the pipe
# Let's look at where it's blocked:
cat /proc/$READER_PID/wchan
# Output: pipe_read (or similar kernel function name)
 
# More detailed view with stack trace:
cat /proc/$READER_PID/stack
# Output shows kernel stack - blocked in pipe_read waiting for data
 
# Now let's send data to unblock:
echo "hello" > /tmp/test_pipe
 
# The cat process wakes up, prints "hello", and exits
 
# Clean up
rm /tmp/test_pipe
 
# For disk I/O, we might see TASK_UNINTERRUPTIBLE:
# dd if=/dev/sda of=/dev/null bs=1M count=100 &
# ps aux | grep dd
# The 'D' state may appear briefly during actual disk I/O

Process State Transitions During Blocking I/O
Event	State Transition	Scheduler Action	CPU Impact
read() called, no data	RUNNING → INTERRUPTIBLE	Remove from run queue	CPU freed for other processes
Data arrives (interrupt)	INTERRUPTIBLE → RUNNING	Add to run queue	Will be scheduled when selected
Signal received while blocked	INTERRUPTIBLE → RUNNING	Add to run queue	Returns -EINTR to user space
disk read() initiated	RUNNING → UNINTERRUPTIBLE	Remove from run queue	Cannot be interrupted
Disk I/O completes	UNINTERRUPTIBLE → RUNNING	Add to run queue	Data available in page cache

Blocking I/O Semantics

Understanding the precise semantics of blocking I/O calls is crucial for writing correct programs. Many subtle bugs arise from misunderstanding what these calls guarantee.

The `read()` System Call

Signature: ssize_t read(int fd, void *buf, size_t count);

Blocking behavior:

Blocks until at least one byte is available, NOT until count bytes are available
Returns the number of bytes actually read (may be less than count)
Returns 0 on EOF (end-of-file or connection closed)
Returns -1 on error (check errno)

Critical insight: A successful read() may return fewer bytes than requested. This is called a short read and is perfectly normal behavior, especially for network sockets, pipes, and terminals. You must loop to read exactly count bytes.

correct_blocking_read.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
 
/**
 * Read exactly 'count' bytes from file descriptor.
 * Handles short reads by looping until all bytes are read.
 * 
 * Returns:
 *   count on success (all bytes read)
 *   < count if EOF reached
 *   -1 on error
 */
ssize_t read_exact(int fd, void *buf, size_t count) {
    size_t total_read = 0;
    char *ptr = (char *)buf;
    
    while (total_read < count) {
        ssize_t n = read(fd, ptr + total_read, count - total_read);
        
        if (n < 0) {
            // Error occurred
            if (errno == EINTR) {
                // Interrupted by signal - retry
                continue;
            }
            // Actual error
            return -1;
        }
        
        if (n == 0) {
            // EOF reached before reading 'count' bytes
            // Return what we got
            break;
        }
        
        total_read += n;
    }
    
    return total_read;
}
 
/**
 * INCORRECT version - common bug!
 * Assumes read() always returns 'count' bytes.
 */
ssize_t read_exact_WRONG(int fd, void *buf, size_t count) {
    // BUG: read() may return fewer bytes than requested!
    return read(fd, buf, count);
}
 
// Example usage
int main() {
    char buffer[1024];
    
    // Suppose we're reading from a network socket
    // WRONG: Assumes we get all 1024 bytes
    if (read(socket_fd, buffer, 1024) == 1024) {
        // This check often fails on sockets!
    }
    
    // CORRECT: Loop until we have all data
    ssize_t bytes = read_exact(socket_fd, buffer, 1024);
    if (bytes < 0) {
        perror("read failed");
    } else if (bytes < 1024) {
        printf("EOF: got only %zd bytes
", bytes);
    } else {
        // Full message received
    }
    
    return 0;
}

The `write()` System Call

Signature: ssize_t write(int fd, const void *buf, size_t count);

Blocking behavior:

Blocks until at least one byte can be written, NOT until all count bytes are written
Returns the number of bytes actually written (may be less than count)
For regular files, short writes are rare but possible (disk full)
For sockets/pipes, short writes are common when buffers fill

Critical insight: Short writes are less common than short reads for regular files, but they're very common for sockets. Always check the return value and loop if necessary.

correct_blocking_write.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <unistd.h>
#include <errno.h>
 
/**
 * Write exactly 'count' bytes to file descriptor.
 * Handles short writes and EINTR by looping.
 * 
 * Returns:
 *   count on success
 *   -1 on error (partial writes may have occurred!)
 */
ssize_t write_exact(int fd, const void *buf, size_t count) {
    size_t total_written = 0;
    const char *ptr = (const char *)buf;
    
    while (total_written < count) {
        ssize_t n = write(fd, ptr + total_written, count - total_written);
        
        if (n < 0) {
            if (errno == EINTR) {
                // Interrupted by signal - retry
                continue;
            }
            // Actual error - note that some bytes may have been written!
            return -1;
        }
        
        // Note: write() returning 0 is unusual but possible
        // for some special files. For regular files and sockets,
        // it indicates an error condition.
        if (n == 0) {
            // This shouldn't happen for blocking writes
            // to regular files or sockets, but handle defensively
            continue;
        }
        
        total_written += n;
    }
    
    return total_written;
}
 
// Example: Sending a complete message over a socket
int send_message(int sock, const char *message, size_t len) {
    ssize_t written = write_exact(sock, message, len);
    if (written < 0) {
        perror("send_message failed");
        return -1;
    }
    // All bytes sent
    return 0;
}

Signal Interruption (EINTR)

When a process receives a signal while blocked in a system call, the call may return early with errno set to EINTR. This is NOT an error—it's the kernel telling you 'something else needs your attention.' Proper code must check for EINTR and retry the operation. Alternatively, use the SA_RESTART flag when installing signal handlers to have the kernel automatically restart interrupted calls.

Blocking Scenarios in Practice

Different I/O resources exhibit different blocking behaviors. Understanding these nuances is essential for building reliable systems.

Regular Files

Blocking on regular files is typically brief—data is either in the page cache (instant), or it must be read from disk (milliseconds to seconds).

Key characteristics:

Read blocks until disk I/O completes (often TASK_UNINTERRUPTIBLE)
Page cache hits return immediately (no actual blocking)
Write typically returns after copying to page cache (not after disk commit)
Disk I/O uses TASK_UNINTERRUPTIBLE because interrupting could corrupt data

Sockets

Network sockets exhibit the most variable blocking behavior—from microseconds to indefinitely.

Key characteristics:

Read blocks until data arrives from the network (can be forever!)
Write blocks when the send buffer is full (network congestion)
accept() blocks until a client connects
connect() blocks until TCP handshake completes (or times out)

Critical consideration: TCP sockets can block indefinitely if the peer becomes unreachable. Always use timeouts for production code.

socket_timeout.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <sys/socket.h>
#include <sys/time.h>
 
/**
 * Set socket receive timeout.
 * After timeout expires, read() will return -1 with errno EAGAIN or EWOULDBLOCK.
 */
int set_socket_timeout(int sockfd, int timeout_seconds) {
    struct timeval tv;
    tv.tv_sec = timeout_seconds;
    tv.tv_usec = 0;
    
    // SO_RCVTIMEO: Receive timeout
    if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) < 0) {
        perror("setsockopt SO_RCVTIMEO");
        return -1;
    }
    
    // SO_SNDTIMEO: Send timeout  
    if (setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv)) < 0) {
        perror("setsockopt SO_SNDTIMEO");
        return -1;
    }
    
    return 0;
}
 
// Usage example
int main() {
    int client_sock = accept(server_sock, NULL, NULL);
    
    // Set 30-second timeout for all I/O on this socket
    set_socket_timeout(client_sock, 30);
    
    char buffer[1024];
    ssize_t n = read(client_sock, buffer, sizeof(buffer));
    
    if (n < 0) {
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            // Timeout occurred - client too slow
            printf("Read timed out
");
        } else {
            perror("read failed");
        }
    }
    
    return 0;
}

Pipes and FIFOs

Pipes exhibit blocking behavior based on their internal buffer state.

Key characteristics:

Read blocks when the pipe buffer is empty (no data written yet)
Write blocks when the pipe buffer is full (typically 64KB on Linux)
Read returns 0 (EOF) when all write ends are closed
Write generates SIGPIPE when all read ends are closed

Blocking Duration by Resource Type
Resource	Typical Block Duration	Maximum Duration	Notes
Regular file (cached)	Microseconds	Milliseconds	Page cache hit, memory copy only
Regular file (disk)	1-100 milliseconds	Seconds	Depends on disk speed, queue depth
SSD random read	50-200 microseconds	Milliseconds	Flash latency plus software overhead
HDD random read	5-15 milliseconds	Hundreds of ms	Dominated by seek time
Local socket	Microseconds	Depends on peer	Loopback is fast, but peer can be slow
Remote socket (LAN)	Microseconds to milliseconds	Indefinite	Network latency plus peer processing
Remote socket (WAN)	Tens to hundreds of ms	Indefinite	Wide variation, packet loss, congestion
Pipe (data available)	Microseconds	Microseconds	Memory-to-memory copy
Pipe (empty)	Depends on writer	Indefinite	Blocks until writer provides data
Terminal read	Human scale	Indefinite	Waiting for user input

Advantages of Blocking I/O

Blocking I/O is the default for good reasons. Despite its limitations, it offers significant advantages that make it the right choice for many scenarios.

Benefits of Blocking I/O

•Simplicity — The programming model is straightforward. Call read(), get data. No callbacks, no event loops, no state machines. The code reads top-to-bottom, matching the order of operations.
•Sequential Reasoning — With blocking I/O, you can reason about your program as a sequence of steps. Each statement completes before the next begins. This makes debugging significantly easier.
•Efficient CPU Usage — Blocked processes consume zero CPU time. The kernel efficiently multiplexes the CPU among runnable processes while blocked processes sleep.
•No Busy Waiting — Unlike polling approaches, blocking I/O doesn't waste CPU cycles checking 'is data ready yet?' The kernel wakes the process exactly when data arrives.
•Natural Backpressure — When a process can't keep up with incoming data, it naturally slows down because it's blocked waiting. This prevents unbounded memory growth that can occur with async approaches.
•Thread Safety by Default — Each thread has its own blocking calls that don't interfere with others. No need for complex synchronization around I/O operations.
•Mature Ecosystem — Most libraries, frameworks, and tools assume blocking I/O. Integration is seamless.

When to Choose Blocking I/O

Blocking I/O is ideal for: command-line tools, batch processing, scripts, applications with one or few I/O sources, CPU-bound programs with occasional I/O, and situations where simplicity is more valuable than maximum concurrency.

Limitations of Blocking I/O

Blocking I/O's simplicity comes with significant limitations that motivate the development of alternative I/O models.

Limitations of Blocking I/O

•One Resource Per Thread — A thread blocked on one resource cannot monitor other resources. To handle multiple connections simultaneously, you need multiple threads.
•Thread Overhead — Threads consume memory (stack space, kernel structures) and incur context switch costs. Handling 10,000 connections with blocking I/O requires 10,000 threads—often impractical.
•The C10K Problem — Servers handling tens of thousands of concurrent connections cannot use thread-per-connection with blocking I/O. Memory and scheduling overhead become prohibitive.
•Latency Variability — You cannot predict when a blocking call will return. This makes it difficult to meet latency guarantees in real-time or interactive systems.
•Unbounded Wait — Without timeouts, blocking calls can wait forever. A misbehaving client or network failure can hang your program indefinitely.
•Complexity with Multiple I/O Sources — Serving multiple clients, reading from multiple files, or handling timeouts simultaneously requires complex threading or multiplexing.
•Signal Handling Complications — Signals and blocking I/O interact in complex ways. EINTR handling is error-prone, and some signals can't be delivered to blocked processes.

The fundamental tension:

Blocking I/O provides an elegant per-operation model but struggles with concurrent operations. For a program that reads from one file and writes to another, blocking I/O is perfect. For a web server handling thousands of simultaneous connections, blocking I/O requires thousands of threads—and threads don't scale well to those numbers.

This tension drove the development of non-blocking I/O, I/O multiplexing, and asynchronous I/O, which we'll explore in subsequent sections.

thread_per_connection_problem.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <pthread.h>
#include <sys/socket.h>
#include <unistd.h>
#include <stdlib.h>
 
/**
 * Classic thread-per-connection server.
 * Simple to understand but doesn't scale.
 */
 
void *handle_client(void *arg) {
    int client_fd = *(int *)arg;
    free(arg);
    
    char buffer[1024];
    ssize_t n;
    
    // Blocking read from client - simple!
    while ((n = read(client_fd, buffer, sizeof(buffer))) > 0) {
        // Echo back - also blocking
        write(client_fd, buffer, n);
    }
    
    close(client_fd);
    return NULL;
}
 
int main() {
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    // ... bind and listen ...
    
    while (1) {
        int *client_fd = malloc(sizeof(int));
        *client_fd = accept(server_fd, NULL, NULL);  // Blocking!
        
        pthread_t thread;
        pthread_create(&thread, NULL, handle_client, client_fd);
        pthread_detach(thread);
        
        // Problem: With 10,000 clients, we have 10,000 threads!
        // Each thread consumes:
        // - ~8KB-8MB stack space (platform dependent)
        // - Kernel resources for thread scheduling
        // - Context switch overhead when switching threads
        //
        // At some point, thread creation fails or performance degrades
        // severely due to memory pressure and scheduling overhead.
    }
    
    return 0;
}

Summary: Blocking I/O

Blocking I/O is the foundational I/O model that every systems programmer must understand. Let's consolidate the key concepts:

Key Takeaways

•Blocking I/O suspends the calling process until the operation completes—data arrives for reads or buffer space opens for writes.
•The kernel manages blocking through wait queues. Blocked processes consume no CPU and are awakened when their awaited event occurs.
•Process states reflect blocking — TASK_INTERRUPTIBLE for most I/O, TASK_UNINTERRUPTIBLE for critical disk operations that cannot be interrupted.
•Short reads and writes are normal — Always check return values and loop when necessary. A single read() or write() may not transfer all requested bytes.
•EINTR is not an error — Signal delivery can interrupt blocking calls. Proper code checks for EINTR and retries.
•Different resources block differently — Files, sockets, pipes, and devices have distinct behaviors and latency characteristics.
•Blocking I/O is simple but limited — Excellent for simple programs, problematic for high-concurrency servers.

What's next:

Now that we understand blocking I/O's semantics and limitations, we'll explore non-blocking I/O—a model where system calls return immediately even when the operation cannot complete. Non-blocking I/O addresses some of blocking's limitations but introduces new challenges around polling and state management.

Page Complete

You now have a deep understanding of blocking I/O—the synchronous model that underlies most I/O programming. You understand the kernel mechanisms (wait queues, process states), the system call semantics (short reads, EINTR), and the fundamental tradeoffs. Next, we'll explore how non-blocking I/O changes the programming model.

1 / 5

Loading learning content...

Operating SystemsI/O Software

Blocking and Non-Blocking I/O

LevelIntermediate

Duration90 mins

TopicI/O Software

1 / 5

Blocking I/O

The Synchronous I/O Foundation

What You Will Learn

Defining Blocking I/O

The formal definition:

A blocking I/O operation suspends the execution of the calling thread until:

The requested data is available (for read operations)
Buffer space is available (for write operations)
An error condition occurs
A timeout expires (if configured)
A signal interrupts the operation

This is in contrast to other models where the system call returns immediately, regardless of whether the operation is complete.

Blocking is the Default

The mental model:

This model is synchronous—the caller and the I/O operation move in lockstep. The caller's thread of execution cannot proceed until the I/O completes.

Blocking I/O Behavior by Operation Type
Operation	Blocks When	Returns When	Error Behavior
read(fd, buf, n)	No data available in buffer	At least 1 byte available or EOF	Returns -1 with errno set
write(fd, buf, n)	Kernel buffer full	At least 1 byte accepted	Returns -1 with errno set
accept(sockfd, ...)	No pending connections	Connection available	Returns -1 with errno set
connect(sockfd, ...)	TCP handshake in progress	Connection established	Returns -1 with errno set
recv(sockfd, ...)	No data in socket buffer	Data received or connection closed	Returns -1 with errno set
send(sockfd, ...)	Send buffer full	Data queued for transmission	Returns -1 with errno set

The Kernel's Role in Blocking

The blocking sequence in detail:

Kernel Blocking Sequence

•System call entry — The process executes a syscall instruction, transitioning from user mode to kernel mode. The CPU saves user context and loads kernel context.
•Resource check — The kernel checks if the requested resource is immediately available (e.g., data in the socket receive buffer).
•Immediate return (happy path) — If data is available, the kernel copies it to user space and returns immediately. No blocking occurs.
•Wait queue insertion — If data is not available, the process is added to a wait queue associated with the specific resource (file, socket, device).
•State transition — The process state changes from TASK_RUNNING to TASK_INTERRUPTIBLE (or TASK_UNINTERRUPTIBLE for certain operations).
•Context switch — The scheduler removes this process from the run queue and selects another runnable process. The blocked process consumes no CPU.
•Event occurrence — When data arrives (e.g., a network packet), the device driver or protocol handler runs, typically in interrupt context or softirq.
•Wake-up — The kernel removes the process from the wait queue, sets its state to TASK_RUNNING, and adds it to the run queue.
•Rescheduling — Eventually (possibly after other processes run), the scheduler selects our process. It resumes exactly where it left off—inside the system call.
•Completion — The kernel copies data to user space and returns to user mode with the result.

kernel_wait_queue_example.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// Simplified kernel code demonstrating blocking I/O implementation
// This is representative of how wait queues work in Linux
 
struct wait_queue_entry {
    struct task_struct *task;
    struct list_head list;
    unsigned int flags;
};
 
// The wait queue associated with a socket's receive buffer
struct wait_queue_head socket_wait_queue;
 
// Inside the read() system call implementation
ssize_t socket_read(struct socket *sock, char __user *buf, size_t count) {
    struct sock *sk = sock->sk;
    DEFINE_WAIT(wait);  // Declare a wait queue entry for current process
    
    // Lock the socket to check buffer state
    lock_sock(sk);
    
    while (1) {
        // Check if data is available
        if (skb_queue_len(&sk->sk_receive_queue) > 0) {
            // Data available! Copy to user space and return
            ssize_t copied = copy_data_to_user(sk, buf, count);
            release_sock(sk);
            return copied;
        }
        
        // No data available - prepare to sleep
        // Add ourselves to the socket's wait queue
        prepare_to_wait(&sk->sk_wq->wait, &wait, TASK_INTERRUPTIBLE);
        
        // Release lock before sleeping (important for deadlock prevention)
        release_sock(sk);
        
        // Check for signals before actually sleeping
        if (signal_pending(current)) {
            finish_wait(&sk->sk_wq->wait, &wait);
            return -EINTR;  // Interrupted by signal
        }
        
        // Actually sleep - scheduler takes over
        // CPU is now free for other processes
        schedule();
        
        // We've been woken up! Clean up wait queue entry
        finish_wait(&sk->sk_wq->wait, &wait);
        
        // Re-acquire lock and loop to check for data
        lock_sock(sk);
    }
}
 
// Called by network stack when data arrives (softirq context)
void tcp_data_ready(struct sock *sk) {
    // Wake up any processes waiting for data on this socket
    wake_up_interruptible(&sk->sk_wq->wait);
}

Wait Queues Are The Foundation

Process States and Blocking

Blocking I/O directly affects process scheduling. When a process blocks, its state changes in ways that the scheduler understands. Let's examine the relevant process states in Linux:

TASK_RUNNING (R) The process is either currently executing on a CPU or is waiting in the run queue to be scheduled. This is the only state from which a process can be selected by the scheduler.

TASK_INTERRUPTIBLE (S) The process is sleeping, waiting for some condition (like I/O completion). It can be awakened by either:

The awaited event (data arrival, timer expiration)
A signal delivered to the process

Most blocking I/O uses this state. When you see "S" in ps output, this is what it means.

The Dreaded D State

process_state_observation.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Observing process states during I/O operations
 
# Create a named pipe for controlled blocking
mkfifo /tmp/test_pipe
 
# In one terminal, start a blocking read (will block until data arrives)
cat /tmp/test_pipe &
READER_PID=$!
 
# Check the process state - should show 'S' (interruptible sleep)
ps -o pid,state,comm -p $READER_PID
# Output: 
#   PID S COMMAND
# 12345 S cat
 
# The process is sleeping, waiting for data on the pipe
# Let's look at where it's blocked:
cat /proc/$READER_PID/wchan
# Output: pipe_read (or similar kernel function name)
 
# More detailed view with stack trace:
cat /proc/$READER_PID/stack
# Output shows kernel stack - blocked in pipe_read waiting for data
 
# Now let's send data to unblock:
echo "hello" > /tmp/test_pipe
 
# The cat process wakes up, prints "hello", and exits
 
# Clean up
rm /tmp/test_pipe
 
# For disk I/O, we might see TASK_UNINTERRUPTIBLE:
# dd if=/dev/sda of=/dev/null bs=1M count=100 &
# ps aux | grep dd
# The 'D' state may appear briefly during actual disk I/O

Process State Transitions During Blocking I/O
Event	State Transition	Scheduler Action	CPU Impact
read() called, no data	RUNNING → INTERRUPTIBLE	Remove from run queue	CPU freed for other processes
Data arrives (interrupt)	INTERRUPTIBLE → RUNNING	Add to run queue	Will be scheduled when selected
Signal received while blocked	INTERRUPTIBLE → RUNNING	Add to run queue	Returns -EINTR to user space
disk read() initiated	RUNNING → UNINTERRUPTIBLE	Remove from run queue	Cannot be interrupted
Disk I/O completes	UNINTERRUPTIBLE → RUNNING	Add to run queue	Data available in page cache

Blocking I/O Semantics

Understanding the precise semantics of blocking I/O calls is crucial for writing correct programs. Many subtle bugs arise from misunderstanding what these calls guarantee.

The `read()` System Call

Signature: ssize_t read(int fd, void *buf, size_t count);

Blocking behavior:

Blocks until at least one byte is available, NOT until count bytes are available
Returns the number of bytes actually read (may be less than count)
Returns 0 on EOF (end-of-file or connection closed)
Returns -1 on error (check errno)

correct_blocking_read.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
 
/**
 * Read exactly 'count' bytes from file descriptor.
 * Handles short reads by looping until all bytes are read.
 * 
 * Returns:
 *   count on success (all bytes read)
 *   < count if EOF reached
 *   -1 on error
 */
ssize_t read_exact(int fd, void *buf, size_t count) {
    size_t total_read = 0;
    char *ptr = (char *)buf;
    
    while (total_read < count) {
        ssize_t n = read(fd, ptr + total_read, count - total_read);
        
        if (n < 0) {
            // Error occurred
            if (errno == EINTR) {
                // Interrupted by signal - retry
                continue;
            }
            // Actual error
            return -1;
        }
        
        if (n == 0) {
            // EOF reached before reading 'count' bytes
            // Return what we got
            break;
        }
        
        total_read += n;
    }
    
    return total_read;
}
 
/**
 * INCORRECT version - common bug!
 * Assumes read() always returns 'count' bytes.
 */
ssize_t read_exact_WRONG(int fd, void *buf, size_t count) {
    // BUG: read() may return fewer bytes than requested!
    return read(fd, buf, count);
}
 
// Example usage
int main() {
    char buffer[1024];
    
    // Suppose we're reading from a network socket
    // WRONG: Assumes we get all 1024 bytes
    if (read(socket_fd, buffer, 1024) == 1024) {
        // This check often fails on sockets!
    }
    
    // CORRECT: Loop until we have all data
    ssize_t bytes = read_exact(socket_fd, buffer, 1024);
    if (bytes < 0) {
        perror("read failed");
    } else if (bytes < 1024) {
        printf("EOF: got only %zd bytes
", bytes);
    } else {
        // Full message received
    }
    
    return 0;
}

The `write()` System Call

Signature: ssize_t write(int fd, const void *buf, size_t count);

Blocking behavior:

Blocks until at least one byte can be written, NOT until all count bytes are written
Returns the number of bytes actually written (may be less than count)
For regular files, short writes are rare but possible (disk full)
For sockets/pipes, short writes are common when buffers fill

Critical insight: Short writes are less common than short reads for regular files, but they're very common for sockets. Always check the return value and loop if necessary.

correct_blocking_write.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <unistd.h>
#include <errno.h>
 
/**
 * Write exactly 'count' bytes to file descriptor.
 * Handles short writes and EINTR by looping.
 * 
 * Returns:
 *   count on success
 *   -1 on error (partial writes may have occurred!)
 */
ssize_t write_exact(int fd, const void *buf, size_t count) {
    size_t total_written = 0;
    const char *ptr = (const char *)buf;
    
    while (total_written < count) {
        ssize_t n = write(fd, ptr + total_written, count - total_written);
        
        if (n < 0) {
            if (errno == EINTR) {
                // Interrupted by signal - retry
                continue;
            }
            // Actual error - note that some bytes may have been written!
            return -1;
        }
        
        // Note: write() returning 0 is unusual but possible
        // for some special files. For regular files and sockets,
        // it indicates an error condition.
        if (n == 0) {
            // This shouldn't happen for blocking writes
            // to regular files or sockets, but handle defensively
            continue;
        }
        
        total_written += n;
    }
    
    return total_written;
}
 
// Example: Sending a complete message over a socket
int send_message(int sock, const char *message, size_t len) {
    ssize_t written = write_exact(sock, message, len);
    if (written < 0) {
        perror("send_message failed");
        return -1;
    }
    // All bytes sent
    return 0;
}

Signal Interruption (EINTR)

Blocking Scenarios in Practice

Different I/O resources exhibit different blocking behaviors. Understanding these nuances is essential for building reliable systems.

Regular Files

Blocking on regular files is typically brief—data is either in the page cache (instant), or it must be read from disk (milliseconds to seconds).

Key characteristics:

Read blocks until disk I/O completes (often TASK_UNINTERRUPTIBLE)
Page cache hits return immediately (no actual blocking)
Write typically returns after copying to page cache (not after disk commit)
Disk I/O uses TASK_UNINTERRUPTIBLE because interrupting could corrupt data

Sockets

Network sockets exhibit the most variable blocking behavior—from microseconds to indefinitely.

Key characteristics:

Read blocks until data arrives from the network (can be forever!)
Write blocks when the send buffer is full (network congestion)
accept() blocks until a client connects
connect() blocks until TCP handshake completes (or times out)

Critical consideration: TCP sockets can block indefinitely if the peer becomes unreachable. Always use timeouts for production code.

socket_timeout.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <sys/socket.h>
#include <sys/time.h>
 
/**
 * Set socket receive timeout.
 * After timeout expires, read() will return -1 with errno EAGAIN or EWOULDBLOCK.
 */
int set_socket_timeout(int sockfd, int timeout_seconds) {
    struct timeval tv;
    tv.tv_sec = timeout_seconds;
    tv.tv_usec = 0;
    
    // SO_RCVTIMEO: Receive timeout
    if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) < 0) {
        perror("setsockopt SO_RCVTIMEO");
        return -1;
    }
    
    // SO_SNDTIMEO: Send timeout  
    if (setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv)) < 0) {
        perror("setsockopt SO_SNDTIMEO");
        return -1;
    }
    
    return 0;
}
 
// Usage example
int main() {
    int client_sock = accept(server_sock, NULL, NULL);
    
    // Set 30-second timeout for all I/O on this socket
    set_socket_timeout(client_sock, 30);
    
    char buffer[1024];
    ssize_t n = read(client_sock, buffer, sizeof(buffer));
    
    if (n < 0) {
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            // Timeout occurred - client too slow
            printf("Read timed out
");
        } else {
            perror("read failed");
        }
    }
    
    return 0;
}

Pipes and FIFOs

Pipes exhibit blocking behavior based on their internal buffer state.

Key characteristics:

Read blocks when the pipe buffer is empty (no data written yet)
Write blocks when the pipe buffer is full (typically 64KB on Linux)
Read returns 0 (EOF) when all write ends are closed
Write generates SIGPIPE when all read ends are closed

Blocking Duration by Resource Type
Resource	Typical Block Duration	Maximum Duration	Notes
Regular file (cached)	Microseconds	Milliseconds	Page cache hit, memory copy only
Regular file (disk)	1-100 milliseconds	Seconds	Depends on disk speed, queue depth
SSD random read	50-200 microseconds	Milliseconds	Flash latency plus software overhead
HDD random read	5-15 milliseconds	Hundreds of ms	Dominated by seek time
Local socket	Microseconds	Depends on peer	Loopback is fast, but peer can be slow
Remote socket (LAN)	Microseconds to milliseconds	Indefinite	Network latency plus peer processing
Remote socket (WAN)	Tens to hundreds of ms	Indefinite	Wide variation, packet loss, congestion
Pipe (data available)	Microseconds	Microseconds	Memory-to-memory copy
Pipe (empty)	Depends on writer	Indefinite	Blocks until writer provides data
Terminal read	Human scale	Indefinite	Waiting for user input

Advantages of Blocking I/O

Blocking I/O is the default for good reasons. Despite its limitations, it offers significant advantages that make it the right choice for many scenarios.

Benefits of Blocking I/O

•Simplicity — The programming model is straightforward. Call read(), get data. No callbacks, no event loops, no state machines. The code reads top-to-bottom, matching the order of operations.
•Sequential Reasoning — With blocking I/O, you can reason about your program as a sequence of steps. Each statement completes before the next begins. This makes debugging significantly easier.
•Efficient CPU Usage — Blocked processes consume zero CPU time. The kernel efficiently multiplexes the CPU among runnable processes while blocked processes sleep.
•No Busy Waiting — Unlike polling approaches, blocking I/O doesn't waste CPU cycles checking 'is data ready yet?' The kernel wakes the process exactly when data arrives.
•Natural Backpressure — When a process can't keep up with incoming data, it naturally slows down because it's blocked waiting. This prevents unbounded memory growth that can occur with async approaches.
•Thread Safety by Default — Each thread has its own blocking calls that don't interfere with others. No need for complex synchronization around I/O operations.
•Mature Ecosystem — Most libraries, frameworks, and tools assume blocking I/O. Integration is seamless.

When to Choose Blocking I/O

Limitations of Blocking I/O

Blocking I/O's simplicity comes with significant limitations that motivate the development of alternative I/O models.

Limitations of Blocking I/O

•One Resource Per Thread — A thread blocked on one resource cannot monitor other resources. To handle multiple connections simultaneously, you need multiple threads.
•Thread Overhead — Threads consume memory (stack space, kernel structures) and incur context switch costs. Handling 10,000 connections with blocking I/O requires 10,000 threads—often impractical.
•The C10K Problem — Servers handling tens of thousands of concurrent connections cannot use thread-per-connection with blocking I/O. Memory and scheduling overhead become prohibitive.
•Latency Variability — You cannot predict when a blocking call will return. This makes it difficult to meet latency guarantees in real-time or interactive systems.
•Unbounded Wait — Without timeouts, blocking calls can wait forever. A misbehaving client or network failure can hang your program indefinitely.
•Complexity with Multiple I/O Sources — Serving multiple clients, reading from multiple files, or handling timeouts simultaneously requires complex threading or multiplexing.
•Signal Handling Complications — Signals and blocking I/O interact in complex ways. EINTR handling is error-prone, and some signals can't be delivered to blocked processes.

The fundamental tension:

This tension drove the development of non-blocking I/O, I/O multiplexing, and asynchronous I/O, which we'll explore in subsequent sections.

thread_per_connection_problem.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <pthread.h>
#include <sys/socket.h>
#include <unistd.h>
#include <stdlib.h>
 
/**
 * Classic thread-per-connection server.
 * Simple to understand but doesn't scale.
 */
 
void *handle_client(void *arg) {
    int client_fd = *(int *)arg;
    free(arg);
    
    char buffer[1024];
    ssize_t n;
    
    // Blocking read from client - simple!
    while ((n = read(client_fd, buffer, sizeof(buffer))) > 0) {
        // Echo back - also blocking
        write(client_fd, buffer, n);
    }
    
    close(client_fd);
    return NULL;
}
 
int main() {
    int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    // ... bind and listen ...
    
    while (1) {
        int *client_fd = malloc(sizeof(int));
        *client_fd = accept(server_fd, NULL, NULL);  // Blocking!
        
        pthread_t thread;
        pthread_create(&thread, NULL, handle_client, client_fd);
        pthread_detach(thread);
        
        // Problem: With 10,000 clients, we have 10,000 threads!
        // Each thread consumes:
        // - ~8KB-8MB stack space (platform dependent)
        // - Kernel resources for thread scheduling
        // - Context switch overhead when switching threads
        //
        // At some point, thread creation fails or performance degrades
        // severely due to memory pressure and scheduling overhead.
    }
    
    return 0;
}

Summary: Blocking I/O

Blocking I/O is the foundational I/O model that every systems programmer must understand. Let's consolidate the key concepts:

Key Takeaways

•Blocking I/O suspends the calling process until the operation completes—data arrives for reads or buffer space opens for writes.
•The kernel manages blocking through wait queues. Blocked processes consume no CPU and are awakened when their awaited event occurs.
•Process states reflect blocking — TASK_INTERRUPTIBLE for most I/O, TASK_UNINTERRUPTIBLE for critical disk operations that cannot be interrupted.
•Short reads and writes are normal — Always check return values and loop when necessary. A single read() or write() may not transfer all requested bytes.
•EINTR is not an error — Signal delivery can interrupt blocking calls. Proper code checks for EINTR and retries.
•Different resources block differently — Files, sockets, pipes, and devices have distinct behaviors and latency characteristics.
•Blocking I/O is simple but limited — Excellent for simple programs, problematic for high-concurrency servers.

What's next:

Page Complete

1 / 5

Blocking and Non-Blocking I/O

Blocking I/O

The read() System Call

The write() System Call

Regular Files

Sockets

Pipes and FIFOs

Blocking and Non-Blocking I/O

Blocking I/O

The read() System Call

The write() System Call

Regular Files

Sockets

Pipes and FIFOs

The `read()` System Call

The `write()` System Call

The `read()` System Call

The `write()` System Call