Blocking Non Blocking Io - Learning Module

Loading content...

0/240

I/O Multiplexing

Monitoring Many Sources at Once

Imagine you're a waiter in a restaurant with 100 tables. You could stand at one table until they order (blocking), run frantically between all tables checking if anyone needs help (polling), or have a system that alerts you when any table is ready (multiplexing). The third approach is obviously most efficient—and it's exactly what I/O multiplexing provides for your programs.

I/O multiplexing allows a single thread to wait for I/O readiness across multiple file descriptors simultaneously. Instead of blocking on one descriptor or polling them all, you ask the kernel: "Watch these 1000 sockets and wake me when any of them have data." This is the foundation of scalable servers.

What You Will Learn

By the end of this page, you will understand the concept and importance of I/O multiplexing, why it's necessary for scalable applications, the readiness model it employs, and how it fits with non-blocking I/O. You'll be prepared to understand the specific mechanisms (select, poll, epoll) in the next section.

The Multiplexing Problem

Consider a chat server that must handle 10,000 concurrent clients. Each client might send a message at any time. The server must read from whichever client sends data, without knowing in advance which that will be.

Why blocking I/O fails here:

With blocking I/O, if you call read() on client #1's socket and they haven't sent anything, your thread blocks. While blocked, you can't read from the other 9,999 clients—even if hundreds of them have data waiting. One slow or idle client makes the entire server unresponsive.

The thread-per-client approach:

You could create 10,000 threads, one per client. Each thread blocks on its client's socket. This works but has severe limitations:

Memory: Each thread needs a stack (typically 1-8 MB). 10,000 threads = 10-80 GB of stack space.
Scheduling overhead: Context switching between 10,000 threads is expensive.
Synchronization: Threads sharing data need locks, adding complexity and contention.
Thundering herd: Many threads waking simultaneously causes cache thrashing.

For high-scale servers, the thread-per-client model is impractical.

The C10K Problem

The 'C10K problem' (coined in 1999) describes the challenge of handling 10,000 concurrent connections. A decade earlier, even 1,000 was ambitious. Today, C100K, C1M, and beyond are achievable—but only with proper I/O multiplexing. Thread-per-connection never scaled to these numbers.

The polling approach:

With non-blocking I/O, you could poll all 10,000 sockets in a loop:

while (1) {
    for (int i = 0; i < 10000; i++) {
        n = read(clients[i], buf, size);  // Non-blocking
        if (n > 0) handle_data(i, buf, n);
    }
}

This works but is terribly inefficient:

10,000 system calls every loop iteration
Most return EAGAIN (no data)
CPU spins constantly, even when all clients are idle
Latency is proportional to client count (later clients wait for earlier checks)

We need a way to wait efficiently for any of the 10,000 sockets to become ready.

What I/O Multiplexing Provides

I/O multiplexing provides a way to:

Register interest in a set of file descriptors
Block efficiently until any of them is ready for I/O
Learn which ones are ready, so you can process only those
Repeat with updated interest sets

The kernel does the work of monitoring all registered descriptors. It wakes your thread only when at least one descriptor is ready, and tells you exactly which ones.

The fundamental operation:

multiplex(descriptors[], timeout) → ready_descriptors[]

This operation:

Takes a set of descriptors you're interested in
Blocks (up to timeout) until at least one is ready
Returns which descriptors are ready for I/O

The specifics vary between mechanisms (select, poll, epoll, kqueue), but this is the core abstraction.

I/O Multiplexing Benefits
Aspect	Thread-per-Connection	Polling	I/O Multiplexing
CPU when idle	Low (threads sleep)	100% (busy loop)	Near zero (blocked)
Memory usage	Very high (stacks)	Low	Low
Latency to first ready	Good	Bad (check all first)	Good (immediate)
System calls/iteration	N/A	N (one per fd)	1-2 (total)
Code complexity	Simple per-thread	Simple but inefficient	Event-driven pattern
Scalability	~thousands	Low (CPU bound)	Millions possible

multiplexing_concept.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Conceptual I/O multiplexing pattern
int server_loop(int listen_fd, int *clients, int num_clients) {
    while (1) {
        // Build set of all descriptors we care about
        fd_set interest_set;
        build_interest_set(&interest_set, listen_fd, clients, num_clients);
        
        // Block until ANY descriptor is ready (or timeout)
        // This is the key - ONE system call, watches EVERYTHING
        int num_ready = select(max_fd + 1, &interest_set, NULL, NULL, NULL);
        
        if (num_ready < 0) {
            handle_error();
            continue;
        }
        
        // Check which descriptors are ready
        // Only iterate over the ones we know are ready
        
        if (FD_ISSET(listen_fd, &interest_set)) {
            // New connection ready to accept
            int new_client = accept(listen_fd, NULL, NULL);  // Won't block!
            add_client(new_client);
        }
        
        for (int i = 0; i < num_clients; i++) {
            if (FD_ISSET(clients[i], &interest_set)) {
                // This client has data ready - read won't block
                handle_client_data(clients[i]);
            }
        }
    }
}

The Readiness Model

I/O multiplexing operates on the readiness model: it tells you when a file descriptor is ready for an operation, not that the operation has completed.

What "ready" means:

Read-ready: At least one byte is available to read, OR EOF/error has occurred. A non-blocking read() will not return EAGAIN.
Write-ready: At least one byte can be written. A non-blocking write() will not return EAGAIN. (Note: sockets are usually write-ready unless the send buffer is full.)
Exception/priority data: Out-of-band data is available (for sockets) or an error condition exists.

Readiness vs completion:

Multiplexing tells you an operation can proceed without blocking—not that it has completed. You still need to perform the actual read() or write() after readiness is indicated.

readiness_pattern.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// The readiness model workflow
 
// 1. Wait for readiness (this blocks)
wait_for_readiness(fd);
 
// 2. Perform I/O (this does NOT block because we know it's ready)
ssize_t n = read(fd, buf, sizeof(buf));
 
// 3. Handle the result
if (n > 0) {
    // Got data
    process(buf, n);
} else if (n == 0) {
    // EOF
    close(fd);
} else {
    // Error (shouldn't be EAGAIN since we checked readiness)
    handle_error();
}
 
/*
 * Important: Readiness doesn't guarantee you'll get all data you want.
 * If you need 1000 bytes but only 100 are ready, you'll read 100.
 * For complete messages, you often need to:
 * 1. Accumulate data across multiple ready notifications
 * 2. Parse to find message boundaries
 * 3. Process complete messages only
 */

Readiness Can Change

Between checking readiness and performing I/O, readiness can change—especially in multi-threaded programs or if another thread reads from the same fd. In edge-triggered mode (epoll), you must read all available data before another notification occurs. Always use non-blocking I/O with multiplexing for safety.

Why non-blocking I/O pairs with multiplexing:

Multiplexing answers: "Which descriptors are ready?" Non-blocking I/O answers: "What happens if I try I/O and nothing's ready?"

Together they provide robust I/O handling:

Multiplexing efficiently finds ready descriptors
Non-blocking mode ensures you never get stuck if readiness changes
The combination handles edge cases gracefully

Readiness States and Their Meaning
Event Type	Condition	Appropriate Action
Read ready (normal)	Data in buffer	read() - will return data
Read ready (EOF)	Connection closed by peer	read() returns 0 - close fd
Read ready (error)	Error condition	read() returns -1 - check errno
Write ready	Space in send buffer	write() - will accept some bytes
Write ready (connected)	TCP connect completed	getsockopt(SO_ERROR) to check result
Exception (OOB)	Urgent TCP data	recv(MSG_OOB) to read urgent data
Hangup	Peer closed write side	Can still read remaining data
Error	Socket error	Check with getsockopt(SO_ERROR)

Event-Driven Architecture

I/O multiplexing naturally leads to event-driven programming—a paradigm where the flow of the program is determined by events (I/O readiness, timers, signals) rather than sequential execution.

The event loop:

At the heart of event-driven programs is the event loop: an infinite loop that waits for events and dispatches them to handlers.

event_loop.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <sys/epoll.h>
#include <unistd.h>
#include <stdio.h>
 
#define MAX_EVENTS 64
 
typedef void (*event_handler)(int fd, void *data);
 
struct event_data {
    event_handler read_handler;
    event_handler write_handler;
    void *user_data;
};
 
/**
 * A simple event loop using epoll
 */
void event_loop(int epoll_fd) {
    struct epoll_event events[MAX_EVENTS];
    
    while (1) {
        // Wait for events (this is where we spend most time blocked)
        int num_events = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
        
        if (num_events < 0) {
            if (errno == EINTR) continue;  // Signal interrupted us
            perror("epoll_wait");
            break;
        }
        
        // Dispatch events to handlers
        for (int i = 0; i < num_events; i++) {
            struct event_data *data = events[i].data.ptr;
            
            if (events[i].events & EPOLLIN) {
                // Read event
                data->read_handler(events[i].data.fd, data->user_data);
            }
            
            if (events[i].events & EPOLLOUT) {
                // Write event
                data->write_handler(events[i].data.fd, data->user_data);
            }
            
            if (events[i].events & (EPOLLERR | EPOLLHUP)) {
                // Error or hangup
                handle_error(events[i].data.fd);
            }
        }
        
        // Process timers, deferred work, etc.
        process_timers();
        process_deferred();
    }
}

Characteristics of event-driven code:

Single-threaded concurrency — Many concurrent activities, one thread. No locks needed for most code.
Non-blocking handlers — Event handlers must not block; they do work and return quickly. Long operations must be broken into steps or offloaded.
State machines — Without blocking, complex protocols become explicit state machines. Each event advances the state.
Callback-based — Instead of "call read(), wait, process," you register "call this function when readable."
Inverse of control — The event loop controls execution flow, not your sequential code. This takes adjustment.

Famous Event-Driven Systems

nginx, Node.js, Redis, libuv, Tornado, and many high-performance servers use event-driven architecture with I/O multiplexing. The pattern has proven itself at massive scale—Redis handles millions of operations per second with a single event loop thread.

Level-Triggered vs Edge-Triggered

I/O multiplexing mechanisms can notify you of readiness in two ways, with fundamentally different semantics:

Level-triggered (LT):

Notifies you as long as the condition is true
If there's data to read, you get notified every time you poll
Simple to program: you can read partial data and get notified again
Less efficient: repeated notifications for the same data

Edge-triggered (ET):

Notifies you only when the condition changes
You get ONE notification when data arrives
You must read ALL available data or you won't be notified again
More efficient: no repeated notifications
Harder to program: must drain the fd completely

level_vs_edge.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/*
 * Level-Triggered vs Edge-Triggered Behavior
 * 
 * Scenario: 100 bytes arrive on a socket
 * 
 * LEVEL-TRIGGERED:
 * ────────────────────────────────────────────
 * 1. epoll_wait() returns: socket is readable
 * 2. You read 50 bytes
 * 3. epoll_wait() returns AGAIN: socket still readable (50 bytes left)
 * 4. You read 50 bytes
 * 5. epoll_wait() blocks: no more data
 * 
 * LT is forgiving: read what you want, come back later.
 * 
 * 
 * EDGE-TRIGGERED:
 * ────────────────────────────────────────────
 * 1. epoll_wait() returns: socket became readable (edge!)
 * 2. You read 50 bytes
 * 3. epoll_wait() BLOCKS: no new edge occurred!
 * 4. You're stuck - 50 bytes remain unread forever!
 * 
 * ET requires: read until EAGAIN, every time.
 */
 
// CORRECT edge-triggered read pattern
void et_read_handler(int fd) {
    char buffer[4096];
    
    // MUST loop until EAGAIN - can't leave data!
    while (1) {
        ssize_t n = read(fd, buffer, sizeof(buffer));
        
        if (n > 0) {
            // Got data, process it
            process_data(buffer, n);
            continue;  // There might be more!
        }
        
        if (n == 0) {
            // EOF - connection closed
            close(fd);
            return;
        }
        
        // n < 0
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            // No more data RIGHT NOW - safe to return
            // epoll will notify us when more arrives
            return;
        }
        
        // Actual error
        handle_error(fd);
        return;
    }
}
 
// WRONG edge-triggered read - WILL LOSE DATA
void et_read_handler_BROKEN(int fd) {
    char buffer[4096];
    
    // Only reads once!
    ssize_t n = read(fd, buffer, sizeof(buffer));
    if (n > 0) {
        process_data(buffer, n);
    }
    // BUG: If more than 4096 bytes arrived, we lose the rest!
    // Edge-triggered won't notify again for data already present.
}

Level-Triggered vs Edge-Triggered Comparison
Aspect	Level-Triggered	Edge-Triggered
Notifications	Repeated while condition true	Once per state change
Partial reads	Safe - notified again	Unsafe - must drain completely
Programming model	Simpler	More complex
Performance	More syscalls (repeated notifies)	Fewer syscalls
Risk of starvation	Low (repeated chances)	High (miss event = stuck)
Default in epoll	Yes (EPOLLLT)	Must specify EPOLLET
select/poll	Level-triggered only	Not available

Edge-Triggered is Expert Mode

Edge-triggered epoll is faster but unforgiving. If you don't read all data, you won't be notified again. If you don't handle write-readiness correctly, you can get stuck. Start with level-triggered; switch to edge-triggered only when you need the performance and understand the implications.

Multiplexing Beyond Sockets

While network servers are the classic use case for I/O multiplexing, these mechanisms work with any file descriptor:

Pipes and FIFOs: Multiplexing can wait for data on pipes, enabling parent-child communication patterns and producer-consumer architectures.

Terminal input: Watch standard input for user commands while also handling network events—common in interactive CLI tools.

Signals (signalfd): On Linux, signals can be converted to file descriptor events via signalfd(), allowing unified event handling.

Timers (timerfd): Linux's timerfd_create() makes timers into file descriptors. Combine with epoll for efficient timeout handling.

File system events (inotify): inotify provides file system change notifications as file descriptor events.

Event notification (eventfd): A simple counting semaphore as a file descriptor—useful for thread communication in event loops.

unified_events.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#include <sys/epoll.h>
#include <sys/signalfd.h>
#include <sys/timerfd.h>
#include <signal.h>
#include <unistd.h>
 
/**
 * Unified event handling: sockets, signals, timers
 * All through epoll!
 */
int setup_unified_event_loop() {
    int epoll_fd = epoll_create1(0);
    
    // 1. Add listening socket
    int listen_fd = create_listen_socket(8080);
    add_to_epoll(epoll_fd, listen_fd, EPOLLIN);
    
    // 2. Add signal handling via signalfd
    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, SIGINT);
    sigaddset(&mask, SIGTERM);
    sigprocmask(SIG_BLOCK, &mask, NULL);  // Block normal delivery
    
    int signal_fd = signalfd(-1, &mask, SFD_NONBLOCK);
    add_to_epoll(epoll_fd, signal_fd, EPOLLIN);
    
    // 3. Add periodic timer
    int timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK);
    struct itimerspec timer_spec = {
        .it_interval = { .tv_sec = 1, .tv_nsec = 0 },  // 1 second repeat
        .it_value = { .tv_sec = 1, .tv_nsec = 0 }
    };
    timerfd_settime(timer_fd, 0, &timer_spec, NULL);
    add_to_epoll(epoll_fd, timer_fd, EPOLLIN);
    
    // 4. Add stdin for interactive commands
    add_to_epoll(epoll_fd, STDIN_FILENO, EPOLLIN);
    
    // Now epoll_wait handles:
    // - New network connections
    // - Shutdown signals (SIGINT, SIGTERM)
    // - Periodic housekeeping (timer)
    // - User commands (stdin)
    // All in one unified loop!
    
    return epoll_fd;
}
 
void event_loop(int epoll_fd) {
    struct epoll_event events[64];
    
    while (running) {
        int n = epoll_wait(epoll_fd, events, 64, -1);
        
        for (int i = 0; i < n; i++) {
            int fd = events[i].data.fd;
            
            if (fd == signal_fd) {
                // Read signal info
                struct signalfd_siginfo info;
                read(signal_fd, &info, sizeof(info));
                printf("Received signal %d
", info.ssi_signo);
                running = 0;  // Shutdown
                
            } else if (fd == timer_fd) {
                // Timer expired - do periodic work
                uint64_t expirations;
                read(timer_fd, &expirations, sizeof(expirations));
                do_housekeeping();
                
            } else if (fd == listen_fd) {
                // New connection
                accept_connection(listen_fd);
                
            } else if (fd == STDIN_FILENO) {
                // User command
                handle_user_input();
                
            } else {
                // Client data
                handle_client(fd);
            }
        }
    }
}

Everything is a File Descriptor

The Unix philosophy of 'everything is a file' really shines with multiplexing. Signals, timers, file change notifications—all become file descriptors that epoll can watch. This unification simplifies event-driven programming enormously.

Multiplexing Mechanisms Overview

Unix-like systems provide several I/O multiplexing mechanisms, each with different tradeoffs. The next section covers select, poll, and epoll in depth, but here's an overview:

I/O Multiplexing Mechanisms Comparison
Mechanism	Origin	Max FDs	Performance	Portability	API Style
select()	BSD (1983)	1024 (FD_SETSIZE)	O(n) per call	Universal	fd_set bitmasks
poll()	SVR4 (1986)	Unlimited	O(n) per call	POSIX	Array of pollfd
epoll	Linux 2.6 (2002)	Unlimited	O(1) for events	Linux only	Separate create/wait
kqueue	FreeBSD (2000)	Unlimited	O(1) for events	BSD/macOS	Kevent structure
IOCP	Windows NT	Unlimited	O(1) completion	Windows	Completion ports

Evolution of mechanisms:

select() (1983) — The original. Simple but limited to 1024 fds and O(n) scanning.
poll() (1986) — Removed the fd limit but still O(n). Slightly cleaner API.
epoll (2002) — Linux's answer to scale. O(1) event delivery, handles millions of fds.
kqueue (2000) — BSD's equivalent to epoll. Slightly more flexible API.
io_uring (2019) — Goes beyond multiplexing to true async I/O, but can also be used for notification.

Choosing a mechanism:

Portability needed? → Use select (ancient code) or poll (POSIX compliant)
Linux only, high scale? → Use epoll
BSD/macOS, high scale? → Use kqueue
Cross-platform high performance? → Use libevent, libuv, or similar abstraction

The Abstraction Layer

Most production systems don't use raw select/poll/epoll. Libraries like libevent, libev, and libuv provide a portable abstraction over platform-specific mechanisms. They use epoll on Linux, kqueue on BSD/macOS, and IOCP on Windows.

Summary: I/O Multiplexing

I/O multiplexing is the technique that enables scalable, event-driven systems. Let's consolidate the key concepts:

Key Takeaways

•Multiplexing solves concurrent I/O — Watch many descriptors, act on whichever is ready. One thread handles thousands of connections.
•It uses the readiness model — You're told when I/O can proceed, not that it's complete. Still need to call read()/write().
•Pairs with non-blocking I/O — Multiplexing finds ready descriptors; non-blocking ensures you never get stuck.
•Enables event-driven architecture — Event loops, callbacks, and state machines replace sequential code with blocking calls.
•Level vs edge triggering matters — LT is simpler; ET is faster but unforgiving. Know which you're using.
•Works beyond sockets — Pipes, timers, signals, file system events—all can be multiplexed.
•Multiple mechanisms exist — select, poll, epoll, kqueue. Each with different tradeoffs and scale characteristics.

What's next:

Now that we understand the conceptual foundation of I/O multiplexing, we'll dive into the specific mechanisms: select, poll, and epoll. We'll see their APIs, understand their performance characteristics, and learn when to use each one.

Page Complete

You now understand I/O multiplexing—the technique that enables one thread to efficiently handle many concurrent I/O sources. You grasp the readiness model, event-driven programming, and level vs edge triggering. Next, we'll explore select, poll, and epoll in practical detail.