Loading content...
Imagine you're a waiter in a restaurant with 100 tables. You could stand at one table until they order (blocking), run frantically between all tables checking if anyone needs help (polling), or have a system that alerts you when any table is ready (multiplexing). The third approach is obviously most efficient—and it's exactly what I/O multiplexing provides for your programs.
I/O multiplexing allows a single thread to wait for I/O readiness across multiple file descriptors simultaneously. Instead of blocking on one descriptor or polling them all, you ask the kernel: "Watch these 1000 sockets and wake me when any of them have data." This is the foundation of scalable servers.
By the end of this page, you will understand the concept and importance of I/O multiplexing, why it's necessary for scalable applications, the readiness model it employs, and how it fits with non-blocking I/O. You'll be prepared to understand the specific mechanisms (select, poll, epoll) in the next section.
Consider a chat server that must handle 10,000 concurrent clients. Each client might send a message at any time. The server must read from whichever client sends data, without knowing in advance which that will be.
Why blocking I/O fails here:
With blocking I/O, if you call read() on client #1's socket and they haven't sent anything, your thread blocks. While blocked, you can't read from the other 9,999 clients—even if hundreds of them have data waiting. One slow or idle client makes the entire server unresponsive.
The thread-per-client approach:
You could create 10,000 threads, one per client. Each thread blocks on its client's socket. This works but has severe limitations:
For high-scale servers, the thread-per-client model is impractical.
The 'C10K problem' (coined in 1999) describes the challenge of handling 10,000 concurrent connections. A decade earlier, even 1,000 was ambitious. Today, C100K, C1M, and beyond are achievable—but only with proper I/O multiplexing. Thread-per-connection never scaled to these numbers.
The polling approach:
With non-blocking I/O, you could poll all 10,000 sockets in a loop:
while (1) {
for (int i = 0; i < 10000; i++) {
n = read(clients[i], buf, size); // Non-blocking
if (n > 0) handle_data(i, buf, n);
}
}
This works but is terribly inefficient:
We need a way to wait efficiently for any of the 10,000 sockets to become ready.
I/O multiplexing provides a way to:
The kernel does the work of monitoring all registered descriptors. It wakes your thread only when at least one descriptor is ready, and tells you exactly which ones.
The fundamental operation:
multiplex(descriptors[], timeout) → ready_descriptors[]
This operation:
The specifics vary between mechanisms (select, poll, epoll, kqueue), but this is the core abstraction.
| Aspect | Thread-per-Connection | Polling | I/O Multiplexing |
|---|---|---|---|
| CPU when idle | Low (threads sleep) | 100% (busy loop) | Near zero (blocked) |
| Memory usage | Very high (stacks) | Low | Low |
| Latency to first ready | Good | Bad (check all first) | Good (immediate) |
| System calls/iteration | N/A | N (one per fd) | 1-2 (total) |
| Code complexity | Simple per-thread | Simple but inefficient | Event-driven pattern |
| Scalability | ~thousands | Low (CPU bound) | Millions possible |
123456789101112131415161718192021222324252627282930313233
// Conceptual I/O multiplexing patternint server_loop(int listen_fd, int *clients, int num_clients) { while (1) { // Build set of all descriptors we care about fd_set interest_set; build_interest_set(&interest_set, listen_fd, clients, num_clients); // Block until ANY descriptor is ready (or timeout) // This is the key - ONE system call, watches EVERYTHING int num_ready = select(max_fd + 1, &interest_set, NULL, NULL, NULL); if (num_ready < 0) { handle_error(); continue; } // Check which descriptors are ready // Only iterate over the ones we know are ready if (FD_ISSET(listen_fd, &interest_set)) { // New connection ready to accept int new_client = accept(listen_fd, NULL, NULL); // Won't block! add_client(new_client); } for (int i = 0; i < num_clients; i++) { if (FD_ISSET(clients[i], &interest_set)) { // This client has data ready - read won't block handle_client_data(clients[i]); } } }}I/O multiplexing operates on the readiness model: it tells you when a file descriptor is ready for an operation, not that the operation has completed.
What "ready" means:
Read-ready: At least one byte is available to read, OR EOF/error has occurred. A non-blocking read() will not return EAGAIN.
Write-ready: At least one byte can be written. A non-blocking write() will not return EAGAIN. (Note: sockets are usually write-ready unless the send buffer is full.)
Exception/priority data: Out-of-band data is available (for sockets) or an error condition exists.
Readiness vs completion:
Multiplexing tells you an operation can proceed without blocking—not that it has completed. You still need to perform the actual read() or write() after readiness is indicated.
12345678910111213141516171819202122232425262728
// The readiness model workflow // 1. Wait for readiness (this blocks)wait_for_readiness(fd); // 2. Perform I/O (this does NOT block because we know it's ready)ssize_t n = read(fd, buf, sizeof(buf)); // 3. Handle the resultif (n > 0) { // Got data process(buf, n);} else if (n == 0) { // EOF close(fd);} else { // Error (shouldn't be EAGAIN since we checked readiness) handle_error();} /* * Important: Readiness doesn't guarantee you'll get all data you want. * If you need 1000 bytes but only 100 are ready, you'll read 100. * For complete messages, you often need to: * 1. Accumulate data across multiple ready notifications * 2. Parse to find message boundaries * 3. Process complete messages only */Between checking readiness and performing I/O, readiness can change—especially in multi-threaded programs or if another thread reads from the same fd. In edge-triggered mode (epoll), you must read all available data before another notification occurs. Always use non-blocking I/O with multiplexing for safety.
Why non-blocking I/O pairs with multiplexing:
Multiplexing answers: "Which descriptors are ready?" Non-blocking I/O answers: "What happens if I try I/O and nothing's ready?"
Together they provide robust I/O handling:
| Event Type | Condition | Appropriate Action |
|---|---|---|
| Read ready (normal) | Data in buffer | read() - will return data |
| Read ready (EOF) | Connection closed by peer | read() returns 0 - close fd |
| Read ready (error) | Error condition | read() returns -1 - check errno |
| Write ready | Space in send buffer | write() - will accept some bytes |
| Write ready (connected) | TCP connect completed | getsockopt(SO_ERROR) to check result |
| Exception (OOB) | Urgent TCP data | recv(MSG_OOB) to read urgent data |
| Hangup | Peer closed write side | Can still read remaining data |
| Error | Socket error | Check with getsockopt(SO_ERROR) |
I/O multiplexing naturally leads to event-driven programming—a paradigm where the flow of the program is determined by events (I/O readiness, timers, signals) rather than sequential execution.
The event loop:
At the heart of event-driven programs is the event loop: an infinite loop that waits for events and dispatches them to handlers.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
#include <sys/epoll.h>#include <unistd.h>#include <stdio.h> #define MAX_EVENTS 64 typedef void (*event_handler)(int fd, void *data); struct event_data { event_handler read_handler; event_handler write_handler; void *user_data;}; /** * A simple event loop using epoll */void event_loop(int epoll_fd) { struct epoll_event events[MAX_EVENTS]; while (1) { // Wait for events (this is where we spend most time blocked) int num_events = epoll_wait(epoll_fd, events, MAX_EVENTS, -1); if (num_events < 0) { if (errno == EINTR) continue; // Signal interrupted us perror("epoll_wait"); break; } // Dispatch events to handlers for (int i = 0; i < num_events; i++) { struct event_data *data = events[i].data.ptr; if (events[i].events & EPOLLIN) { // Read event data->read_handler(events[i].data.fd, data->user_data); } if (events[i].events & EPOLLOUT) { // Write event data->write_handler(events[i].data.fd, data->user_data); } if (events[i].events & (EPOLLERR | EPOLLHUP)) { // Error or hangup handle_error(events[i].data.fd); } } // Process timers, deferred work, etc. process_timers(); process_deferred(); }}Characteristics of event-driven code:
Single-threaded concurrency — Many concurrent activities, one thread. No locks needed for most code.
Non-blocking handlers — Event handlers must not block; they do work and return quickly. Long operations must be broken into steps or offloaded.
State machines — Without blocking, complex protocols become explicit state machines. Each event advances the state.
Callback-based — Instead of "call read(), wait, process," you register "call this function when readable."
Inverse of control — The event loop controls execution flow, not your sequential code. This takes adjustment.
nginx, Node.js, Redis, libuv, Tornado, and many high-performance servers use event-driven architecture with I/O multiplexing. The pattern has proven itself at massive scale—Redis handles millions of operations per second with a single event loop thread.
I/O multiplexing mechanisms can notify you of readiness in two ways, with fundamentally different semantics:
Level-triggered (LT):
Edge-triggered (ET):
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
/* * Level-Triggered vs Edge-Triggered Behavior * * Scenario: 100 bytes arrive on a socket * * LEVEL-TRIGGERED: * ──────────────────────────────────────────── * 1. epoll_wait() returns: socket is readable * 2. You read 50 bytes * 3. epoll_wait() returns AGAIN: socket still readable (50 bytes left) * 4. You read 50 bytes * 5. epoll_wait() blocks: no more data * * LT is forgiving: read what you want, come back later. * * * EDGE-TRIGGERED: * ──────────────────────────────────────────── * 1. epoll_wait() returns: socket became readable (edge!) * 2. You read 50 bytes * 3. epoll_wait() BLOCKS: no new edge occurred! * 4. You're stuck - 50 bytes remain unread forever! * * ET requires: read until EAGAIN, every time. */ // CORRECT edge-triggered read patternvoid et_read_handler(int fd) { char buffer[4096]; // MUST loop until EAGAIN - can't leave data! while (1) { ssize_t n = read(fd, buffer, sizeof(buffer)); if (n > 0) { // Got data, process it process_data(buffer, n); continue; // There might be more! } if (n == 0) { // EOF - connection closed close(fd); return; } // n < 0 if (errno == EAGAIN || errno == EWOULDBLOCK) { // No more data RIGHT NOW - safe to return // epoll will notify us when more arrives return; } // Actual error handle_error(fd); return; }} // WRONG edge-triggered read - WILL LOSE DATAvoid et_read_handler_BROKEN(int fd) { char buffer[4096]; // Only reads once! ssize_t n = read(fd, buffer, sizeof(buffer)); if (n > 0) { process_data(buffer, n); } // BUG: If more than 4096 bytes arrived, we lose the rest! // Edge-triggered won't notify again for data already present.}| Aspect | Level-Triggered | Edge-Triggered |
|---|---|---|
| Notifications | Repeated while condition true | Once per state change |
| Partial reads | Safe - notified again | Unsafe - must drain completely |
| Programming model | Simpler | More complex |
| Performance | More syscalls (repeated notifies) | Fewer syscalls |
| Risk of starvation | Low (repeated chances) | High (miss event = stuck) |
| Default in epoll | Yes (EPOLLLT) | Must specify EPOLLET |
| select/poll | Level-triggered only | Not available |
Edge-triggered epoll is faster but unforgiving. If you don't read all data, you won't be notified again. If you don't handle write-readiness correctly, you can get stuck. Start with level-triggered; switch to edge-triggered only when you need the performance and understand the implications.
While network servers are the classic use case for I/O multiplexing, these mechanisms work with any file descriptor:
Pipes and FIFOs: Multiplexing can wait for data on pipes, enabling parent-child communication patterns and producer-consumer architectures.
Terminal input: Watch standard input for user commands while also handling network events—common in interactive CLI tools.
Signals (signalfd): On Linux, signals can be converted to file descriptor events via signalfd(), allowing unified event handling.
Timers (timerfd): Linux's timerfd_create() makes timers into file descriptors. Combine with epoll for efficient timeout handling.
File system events (inotify): inotify provides file system change notifications as file descriptor events.
Event notification (eventfd): A simple counting semaphore as a file descriptor—useful for thread communication in event loops.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
#include <sys/epoll.h>#include <sys/signalfd.h>#include <sys/timerfd.h>#include <signal.h>#include <unistd.h> /** * Unified event handling: sockets, signals, timers * All through epoll! */int setup_unified_event_loop() { int epoll_fd = epoll_create1(0); // 1. Add listening socket int listen_fd = create_listen_socket(8080); add_to_epoll(epoll_fd, listen_fd, EPOLLIN); // 2. Add signal handling via signalfd sigset_t mask; sigemptyset(&mask); sigaddset(&mask, SIGINT); sigaddset(&mask, SIGTERM); sigprocmask(SIG_BLOCK, &mask, NULL); // Block normal delivery int signal_fd = signalfd(-1, &mask, SFD_NONBLOCK); add_to_epoll(epoll_fd, signal_fd, EPOLLIN); // 3. Add periodic timer int timer_fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK); struct itimerspec timer_spec = { .it_interval = { .tv_sec = 1, .tv_nsec = 0 }, // 1 second repeat .it_value = { .tv_sec = 1, .tv_nsec = 0 } }; timerfd_settime(timer_fd, 0, &timer_spec, NULL); add_to_epoll(epoll_fd, timer_fd, EPOLLIN); // 4. Add stdin for interactive commands add_to_epoll(epoll_fd, STDIN_FILENO, EPOLLIN); // Now epoll_wait handles: // - New network connections // - Shutdown signals (SIGINT, SIGTERM) // - Periodic housekeeping (timer) // - User commands (stdin) // All in one unified loop! return epoll_fd;} void event_loop(int epoll_fd) { struct epoll_event events[64]; while (running) { int n = epoll_wait(epoll_fd, events, 64, -1); for (int i = 0; i < n; i++) { int fd = events[i].data.fd; if (fd == signal_fd) { // Read signal info struct signalfd_siginfo info; read(signal_fd, &info, sizeof(info)); printf("Received signal %d", info.ssi_signo); running = 0; // Shutdown } else if (fd == timer_fd) { // Timer expired - do periodic work uint64_t expirations; read(timer_fd, &expirations, sizeof(expirations)); do_housekeeping(); } else if (fd == listen_fd) { // New connection accept_connection(listen_fd); } else if (fd == STDIN_FILENO) { // User command handle_user_input(); } else { // Client data handle_client(fd); } } }}The Unix philosophy of 'everything is a file' really shines with multiplexing. Signals, timers, file change notifications—all become file descriptors that epoll can watch. This unification simplifies event-driven programming enormously.
Unix-like systems provide several I/O multiplexing mechanisms, each with different tradeoffs. The next section covers select, poll, and epoll in depth, but here's an overview:
| Mechanism | Origin | Max FDs | Performance | Portability | API Style |
|---|---|---|---|---|---|
| select() | BSD (1983) | 1024 (FD_SETSIZE) | O(n) per call | Universal | fd_set bitmasks |
| poll() | SVR4 (1986) | Unlimited | O(n) per call | POSIX | Array of pollfd |
| epoll | Linux 2.6 (2002) | Unlimited | O(1) for events | Linux only | Separate create/wait |
| kqueue | FreeBSD (2000) | Unlimited | O(1) for events | BSD/macOS | Kevent structure |
| IOCP | Windows NT | Unlimited | O(1) completion | Windows | Completion ports |
Evolution of mechanisms:
select() (1983) — The original. Simple but limited to 1024 fds and O(n) scanning.
poll() (1986) — Removed the fd limit but still O(n). Slightly cleaner API.
epoll (2002) — Linux's answer to scale. O(1) event delivery, handles millions of fds.
kqueue (2000) — BSD's equivalent to epoll. Slightly more flexible API.
io_uring (2019) — Goes beyond multiplexing to true async I/O, but can also be used for notification.
Choosing a mechanism:
Most production systems don't use raw select/poll/epoll. Libraries like libevent, libev, and libuv provide a portable abstraction over platform-specific mechanisms. They use epoll on Linux, kqueue on BSD/macOS, and IOCP on Windows.
I/O multiplexing is the technique that enables scalable, event-driven systems. Let's consolidate the key concepts:
What's next:
Now that we understand the conceptual foundation of I/O multiplexing, we'll dive into the specific mechanisms: select, poll, and epoll. We'll see their APIs, understand their performance characteristics, and learn when to use each one.
You now understand I/O multiplexing—the technique that enables one thread to efficiently handle many concurrent I/O sources. You grasp the readiness model, event-driven programming, and level vs edge triggering. Next, we'll explore select, poll, and epoll in practical detail.