Operating SystemsConcurrent Programming

Asynchronous Programming

LevelIntermediate

Duration90 mins

TopicConcurrent Programming

1 / 5

Asynchronous Operations

The Waiting Problem

Imagine a restaurant where the chef prepares each dish one at a time—taking an order, cooking it completely, plating it, and serving it—before even looking at the next order. During a busy evening, customers would wait hours as the chef blocks on each individual dish.

Now imagine a different model: the chef takes multiple orders, starts several dishes simultaneously, checks on cooking timers, delegates prep tasks to sous chefs, and orchestrates completion across many concurrent activities. The throughput increases dramatically without adding more chefs.

This is the fundamental distinction between synchronous and asynchronous operation. In computing, the synchronous model where operations block until completion creates the same bottleneck as our first chef. Asynchronous operations allow systems to initiate multiple activities, continue working on other tasks, and respond when those activities complete—dramatically improving throughput and responsiveness.

What You Will Learn

By the end of this page, you will understand what asynchronous operations are at the operating system level, how they differ from synchronous operations, the distinction between blocking and non-blocking I/O, and why async programming has become essential for building scalable, responsive systems.

Synchronous vs Asynchronous Execution

At the most fundamental level, synchronous and asynchronous describe how operations relate to the flow of program execution.

Synchronous execution means the caller initiates an operation and waits—the calling thread is suspended until the operation completes. The word 'synchronous' derives from Greek roots meaning 'same time'—the caller and the operation proceed together, in lockstep. The caller cannot proceed until the operation returns.

Asynchronous execution means the caller initiates an operation and continues immediately—the operation proceeds independently while the caller does other work. The word 'asynchronous' means 'not at the same time'—the caller and the operation proceed on separate timelines. The caller is notified of completion through some mechanism (callback, polling, or signal).

Synchronous vs Asynchronous: Key Distinctions
Characteristic	Synchronous	Asynchronous
Caller Behavior	Blocks until operation completes	Continues immediately after initiating
Thread Utilization	Thread is idle during wait	Thread remains productive
Control Flow	Linear, sequential	Non-linear, requires coordination
Result Retrieval	Return value from function call	Callback, future, or polling
Error Handling	Traditional try-catch	Callback parameters or future inspection
Code Complexity	Simple, intuitive flow	More complex, requires explicit coordination
Scalability	Limited by thread availability	Can handle many concurrent operations

A concrete example—reading a file:

Synchronous approach:

data = read_file("/path/to/file")  // Thread blocks here
process(data)                       // Executes after read completes

Asynchronous approach:

read_file_async("/path/to/file", callback=process)  // Returns immediately
do_other_work()                                       // Executes while read proceeds
// Later: process() is called when read completes

In the synchronous model, do_other_work() cannot execute until read_file completes. In the asynchronous model, do_other_work() executes immediately, and process() is invoked later when the file read finishes.

Mental Model: Restaurant Orders

Think of synchronous as placing an order and standing at the counter until it's ready (blocking). Asynchronous is placing an order, getting a buzzer, and sitting down to chat with friends—when the buzzer goes off, you pick up your food. The total preparation time is the same, but your time utilization is dramatically different.

Understanding Blocking Operations

To truly understand asynchronous programming, we must first understand what blocking means at the operating system level.

When a process or thread makes a blocking system call, the kernel moves that thread from the ready queue to a wait queue associated with the resource it's waiting for. The thread is no longer scheduled for CPU time—it's entirely suspended until the awaited event occurs.

Common blocking operations include:

Disk I/O: Reading from or writing to files (waiting for disk head movement, sector reads)
Network I/O: Waiting for data from a socket (packets traveling over the network)
User Input: Waiting for keyboard, mouse, or other input devices
Synchronization: Waiting on a lock, semaphore, or condition variable
Sleep: Explicitly waiting for a timer to expire

blocking_syscall.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Example of blocking system calls in C
 
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
 
int main() {
    char buffer[1024];
    ssize_t bytes_read;
    
    // This call BLOCKS the thread until data is available
    // The thread is moved to a wait queue by the kernel
    printf("About to block on read...\n");
    
    bytes_read = read(STDIN_FILENO, buffer, sizeof(buffer));
    
    // Execution resumes here ONLY after read() completes
    // Thread was not consuming CPU during the wait
    printf("Read %zd bytes\n", bytes_read);
    
    return 0;
}
 
/*
 * What happens at the OS level during the blocking read():
 * 
 * 1. User process calls read() system call
 * 2. Kernel checks if data is available in the input buffer
 * 3. If no data: kernel moves thread to TASK_INTERRUPTIBLE state
 * 4. Thread is added to a wait queue for the terminal device
 * 5. Scheduler selects another thread to run (context switch)
 * 6. When data arrives: device driver signals the wait queue
 * 7. Kernel moves thread back to TASK_RUNNING state
 * 8. Thread is added back to the run queue
 * 9. Eventually scheduler runs the thread again
 * 10. read() returns with the data
 */

The kernel wait queue mechanism:

The kernel maintains wait queues for every resource that can cause blocking. When a thread blocks:

State transition: Thread state changes from RUNNING to INTERRUPTIBLE (or UNINTERRUPTIBLE for non-abortable waits)
Queue insertion: Thread is added to the wait queue for the specific resource
Context switch: Scheduler immediately selects another ready thread
Zero CPU consumption: Blocked thread uses no CPU cycles while waiting

When the awaited event occurs (e.g., I/O completion):

Interrupt handler: Hardware interrupt notifies the kernel
Wake up: Kernel walks the wait queue and wakes all waiting threads
State transition: Threads move from INTERRUPTIBLE to RUNNING
Scheduling: Awakened threads compete for CPU time normally

Why Blocking Isn't Always Bad

Blocking operations are efficient from a CPU perspective—a blocked thread consumes zero CPU cycles. The problem isn't CPU waste but thread waste. Each blocked thread ties up kernel resources (stack, scheduling structures, memory mappings). With limited threads available, blocking limits concurrency.

Non-Blocking I/O

Non-blocking I/O takes a different approach: instead of suspending the caller when an operation cannot complete immediately, the system call returns immediately with an indication that the operation is not yet complete.

The caller can then:

Retry the operation later (polling)
Proceed with other work and check periodically
Use an event notification mechanism to learn when to retry

The key insight: Non-blocking I/O separates the initiation of an operation from the completion. The thread remains schedulable and can make progress on other tasks.

non_blocking_io.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
 
int main() {
    char buffer[1024];
    ssize_t bytes_read;
    
    // Set stdin to non-blocking mode
    int flags = fcntl(STDIN_FILENO, F_GETFL, 0);
    fcntl(STDIN_FILENO, F_SETFL, flags | O_NONBLOCK);
    
    // Now read() returns immediately even if no data is available
    bytes_read = read(STDIN_FILENO, buffer, sizeof(buffer));
    
    if (bytes_read == -1 && errno == EAGAIN) {
        // EAGAIN means "no data available, try again later"
        // The thread is NOT blocked - we can do other work!
        printf("No data available - would have blocked\n");
        printf("But thread continues executing...\n");
        
        // Do some other useful work here
        do_other_computation();
        
        // Maybe try again later
    } else if (bytes_read > 0) {
        printf("Read %zd bytes immediately\n", bytes_read);
    }
    
    return 0;
}
 
/*
 * Non-blocking I/O characteristics:
 * 
 * - Returns immediately, never suspends the thread
 * - Returns EAGAIN/EWOULDBLOCK if operation would block
 * - Caller is responsible for retrying at appropriate times
 * - Enables single thread to handle multiple I/O streams
 * - Foundation for event-driven architectures
 */

EAGAIN and EWOULDBLOCK:

These error codes are central to non-blocking I/O. They indicate that the operation cannot complete right now but is not an error—the caller should try again later when conditions change.

EAGAIN: "Try again" - temporary resource unavailability
EWOULDBLOCK: "This would block" - same meaning, some systems use this

On most modern systems, these are the same value. The semantics: "What you asked for isn't ready, but it's not your fault and nothing's broken."

Non-blocking connects vs reads:

Different operations have different non-blocking behaviors:

Operation	Blocking Behavior	Non-Blocking Behavior
`read()`	Waits for data	Returns EAGAIN if no data
`write()`	Waits for buffer space	Returns EAGAIN if buffer full
`accept()`	Waits for connection	Returns EAGAIN if no pending connections
`connect()`	Waits for handshake	Returns EINPROGRESS, must poll for completion

The Polling Problem

Naive non-blocking I/O with constant polling (busy-waiting) wastes CPU cycles checking if operations are ready. This is worse than blocking! The solution is event-driven I/O multiplexing (select, poll, epoll, kqueue) which we cover in the Event Loops page.

Asynchronous I/O (True Async)

There's an important distinction between non-blocking I/O and true asynchronous I/O (AIO). They're often conflated, but the mechanisms differ significantly.

Non-blocking I/O tells you that an operation would block, returning immediately. You must still perform the actual I/O operation yourself, typically when an event notifies you that the descriptor is ready.

Asynchronous I/O initiates the entire operation in the kernel, which completes it in the background. The kernel notifies you when the data is already transferred—no additional read/write call is needed.

Think of it this way:

Blocking: You ask someone to pour you coffee and wait for them.
Non-blocking: You check if coffee is ready; if not, you walk away and check later.
Asynchronous: You request coffee, go do other things, and someone brings it to you when ready.

I/O Models Compared
Model	Initiation	Completion	Notification	CPU Efficiency
Blocking	Synchronous	After wait	Function return	Good (thread sleeps)
Non-blocking	Immediate	Polling required	Poll result or event	Depends on polling strategy
Async I/O	Submit to kernel	Kernel performs I/O	Signal, callback, or queue	Excellent

posix_aio_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <stdio.h>
#include <stdlib.h>
#include <aio.h>
#include <fcntl.h>
#include <errno.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
 
#define BUFFER_SIZE 65536
 
// Callback function invoked when AIO completes
void aio_completion_handler(sigval_t sigval) {
    struct aiocb *req = (struct aiocb *)sigval.sival_ptr;
    
    // Check the result of the async operation
    int status = aio_error(req);
    if (status == 0) {
        ssize_t bytes = aio_return(req);
        printf("AIO completed: read %zd bytes\n", bytes);
        // Data is NOW in the buffer - kernel already did the transfer
        process_data((char *)req->aio_buf, bytes);
    } else {
        printf("AIO error: %s\n", strerror(status));
    }
}
 
int main() {
    int fd = open("largefile.dat", O_RDONLY);
    if (fd < 0) { perror("open"); exit(1); }
    
    // Allocate buffer for the data
    char *buffer = malloc(BUFFER_SIZE);
    
    // Set up the async I/O control block
    struct aiocb cb;
    memset(&cb, 0, sizeof(cb));
    
    cb.aio_fildes = fd;          // File descriptor
    cb.aio_buf = buffer;         // Buffer for data
    cb.aio_nbytes = BUFFER_SIZE; // How much to read
    cb.aio_offset = 0;           // Where in file to read
    
    // Set up notification via thread callback
    cb.aio_sigevent.sigev_notify = SIGEV_THREAD;
    cb.aio_sigevent.sigev_notify_function = aio_completion_handler;
    cb.aio_sigevent.sigev_notify_attributes = NULL;
    cb.aio_sigevent.sigev_value.sival_ptr = &cb;
    
    // Submit the async read - THIS RETURNS IMMEDIATELY
    // The kernel will perform the read in the background
    if (aio_read(&cb) == -1) {
        perror("aio_read");
        exit(1);
    }
    
    printf("AIO submitted - doing other work...\n");
    
    // We can now do OTHER work while the kernel reads the file
    // This is the key advantage of true async I/O
    for (int i = 0; i < 10; i++) {
        printf("Working on task %d while read progresses...\n", i);
        do_computation(i);
    }
    
    // Callback will fire automatically when I/O completes
    // In production: proper event loop or main thread coordination
    sleep(2); // Simplified: wait for completion
    
    free(buffer);
    close(fd);
    return 0;
}

Linux AIO implementations:

Linux has evolved through several async I/O implementations:

1. POSIX AIO (aio_read/aio_write)

Portable but often implemented with thread pools
Not true kernel-level async on many systems
Signal or thread-based notification

2. Native Linux AIO (io_submit/io_getevents)

True kernel-level async I/O
Works well for direct I/O (O_DIRECT)
Limited for buffered I/O

3. io_uring (Linux 5.1+)

Modern, highly efficient async I/O
Submission and completion queues in shared memory
Near-zero overhead for high-frequency operations
Supports far more operation types than previous APIs

io_uring: The Future of Linux Async I/O

io_uring represents a paradigm shift in Linux I/O. By using ring buffers in shared memory between user and kernel space, it eliminates system call overhead for submissions and completions. A single io_uring can handle thousands of concurrent operations with minimal CPU usage.

Why Async Matters: The C10K Problem

The importance of asynchronous programming crystallized around the C10K problem—the challenge of handling 10,000 concurrent connections on a single server. This problem, articulated by Dan Kegel in 1999, exposed the fundamental limitations of thread-per-connection architectures.

The thread-per-connection model:

In traditional server designs, each client connection gets its own thread:

while (true) {
    client = accept(server_socket);  // Block waiting for connection
    spawn_thread(handle_client, client);
}

void handle_client(socket) {
    while (true) {
        request = read(socket);  // BLOCKS waiting for client
        response = process(request);
        write(socket, response); // BLOCKS until buffer available
    }
}

Why this breaks at scale:

Thread-Per-Connection Scalability Limits

•Memory overhead: Each thread requires a stack (often 1-8 MB by default). 10,000 threads = 10-80 GB of stack memory alone.
•Context switch cost: With 10,000 threads competing for finite CPUs, the OS spends significant time just switching between them.
•Scheduling overhead: O(n) or O(log n) scheduling decisions for each context switch across 10,000 threads.
•Kernel structures: Each thread has associated kernel data structures (task_struct in Linux: ~6KB each).
•Cache pollution: Frequent context switches thrash CPU caches, destroying performance.

The math of thread limits:

Threads	Stack Size (8MB default)	Total Stack Memory	Kernel Overhead (~6KB/thread)
100	800 MB	~1 GB	600 KB
1,000	8 GB	~10 GB	6 MB
10,000	80 GB	~100 GB	60 MB
100,000	800 GB	Impossible	600 MB

Even with reduced stack sizes (256KB), 10,000 threads consume 2.5 GB of stack memory. More critically, most of these threads are blocked most of the time, doing nothing but waiting for slow network I/O.

The async solution:

Asynchronous I/O inverts the model. Instead of one thread per connection, a small pool of threads (often equal to CPU cores) handles all connections:

while (true) {
    events = wait_for_events();  // Single syscall monitors ALL connections
    for event in events {
        if (event.type == READ_READY) {
            data = non_blocking_read(event.socket);
            process_data(event.socket, data);
        }
        if (event.type == WRITE_READY) {
            send_pending_data(event.socket);
        }
    }
}

This event-driven approach means:

4 threads can handle 100,000 connections
Memory usage stays bounded (~300 bytes per connection state, not 8 MB)
No context-switch overhead between connection handlers
CPU time spent on actual work, not scheduling

The Modern Reality: C10M

Today's challenge isn't C10K but C10M—10 million concurrent connections. Systems like nginx and HAProxy achieve this through aggressive use of async I/O, event loops, and careful memory management. A single modern server can handle more connections than entire data centers could in 1999.

Async in Different Programming Languages

Different programming languages provide varying levels of async support, from manual callback management to sophisticated runtime systems. Understanding these differences helps you choose the right tool and understand what's happening beneath abstractions.

async_c.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// C: Manual async with epoll (Linux)
// Explicit state machine management required
 
#include <sys/epoll.h>
#include <fcntl.h>
#include <unistd.h>
 
#define MAX_EVENTS 1024
 
typedef struct {
    int fd;
    int state;  // Manual state tracking
    char buffer[4096];
    size_t bytes_read;
} Connection;
 
int main() {
    int epoll_fd = epoll_create1(0);
    struct epoll_event events[MAX_EVENTS];
    
    while (1) {
        int nfds = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
        for (int i = 0; i < nfds; i++) {
            Connection *conn = events[i].data.ptr;
            
            // Manual state machine for each connection
            switch (conn->state) {
                case STATE_READING:
                    handle_read(conn);
                    break;
                case STATE_PROCESSING:
                    handle_process(conn);
                    break;
                case STATE_WRITING:
                    handle_write(conn);
                    break;
            }
        }
    }
}
 
// Characteristics:
// - Maximum control and performance
// - Significant boilerplate and complexity
// - Manual memory and state management
// - Used in: nginx, Redis, high-performance servers

Language Runtime vs OS

All these language-level async features ultimately rely on OS-level async primitives (epoll, kqueue, IOCP). The language runtime translates high-level async constructs into efficient system calls. Understanding the OS layer helps you debug performance issues that transcend any single language.

Async vs Parallel: An Important Distinction

A common source of confusion is conflating asynchronous with parallel. These are related but distinct concepts.

Asynchronous: Operations that don't block the calling thread. The operations may or may not run simultaneously—the key property is that the caller continues without waiting.

Parallel: Operations that run simultaneously on multiple CPU cores. This requires multiple threads or processes executing at the same instant.

The key insight: Async is about waiting efficiently, while parallel is about computing simultaneously.

Asynchronous

•Single thread can handle many operations
•Efficient for I/O-bound work
•Operations interleave, not overlap
•No additional CPU resources needed
•Reduces waiting, not computing time
•Example: Web server handling 10K connections

Parallel

•Multiple threads/cores required
•Efficient for CPU-bound work
•Operations truly overlap in time
•Requires multi-core hardware
•Reduces computation time
•Example: Rendering frames in a game

Single-threaded async example:

Time: |--0ms--|--1ms--|--2ms--|--3ms--|--4ms--|--5ms--|--6ms--|--7ms--|

Thread 1:
  [Start A] [Start B] [Start C]   [A done]  [B done]  [C done]
       |         |         |          |          |         |
       +---------|---------|----------+          |         |
                 +---------|---------------------+         |
                           +-------------------------------+

                    ↓ All I/O operations overlap in wall-clock time
                    ↓ But only ONE thread is executing at any moment

Multi-threaded parallel example:

Time: |--0ms--|--1ms--|--2ms--|--3ms--|

Thread 1: [=====COMPUTE A=====]
Thread 2: [=====COMPUTE B=====]
Thread 3: [=====COMPUTE C=====]

            ↓ All three threads execute SIMULTANEOUSLY
            ↓ Uses 3 CPU cores at once

Combined async + parallel:

Modern systems often combine both:

Use async for I/O operations (network, disk)
Use parallelism for CPU-intensive work
Example: Web server with async I/O that dispatches CPU-bound tasks to a thread pool

When to Use Which

Use async when your bottleneck is waiting (I/O-bound): network requests, database queries, file operations. Use parallelism when your bottleneck is computation (CPU-bound): image processing, cryptography, simulations. Use both when you have mixed workloads.

Summary: Asynchronous Operations

We've established the foundation of asynchronous programming at the operating system level. Let's consolidate the key concepts:

Key Takeaways

•Synchronous = Blocking: The caller waits until the operation completes. Simple but limits concurrency.
•Asynchronous = Non-blocking: The caller continues immediately. More complex but enables high concurrency.
•Blocking I/O suspends threads in kernel wait queues—efficient for CPU but ties up threads.
•Non-blocking I/O returns immediately with EAGAIN if not ready—requires polling or event notification.
•True Async I/O (AIO, io_uring) submits operations to the kernel which completes them in the background.
•The C10K problem drove async adoption—10,000 connections cannot each have a dedicated thread.
•Async ≠ Parallel: Async is about waiting efficiently; parallel is about computing simultaneously.

What's next:

With async operations understood, we need a mechanism to be notified when operations complete. The oldest and most fundamental pattern is the callback—a function passed to an async operation that's invoked upon completion. The next page explores callbacks in depth: their mechanics, patterns, and the problems they introduce.

Page Complete

You now understand asynchronous operations at the OS level—the distinction between sync/async, blocking/non-blocking, and why async programming enables the high-concurrency systems that power the modern web. Next, we'll explore the callback pattern that makes async orchestration possible.

1 / 5

Loading learning content...

Operating SystemsConcurrent Programming

Asynchronous Programming

LevelIntermediate

Duration90 mins

TopicConcurrent Programming

1 / 5

Asynchronous Operations

The Waiting Problem

What You Will Learn

Synchronous vs Asynchronous Execution

At the most fundamental level, synchronous and asynchronous describe how operations relate to the flow of program execution.

Synchronous vs Asynchronous: Key Distinctions
Characteristic	Synchronous	Asynchronous
Caller Behavior	Blocks until operation completes	Continues immediately after initiating
Thread Utilization	Thread is idle during wait	Thread remains productive
Control Flow	Linear, sequential	Non-linear, requires coordination
Result Retrieval	Return value from function call	Callback, future, or polling
Error Handling	Traditional try-catch	Callback parameters or future inspection
Code Complexity	Simple, intuitive flow	More complex, requires explicit coordination
Scalability	Limited by thread availability	Can handle many concurrent operations

A concrete example—reading a file:

Synchronous approach:

data = read_file("/path/to/file")  // Thread blocks here
process(data)                       // Executes after read completes

Asynchronous approach:

read_file_async("/path/to/file", callback=process)  // Returns immediately
do_other_work()                                       // Executes while read proceeds
// Later: process() is called when read completes

Mental Model: Restaurant Orders

Understanding Blocking Operations

To truly understand asynchronous programming, we must first understand what blocking means at the operating system level.

Common blocking operations include:

Disk I/O: Reading from or writing to files (waiting for disk head movement, sector reads)
Network I/O: Waiting for data from a socket (packets traveling over the network)
User Input: Waiting for keyboard, mouse, or other input devices
Synchronization: Waiting on a lock, semaphore, or condition variable
Sleep: Explicitly waiting for a timer to expire

blocking_syscall.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Example of blocking system calls in C
 
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
 
int main() {
    char buffer[1024];
    ssize_t bytes_read;
    
    // This call BLOCKS the thread until data is available
    // The thread is moved to a wait queue by the kernel
    printf("About to block on read...\n");
    
    bytes_read = read(STDIN_FILENO, buffer, sizeof(buffer));
    
    // Execution resumes here ONLY after read() completes
    // Thread was not consuming CPU during the wait
    printf("Read %zd bytes\n", bytes_read);
    
    return 0;
}
 
/*
 * What happens at the OS level during the blocking read():
 * 
 * 1. User process calls read() system call
 * 2. Kernel checks if data is available in the input buffer
 * 3. If no data: kernel moves thread to TASK_INTERRUPTIBLE state
 * 4. Thread is added to a wait queue for the terminal device
 * 5. Scheduler selects another thread to run (context switch)
 * 6. When data arrives: device driver signals the wait queue
 * 7. Kernel moves thread back to TASK_RUNNING state
 * 8. Thread is added back to the run queue
 * 9. Eventually scheduler runs the thread again
 * 10. read() returns with the data
 */

The kernel wait queue mechanism:

The kernel maintains wait queues for every resource that can cause blocking. When a thread blocks:

State transition: Thread state changes from RUNNING to INTERRUPTIBLE (or UNINTERRUPTIBLE for non-abortable waits)
Queue insertion: Thread is added to the wait queue for the specific resource
Context switch: Scheduler immediately selects another ready thread
Zero CPU consumption: Blocked thread uses no CPU cycles while waiting

When the awaited event occurs (e.g., I/O completion):

Interrupt handler: Hardware interrupt notifies the kernel
Wake up: Kernel walks the wait queue and wakes all waiting threads
State transition: Threads move from INTERRUPTIBLE to RUNNING
Scheduling: Awakened threads compete for CPU time normally

Why Blocking Isn't Always Bad

Non-Blocking I/O

The caller can then:

Retry the operation later (polling)
Proceed with other work and check periodically
Use an event notification mechanism to learn when to retry

The key insight: Non-blocking I/O separates the initiation of an operation from the completion. The thread remains schedulable and can make progress on other tasks.

non_blocking_io.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
 
int main() {
    char buffer[1024];
    ssize_t bytes_read;
    
    // Set stdin to non-blocking mode
    int flags = fcntl(STDIN_FILENO, F_GETFL, 0);
    fcntl(STDIN_FILENO, F_SETFL, flags | O_NONBLOCK);
    
    // Now read() returns immediately even if no data is available
    bytes_read = read(STDIN_FILENO, buffer, sizeof(buffer));
    
    if (bytes_read == -1 && errno == EAGAIN) {
        // EAGAIN means "no data available, try again later"
        // The thread is NOT blocked - we can do other work!
        printf("No data available - would have blocked\n");
        printf("But thread continues executing...\n");
        
        // Do some other useful work here
        do_other_computation();
        
        // Maybe try again later
    } else if (bytes_read > 0) {
        printf("Read %zd bytes immediately\n", bytes_read);
    }
    
    return 0;
}
 
/*
 * Non-blocking I/O characteristics:
 * 
 * - Returns immediately, never suspends the thread
 * - Returns EAGAIN/EWOULDBLOCK if operation would block
 * - Caller is responsible for retrying at appropriate times
 * - Enables single thread to handle multiple I/O streams
 * - Foundation for event-driven architectures
 */

EAGAIN and EWOULDBLOCK:

These error codes are central to non-blocking I/O. They indicate that the operation cannot complete right now but is not an error—the caller should try again later when conditions change.

EAGAIN: "Try again" - temporary resource unavailability
EWOULDBLOCK: "This would block" - same meaning, some systems use this

On most modern systems, these are the same value. The semantics: "What you asked for isn't ready, but it's not your fault and nothing's broken."

Non-blocking connects vs reads:

Different operations have different non-blocking behaviors:

Operation	Blocking Behavior	Non-Blocking Behavior
`read()`	Waits for data	Returns EAGAIN if no data
`write()`	Waits for buffer space	Returns EAGAIN if buffer full
`accept()`	Waits for connection	Returns EAGAIN if no pending connections
`connect()`	Waits for handshake	Returns EINPROGRESS, must poll for completion

The Polling Problem

Asynchronous I/O (True Async)

There's an important distinction between non-blocking I/O and true asynchronous I/O (AIO). They're often conflated, but the mechanisms differ significantly.

Think of it this way:

Blocking: You ask someone to pour you coffee and wait for them.
Non-blocking: You check if coffee is ready; if not, you walk away and check later.
Asynchronous: You request coffee, go do other things, and someone brings it to you when ready.

I/O Models Compared
Model	Initiation	Completion	Notification	CPU Efficiency
Blocking	Synchronous	After wait	Function return	Good (thread sleeps)
Non-blocking	Immediate	Polling required	Poll result or event	Depends on polling strategy
Async I/O	Submit to kernel	Kernel performs I/O	Signal, callback, or queue	Excellent

posix_aio_example.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <stdio.h>
#include <stdlib.h>
#include <aio.h>
#include <fcntl.h>
#include <errno.h>
#include <signal.h>
#include <string.h>
#include <unistd.h>
 
#define BUFFER_SIZE 65536
 
// Callback function invoked when AIO completes
void aio_completion_handler(sigval_t sigval) {
    struct aiocb *req = (struct aiocb *)sigval.sival_ptr;
    
    // Check the result of the async operation
    int status = aio_error(req);
    if (status == 0) {
        ssize_t bytes = aio_return(req);
        printf("AIO completed: read %zd bytes\n", bytes);
        // Data is NOW in the buffer - kernel already did the transfer
        process_data((char *)req->aio_buf, bytes);
    } else {
        printf("AIO error: %s\n", strerror(status));
    }
}
 
int main() {
    int fd = open("largefile.dat", O_RDONLY);
    if (fd < 0) { perror("open"); exit(1); }
    
    // Allocate buffer for the data
    char *buffer = malloc(BUFFER_SIZE);
    
    // Set up the async I/O control block
    struct aiocb cb;
    memset(&cb, 0, sizeof(cb));
    
    cb.aio_fildes = fd;          // File descriptor
    cb.aio_buf = buffer;         // Buffer for data
    cb.aio_nbytes = BUFFER_SIZE; // How much to read
    cb.aio_offset = 0;           // Where in file to read
    
    // Set up notification via thread callback
    cb.aio_sigevent.sigev_notify = SIGEV_THREAD;
    cb.aio_sigevent.sigev_notify_function = aio_completion_handler;
    cb.aio_sigevent.sigev_notify_attributes = NULL;
    cb.aio_sigevent.sigev_value.sival_ptr = &cb;
    
    // Submit the async read - THIS RETURNS IMMEDIATELY
    // The kernel will perform the read in the background
    if (aio_read(&cb) == -1) {
        perror("aio_read");
        exit(1);
    }
    
    printf("AIO submitted - doing other work...\n");
    
    // We can now do OTHER work while the kernel reads the file
    // This is the key advantage of true async I/O
    for (int i = 0; i < 10; i++) {
        printf("Working on task %d while read progresses...\n", i);
        do_computation(i);
    }
    
    // Callback will fire automatically when I/O completes
    // In production: proper event loop or main thread coordination
    sleep(2); // Simplified: wait for completion
    
    free(buffer);
    close(fd);
    return 0;
}

Linux AIO implementations:

Linux has evolved through several async I/O implementations:

1. POSIX AIO (aio_read/aio_write)

Portable but often implemented with thread pools
Not true kernel-level async on many systems
Signal or thread-based notification

2. Native Linux AIO (io_submit/io_getevents)

True kernel-level async I/O
Works well for direct I/O (O_DIRECT)
Limited for buffered I/O

3. io_uring (Linux 5.1+)

Modern, highly efficient async I/O
Submission and completion queues in shared memory
Near-zero overhead for high-frequency operations
Supports far more operation types than previous APIs

io_uring: The Future of Linux Async I/O

Why Async Matters: The C10K Problem

The thread-per-connection model:

In traditional server designs, each client connection gets its own thread:

while (true) {
    client = accept(server_socket);  // Block waiting for connection
    spawn_thread(handle_client, client);
}

void handle_client(socket) {
    while (true) {
        request = read(socket);  // BLOCKS waiting for client
        response = process(request);
        write(socket, response); // BLOCKS until buffer available
    }
}

Why this breaks at scale:

Thread-Per-Connection Scalability Limits

•Memory overhead: Each thread requires a stack (often 1-8 MB by default). 10,000 threads = 10-80 GB of stack memory alone.
•Context switch cost: With 10,000 threads competing for finite CPUs, the OS spends significant time just switching between them.
•Scheduling overhead: O(n) or O(log n) scheduling decisions for each context switch across 10,000 threads.
•Kernel structures: Each thread has associated kernel data structures (task_struct in Linux: ~6KB each).
•Cache pollution: Frequent context switches thrash CPU caches, destroying performance.

The math of thread limits:

Threads	Stack Size (8MB default)	Total Stack Memory	Kernel Overhead (~6KB/thread)
100	800 MB	~1 GB	600 KB
1,000	8 GB	~10 GB	6 MB
10,000	80 GB	~100 GB	60 MB
100,000	800 GB	Impossible	600 MB

The async solution:

Asynchronous I/O inverts the model. Instead of one thread per connection, a small pool of threads (often equal to CPU cores) handles all connections:

while (true) {
    events = wait_for_events();  // Single syscall monitors ALL connections
    for event in events {
        if (event.type == READ_READY) {
            data = non_blocking_read(event.socket);
            process_data(event.socket, data);
        }
        if (event.type == WRITE_READY) {
            send_pending_data(event.socket);
        }
    }
}

This event-driven approach means:

4 threads can handle 100,000 connections
Memory usage stays bounded (~300 bytes per connection state, not 8 MB)
No context-switch overhead between connection handlers
CPU time spent on actual work, not scheduling

The Modern Reality: C10M

Async in Different Programming Languages

async_c.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// C: Manual async with epoll (Linux)
// Explicit state machine management required
 
#include <sys/epoll.h>
#include <fcntl.h>
#include <unistd.h>
 
#define MAX_EVENTS 1024
 
typedef struct {
    int fd;
    int state;  // Manual state tracking
    char buffer[4096];
    size_t bytes_read;
} Connection;
 
int main() {
    int epoll_fd = epoll_create1(0);
    struct epoll_event events[MAX_EVENTS];
    
    while (1) {
        int nfds = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
        for (int i = 0; i < nfds; i++) {
            Connection *conn = events[i].data.ptr;
            
            // Manual state machine for each connection
            switch (conn->state) {
                case STATE_READING:
                    handle_read(conn);
                    break;
                case STATE_PROCESSING:
                    handle_process(conn);
                    break;
                case STATE_WRITING:
                    handle_write(conn);
                    break;
            }
        }
    }
}
 
// Characteristics:
// - Maximum control and performance
// - Significant boilerplate and complexity
// - Manual memory and state management
// - Used in: nginx, Redis, high-performance servers

Language Runtime vs OS

Async vs Parallel: An Important Distinction

A common source of confusion is conflating asynchronous with parallel. These are related but distinct concepts.

Asynchronous: Operations that don't block the calling thread. The operations may or may not run simultaneously—the key property is that the caller continues without waiting.

Parallel: Operations that run simultaneously on multiple CPU cores. This requires multiple threads or processes executing at the same instant.

The key insight: Async is about waiting efficiently, while parallel is about computing simultaneously.

Asynchronous

•Single thread can handle many operations
•Efficient for I/O-bound work
•Operations interleave, not overlap
•No additional CPU resources needed
•Reduces waiting, not computing time
•Example: Web server handling 10K connections

Parallel

•Multiple threads/cores required
•Efficient for CPU-bound work
•Operations truly overlap in time
•Requires multi-core hardware
•Reduces computation time
•Example: Rendering frames in a game

Single-threaded async example:

Time: |--0ms--|--1ms--|--2ms--|--3ms--|--4ms--|--5ms--|--6ms--|--7ms--|

Thread 1:
  [Start A] [Start B] [Start C]   [A done]  [B done]  [C done]
       |         |         |          |          |         |
       +---------|---------|----------+          |         |
                 +---------|---------------------+         |
                           +-------------------------------+

                    ↓ All I/O operations overlap in wall-clock time
                    ↓ But only ONE thread is executing at any moment

Multi-threaded parallel example:

Time: |--0ms--|--1ms--|--2ms--|--3ms--|

Thread 1: [=====COMPUTE A=====]
Thread 2: [=====COMPUTE B=====]
Thread 3: [=====COMPUTE C=====]

            ↓ All three threads execute SIMULTANEOUSLY
            ↓ Uses 3 CPU cores at once

Combined async + parallel:

Modern systems often combine both:

Use async for I/O operations (network, disk)
Use parallelism for CPU-intensive work
Example: Web server with async I/O that dispatches CPU-bound tasks to a thread pool

When to Use Which

Summary: Asynchronous Operations

We've established the foundation of asynchronous programming at the operating system level. Let's consolidate the key concepts:

Key Takeaways

•Synchronous = Blocking: The caller waits until the operation completes. Simple but limits concurrency.
•Asynchronous = Non-blocking: The caller continues immediately. More complex but enables high concurrency.
•Blocking I/O suspends threads in kernel wait queues—efficient for CPU but ties up threads.
•Non-blocking I/O returns immediately with EAGAIN if not ready—requires polling or event notification.
•True Async I/O (AIO, io_uring) submits operations to the kernel which completes them in the background.
•The C10K problem drove async adoption—10,000 connections cannot each have a dedicated thread.
•Async ≠ Parallel: Async is about waiting efficiently; parallel is about computing simultaneously.

What's next:

Page Complete

1 / 5