Loading learning content...
We've traced the journey of network data through multiplexing at the sender, demultiplexing at the receiver, port identification, and connection identification. But there's one final step in this journey: process mapping—the mechanism by which the operating system delivers demultiplexed data to the correct application process.
When a TCP segment arrives and is matched to a socket, what happens next? The data doesn't magically appear in the application's memory. The operating system must:
This page explores the complete process mapping system—how sockets, file descriptors, processes, and the I/O subsystem work together to complete data delivery.
By the end of this page, you will understand how sockets connect to processes through file descriptors, how the operating system kernel manages socket-to-process mapping, how processes are notified of incoming data, and how various I/O models (blocking, non-blocking, multiplexed) affect network programming. You'll see the complete picture of data flow from network interface to application memory.
In Unix-like operating systems (Linux, macOS, BSD), sockets are represented as file descriptors. This elegant design means that network I/O uses the same interface as file I/O—read(), write(), close()—providing a unified programming model.
What is a File Descriptor?
A file descriptor (fd) is a small non-negative integer that serves as a handle to an open I/O resource. When a process opens a file, creates a socket, or opens a pipe, the kernel assigns a file descriptor and returns it to the process.
Process File Descriptor Table:
┌────┬──────────────────────────────┐
│ FD │ Resource │
├────┼──────────────────────────────┤
│ 0 │ stdin (standard input) │
│ 1 │ stdout (standard output) │
│ 2 │ stderr (standard error) │
│ 3 │ /var/log/app.log (file) │
│ 4 │ TCP socket (listen :8080) │
│ 5 │ TCP socket (connected) │
│ 6 │ UDP socket │
└────┴──────────────────────────────┘
The first three descriptors (0, 1, 2) are reserved by convention. New resources get the lowest available number.
The Unix philosophy 'everything is a file' extends to network sockets. This means tools like 'read' and 'write' work on sockets, shell redirection can involve sockets, and file-oriented utilities can often work with network data. This unification simplifies programming and enables powerful composition.
Socket Creation and File Descriptor Assignment:
// Create a TCP socket
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
// sockfd now holds a file descriptor, e.g., 3
// Bind to local address
bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
// Listen for connections
listen(sockfd, backlog);
// Accept a connection - creates NEW file descriptor
int clientfd = accept(sockfd, NULL, NULL);
// clientfd is a different fd, e.g., 4
// Now we have:
// sockfd (3) - listening socket
// clientfd (4) - connected socket for specific client
Windows Socket Handles:
Windows uses a different model—sockets are handles (SOCKET type), not file descriptors. They use different API calls (recv/send vs read/write) but the conceptual mapping is similar.
// Windows
SOCKET sock = socket(AF_INET, SOCK_STREAM, 0);
recv(sock, buffer, length, 0); // Not read()
send(sock, buffer, length, 0); // Not write()
closesocket(sock); // Not close()
When demultiplexing delivers data to a socket, it actually delivers to a kernel data structure. Understanding this structure reveals how process mapping works.
The Socket Kernel Object:
Each socket file descriptor references a kernel socket structure that contains:
12345678910111213141516171819202122232425262728293031323334353637383940414243
# Simplified socket kernel structure struct socket { # Identification and state int family; # AF_INET, AF_INET6 int type; # SOCK_STREAM, SOCK_DGRAM int protocol; # IPPROTO_TCP, IPPROTO_UDP # Process ownership struct file *file; # Pointer to file structure pid_t owner_pid; # Process ID of owner uid_t owner_uid; # User ID of owner # Address information struct sockaddr_in local_addr; # Local IP:port struct sockaddr_in remote_addr; # Remote IP:port (TCP) # Protocol-specific state union { struct tcp_sock *tcp; # TCP-specific data struct udp_sock *udp; # UDP-specific data } protocol_data; # I/O buffers struct sk_buff_head receive_queue; # Incoming data struct sk_buff_head send_queue; # Outgoing data # Wait queues for blocking I/O wait_queue_head_t wait; # Socket options struct socket_options opts;} struct file { # Links file descriptor to socket struct socket *socket; struct inode *inode; unsigned int f_flags; # O_NONBLOCK, etc. # Reference counting atomic_t f_count;}The Path from File Descriptor to Socket:
Process calls recv(fd=5, buffer, len)
│
▼
Kernel looks up fd 5 in process's file descriptor table
│
▼
File descriptor table entry points to struct file
│
▼
struct file contains pointer to struct socket
│
▼
struct socket contains receive_queue with buffered data
│
▼
Data is copied from receive_queue to user buffer
Key Insight:
The file descriptor is just an index. The actual socket state (buffers, addresses, etc.) lives in kernel memory, protected from direct user access. System calls like recv() provide the controlled interface to access this kernel state.
Kernel buffers allow data to arrive even when the application isn't ready. If a TCP segment arrives but the application hasn't called recv() yet, the data waits in the kernel's receive queue. This decouples network timing from application timing, essential for reliable operation.
Sockets are owned by processes. This ownership determines which process receives data, which process can send, and what happens when the process terminates.
Ownership Establishment:
A process owns a socket if:
Ownership and Fork:
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
bind(sockfd, ...);
listen(sockfd, ...);
pid_t child = fork();
if (child == 0) {
// Child process
// Inherits sockfd - can accept() connections
int clientfd = accept(sockfd, NULL, NULL);
} else {
// Parent process
// Also has sockfd - both can accept()
}
After fork(), both parent and child share the same socket. This is how multi-process servers (like Apache's prefork MPM) work.
Process Termination and Sockets:
When a process terminates, all its open file descriptors are closed:
Process exits with open TCP connections:
│
▼
Kernel closes all file descriptors
│
▼
For each TCP socket:
└── If empty send buffer: Send FIN, enter FIN_WAIT_1
└── If data in send buffer: Send data, then FIN
└── Socket enters TIME_WAIT after full close
│
▼
Eventually, socket resources are freed
Orphaned Connections:
If a process crashes abruptly (kill -9), the kernel still handles graceful close. However, the application may not have flushed all data. This is why applications should handle shutdown gracefully when possible.
Viewing Process-Socket Relationships:
# Linux: Show sockets with process info
ss -tnp
# State Recv-Q Send-Q Local Foreign Process
# ESTAB 0 0 *:443 10.0.0.1 nginx,pid=1234,fd=10
# lsof: List open files (including sockets)
lsof -i :443
# COMMAND PID USER FD TYPE SIZE/OFF NODE NAME
# nginx 1234 www 10u IPv4 12345 TCP *:443 (LISTEN)
When demultiplexing routes a segment to a socket, how does the data actually reach the application process? This involves several steps within the kernel.
Step 1: Kernel Receives Segment
Network card receives frame
│
▼
Interrupt handler copies frame to kernel memory
│
▼
IP layer processes, validates, passes to transport
│
▼
Transport layer demultiplexes to specific socket
Step 2: Data Enqueued on Socket
TCP receives validated segment
│
▼
Sequence number checking, reassembly if needed
│
▼
Data added to socket's receive_queue (sk_buff chain)
│
▼
Socket buffer counter updated
Step 3: Process Notification
Socket checks if process is waiting for data
│
├── If waiting: Wake process from sleep
│
├── If using epoll/select: Mark fd as readable
│
└── If using signals (SIGIO): Send signal to process
Step 4: Process Reads Data
Process calls recv(sockfd, buffer, len, flags)
│
▼
Kernel copies data from socket receive_queue to user buffer
│
▼
receive_queue freed, socket buffer counters updated
│
▼
Kernel sends window update to peer (more space in buffer)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
Network Interface Card (NIC) │ │ Interrupt: "Frame arrived!" ▼┌─────────────────────────────┐│ Interrupt Handler │ Kernel Space│ (softirq/NAPI context) ││ ││ 1. Copy frame to sk_buff ││ 2. Pass up network stack │└──────────────┬──────────────┘ │ ▼┌─────────────────────────────┐│ IP Layer ││ ││ 1. Parse IP header ││ 2. Check destination IP ││ 3. Pass to transport │└──────────────┬──────────────┘ │ ▼┌─────────────────────────────┐│ TCP Layer ││ ││ 1. Parse TCP header ││ 2. Lookup socket (4-tuple)││ 3. Validate sequence num ││ 4. Add to receive_queue │└──────────────┬──────────────┘ │ ▼┌─────────────────────────────┐│ Socket Layer ││ ││ 1. Check for waiters ││ 2. Wake blocked process ││ 3. Update poll status │└──────────────┬──────────────┘ │═══════════════╪═══════════════ User/Kernel Boundary │ ▼┌─────────────────────────────┐│ Application Process │ User Space│ ││ recv() returns with data ││ Data now in user buffer │└─────────────────────────────┘Notice that data is copied at least twice: once from NIC to kernel buffer, once from kernel to user buffer. This copying has CPU cost. High-performance systems use techniques like zero-copy (sendfile, splice) or kernel bypass (DPDK, XDP) to reduce copies.
How a process waits for data significantly affects process mapping behavior. There are two fundamental modes: blocking and non-blocking.
Blocking I/O (Default):
When a process calls recv() on a blocking socket:
// Blocking mode (default)
int bytes = recv(sockfd, buffer, 1024, 0);
// Process sleeps here until:
// - Data arrives (bytes > 0)
// - Connection closed (bytes = 0)
// - Error occurs (bytes = -1)
Internally:
Non-Blocking I/O:
When a socket is set to non-blocking:
// Set non-blocking mode
int flags = fcntl(sockfd, F_GETFL, 0);
fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
int bytes = recv(sockfd, buffer, 1024, 0);
// Returns immediately!
// If data available: bytes > 0
// If no data: bytes = -1, errno = EWOULDBLOCK/EAGAIN
// If closed: bytes = 0
The process is never put to sleep; it must poll or use I/O multiplexing.
| Aspect | Blocking I/O | Non-Blocking I/O |
|---|---|---|
| recv() with no data | Process sleeps | Returns -1, EWOULDBLOCK |
| CPU usage | Efficient (sleeping uses no CPU) | Busy-wait wastes CPU if looped |
| Programming model | Simple, synchronous | Complex, must handle EWOULDBLOCK |
| Multiple sockets | Need threads per socket | Can handle many with one thread |
| Latency | Wake-up delay after data arrives | Minimal if polling frequently |
| Common usage | Simple clients, thread-per-connection | High-performance servers |
Wait Queue Mechanics:
Blocking I/O uses kernel wait queues:
Process A calls recv() on empty socket:
│
▼
Process A added to socket->wait_queue
│
▼
Process A state = TASK_INTERRUPTIBLE (sleeping)
│
▼
Scheduler switches to another process
... later, data arrives ...
Network stack adds data to socket receive_queue
│
▼
Kernel walks socket->wait_queue, wakes all waiters
│
▼
Process A state = TASK_RUNNING (runnable)
│
▼
Scheduler eventually runs Process A
│
▼
recv() copies data and returns
If multiple processes wait on the same socket (e.g., multiple workers calling accept()), all are woken when a connection arrives, but only one can accept it. This 'thundering herd' wastes CPU. Modern kernels use flags like EPOLLEXCLUSIVE or SO_REUSEPORT to mitigate this.
A server handling thousands of connections needs an efficient way to know which sockets have data ready. I/O multiplexing APIs solve this problem.
The Problem:
Server has 10,000 connected sockets
How to efficiently wait for data on any of them?
Option A: Thread-per-socket (10,000 threads - expensive!)
Option B: Busy-poll all sockets (100% CPU wasteful)
Option C: I/O multiplexing (efficient solution ✓)
I/O Multiplexing APIs:
1. select() - The Classic:
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(sock1, &readfds);
FD_SET(sock2, &readfds);
int ready = select(maxfd + 1, &readfds, NULL, NULL, &timeout);
if (FD_ISSET(sock1, &readfds)) {
// sock1 has data
}
Limitations: O(n) per call, limited to 1024 fds typically.
2. poll() - Slightly Better:
struct pollfd fds[2];
fds[0] = {.fd = sock1, .events = POLLIN};
fds[1] = {.fd = sock2, .events = POLLIN};
int ready = poll(fds, 2, timeout_ms);
if (fds[0].revents & POLLIN) {
// sock1 has data
}
No fd limit, still O(n) per call.
3. epoll() - Linux High Performance:
int epfd = epoll_create1(0);
struct epoll_event ev = {.events = EPOLLIN, .data.fd = sock1};
epoll_ctl(epfd, EPOLL_CTL_ADD, sock1, &ev);
struct epoll_event events[100];
int n = epoll_wait(epfd, events, 100, timeout_ms);
for (int i = 0; i < n; i++) {
// events[i].data.fd has data
}
O(1) for adding/removing, O(ready) for wait - scales to millions of fds.
| API | Add/Remove | Wait | fd Limit | Platform |
|---|---|---|---|---|
| select() | O(n) | O(n) | ~1024 | All Unix, Windows |
| poll() | O(n) | O(n) | Unlimited | All Unix |
| epoll() | O(1) | O(ready) | Unlimited | Linux only |
| kqueue() | O(1) | O(ready) | Unlimited | BSD, macOS |
| IOCP | O(1) | O(ready) | Unlimited | Windows |
How epoll Works Internally:
epoll_ctl(ADD) registers socket:
│
▼
Socket's wait_queue gets special epoll callback
│
▼
When data arrives on socket:
└── Callback adds socket to epoll's ready list
│
▼
epoll_wait() returns only ready sockets
└── No need to scan all registered sockets
This is why epoll scales: instead of asking "is socket ready?" for each socket, it gets told when any socket becomes ready.
Event-Driven Architecture:
Modern servers combine non-blocking I/O with epoll:
while (running) {
int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
for (int i = 0; i < n; i++) {
if (events[i].data.fd == listen_sock) {
accept_new_connection();
} else {
handle_client_data(events[i].data.fd);
}
}
}
A single thread can handle tens of thousands of connections efficiently.
How sockets are distributed across threads significantly impacts process mapping and performance. Several models exist.
Model 1: Thread-Per-Connection
Main Thread:
while (true):
clientfd = accept(listenfd)
spawn_thread(handle_client, clientfd)
Worker Thread:
while (connection_open):
data = recv(clientfd) # Blocking OK
response = process(data)
send(clientfd, response)
close(clientfd)
Model 2: Event Loop (Single-Threaded)
Single Thread:
while (true):
ready_fds = epoll_wait(epfd)
for fd in ready_fds:
if fd is listen_socket:
new_client = accept(fd)
epoll_add(new_client)
else:
data = recv(fd) # Non-blocking
response = process(data)
send(fd, response)
1234567891011121314151617181920212223242526272829303132333435363738394041
Model 1: Thread-Per-Connection───────────────────────────────────────── ┌─────────┐ ┌───►│Thread 1 │──► Client A┌─────────┐ accept() │ └─────────┘│ Main │──────────────┤ ┌─────────┐│ Thread │ ├───►│Thread 2 │──► Client B│(listen) │ │ └─────────┘└─────────┘ │ ┌─────────┐ └───►│Thread N │──► Client N └─────────┘ Model 2: Single-Threaded Event Loop─────────────────────────────────────────┌────────────────────────────────────────┐│ Single Thread ││ ┌──────────────────────────────────┐ ││ │ Event Loop │ ││ │ │ ││ │ ┌─────────┐ ┌─────────┐ │ ││ │ │Client A │ │Client B │ ... │ ││ │ │ fd=5 │ │ fd=6 │ │ ││ │ └─────────┘ └─────────┘ │ ││ │ epoll_wait() │ ││ └──────────────────────────────────┘ │└────────────────────────────────────────┘ Model 3: Thread Pool with Work Stealing───────────────────────────────────────── ┌─────────┐┌───────────┐ │Worker 1 ││ Accept │ Work Queue │ ││ Thread │──────────────► │ epoll ││ │ └─────────┘└───────────┘ ┌─────────┐ │Worker 2 │ All sockets │ │ distributed │ epoll │ across workers └─────────┘Model 3: Multi-Threaded with I/O Multiplexing
Acceptor Thread:
while (true):
clientfd = accept(listenfd)
worker = select_worker() # Round-robin or least-loaded
assign_to_worker(worker, clientfd)
Worker Thread (one per CPU core):
while (true):
ready_fds = epoll_wait(my_epfd)
for fd in ready_fds:
handle_event(fd)
This hybrid model (used by nginx, Node.js cluster):
SO_REUSEPORT: Kernel-Level Load Balancing
// Multiple processes can bind to same port
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one));
bind(sockfd, ...);
listen(sockfd, ...);
// Kernel distributes incoming connections across all listeners
With SO_REUSEPORT, the kernel handles distribution—each worker has its own listening socket, and the kernel load-balances incoming connections.
While multiple threads can read from a socket, they may receive interleaved data. For TCP streams, this is problematic—you'll get scrambled data. Each TCP connection should be owned by one thread at a time, or protected with mutex for coordinated access.
Let's trace a complete example from packet arrival to application processing, showing every step of process mapping.
Scenario:
Timeline:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
Time Component Action═════ ═════════════════════ ═══════════════════════════════════T+0ms Network Card Receives Ethernet frame T+0.01 NIC DMA Copies frame to kernel ring buffer T+0.02 Interrupt Hardware interrupt signals CPU T+0.03 Interrupt Handler Acknowledges interrupt, schedules softirq T+0.10 Softirq Handler Pulls frame from ring buffer T+0.12 Ethernet Layer Strips Ethernet header, identifies IP T+0.15 IP Layer Validates IP header, destination is local Protocol field = 6 (TCP) T+0.18 TCP Layer Extracts 4-tuple: (10.0.0.1:52341, 192.0.2.1:443) T+0.20 Socket Lookup hash(4-tuple) → lookup in connection table Found: socket fd=12, owned by nginx pid=1234 T+0.22 TCP Processing Validate sequence number: 12000 in window ✓ Add 500 bytes to socket receive queue T+0.25 Socket Layer Check socket->wait_queue for waiters nginx worker (pid=1235) is in epoll_wait() T+0.27 Epoll Add fd=12 to ready list for epfd=3 T+0.28 Scheduler Mark nginx worker (pid=1235) as RUNNABLE T+0.50 Scheduler Context switch to nginx worker T+0.52 nginx (user space) epoll_wait() returns: fd=12 is readable T+0.55 nginx (user space) Calls recv(12, buffer, 4096, 0) T+0.56 Kernel sys_recvfrom() system call entry T+0.58 Kernel Look up fd=12 → socket structure T+0.60 Kernel Copy 500 bytes from socket receive_queue to user buffer at 0x7fff1234abc0 T+0.62 Kernel Update socket buffer counters Schedule TCP window update T+0.64 Kernel System call returns, bytes=500 T+0.65 nginx (user space) recv() returns 500 T+1.00 nginx (user space) Parses HTTP request: "GET / HTTP/1.1" T+2.00 nginx (user space) Prepares response, calls send() ═══════════════════════════════════════════════════════════════════Total time from NIC to application: ~0.65ms (typical modern system)Key Observations:
What Affects Latency:
| Factor | Impact | Mitigation |
|---|---|---|
| Interrupt handling | ~microseconds | Interrupt coalescing, NAPI |
| Socket lookup | ~nanoseconds | Efficient hash tables |
| Process wake-up | ~microseconds | Busy polling, DPDK |
| Memory copy | ~microseconds | Zero-copy techniques |
| Scheduler latency | ~microseconds | Real-time scheduling |
Ultra-Low-Latency Approaches:
We've completed our exploration of multiplexing and demultiplexing with process mapping—the final link connecting network segments to application processes. Let's consolidate the key concepts:
Module Complete:
You've now mastered the complete multiplexing and demultiplexing system:
This knowledge forms the foundation for understanding how all networked applications function—from simple clients to high-performance servers handling millions of connections.
Congratulations! You've completed the Multiplexing and Demultiplexing module. You now understand the complete data path from application to network and back—the fundamental mechanism enabling all Internet communication. This knowledge is essential for network programming, system administration, and understanding distributed systems.