Loading content...
Every network communication you've ever used—web browsing, video calls, online gaming, REST APIs, gRPC services—ultimately relies on one fundamental operating system abstraction: sockets.
A socket is an endpoint for communication, an abstraction that lets applications send and receive data over a network using the same read/write paradigm as files. The socket API, introduced in BSD Unix in 1983, has become the universal interface for network programming across virtually all operating systems.
Understanding sockets is essential because:
By the end of this page, you will understand the socket abstraction and how it maps to the network stack, the difference between stream (TCP) and datagram (UDP) sockets, the complete lifecycle of socket operations, advanced I/O models including blocking, non-blocking, and multiplexed I/O, and practical patterns for building reliable network applications.
A socket is an operating system abstraction representing an endpoint for network communication. It provides a uniform interface for communicating over various network protocols while hiding the complexity of the underlying network stack.
Sockets as File Descriptors
In Unix-like systems, sockets are file descriptors—integer handles that can be used with standard I/O operations:
int sock = socket(...); // Returns file descriptor, e.g., 3
write(sock, data, len); // Write data to socket (send)
read(sock, buf, len); // Read data from socket (receive)
close(sock); // Close the socket
This "everything is a file" philosophy means the same tools work for files, pipes, terminals, and network connections. However, sockets have additional operations specific to networking: binding to addresses, listening for connections, accepting clients, and connecting to servers.
The Socket Address
Every socket is associated with addressing information that identifies the endpoint:
192.168.1.100 or ::1)80 for HTTP)AF_INET) or IPv6 (AF_INET6)SOCK_STREAM) or datagram (SOCK_DGRAM)A complete socket address—called a 5-tuple—uniquely identifies a connection:
(protocol, local_ip, local_port, remote_ip, remote_port)
| Characteristic | Stream (TCP) | Datagram (UDP) |
|---|---|---|
| Socket type constant | SOCK_STREAM | SOCK_DGRAM |
| Connection | Connection-oriented | Connectionless |
| Reliability | Guaranteed delivery, ordering | Best-effort, may lose/reorder |
| Message boundaries | Byte stream (no boundaries) | Discrete messages preserved |
| Flow control | Yes (TCP windowing) | No (application must handle) |
| Overhead | Higher (connection state) | Lower (no state) |
| Use cases | HTTP, databases, SSH | DNS, video streaming, gaming |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
// Socket creation and addressing fundamentals#include <sys/socket.h>#include <netinet/in.h>#include <arpa/inet.h> // Create a socketint create_tcp_socket() { // socket(domain, type, protocol) // - AF_INET: IPv4 // - SOCK_STREAM: TCP (reliable, connection-oriented) // - 0: Default protocol for type int sock = socket(AF_INET, SOCK_STREAM, 0); if (sock < 0) { perror("socket creation failed"); return -1; } return sock; // Returns file descriptor >= 0} int create_udp_socket() { // SOCK_DGRAM: UDP (unreliable, connectionless) return socket(AF_INET, SOCK_DGRAM, 0);} int create_ipv6_socket() { // AF_INET6: IPv6 return socket(AF_INET6, SOCK_STREAM, 0);} // Socket address structuresvoid demonstrate_addressing() { // IPv4 address structure struct sockaddr_in addr4 = { .sin_family = AF_INET, .sin_port = htons(8080), // Network byte order (big-endian) .sin_addr.s_addr = inet_addr("192.168.1.100"), }; // Or use inet_pton for both IPv4 and IPv6 struct sockaddr_in6 addr6; memset(&addr6, 0, sizeof(addr6)); addr6.sin6_family = AF_INET6; addr6.sin6_port = htons(8080); inet_pton(AF_INET6, "::1", &addr6.sin6_addr); // Localhost // INADDR_ANY: Bind to all local interfaces struct sockaddr_in any_addr = { .sin_family = AF_INET, .sin_port = htons(8080), .sin_addr.s_addr = INADDR_ANY, // 0.0.0.0 };} // Byte order conversion functionsvoid byte_order_example() { uint16_t port = 8080; uint32_t ip = 0xC0A80164; // 192.168.1.100 // Host to Network (for sending) uint16_t net_port = htons(port); // host-to-network short uint32_t net_ip = htonl(ip); // host-to-network long // Network to Host (after receiving) uint16_t host_port = ntohs(net_port); uint32_t host_ip = ntohl(net_ip); // These are no-ops on big-endian systems // but essential on little-endian (x86, ARM)}Network protocols use big-endian byte order (most significant byte first), but most modern CPUs are little-endian. Always use htons()/htonl() when sending and ntohs()/ntohl() when receiving multi-byte values. Forgetting this causes subtle bugs: port 80 (0x0050) becomes port 20480 (0x5000) if byte order is wrong!
TCP sockets follow a well-defined lifecycle from creation through data exchange to termination. Understanding this lifecycle is essential for building reliable network applications.
Server Lifecycle
A TCP server follows this sequence:
Client Lifecycle
A TCP client is simpler:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
// Complete TCP server implementation#include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <sys/socket.h>#include <netinet/in.h>#include <arpa/inet.h> #define PORT 8080#define BACKLOG 128 // Listen queue size#define BUFFER_SIZE 4096 int main() { int server_fd, client_fd; struct sockaddr_in server_addr, client_addr; socklen_t client_len = sizeof(client_addr); char buffer[BUFFER_SIZE]; // 1. CREATE SOCKET server_fd = socket(AF_INET, SOCK_STREAM, 0); if (server_fd < 0) { perror("socket failed"); exit(EXIT_FAILURE); } // Set SO_REUSEADDR to avoid "Address already in use" on restart int opt = 1; setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); // 2. BIND TO ADDRESS memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_addr.s_addr = INADDR_ANY; // All interfaces server_addr.sin_port = htons(PORT); if (bind(server_fd, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) { perror("bind failed"); exit(EXIT_FAILURE); } // 3. LISTEN FOR CONNECTIONS // BACKLOG: Maximum pending connections in the queue if (listen(server_fd, BACKLOG) < 0) { perror("listen failed"); exit(EXIT_FAILURE); } printf("Server listening on port %d", PORT); // 4. ACCEPT CONNECTIONS (main loop) while (1) { // accept() blocks until a client connects client_fd = accept(server_fd, (struct sockaddr*)&client_addr, &client_len); if (client_fd < 0) { perror("accept failed"); continue; } // Log client connection char client_ip[INET_ADDRSTRLEN]; inet_ntop(AF_INET, &client_addr.sin_addr, client_ip, sizeof(client_ip)); printf("Client connected: %s:%d", client_ip, ntohs(client_addr.sin_port)); // 5. HANDLE CLIENT (simple echo server) ssize_t bytes_read; while ((bytes_read = read(client_fd, buffer, sizeof(buffer) - 1)) > 0) { buffer[bytes_read] = '\0'; // Null-terminate printf("Received: %s", buffer); // Echo back to client write(client_fd, buffer, bytes_read); } // 6. CLOSE CLIENT CONNECTION close(client_fd); printf("Client disconnected"); } close(server_fd); return 0;}123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
// Complete TCP client implementation#include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <sys/socket.h>#include <netinet/in.h>#include <arpa/inet.h>#include <netdb.h> #define BUFFER_SIZE 4096 int main(int argc, char *argv[]) { if (argc != 3) { fprintf(stderr, "Usage: %s <host> <port>", argv[0]); exit(EXIT_FAILURE); } const char *host = argv[1]; int port = atoi(argv[2]); int sock; struct sockaddr_in server_addr; char buffer[BUFFER_SIZE]; // 1. CREATE SOCKET sock = socket(AF_INET, SOCK_STREAM, 0); if (sock < 0) { perror("socket failed"); exit(EXIT_FAILURE); } // Resolve hostname to IP address struct hostent *server = gethostbyname(host); if (server == NULL) { fprintf(stderr, "Host not found: %s", host); exit(EXIT_FAILURE); } // Set up server address memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; memcpy(&server_addr.sin_addr.s_addr, server->h_addr, server->h_length); server_addr.sin_port = htons(port); // 2. CONNECT TO SERVER // This initiates the three-way handshake if (connect(sock, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) { perror("connect failed"); exit(EXIT_FAILURE); } printf("Connected to %s:%d", host, port); // 3. SEND AND RECEIVE DATA const char *message = "Hello, server!"; ssize_t sent = write(sock, message, strlen(message)); printf("Sent %zd bytes: %s", sent, message); ssize_t received = read(sock, buffer, sizeof(buffer) - 1); if (received > 0) { buffer[received] = '\0'; printf("Received %zd bytes: %s", received, buffer); } // 4. CLOSE CONNECTION close(sock); return 0;}The backlog parameter to listen() specifies the maximum number of pending connections waiting to be accepted. If the queue is full, new connection attempts are refused. On modern Linux, the actual limit is min(backlog, net.core.somaxconn). For high-throughput servers, set both backlog and somaxconn appropriately (e.g., 1024 or higher).
UDP (User Datagram Protocol) sockets provide a simpler, connectionless communication model. Unlike TCP, UDP doesn't establish connections, guarantee delivery, or maintain packet order.
UDP Characteristics
When to Use UDP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144
// UDP server and client implementation#include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <sys/socket.h>#include <netinet/in.h>#include <arpa/inet.h> #define PORT 8080#define BUFFER_SIZE 65535 // Max UDP payload // =================== UDP SERVER ===================void udp_server() { int sock; struct sockaddr_in server_addr, client_addr; socklen_t client_len = sizeof(client_addr); char buffer[BUFFER_SIZE]; // Create UDP socket sock = socket(AF_INET, SOCK_DGRAM, 0); if (sock < 0) { perror("socket failed"); exit(EXIT_FAILURE); } // Bind to address memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_addr.s_addr = INADDR_ANY; server_addr.sin_port = htons(PORT); if (bind(sock, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) { perror("bind failed"); exit(EXIT_FAILURE); } printf("UDP server listening on port %d", PORT); // No listen() or accept() needed! // Just receive datagrams directly while (1) { // recvfrom: receive data and get sender's address ssize_t received = recvfrom(sock, buffer, sizeof(buffer) - 1, 0, (struct sockaddr*)&client_addr, &client_len); if (received < 0) { perror("recvfrom failed"); continue; } buffer[received] = '\0'; char client_ip[INET_ADDRSTRLEN]; inet_ntop(AF_INET, &client_addr.sin_addr, client_ip, sizeof(client_ip)); printf("Received from %s:%d: %s", client_ip, ntohs(client_addr.sin_port), buffer); // sendto: send response to the client const char *response = "Message received"; sendto(sock, response, strlen(response), 0, (struct sockaddr*)&client_addr, client_len); } close(sock);} // =================== UDP CLIENT ===================void udp_client(const char *server_ip, int port) { int sock; struct sockaddr_in server_addr; socklen_t server_len = sizeof(server_addr); char buffer[BUFFER_SIZE]; // Create UDP socket sock = socket(AF_INET, SOCK_DGRAM, 0); if (sock < 0) { perror("socket failed"); exit(EXIT_FAILURE); } // Set up server address (no connect needed!) memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_port = htons(port); inet_pton(AF_INET, server_ip, &server_addr.sin_addr); // Send datagram const char *message = "Hello, UDP server!"; ssize_t sent = sendto(sock, message, strlen(message), 0, (struct sockaddr*)&server_addr, server_len); printf("Sent %zd bytes", sent); // Optional: Set receive timeout (UDP can't guarantee response) struct timeval timeout = {.tv_sec = 5, .tv_usec = 0}; setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout)); // Receive response ssize_t received = recvfrom(sock, buffer, sizeof(buffer) - 1, 0, (struct sockaddr*)&server_addr, &server_len); if (received < 0) { perror("recvfrom failed (timeout?)"); } else { buffer[received] = '\0'; printf("Received: %s", buffer); } close(sock);} // =================== CONNECTED UDP ===================// You can also "connect" a UDP socket for convenience void connected_udp_client(const char *server_ip, int port) { int sock; struct sockaddr_in server_addr; sock = socket(AF_INET, SOCK_DGRAM, 0); memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_port = htons(port); inet_pton(AF_INET, server_ip, &server_addr.sin_addr); // "Connect" associates default destination with this socket // Does NOT establish a connection! Just sets default address connect(sock, (struct sockaddr*)&server_addr, sizeof(server_addr)); // Now can use send/recv instead of sendto/recvfrom send(sock, "Hello", 5, 0); // Goes to connected address char buffer[1024]; recv(sock, buffer, sizeof(buffer), 0); // Only from connected address close(sock);}TCP vs. UDP: A Practical Comparison
| Aspect | TCP | UDP |
|---|---|---|
| Header size | 20-60 bytes | 8 bytes |
| Connection setup | 3-way handshake (~1.5 RTT) | None (send immediately) |
| First byte latency | 1.5 RTT + send | 0.5 RTT (just send) |
| Lost packet | Automatic retransmission | Lost (app must handle) |
| Congestion | Automatic backoff | No backoff (can overwhelm network) |
| Packet ordering | Guaranteed in-order | May arrive out of order |
Modern protocols like QUIC (used by HTTP/3) build reliability on top of UDP. This allows implementing custom congestion control, avoiding head-of-line blocking, and reducing connection setup latency. UDP provides the raw transport; the protocol implements exactly the reliability features it needs.
Socket behavior is extensively configurable through socket options. These options control buffering, timeouts, connection behavior, and performance characteristics.
Setting and Getting Options
Use setsockopt() and getsockopt() to configure sockets:
int setsockopt(int socket, int level, int option_name,
const void *option_value, socklen_t option_len);
int getsockopt(int socket, int level, int option_name,
void *option_value, socklen_t *option_len);
The level parameter specifies which protocol layer the option applies to:
SOL_SOCKET: Socket-level options (buffer sizes, reuse)IPPROTO_TCP: TCP-specific options (Nagle, keepalive)IPPROTO_IP: IP-level options (TTL, multicast)123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
// Common socket options and their usage#include <sys/socket.h>#include <netinet/in.h>#include <netinet/tcp.h> void configure_socket(int sock) { int optval; socklen_t optlen = sizeof(optval); // =================== SO_REUSEADDR =================== // Allow reuse of local address. Essential for servers that restart. // Without this, bind() fails with "Address already in use" for // TIME_WAIT connections from previous server instance. optval = 1; setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)); // =================== SO_REUSEPORT =================== // Allow multiple sockets to bind to same port. // Enables multiple processes/threads to share one port. // Kernel load-balances incoming connections. optval = 1; setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval)); // =================== SO_RCVBUF / SO_SNDBUF =================== // Set receive and send buffer sizes. // Larger buffers improve throughput on high-latency connections. // Must set before connect() for automatic TCP window scaling. int recv_buffer = 256 * 1024; // 256 KB int send_buffer = 256 * 1024; setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &recv_buffer, sizeof(recv_buffer)); setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &send_buffer, sizeof(send_buffer)); // Read back actual size (kernel may double it) getsockopt(sock, SOL_SOCKET, SO_RCVBUF, &optval, &optlen); printf("Actual receive buffer: %d bytes", optval); // =================== SO_RCVTIMEO / SO_SNDTIMEO =================== // Set timeout for blocking read/write operations. // Operations return -1 with errno=EAGAIN on timeout. struct timeval recv_timeout = {.tv_sec = 30, .tv_usec = 0}; struct timeval send_timeout = {.tv_sec = 30, .tv_usec = 0}; setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, &recv_timeout, sizeof(recv_timeout)); setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, &send_timeout, sizeof(send_timeout)); // =================== SO_KEEPALIVE =================== // Enable TCP keepalive probes. // Detects dead connections when no data is exchanged. optval = 1; setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(optval)); // Configure keepalive timing (Linux-specific) int keepalive_idle = 60; // Seconds before first probe int keepalive_interval = 10; // Seconds between probes int keepalive_count = 5; // Probes before connection is dead setsockopt(sock, IPPROTO_TCP, TCP_KEEPIDLE, &keepalive_idle, sizeof(keepalive_idle)); setsockopt(sock, IPPROTO_TCP, TCP_KEEPINTVL, &keepalive_interval, sizeof(keepalive_interval)); setsockopt(sock, IPPROTO_TCP, TCP_KEEPCNT, &keepalive_count, sizeof(keepalive_count)); // =================== TCP_NODELAY =================== // Disable Nagle's algorithm. // Send data immediately, don't wait to aggregate small packets. // Essential for low-latency applications (interactive, RPC). optval = 1; setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval)); // =================== TCP_QUICKACK =================== // Disable delayed acknowledgments. // ACK immediately instead of waiting up to 40ms. optval = 1; setsockopt(sock, IPPROTO_TCP, TCP_QUICKACK, &optval, sizeof(optval)); // =================== SO_LINGER =================== // Control behavior on close(). // With linger ON and timeout > 0: close() blocks until data sent // With linger ON and timeout = 0: close() sends RST, discards data struct linger linger_opt = { .l_onoff = 1, // Enable linger .l_linger = 30, // Seconds to wait }; setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger_opt, sizeof(linger_opt));} // =================== PERFORMANCE TUNING EXAMPLE ===================int create_optimized_server_socket(int port) { int sock = socket(AF_INET, SOCK_STREAM, 0); // Reuse address for quick restart int opt = 1; setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); // Allow multiple threads to accept connections setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt)); // Large buffers for high throughput int bufsize = 1024 * 1024; // 1 MB setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &bufsize, sizeof(bufsize)); setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &bufsize, sizeof(bufsize)); // Fast connection behavior setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &opt, sizeof(opt)); // Enable keepalive for connection health monitoring setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, &opt, sizeof(opt)); struct sockaddr_in addr = { .sin_family = AF_INET, .sin_port = htons(port), .sin_addr.s_addr = INADDR_ANY, }; bind(sock, (struct sockaddr*)&addr, sizeof(addr)); listen(sock, 4096); // Large backlog return sock;}Nagle's algorithm (default enabled) delays sending small packets to aggregate them into larger ones, improving network efficiency but adding latency. For RPC, interactive protocols, or latency-sensitive applications, disable it with TCP_NODELAY. The trade-off: more small packets (higher CPU/network overhead) but lower latency.
How a program waits for I/O fundamentally affects its architecture and scalability. There are several I/O models, each with different trade-offs.
Blocking I/O
The default model. read() blocks until data is available; accept() blocks until a client connects. Simple to program but limits concurrency—one thread can handle only one socket at a time.
Non-Blocking I/O
Sockets can be set to non-blocking mode. Operations return immediately with EAGAIN if they would block. The application must poll or use event notification.
I/O Multiplexing
Wait on multiple sockets simultaneously. One thread can handle thousands of connections. Mechanisms include:
| Model | Complexity | Scalability | Use Case |
|---|---|---|---|
| Blocking | Simple | Thread per connection | Low-concurrency servers |
| Non-blocking + poll | Medium | Moderate (O(n) per check) | Moderate concurrency |
| epoll/kqueue | Complex | High (O(1) per event) | High-concurrency servers |
| io_uring | Most complex | Highest (zero-copy possible) | Ultra-high-performance |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
// High-performance event-driven server using epoll#include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <fcntl.h>#include <errno.h>#include <sys/socket.h>#include <sys/epoll.h>#include <netinet/in.h>#include <netinet/tcp.h> #define MAX_EVENTS 10000#define BUFFER_SIZE 4096 // Set socket to non-blocking modeint set_nonblocking(int fd) { int flags = fcntl(fd, F_GETFL, 0); if (flags == -1) return -1; return fcntl(fd, F_SETFL, flags | O_NONBLOCK);} int main() { int server_fd, epoll_fd; struct sockaddr_in server_addr; struct epoll_event event, events[MAX_EVENTS]; // Create and configure server socket server_fd = socket(AF_INET, SOCK_STREAM, 0); int opt = 1; setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); setsockopt(server_fd, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt)); setsockopt(server_fd, IPPROTO_TCP, TCP_NODELAY, &opt, sizeof(opt)); set_nonblocking(server_fd); memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_addr.s_addr = INADDR_ANY; server_addr.sin_port = htons(8080); bind(server_fd, (struct sockaddr*)&server_addr, sizeof(server_addr)); listen(server_fd, 4096); // Create epoll instance epoll_fd = epoll_create1(0); if (epoll_fd < 0) { perror("epoll_create1"); exit(EXIT_FAILURE); } // Add server socket to epoll event.data.fd = server_fd; event.events = EPOLLIN | EPOLLET; // Edge-triggered mode epoll_ctl(epoll_fd, EPOLL_CTL_ADD, server_fd, &event); printf("epoll server listening on port 8080"); // Event loop while (1) { // Wait for events on any of our sockets int n_events = epoll_wait(epoll_fd, events, MAX_EVENTS, -1); for (int i = 0; i < n_events; i++) { int fd = events[i].data.fd; if (events[i].events & (EPOLLERR | EPOLLHUP)) { // Error or hang-up close(fd); continue; } if (fd == server_fd) { // New connection(s) ready while (1) { struct sockaddr_in client_addr; socklen_t client_len = sizeof(client_addr); int client_fd = accept(server_fd, (struct sockaddr*)&client_addr, &client_len); if (client_fd < 0) { if (errno == EAGAIN || errno == EWOULDBLOCK) { // No more pending connections break; } perror("accept"); break; } // Configure new client socket set_nonblocking(client_fd); int nodelay = 1; setsockopt(client_fd, IPPROTO_TCP, TCP_NODELAY, &nodelay, sizeof(nodelay)); // Add to epoll event.data.fd = client_fd; event.events = EPOLLIN | EPOLLET; epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client_fd, &event); } } else { // Data ready on a client socket char buffer[BUFFER_SIZE]; while (1) { ssize_t count = read(fd, buffer, sizeof(buffer)); if (count < 0) { if (errno == EAGAIN) { // No more data to read break; } perror("read"); close(fd); break; } else if (count == 0) { // Client disconnected close(fd); break; } // Echo data back (simple example) write(fd, buffer, count); } } } } close(server_fd); close(epoll_fd); return 0;}Edge-Triggered vs. Level-Triggered
epoll supports two notification modes:
Level-triggered (default): Notifies as long as the condition exists. EPOLLIN fires repeatedly as long as data is available to read.
Edge-triggered (EPOLLET): Notifies only when the condition changes. EPOLLIN fires once when data first becomes available; you must read all data or you won't be notified again.
Edge-triggered is more efficient (fewer syscalls) but requires draining buffers completely. It's preferred for high-performance servers but requires more careful programming.
Linux io_uring (5.1+) is the latest I/O interface. It provides true asynchronous I/O with submission and completion queues in shared memory, eliminating syscalls per operation after setup. Combined with registered buffers and files, it enables near-zero-copy I/O. Libraries like liburing make it accessible. For extreme performance, io_uring can outperform epoll by 40%+.
Writing robust socket code requires handling many edge cases. Here are essential practices learned from production systems.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
// Robust socket I/O with proper error handling #include <errno.h>#include <signal.h>#include <sys/socket.h> // =================== ROBUST READ ===================// Handle partial reads and interrupted syscallsssize_t read_fully(int fd, void *buf, size_t count) { size_t total_read = 0; char *ptr = buf; while (total_read < count) { ssize_t bytes_read = read(fd, ptr + total_read, count - total_read); if (bytes_read < 0) { if (errno == EINTR) { // Interrupted by signal, retry continue; } return -1; // Error } if (bytes_read == 0) { // EOF - connection closed break; } total_read += bytes_read; } return total_read;} // =================== ROBUST WRITE ===================// Handle partial writes and interrupted syscallsssize_t write_fully(int fd, const void *buf, size_t count) { size_t total_written = 0; const char *ptr = buf; while (total_written < count) { ssize_t bytes_written = write(fd, ptr + total_written, count - total_written); if (bytes_written < 0) { if (errno == EINTR) { continue; // Retry } if (errno == EAGAIN || errno == EWOULDBLOCK) { // Non-blocking socket would block // In a real app, you'd use epoll here continue; } if (errno == EPIPE) { // Connection closed by peer return total_written; // Return what we wrote } return -1; } total_written += bytes_written; } return total_written;} // =================== MESSAGE FRAMING ===================// TCP is a byte stream - you need to frame messages yourself typedef struct { uint32_t length; // Message length (network byte order) uint8_t type; // payload follows} __attribute__((packed)) message_header_t; int send_message(int fd, uint8_t type, const void *payload, size_t len) { // Build header message_header_t header = { .length = htonl(len), .type = type, }; // Send header if (write_fully(fd, &header, sizeof(header)) != sizeof(header)) { return -1; } // Send payload if (len > 0) { if (write_fully(fd, payload, len) != (ssize_t)len) { return -1; } } return 0;} int receive_message(int fd, uint8_t *type, void *payload, size_t max_len) { message_header_t header; // Read header if (read_fully(fd, &header, sizeof(header)) != sizeof(header)) { return -1; } *type = header.type; uint32_t len = ntohl(header.length); // Sanity check length if (len > max_len) { errno = EMSGSIZE; return -1; } // Read payload if (len > 0) { if (read_fully(fd, payload, len) != (ssize_t)len) { return -1; } } return len;} // =================== CONNECTION TIMEOUT ===================// Set timeout for connect() operation int connect_with_timeout(int sock, struct sockaddr *addr, socklen_t addrlen, int timeout_sec) { // Set non-blocking int flags = fcntl(sock, F_GETFL, 0); fcntl(sock, F_SETFL, flags | O_NONBLOCK); int result = connect(sock, addr, addrlen); if (result < 0 && errno == EINPROGRESS) { // Connection in progress - wait with timeout fd_set fdset; FD_ZERO(&fdset); FD_SET(sock, &fdset); struct timeval tv = {.tv_sec = timeout_sec, .tv_usec = 0}; result = select(sock + 1, NULL, &fdset, NULL, &tv); if (result == 0) { errno = ETIMEDOUT; return -1; } if (result < 0) { return -1; } // Check if connection succeeded int error; socklen_t len = sizeof(error); getsockopt(sock, SOL_SOCKET, SO_ERROR, &error, &len); if (error) { errno = error; return -1; } } // Restore blocking mode fcntl(sock, F_SETFL, flags); return 0;} // =================== SIGNAL HANDLING ===================// Ignore SIGPIPE to handle write to closed socket gracefully void setup_signal_handling() { // SIGPIPE causes process to exit if writing to closed socket // We want errno=EPIPE instead signal(SIGPIPE, SIG_IGN); // Or use MSG_NOSIGNAL flag per-write: // send(sock, data, len, MSG_NOSIGNAL);}If a client closes the connection (sends FIN) and the server writes more data, the next read() returns 0 (indicating EOF). But if the server hasn't read all client data first, or if the client crashes, the server may receive RST on its next I/O attempt. Always be prepared for connection reset at any point—handle ECONNRESET gracefully.
Sockets are the foundation of all network communication. Whether you're building microservices, web applications, or distributed systems, understanding sockets gives you power over the underlying mechanics.
What's Next:
We'll explore Message Queues—asynchronous communication systems that decouple producers from consumers, enable reliable message delivery, and form the backbone of many distributed architectures.
You now understand sockets, the fundamental operating system primitive for network communication. From socket creation and configuration to advanced I/O models, you have the knowledge to build, debug, and optimize network applications at the system level.