Computer NetworksTransport Layer Concepts

Sockets

LevelIntermediate

Duration60 mins

TopicTransport Layer Concepts

3 / 5

Socket API

The Programming Interface

The Socket API is the programming interface through which applications interact with the network stack. It's a collection of system calls that together enable the full spectrum of network operations: creating communication endpoints, establishing connections, transferring data, and managing socket behavior.

Originating from BSD Unix in 1983, this API has become the universal standard for network programming. Whether you're writing in C, Python, Java, Go, or JavaScript, the underlying concepts—and often the function names—mirror the original BSD socket interface.

Mastering the socket API means understanding not just what each function does, but when to use each, what errors to expect, and how they interact to create robust network applications.

What You Will Learn

By the end of this page, you will understand every major socket API function—socket(), bind(), listen(), accept(), connect(), send(), recv(), close()—along with socket options, I/O multiplexing basics, and error handling patterns. You'll be equipped to read and write socket code with confidence.

Socket Creation: socket()

Every socket operation begins with creating a socket. The socket() system call allocates a new socket and returns a file descriptor for subsequent operations.

Function Signature:

int socket(int domain, int type, int protocol);

Parameters:

Parameter	Description	Common Values
domain	Address family (protocol family)	AF_INET (IPv4), AF_INET6 (IPv6), AF_UNIX (local)
type	Socket type (communication semantics)	SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW
protocol	Specific protocol (usually 0 for default)	0, IPPROTO_TCP, IPPROTO_UDP

Return Value:

Success: Non-negative file descriptor
Failure: -1 with errno set

socket_creation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
 
// Create a TCP socket (IPv4)
int create_tcp_socket_v4() {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create a TCP socket (IPv6, dual-stack capable)
int create_tcp_socket_v6() {
    int sockfd = socket(AF_INET6, SOCK_STREAM, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create a UDP socket
int create_udp_socket() {
    int sockfd = socket(AF_INET, SOCK_DGRAM, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create a socket with explicit protocol (equivalent to above)
int create_explicit_tcp_socket() {
    int sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create socket with non-blocking and close-on-exec flags (Linux 2.6.27+)
int create_nonblocking_socket() {
    int sockfd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}

What Happens Internally:

When socket() succeeds, the kernel:

Allocates a new file descriptor in the process's fd table
Creates a socket structure in kernel memory
Associates the socket with the specified protocol stack
Initializes send/receive buffers
Sets default socket options

The returned file descriptor can be used with standard file operations (read, write, close) as well as socket-specific operations.

Common socket() Errors

•EACCES — Permission denied. Raw sockets require root privileges.
•EAFNOSUPPORT — Address family not supported. The kernel doesn't support this protocol family.
•EMFILE — Per-process file descriptor limit reached. Increase ulimit or fix descriptor leak.
•ENFILE — System-wide file table full. System is overloaded.
•ENOBUFS/ENOMEM — Insufficient memory for socket structures.
•EPROTONOSUPPORT — Protocol not supported. Check kernel configuration.

Modern Socket Flags

Linux 2.6.27+ allows combining socket type with SOCK_NONBLOCK and SOCK_CLOEXEC flags. This is atomic and avoids race conditions compared to calling fcntl() after socket(). Always use these flags in production code for better security (CLOEXEC) and for non-blocking servers.

Binding and Listening: bind() and listen()

bind() — Assign Local Address:

The bind() call associates a socket with a local address. For servers, this determines which IP:port combination will accept connections.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Parameters:

sockfd: Socket file descriptor from socket()
addr: Pointer to address structure (sockaddr_in, sockaddr_in6)
addrlen: Size of the address structure

Return: 0 on success, -1 on failure

listen() — Enable Connection Queue:

The listen() call marks a socket as passive—ready to accept incoming connections. It also sets the size of the connection backlog queue.

int listen(int sockfd, int backlog);

Parameters:

sockfd: Bound socket file descriptor
backlog: Maximum pending connections queue size

Return: 0 on success, -1 on failure

bind_listen.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
 
#define BACKLOG 128  // Common default; tune based on load
 
// Complete server socket setup
int setup_server_socket(uint16_t port) {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("socket");
        return -1;
    }
    
    // CRITICAL: Set SO_REUSEADDR before bind
    int optval = 1;
    if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, 
                   &optval, sizeof(optval)) == -1) {
        perror("setsockopt SO_REUSEADDR");
        close(sockfd);
        return -1;
    }
    
    // Prepare address structure
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);  // All interfaces
    addr.sin_port = htons(port);
    
    // Bind to address
    if (bind(sockfd, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
        perror("bind");
        close(sockfd);
        return -1;
    }
    
    // Enable listening
    if (listen(sockfd, BACKLOG) == -1) {
        perror("listen");
        close(sockfd);
        return -1;
    }
    
    printf("Server listening on port %d\n", port);
    return sockfd;
}
 
// Retrieve the bound address (useful when binding to port 0)
int get_bound_address(int sockfd) {
    struct sockaddr_in addr;
    socklen_t len = sizeof(addr);
    
    if (getsockname(sockfd, (struct sockaddr*)&addr, &len) == -1) {
        perror("getsockname");
        return -1;
    }
    
    char ip_str[INET_ADDRSTRLEN];
    inet_ntop(AF_INET, &addr.sin_addr, ip_str, sizeof(ip_str));
    printf("Bound to %s:%d\n", ip_str, ntohs(addr.sin_port));
    
    return ntohs(addr.sin_port);
}

Understanding the Backlog:

The backlog parameter in listen() specifies the maximum length of the queue for pending connections. This queue holds connections that have completed the TCP handshake but haven't been accept()ed yet.

Too small: Clients get connection refused during traffic spikes
Too large: Memory waste; slow accept() causes connection timeout
Recommended: At least 128; busy servers use 1024+

Linux systems have a system-wide maximum (/proc/sys/net/core/somaxconn) that caps the backlog regardless of the value passed to listen().

The SYN Flood and Backlog

The backlog queue makes servers vulnerable to SYN flood attacks—attackers send SYN packets but never complete handshakes, filling the queue. Linux mitigates this with SYN cookies, but understanding the vulnerability helps in security analysis. Monitor connection queue metrics in production.

Common bind() and listen() Errors
Function	Error	Meaning	Solution
bind()	EADDRINUSE	Address already in use	Use SO_REUSEADDR or wait for TIME_WAIT
bind()	EACCES	Permission denied	Port < 1024 requires root
bind()	EADDRNOTAVAIL	Address not available	IP not assigned to interface
bind()	EINVAL	Already bound	Socket was already bound
listen()	EADDRINUSE	Port in use	Another socket is listening
listen()	EOPNOTSUPP	Not supported	Socket type doesn't support listen

Accepting Connections: accept()

The accept() call extracts the first pending connection from the backlog queue and creates a new socket for that connection. This new socket is used for communication with the connected client.

Function Signature:

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

Parameters:

sockfd: Listening socket file descriptor
addr: Buffer to receive client's address (can be NULL)
addrlen: In/out parameter—size of addr buffer / actual address size

Return:

Success: New socket file descriptor for the accepted connection
Failure: -1 with errno set

Critical Understanding:

After accept(), you have two sockets:

Listening socket (original): Continues accepting new connections
Connected socket (new): Used to communicate with this specific client

Converting Mermaid diagram...

accept_connections.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <unistd.h>
 
// Basic accept loop
void accept_loop(int listen_fd) {
    struct sockaddr_in client_addr;
    socklen_t addr_len;
    
    while (1) {
        addr_len = sizeof(client_addr);
        
        // accept() blocks until a connection arrives
        int client_fd = accept(listen_fd, 
                               (struct sockaddr*)&client_addr, 
                               &addr_len);
        
        if (client_fd == -1) {
            perror("accept");
            continue;  // Don't crash on transient errors
        }
        
        // Log the connection
        char ip_str[INET_ADDRSTRLEN];
        inet_ntop(AF_INET, &client_addr.sin_addr, ip_str, sizeof(ip_str));
        printf("Connection from %s:%d (fd=%d)\n", 
               ip_str, ntohs(client_addr.sin_port), client_fd);
        
        // Handle the client (simplified - real servers use threads/async)
        handle_client(client_fd);
        
        // Close when done
        close(client_fd);
    }
}
 
// accept4() - Linux-specific with flags
int accept_nonblocking(int listen_fd) {
    struct sockaddr_storage client_addr;
    socklen_t addr_len = sizeof(client_addr);
    
    // Atomically set NONBLOCK and CLOEXEC on the new socket
    int client_fd = accept4(listen_fd,
                           (struct sockaddr*)&client_addr,
                           &addr_len,
                           SOCK_NONBLOCK | SOCK_CLOEXEC);
    
    return client_fd;
}
 
// Getting peer information after accept
void log_connection_details(int client_fd) {
    struct sockaddr_storage peer_addr;
    socklen_t peer_len = sizeof(peer_addr);
    
    // Get peer (remote) address
    if (getpeername(client_fd, (struct sockaddr*)&peer_addr, &peer_len) == 0) {
        char host[NI_MAXHOST], port[NI_MAXSERV];
        getnameinfo((struct sockaddr*)&peer_addr, peer_len,
                    host, sizeof(host), port, sizeof(port),
                    NI_NUMERICHOST | NI_NUMERICSERV);
        printf("Peer: %s:%s\n", host, port);
    }
    
    // Get local address of this socket
    struct sockaddr_storage local_addr;
    socklen_t local_len = sizeof(local_addr);
    
    if (getsockname(client_fd, (struct sockaddr*)&local_addr, &local_len) == 0) {
        char host[NI_MAXHOST], port[NI_MAXSERV];
        getnameinfo((struct sockaddr*)&local_addr, local_len,
                    host, sizeof(host), port, sizeof(port),
                    NI_NUMERICHOST | NI_NUMERICSERV);
        printf("Local: %s:%s\n", host, port);
    }
}

Blocking vs. Non-Blocking Accept

By default, accept() blocks if no connections are pending. For non-blocking servers, either set the listening socket to non-blocking mode (accept() returns EAGAIN/EWOULDBLOCK when empty) or use I/O multiplexing (select/poll/epoll) to wait for incoming connections without blocking.

Common accept() Errors

•EAGAIN/EWOULDBLOCK — Non-blocking socket and no connections pending. Normal condition—retry later.
•EINTR — Interrupted by signal before any connection arrived. Retry the accept() call.
•EMFILE — Process file descriptor limit hit. Close unused fds or increase limit.
•ENFILE — System-wide fd limit hit. System overloaded.
•ECONNABORTED — Connection was aborted before accept completed. Retry.
•ENOBUFS/ENOMEM — Kernel memory exhausted for socket. Severe system issue.

Connecting as Client: connect()

The connect() call initiates a connection to a remote server. For TCP, this triggers the three-way handshake.

Function Signature:

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Parameters:

sockfd: Socket file descriptor from socket()
addr: Address of the remote server
addrlen: Size of the address structure

Return:

Success: 0
Failure: -1 with errno set

For TCP (SOCK_STREAM):

connect() initiates the three-way handshake and blocks until the connection is established or fails. If the socket wasn't explicitly bound, the kernel assigns a local address (ephemeral port).

For UDP (SOCK_DGRAM):

connect() simply sets the default destination address. No network traffic occurs. Subsequent send() calls will use this destination, and the socket will only receive datagrams from this address.

connect_client.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
 
// Simple blocking connect
int connect_blocking(const char *host, const char *port) {
    struct addrinfo hints, *res, *p;
    
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_UNSPEC;      // IPv4 or IPv6
    hints.ai_socktype = SOCK_STREAM;  // TCP
    
    int err = getaddrinfo(host, port, &hints, &res);
    if (err != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
        return -1;
    }
    
    int sockfd;
    for (p = res; p != NULL; p = p->ai_next) {
        sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
        if (sockfd == -1) continue;
        
        if (connect(sockfd, p->ai_addr, p->ai_addrlen) == 0) {
            break;  // Connected!
        }
        
        close(sockfd);
        sockfd = -1;
    }
    
    freeaddrinfo(res);
    
    if (p == NULL) {
        fprintf(stderr, "Failed to connect to %s:%s\n", host, port);
        return -1;
    }
    
    return sockfd;
}
 
// Non-blocking connect with timeout
int connect_with_timeout(const char *host, const char *port, int timeout_sec) {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) return -1;
    
    // Set non-blocking
    int flags = fcntl(sockfd, F_GETFL, 0);
    fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
    
    // Resolve and prepare address
    struct addrinfo hints = {0}, *res;
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    
    if (getaddrinfo(host, port, &hints, &res) != 0) {
        close(sockfd);
        return -1;
    }
    
    // Start connection (non-blocking returns immediately)
    int ret = connect(sockfd, res->ai_addr, res->ai_addrlen);
    freeaddrinfo(res);
    
    if (ret == 0) {
        // Connected immediately (rare, but possible with loopback)
        fcntl(sockfd, F_SETFL, flags);  // Restore blocking mode
        return sockfd;
    }
    
    if (errno != EINPROGRESS) {
        close(sockfd);
        return -1;
    }
    
    // Wait for connection to complete with timeout
    fd_set write_fds;
    FD_ZERO(&write_fds);
    FD_SET(sockfd, &write_fds);
    
    struct timeval tv = { .tv_sec = timeout_sec, .tv_usec = 0 };
    
    ret = select(sockfd + 1, NULL, &write_fds, NULL, &tv);
    
    if (ret <= 0) {
        // Timeout or error
        close(sockfd);
        return -1;
    }
    
    // Check if connection succeeded
    int error = 0;
    socklen_t len = sizeof(error);
    getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len);
    
    if (error != 0) {
        close(sockfd);
        errno = error;
        return -1;
    }
    
    // Success! Restore blocking mode
    fcntl(sockfd, F_SETFL, flags);
    return sockfd;
}

Common connect() Errors and Their Meanings
Error	Meaning	Typical Cause
ECONNREFUSED	Connection refused	Server not listening on that port
ETIMEDOUT	Connection timed out	Network unreachable or firewall blocking
ENETUNREACH	Network unreachable	No route to destination network
EHOSTUNREACH	Host unreachable	Target host is down or no route
EINPROGRESS	Connection in progress	Non-blocking socket; connection started
EALREADY	Already connecting	Previous non-blocking connect not complete
EISCONN	Already connected	Socket is already connected
EADDRNOTAVAIL	Address not available	No local address available

The EINPROGRESS Dance

For non-blocking TCP connect(), EINPROGRESS means the connection is underway—not an error. You must wait for the socket to become writable (using select/poll/epoll), then check SO_ERROR to determine if the connection succeeded or failed. This pattern is essential for connection timeouts and handling unreachable hosts gracefully.

Sending Data: send(), write(), sendto()

Once a connection is established (TCP) or destination is known (UDP), applications send data using several related functions:

send() — Socket-Specific Send:

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

write() — Generic File Descriptor Write:

ssize_t write(int fd, const void *buf, size_t count);

sendto() — Datagram with Explicit Destination:

ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
               const struct sockaddr *dest_addr, socklen_t addrlen);

For TCP sockets, send(sockfd, buf, len, 0) is equivalent to write(sockfd, buf, len). The difference is that send() accepts flags for special behavior.

Important send() Flags

•MSG_DONTWAIT — Non-blocking send even if socket is blocking. Returns EAGAIN if send would block.
•MSG_NOSIGNAL — Don't generate SIGPIPE if peer has closed. Return EPIPE error instead. Essential for robust servers.
•MSG_MORE (Linux) — More data coming; don't send yet (like TCP_CORK). Helps with write coalescing.
•MSG_OOB — Send as out-of-band data (TCP urgent data). Rarely used in modern applications.

send_data.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <sys/socket.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
 
// Robust send that handles partial writes
ssize_t send_all(int sockfd, const void *buf, size_t len) {
    size_t total_sent = 0;
    const char *ptr = buf;
    
    while (total_sent < len) {
        ssize_t sent = send(sockfd, ptr + total_sent, len - total_sent,
                           MSG_NOSIGNAL);  // Prevent SIGPIPE
        
        if (sent == -1) {
            if (errno == EINTR) continue;  // Interrupted, retry
            if (errno == EAGAIN || errno == EWOULDBLOCK) {
                // Non-blocking and would block
                // In real code: use select/poll/epoll to wait
                continue;
            }
            return -1;  // Real error
        }
        
        if (sent == 0) {
            // Connection closed by peer (unusual for send)
            return total_sent;
        }
        
        total_sent += sent;
    }
    
    return total_sent;
}
 
// Sending UDP datagram
ssize_t send_udp(int sockfd, const void *buf, size_t len,
                 const struct sockaddr *dest, socklen_t dest_len) {
    ssize_t sent = sendto(sockfd, buf, len, 0, dest, dest_len);
    
    if (sent == -1) {
        perror("sendto");
        return -1;
    }
    
    // For UDP, partial sends don't make sense
    // Either the entire datagram is sent or it fails
    return sent;
}
 
// Using scatter-gather I/O (send multiple buffers atomically)
ssize_t send_gathered(int sockfd, const char *header, size_t hdr_len,
                      const char *body, size_t body_len) {
    struct iovec iov[2];
    iov[0].iov_base = (void*)header;
    iov[0].iov_len = hdr_len;
    iov[1].iov_base = (void*)body;
    iov[1].iov_len = body_len;
    
    struct msghdr msg = {0};
    msg.msg_iov = iov;
    msg.msg_iovlen = 2;
    
    return sendmsg(sockfd, &msg, MSG_NOSIGNAL);
}

Partial Sends Are Normal

send() and write() may send fewer bytes than requested—this is not an error! The kernel's send buffer may be full, or the network may be congested. Always check the return value and loop to send remaining data. This is so common that 'send_all' helper functions appear in nearly every networking codebase.

TCP vs. UDP Sending Semantics:

Aspect	TCP (SOCK_STREAM)	UDP (SOCK_DGRAM)
Partial sends	Common; must loop	No; datagram sent whole or fails
Ordering	Guaranteed	Not guaranteed
Reliability	Guaranteed delivery	Best-effort
Size limit	No inherent limit (streams)	~65KB per datagram
Buffering	Kernel manages send buffer	Usually immediate send
Destination	Implicit (connected)	Can specify per-send

Receiving Data: recv(), read(), recvfrom()

Receiving data is the counterpart to sending, with similar function variants:

recv() — Socket-Specific Receive:

ssize_t recv(int sockfd, void *buf, size_t len, int flags);

read() — Generic File Descriptor Read:

ssize_t read(int fd, void *buf, size_t count);

recvfrom() — Datagram with Source Address:

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                 struct sockaddr *src_addr, socklen_t *addrlen);

Return Value Semantics:

Positive: Number of bytes received
Zero: Peer has closed the connection (EOF)
-1: Error occurred (check errno)

Important recv() Flags

•MSG_PEEK — Return data without removing it from the receive queue. Useful for protocol detection.
•MSG_WAITALL — Block until the full requested amount is received (or error/EOF). Use with caution.
•MSG_DONTWAIT — Non-blocking receive even on blocking socket.
•MSG_TRUNC (UDP) — Return full datagram size even if buffer was too small.
•MSG_OOB — Receive out-of-band data.

recv_data.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#include <sys/socket.h>
#include <errno.h>
#include <stdio.h>
 
// Receive exactly n bytes (TCP)
ssize_t recv_exact(int sockfd, void *buf, size_t len) {
    size_t total_recv = 0;
    char *ptr = buf;
    
    while (total_recv < len) {
        ssize_t n = recv(sockfd, ptr + total_recv, len - total_recv, 0);
        
        if (n == -1) {
            if (errno == EINTR) continue;  // Signal interrupted, retry
            return -1;  // Real error
        }
        
        if (n == 0) {
            // Peer closed connection before we got all data
            // This might be an error depending on protocol
            break;
        }
        
        total_recv += n;
    }
    
    return total_recv;
}
 
// Peek at incoming data without consuming it
int peek_protocol_header(int sockfd, char *buf, size_t len) {
    ssize_t n = recv(sockfd, buf, len, MSG_PEEK);
    
    if (n <= 0) return n;
    
    // Data is still in the receive queue; next recv() will return it again
    return n;
}
 
// Receive UDP datagram with sender address
ssize_t recv_udp(int sockfd, void *buf, size_t len,
                 struct sockaddr *sender, socklen_t *sender_len) {
    ssize_t n = recvfrom(sockfd, buf, len, 0, sender, sender_len);
    
    if (n == -1) {
        perror("recvfrom");
        return -1;
    }
    
    // For UDP: entire datagram is returned (or truncated if buffer too small)
    return n;
}
 
// Common pattern: message loop
void message_loop(int sockfd) {
    char buf[4096];
    
    while (1) {
        ssize_t n = recv(sockfd, buf, sizeof(buf), 0);
        
        if (n == -1) {
            if (errno == EINTR) continue;
            perror("recv");
            break;
        }
        
        if (n == 0) {
            printf("Connection closed by peer\n");
            break;
        }
        
        // Process the received data
        process_message(buf, n);
    }
}
 
// Scatter-gather receive
ssize_t recv_scattered(int sockfd, char *header, size_t hdr_len,
                       char *body, size_t body_len) {
    struct iovec iov[2];
    iov[0].iov_base = header;
    iov[0].iov_len = hdr_len;
    iov[1].iov_base = body;
    iov[1].iov_len = body_len;
    
    struct msghdr msg = {0};
    msg.msg_iov = iov;
    msg.msg_iovlen = 2;
    
    return recvmsg(sockfd, &msg, 0);
}

Zero Return Means EOF

When recv() or read() returns 0 on a stream socket, the peer has closed their sending side of the connection. This is the normal end-of-data signal—your application should close its end and clean up. Don't confuse return value 0 with an error; it's the expected way to detect connection closure.

TCP Streaming Nature:

TCP is a byte stream protocol. If the sender sends two 100-byte messages, the receiver might receive:

One 200-byte chunk, or
One 150-byte and one 50-byte chunk, or
Four 50-byte chunks, or
Any other combination

Message boundaries are not preserved. Applications must implement their own framing:

Length-prefixed: First send message length, then message content
Delimiter-based: Use special characters (newlines, null bytes) as separators
Fixed-size: All messages are the same size
Self-describing: Messages encode their own boundaries (JSON, Protocol Buffers)

Closing Sockets: close() and shutdown()

Properly closing sockets is critical for resource management and correct protocol behavior.

close() — Release Socket Resources:

int close(int fd);

close() decrements the socket's reference count. When it reaches zero, the kernel:

Sends any remaining buffered data
Initiates TCP connection termination (FIN)
Releases the file descriptor
Eventually frees socket resources

shutdown() — Partial Connection Closure:

int shutdown(int sockfd, int how);

shutdown() allows closing one direction of a bidirectional connection:

SHUT_RD: Close reading side; further receives return EOF
SHUT_WR: Close writing side; sends FIN to peer
SHUT_RDWR: Close both directions (similar to close)

close() vs. shutdown() Comparison
Aspect	close()	shutdown()
Reference counting	Decrements count; closes when zero	Immediate effect regardless of refs
Forked processes	Each process needs own close()	Affects all copies of socket
Partial close	No; closes everything	Yes; can close read/write separately
FIN timing	Sends FIN when refcount hits zero	SHUT_WR sends FIN immediately
Resource release	Releases fd	Does NOT release fd

close_shutdown.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <sys/socket.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
 
// Simple close
void simple_close(int sockfd) {
    if (close(sockfd) == -1) {
        perror("close");
    }
}
 
// Graceful HTTP-style shutdown (send then read until EOF)
void graceful_close(int sockfd) {
    // Signal that we're done sending
    if (shutdown(sockfd, SHUT_WR) == -1) {
        perror("shutdown");
        close(sockfd);
        return;
    }
    
    // Read remaining data from peer until they close
    char buf[4096];
    while (1) {
        ssize_t n = recv(sockfd, buf, sizeof(buf), 0);
        if (n <= 0) break;  // EOF or error
        // Process remaining data...
    }
    
    // Now fully close
    close(sockfd);
}
 
// Abortive close (immediate RST, no FIN handshake)
void abortive_close(int sockfd) {
    struct linger l = {
        .l_onoff = 1,   // Enable linger
        .l_linger = 0   // Zero timeout = RST on close
    };
    
    setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
    close(sockfd);  // Sends RST immediately
}
 
// Half-close pattern for request/response protocols
void half_close_pattern(int sockfd, const char *request, size_t req_len) {
    // Send request
    send(sockfd, request, req_len, 0);
    
    // Signal we're done sending
    shutdown(sockfd, SHUT_WR);
    
    // Read entire response until server closes
    char response[65536];
    size_t total = 0;
    ssize_t n;
    
    while ((n = recv(sockfd, response + total, 
                     sizeof(response) - total, 0)) > 0) {
        total += n;
    }
    
    // Done with connection
    close(sockfd);
}

Data Loss on Close

close() may discard unsent data in the socket buffer if the peer isn't reading. For important data, either: (1) use shutdown(SHUT_WR) and wait for application-level acknowledgment, or (2) set SO_LINGER with a non-zero timeout to wait for data delivery. The default behavior prioritizes quick resource release over guaranteed delivery.

Why shutdown() Matters:

Forked servers: After fork(), both parent and child have the socket. Parent needs shutdown(SHUT_WR) to send FIN; just close() won't work until child also closes.
Request/Response protocols: Client sends request, then shutdown(SHUT_WR). Server reads until EOF, knows request is complete, sends response, closes.
Detecting peer closure: After shutdown(SHUT_WR), you can still recv() to see if peer sends anything before closing.
Urgent data: Some protocols use shutdown to signal end of stream while keeping socket open for control messages.

Socket Options: setsockopt() and getsockopt()

Socket behavior can be tuned through options. The setsockopt() and getsockopt() functions provide this control.

Function Signatures:

int setsockopt(int sockfd, int level, int optname,
               const void *optval, socklen_t optlen);
               
int getsockopt(int sockfd, int level, int optname,
               void *optval, socklen_t *optlen);

Levels:

SOL_SOCKET: Generic socket options
IPPROTO_TCP: TCP-specific options
IPPROTO_IP: IPv4 options
IPPROTO_IPV6: IPv6 options

Essential Socket Options
Option	Level	Purpose	Typical Value
SO_REUSEADDR	SOL_SOCKET	Reuse address in TIME_WAIT	1 (enabled)
SO_REUSEPORT	SOL_SOCKET	Multiple sockets on same port	1 (enabled)
SO_KEEPALIVE	SOL_SOCKET	Enable TCP keepalive probes	1 (enabled)
SO_RCVBUF	SOL_SOCKET	Receive buffer size	Bytes
SO_SNDBUF	SOL_SOCKET	Send buffer size	Bytes
SO_LINGER	SOL_SOCKET	Linger on close behavior	struct linger
TCP_NODELAY	IPPROTO_TCP	Disable Nagle's algorithm	1 (disabled)
TCP_CORK	IPPROTO_TCP	Cork output until uncorked	1/0
TCP_KEEPIDLE	IPPROTO_TCP	Seconds before keepalive start	Seconds
TCP_KEEPINTVL	IPPROTO_TCP	Seconds between keepalive probes	Seconds
TCP_KEEPCNT	IPPROTO_TCP	Keepalive probe count	Count

socket_options.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <sys/socket.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <stdio.h>
 
// Configure server socket for production use
void configure_server_socket(int sockfd) {
    int optval = 1;
    
    // Essential: allow address reuse after restart
    setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));
    
    // Enable TCP keepalive
    setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(optval));
    
#ifdef __linux__
    // Linux-specific: tune keepalive timing
    int idle = 60;     // Start keepalive after 60s idle
    int interval = 10; // Send probes every 10s
    int count = 5;     // Fail after 5 probes
    
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPIDLE, &idle, sizeof(idle));
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPINTVL, &interval, sizeof(interval));
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPCNT, &count, sizeof(count));
#endif
}
 
// Configure client socket for low latency
void configure_low_latency_socket(int sockfd) {
    int optval = 1;
    
    // Disable Nagle's algorithm for immediate sends
    setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval));
    
    // For interactive applications, lower send buffer can reduce latency
    int bufsize = 16 * 1024;  // 16KB instead of default ~64KB
    setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &bufsize, sizeof(bufsize));
}
 
// Configure socket for high throughput
void configure_high_throughput_socket(int sockfd) {
    // Large buffers for high-bandwidth connections
    int bufsize = 1024 * 1024;  // 1MB
    
    setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &bufsize, sizeof(bufsize));
    setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &bufsize, sizeof(bufsize));
    
    // Keep Nagle enabled for better batching (default, but explicit)
    int optval = 0;
    setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval));
}
 
// Query current socket options
void print_socket_info(int sockfd) {
    int optval;
    socklen_t optlen = sizeof(optval);
    
    getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &optval, &optlen);
    printf("Receive buffer: %d bytes\n", optval);
    
    getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &optval, &optlen);
    printf("Send buffer: %d bytes\n", optval);
    
    getsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &optval, &optlen);
    printf("TCP_NODELAY: %s\n", optval ? "on" : "off");
}

TCP_NODELAY vs. Performance

Nagle's algorithm delays small sends to batch them efficiently. Disabling it (TCP_NODELAY=1) reduces latency for interactive apps but may reduce throughput. For request/response protocols, the interaction with delayed ACKs can cause artificial delays. Understand your workload before tuning—the defaults are reasonable for most applications.

Summary: The Socket API

We've covered the complete socket API—the building blocks for all network programming. Let's consolidate the essential functions and patterns:

Key Takeaways

•socket() creates the endpoint — Specifies domain (IPv4/IPv6), type (stream/datagram), and optionally protocol.
•bind() assigns local address — Essential for servers; optional for clients (OS assigns ephemeral port).
•listen() enables connection queue — Transforms socket into passive listener with configurable backlog.
•accept() extracts pending connections — Returns a NEW socket for each client; listening socket continues accepting.
•connect() initiates client connection — Triggers TCP handshake; for UDP, sets default destination.
•send()/recv() transfer data — Handle partial transfers; check return values; use flags for special behavior.
•close()/shutdown() terminate connections — shutdown() for half-close; close() for full release.
•setsockopt()/getsockopt() tune behavior — Essential options: SO_REUSEADDR, TCP_NODELAY, SO_KEEPALIVE.

What's Next:

With the API understood, we're ready to see it in action. The next page explores Client/Server Sockets—the architectural patterns for building network applications, from simple iterative servers to scalable concurrent designs using threads, processes, and event-driven I/O.

API Mastered

You now understand the complete Socket API—every major function, its parameters, return values, and common errors. This knowledge enables you to read, write, and debug socket code in any language that wraps these system calls. The patterns you've learned apply from C to Python to Go.

3 / 5

Loading learning content...

Computer NetworksTransport Layer Concepts

Sockets

LevelIntermediate

Duration60 mins

TopicTransport Layer Concepts

3 / 5

Socket API

The Programming Interface

Mastering the socket API means understanding not just what each function does, but when to use each, what errors to expect, and how they interact to create robust network applications.

What You Will Learn

Socket Creation: socket()

Every socket operation begins with creating a socket. The socket() system call allocates a new socket and returns a file descriptor for subsequent operations.

Function Signature:

int socket(int domain, int type, int protocol);

Parameters:

Parameter	Description	Common Values
domain	Address family (protocol family)	AF_INET (IPv4), AF_INET6 (IPv6), AF_UNIX (local)
type	Socket type (communication semantics)	SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW
protocol	Specific protocol (usually 0 for default)	0, IPPROTO_TCP, IPPROTO_UDP

Return Value:

Success: Non-negative file descriptor
Failure: -1 with errno set

socket_creation.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
 
// Create a TCP socket (IPv4)
int create_tcp_socket_v4() {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create a TCP socket (IPv6, dual-stack capable)
int create_tcp_socket_v6() {
    int sockfd = socket(AF_INET6, SOCK_STREAM, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create a UDP socket
int create_udp_socket() {
    int sockfd = socket(AF_INET, SOCK_DGRAM, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create a socket with explicit protocol (equivalent to above)
int create_explicit_tcp_socket() {
    int sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}
 
// Create socket with non-blocking and close-on-exec flags (Linux 2.6.27+)
int create_nonblocking_socket() {
    int sockfd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC, 0);
    if (sockfd == -1) {
        fprintf(stderr, "socket() failed: %s\n", strerror(errno));
        return -1;
    }
    return sockfd;
}

What Happens Internally:

When socket() succeeds, the kernel:

Allocates a new file descriptor in the process's fd table
Creates a socket structure in kernel memory
Associates the socket with the specified protocol stack
Initializes send/receive buffers
Sets default socket options

The returned file descriptor can be used with standard file operations (read, write, close) as well as socket-specific operations.

Common socket() Errors

•EACCES — Permission denied. Raw sockets require root privileges.
•EAFNOSUPPORT — Address family not supported. The kernel doesn't support this protocol family.
•EMFILE — Per-process file descriptor limit reached. Increase ulimit or fix descriptor leak.
•ENFILE — System-wide file table full. System is overloaded.
•ENOBUFS/ENOMEM — Insufficient memory for socket structures.
•EPROTONOSUPPORT — Protocol not supported. Check kernel configuration.

Modern Socket Flags

Binding and Listening: bind() and listen()

bind() — Assign Local Address:

The bind() call associates a socket with a local address. For servers, this determines which IP:port combination will accept connections.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Parameters:

sockfd: Socket file descriptor from socket()
addr: Pointer to address structure (sockaddr_in, sockaddr_in6)
addrlen: Size of the address structure

Return: 0 on success, -1 on failure

listen() — Enable Connection Queue:

The listen() call marks a socket as passive—ready to accept incoming connections. It also sets the size of the connection backlog queue.

int listen(int sockfd, int backlog);

Parameters:

sockfd: Bound socket file descriptor
backlog: Maximum pending connections queue size

Return: 0 on success, -1 on failure

bind_listen.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
 
#define BACKLOG 128  // Common default; tune based on load
 
// Complete server socket setup
int setup_server_socket(uint16_t port) {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) {
        perror("socket");
        return -1;
    }
    
    // CRITICAL: Set SO_REUSEADDR before bind
    int optval = 1;
    if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, 
                   &optval, sizeof(optval)) == -1) {
        perror("setsockopt SO_REUSEADDR");
        close(sockfd);
        return -1;
    }
    
    // Prepare address structure
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);  // All interfaces
    addr.sin_port = htons(port);
    
    // Bind to address
    if (bind(sockfd, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
        perror("bind");
        close(sockfd);
        return -1;
    }
    
    // Enable listening
    if (listen(sockfd, BACKLOG) == -1) {
        perror("listen");
        close(sockfd);
        return -1;
    }
    
    printf("Server listening on port %d\n", port);
    return sockfd;
}
 
// Retrieve the bound address (useful when binding to port 0)
int get_bound_address(int sockfd) {
    struct sockaddr_in addr;
    socklen_t len = sizeof(addr);
    
    if (getsockname(sockfd, (struct sockaddr*)&addr, &len) == -1) {
        perror("getsockname");
        return -1;
    }
    
    char ip_str[INET_ADDRSTRLEN];
    inet_ntop(AF_INET, &addr.sin_addr, ip_str, sizeof(ip_str));
    printf("Bound to %s:%d\n", ip_str, ntohs(addr.sin_port));
    
    return ntohs(addr.sin_port);
}

Understanding the Backlog:

Too small: Clients get connection refused during traffic spikes
Too large: Memory waste; slow accept() causes connection timeout
Recommended: At least 128; busy servers use 1024+

Linux systems have a system-wide maximum (/proc/sys/net/core/somaxconn) that caps the backlog regardless of the value passed to listen().

The SYN Flood and Backlog

Common bind() and listen() Errors
Function	Error	Meaning	Solution
bind()	EADDRINUSE	Address already in use	Use SO_REUSEADDR or wait for TIME_WAIT
bind()	EACCES	Permission denied	Port < 1024 requires root
bind()	EADDRNOTAVAIL	Address not available	IP not assigned to interface
bind()	EINVAL	Already bound	Socket was already bound
listen()	EADDRINUSE	Port in use	Another socket is listening
listen()	EOPNOTSUPP	Not supported	Socket type doesn't support listen

Accepting Connections: accept()

The accept() call extracts the first pending connection from the backlog queue and creates a new socket for that connection. This new socket is used for communication with the connected client.

Function Signature:

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

Parameters:

sockfd: Listening socket file descriptor
addr: Buffer to receive client's address (can be NULL)
addrlen: In/out parameter—size of addr buffer / actual address size

Return:

Success: New socket file descriptor for the accepted connection
Failure: -1 with errno set

Critical Understanding:

After accept(), you have two sockets:

Listening socket (original): Continues accepting new connections
Connected socket (new): Used to communicate with this specific client

Converting Mermaid diagram...

accept_connections.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <unistd.h>
 
// Basic accept loop
void accept_loop(int listen_fd) {
    struct sockaddr_in client_addr;
    socklen_t addr_len;
    
    while (1) {
        addr_len = sizeof(client_addr);
        
        // accept() blocks until a connection arrives
        int client_fd = accept(listen_fd, 
                               (struct sockaddr*)&client_addr, 
                               &addr_len);
        
        if (client_fd == -1) {
            perror("accept");
            continue;  // Don't crash on transient errors
        }
        
        // Log the connection
        char ip_str[INET_ADDRSTRLEN];
        inet_ntop(AF_INET, &client_addr.sin_addr, ip_str, sizeof(ip_str));
        printf("Connection from %s:%d (fd=%d)\n", 
               ip_str, ntohs(client_addr.sin_port), client_fd);
        
        // Handle the client (simplified - real servers use threads/async)
        handle_client(client_fd);
        
        // Close when done
        close(client_fd);
    }
}
 
// accept4() - Linux-specific with flags
int accept_nonblocking(int listen_fd) {
    struct sockaddr_storage client_addr;
    socklen_t addr_len = sizeof(client_addr);
    
    // Atomically set NONBLOCK and CLOEXEC on the new socket
    int client_fd = accept4(listen_fd,
                           (struct sockaddr*)&client_addr,
                           &addr_len,
                           SOCK_NONBLOCK | SOCK_CLOEXEC);
    
    return client_fd;
}
 
// Getting peer information after accept
void log_connection_details(int client_fd) {
    struct sockaddr_storage peer_addr;
    socklen_t peer_len = sizeof(peer_addr);
    
    // Get peer (remote) address
    if (getpeername(client_fd, (struct sockaddr*)&peer_addr, &peer_len) == 0) {
        char host[NI_MAXHOST], port[NI_MAXSERV];
        getnameinfo((struct sockaddr*)&peer_addr, peer_len,
                    host, sizeof(host), port, sizeof(port),
                    NI_NUMERICHOST | NI_NUMERICSERV);
        printf("Peer: %s:%s\n", host, port);
    }
    
    // Get local address of this socket
    struct sockaddr_storage local_addr;
    socklen_t local_len = sizeof(local_addr);
    
    if (getsockname(client_fd, (struct sockaddr*)&local_addr, &local_len) == 0) {
        char host[NI_MAXHOST], port[NI_MAXSERV];
        getnameinfo((struct sockaddr*)&local_addr, local_len,
                    host, sizeof(host), port, sizeof(port),
                    NI_NUMERICHOST | NI_NUMERICSERV);
        printf("Local: %s:%s\n", host, port);
    }
}

Blocking vs. Non-Blocking Accept

Common accept() Errors

•EAGAIN/EWOULDBLOCK — Non-blocking socket and no connections pending. Normal condition—retry later.
•EINTR — Interrupted by signal before any connection arrived. Retry the accept() call.
•EMFILE — Process file descriptor limit hit. Close unused fds or increase limit.
•ENFILE — System-wide fd limit hit. System overloaded.
•ECONNABORTED — Connection was aborted before accept completed. Retry.
•ENOBUFS/ENOMEM — Kernel memory exhausted for socket. Severe system issue.

Connecting as Client: connect()

The connect() call initiates a connection to a remote server. For TCP, this triggers the three-way handshake.

Function Signature:

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Parameters:

sockfd: Socket file descriptor from socket()
addr: Address of the remote server
addrlen: Size of the address structure

Return:

Success: 0
Failure: -1 with errno set

For TCP (SOCK_STREAM):

connect() initiates the three-way handshake and blocks until the connection is established or fails. If the socket wasn't explicitly bound, the kernel assigns a local address (ephemeral port).

For UDP (SOCK_DGRAM):

connect() simply sets the default destination address. No network traffic occurs. Subsequent send() calls will use this destination, and the socket will only receive datagrams from this address.

connect_client.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
 
// Simple blocking connect
int connect_blocking(const char *host, const char *port) {
    struct addrinfo hints, *res, *p;
    
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_UNSPEC;      // IPv4 or IPv6
    hints.ai_socktype = SOCK_STREAM;  // TCP
    
    int err = getaddrinfo(host, port, &hints, &res);
    if (err != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
        return -1;
    }
    
    int sockfd;
    for (p = res; p != NULL; p = p->ai_next) {
        sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
        if (sockfd == -1) continue;
        
        if (connect(sockfd, p->ai_addr, p->ai_addrlen) == 0) {
            break;  // Connected!
        }
        
        close(sockfd);
        sockfd = -1;
    }
    
    freeaddrinfo(res);
    
    if (p == NULL) {
        fprintf(stderr, "Failed to connect to %s:%s\n", host, port);
        return -1;
    }
    
    return sockfd;
}
 
// Non-blocking connect with timeout
int connect_with_timeout(const char *host, const char *port, int timeout_sec) {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd == -1) return -1;
    
    // Set non-blocking
    int flags = fcntl(sockfd, F_GETFL, 0);
    fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
    
    // Resolve and prepare address
    struct addrinfo hints = {0}, *res;
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    
    if (getaddrinfo(host, port, &hints, &res) != 0) {
        close(sockfd);
        return -1;
    }
    
    // Start connection (non-blocking returns immediately)
    int ret = connect(sockfd, res->ai_addr, res->ai_addrlen);
    freeaddrinfo(res);
    
    if (ret == 0) {
        // Connected immediately (rare, but possible with loopback)
        fcntl(sockfd, F_SETFL, flags);  // Restore blocking mode
        return sockfd;
    }
    
    if (errno != EINPROGRESS) {
        close(sockfd);
        return -1;
    }
    
    // Wait for connection to complete with timeout
    fd_set write_fds;
    FD_ZERO(&write_fds);
    FD_SET(sockfd, &write_fds);
    
    struct timeval tv = { .tv_sec = timeout_sec, .tv_usec = 0 };
    
    ret = select(sockfd + 1, NULL, &write_fds, NULL, &tv);
    
    if (ret <= 0) {
        // Timeout or error
        close(sockfd);
        return -1;
    }
    
    // Check if connection succeeded
    int error = 0;
    socklen_t len = sizeof(error);
    getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len);
    
    if (error != 0) {
        close(sockfd);
        errno = error;
        return -1;
    }
    
    // Success! Restore blocking mode
    fcntl(sockfd, F_SETFL, flags);
    return sockfd;
}

Common connect() Errors and Their Meanings
Error	Meaning	Typical Cause
ECONNREFUSED	Connection refused	Server not listening on that port
ETIMEDOUT	Connection timed out	Network unreachable or firewall blocking
ENETUNREACH	Network unreachable	No route to destination network
EHOSTUNREACH	Host unreachable	Target host is down or no route
EINPROGRESS	Connection in progress	Non-blocking socket; connection started
EALREADY	Already connecting	Previous non-blocking connect not complete
EISCONN	Already connected	Socket is already connected
EADDRNOTAVAIL	Address not available	No local address available

The EINPROGRESS Dance

Sending Data: send(), write(), sendto()

Once a connection is established (TCP) or destination is known (UDP), applications send data using several related functions:

send() — Socket-Specific Send:

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

write() — Generic File Descriptor Write:

ssize_t write(int fd, const void *buf, size_t count);

sendto() — Datagram with Explicit Destination:

ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
               const struct sockaddr *dest_addr, socklen_t addrlen);

For TCP sockets, send(sockfd, buf, len, 0) is equivalent to write(sockfd, buf, len). The difference is that send() accepts flags for special behavior.

Important send() Flags

•MSG_DONTWAIT — Non-blocking send even if socket is blocking. Returns EAGAIN if send would block.
•MSG_NOSIGNAL — Don't generate SIGPIPE if peer has closed. Return EPIPE error instead. Essential for robust servers.
•MSG_MORE (Linux) — More data coming; don't send yet (like TCP_CORK). Helps with write coalescing.
•MSG_OOB — Send as out-of-band data (TCP urgent data). Rarely used in modern applications.

send_data.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <sys/socket.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
 
// Robust send that handles partial writes
ssize_t send_all(int sockfd, const void *buf, size_t len) {
    size_t total_sent = 0;
    const char *ptr = buf;
    
    while (total_sent < len) {
        ssize_t sent = send(sockfd, ptr + total_sent, len - total_sent,
                           MSG_NOSIGNAL);  // Prevent SIGPIPE
        
        if (sent == -1) {
            if (errno == EINTR) continue;  // Interrupted, retry
            if (errno == EAGAIN || errno == EWOULDBLOCK) {
                // Non-blocking and would block
                // In real code: use select/poll/epoll to wait
                continue;
            }
            return -1;  // Real error
        }
        
        if (sent == 0) {
            // Connection closed by peer (unusual for send)
            return total_sent;
        }
        
        total_sent += sent;
    }
    
    return total_sent;
}
 
// Sending UDP datagram
ssize_t send_udp(int sockfd, const void *buf, size_t len,
                 const struct sockaddr *dest, socklen_t dest_len) {
    ssize_t sent = sendto(sockfd, buf, len, 0, dest, dest_len);
    
    if (sent == -1) {
        perror("sendto");
        return -1;
    }
    
    // For UDP, partial sends don't make sense
    // Either the entire datagram is sent or it fails
    return sent;
}
 
// Using scatter-gather I/O (send multiple buffers atomically)
ssize_t send_gathered(int sockfd, const char *header, size_t hdr_len,
                      const char *body, size_t body_len) {
    struct iovec iov[2];
    iov[0].iov_base = (void*)header;
    iov[0].iov_len = hdr_len;
    iov[1].iov_base = (void*)body;
    iov[1].iov_len = body_len;
    
    struct msghdr msg = {0};
    msg.msg_iov = iov;
    msg.msg_iovlen = 2;
    
    return sendmsg(sockfd, &msg, MSG_NOSIGNAL);
}

Partial Sends Are Normal

TCP vs. UDP Sending Semantics:

Aspect	TCP (SOCK_STREAM)	UDP (SOCK_DGRAM)
Partial sends	Common; must loop	No; datagram sent whole or fails
Ordering	Guaranteed	Not guaranteed
Reliability	Guaranteed delivery	Best-effort
Size limit	No inherent limit (streams)	~65KB per datagram
Buffering	Kernel manages send buffer	Usually immediate send
Destination	Implicit (connected)	Can specify per-send

Receiving Data: recv(), read(), recvfrom()

Receiving data is the counterpart to sending, with similar function variants:

recv() — Socket-Specific Receive:

ssize_t recv(int sockfd, void *buf, size_t len, int flags);

read() — Generic File Descriptor Read:

ssize_t read(int fd, void *buf, size_t count);

recvfrom() — Datagram with Source Address:

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,
                 struct sockaddr *src_addr, socklen_t *addrlen);

Return Value Semantics:

Positive: Number of bytes received
Zero: Peer has closed the connection (EOF)
-1: Error occurred (check errno)

Important recv() Flags

•MSG_PEEK — Return data without removing it from the receive queue. Useful for protocol detection.
•MSG_WAITALL — Block until the full requested amount is received (or error/EOF). Use with caution.
•MSG_DONTWAIT — Non-blocking receive even on blocking socket.
•MSG_TRUNC (UDP) — Return full datagram size even if buffer was too small.
•MSG_OOB — Receive out-of-band data.

recv_data.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#include <sys/socket.h>
#include <errno.h>
#include <stdio.h>
 
// Receive exactly n bytes (TCP)
ssize_t recv_exact(int sockfd, void *buf, size_t len) {
    size_t total_recv = 0;
    char *ptr = buf;
    
    while (total_recv < len) {
        ssize_t n = recv(sockfd, ptr + total_recv, len - total_recv, 0);
        
        if (n == -1) {
            if (errno == EINTR) continue;  // Signal interrupted, retry
            return -1;  // Real error
        }
        
        if (n == 0) {
            // Peer closed connection before we got all data
            // This might be an error depending on protocol
            break;
        }
        
        total_recv += n;
    }
    
    return total_recv;
}
 
// Peek at incoming data without consuming it
int peek_protocol_header(int sockfd, char *buf, size_t len) {
    ssize_t n = recv(sockfd, buf, len, MSG_PEEK);
    
    if (n <= 0) return n;
    
    // Data is still in the receive queue; next recv() will return it again
    return n;
}
 
// Receive UDP datagram with sender address
ssize_t recv_udp(int sockfd, void *buf, size_t len,
                 struct sockaddr *sender, socklen_t *sender_len) {
    ssize_t n = recvfrom(sockfd, buf, len, 0, sender, sender_len);
    
    if (n == -1) {
        perror("recvfrom");
        return -1;
    }
    
    // For UDP: entire datagram is returned (or truncated if buffer too small)
    return n;
}
 
// Common pattern: message loop
void message_loop(int sockfd) {
    char buf[4096];
    
    while (1) {
        ssize_t n = recv(sockfd, buf, sizeof(buf), 0);
        
        if (n == -1) {
            if (errno == EINTR) continue;
            perror("recv");
            break;
        }
        
        if (n == 0) {
            printf("Connection closed by peer\n");
            break;
        }
        
        // Process the received data
        process_message(buf, n);
    }
}
 
// Scatter-gather receive
ssize_t recv_scattered(int sockfd, char *header, size_t hdr_len,
                       char *body, size_t body_len) {
    struct iovec iov[2];
    iov[0].iov_base = header;
    iov[0].iov_len = hdr_len;
    iov[1].iov_base = body;
    iov[1].iov_len = body_len;
    
    struct msghdr msg = {0};
    msg.msg_iov = iov;
    msg.msg_iovlen = 2;
    
    return recvmsg(sockfd, &msg, 0);
}

Zero Return Means EOF

TCP Streaming Nature:

TCP is a byte stream protocol. If the sender sends two 100-byte messages, the receiver might receive:

One 200-byte chunk, or
One 150-byte and one 50-byte chunk, or
Four 50-byte chunks, or
Any other combination

Message boundaries are not preserved. Applications must implement their own framing:

Length-prefixed: First send message length, then message content
Delimiter-based: Use special characters (newlines, null bytes) as separators
Fixed-size: All messages are the same size
Self-describing: Messages encode their own boundaries (JSON, Protocol Buffers)

Closing Sockets: close() and shutdown()

Properly closing sockets is critical for resource management and correct protocol behavior.

close() — Release Socket Resources:

int close(int fd);

close() decrements the socket's reference count. When it reaches zero, the kernel:

Sends any remaining buffered data
Initiates TCP connection termination (FIN)
Releases the file descriptor
Eventually frees socket resources

shutdown() — Partial Connection Closure:

int shutdown(int sockfd, int how);

shutdown() allows closing one direction of a bidirectional connection:

SHUT_RD: Close reading side; further receives return EOF
SHUT_WR: Close writing side; sends FIN to peer
SHUT_RDWR: Close both directions (similar to close)

close() vs. shutdown() Comparison
Aspect	close()	shutdown()
Reference counting	Decrements count; closes when zero	Immediate effect regardless of refs
Forked processes	Each process needs own close()	Affects all copies of socket
Partial close	No; closes everything	Yes; can close read/write separately
FIN timing	Sends FIN when refcount hits zero	SHUT_WR sends FIN immediately
Resource release	Releases fd	Does NOT release fd

close_shutdown.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <sys/socket.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
 
// Simple close
void simple_close(int sockfd) {
    if (close(sockfd) == -1) {
        perror("close");
    }
}
 
// Graceful HTTP-style shutdown (send then read until EOF)
void graceful_close(int sockfd) {
    // Signal that we're done sending
    if (shutdown(sockfd, SHUT_WR) == -1) {
        perror("shutdown");
        close(sockfd);
        return;
    }
    
    // Read remaining data from peer until they close
    char buf[4096];
    while (1) {
        ssize_t n = recv(sockfd, buf, sizeof(buf), 0);
        if (n <= 0) break;  // EOF or error
        // Process remaining data...
    }
    
    // Now fully close
    close(sockfd);
}
 
// Abortive close (immediate RST, no FIN handshake)
void abortive_close(int sockfd) {
    struct linger l = {
        .l_onoff = 1,   // Enable linger
        .l_linger = 0   // Zero timeout = RST on close
    };
    
    setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
    close(sockfd);  // Sends RST immediately
}
 
// Half-close pattern for request/response protocols
void half_close_pattern(int sockfd, const char *request, size_t req_len) {
    // Send request
    send(sockfd, request, req_len, 0);
    
    // Signal we're done sending
    shutdown(sockfd, SHUT_WR);
    
    // Read entire response until server closes
    char response[65536];
    size_t total = 0;
    ssize_t n;
    
    while ((n = recv(sockfd, response + total, 
                     sizeof(response) - total, 0)) > 0) {
        total += n;
    }
    
    // Done with connection
    close(sockfd);
}

Data Loss on Close

Why shutdown() Matters:

Forked servers: After fork(), both parent and child have the socket. Parent needs shutdown(SHUT_WR) to send FIN; just close() won't work until child also closes.
Request/Response protocols: Client sends request, then shutdown(SHUT_WR). Server reads until EOF, knows request is complete, sends response, closes.
Detecting peer closure: After shutdown(SHUT_WR), you can still recv() to see if peer sends anything before closing.
Urgent data: Some protocols use shutdown to signal end of stream while keeping socket open for control messages.

Socket Options: setsockopt() and getsockopt()

Socket behavior can be tuned through options. The setsockopt() and getsockopt() functions provide this control.

Function Signatures:

int setsockopt(int sockfd, int level, int optname,
               const void *optval, socklen_t optlen);
               
int getsockopt(int sockfd, int level, int optname,
               void *optval, socklen_t *optlen);

Levels:

SOL_SOCKET: Generic socket options
IPPROTO_TCP: TCP-specific options
IPPROTO_IP: IPv4 options
IPPROTO_IPV6: IPv6 options

Essential Socket Options
Option	Level	Purpose	Typical Value
SO_REUSEADDR	SOL_SOCKET	Reuse address in TIME_WAIT	1 (enabled)
SO_REUSEPORT	SOL_SOCKET	Multiple sockets on same port	1 (enabled)
SO_KEEPALIVE	SOL_SOCKET	Enable TCP keepalive probes	1 (enabled)
SO_RCVBUF	SOL_SOCKET	Receive buffer size	Bytes
SO_SNDBUF	SOL_SOCKET	Send buffer size	Bytes
SO_LINGER	SOL_SOCKET	Linger on close behavior	struct linger
TCP_NODELAY	IPPROTO_TCP	Disable Nagle's algorithm	1 (disabled)
TCP_CORK	IPPROTO_TCP	Cork output until uncorked	1/0
TCP_KEEPIDLE	IPPROTO_TCP	Seconds before keepalive start	Seconds
TCP_KEEPINTVL	IPPROTO_TCP	Seconds between keepalive probes	Seconds
TCP_KEEPCNT	IPPROTO_TCP	Keepalive probe count	Count

socket_options.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <sys/socket.h>
#include <netinet/tcp.h>
#include <netinet/in.h>
#include <stdio.h>
 
// Configure server socket for production use
void configure_server_socket(int sockfd) {
    int optval = 1;
    
    // Essential: allow address reuse after restart
    setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));
    
    // Enable TCP keepalive
    setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(optval));
    
#ifdef __linux__
    // Linux-specific: tune keepalive timing
    int idle = 60;     // Start keepalive after 60s idle
    int interval = 10; // Send probes every 10s
    int count = 5;     // Fail after 5 probes
    
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPIDLE, &idle, sizeof(idle));
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPINTVL, &interval, sizeof(interval));
    setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPCNT, &count, sizeof(count));
#endif
}
 
// Configure client socket for low latency
void configure_low_latency_socket(int sockfd) {
    int optval = 1;
    
    // Disable Nagle's algorithm for immediate sends
    setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval));
    
    // For interactive applications, lower send buffer can reduce latency
    int bufsize = 16 * 1024;  // 16KB instead of default ~64KB
    setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &bufsize, sizeof(bufsize));
}
 
// Configure socket for high throughput
void configure_high_throughput_socket(int sockfd) {
    // Large buffers for high-bandwidth connections
    int bufsize = 1024 * 1024;  // 1MB
    
    setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &bufsize, sizeof(bufsize));
    setsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &bufsize, sizeof(bufsize));
    
    // Keep Nagle enabled for better batching (default, but explicit)
    int optval = 0;
    setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval));
}
 
// Query current socket options
void print_socket_info(int sockfd) {
    int optval;
    socklen_t optlen = sizeof(optval);
    
    getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &optval, &optlen);
    printf("Receive buffer: %d bytes\n", optval);
    
    getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &optval, &optlen);
    printf("Send buffer: %d bytes\n", optval);
    
    getsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &optval, &optlen);
    printf("TCP_NODELAY: %s\n", optval ? "on" : "off");
}

TCP_NODELAY vs. Performance

Summary: The Socket API

We've covered the complete socket API—the building blocks for all network programming. Let's consolidate the essential functions and patterns:

Key Takeaways

•socket() creates the endpoint — Specifies domain (IPv4/IPv6), type (stream/datagram), and optionally protocol.
•bind() assigns local address — Essential for servers; optional for clients (OS assigns ephemeral port).
•listen() enables connection queue — Transforms socket into passive listener with configurable backlog.
•accept() extracts pending connections — Returns a NEW socket for each client; listening socket continues accepting.
•connect() initiates client connection — Triggers TCP handshake; for UDP, sets default destination.
•send()/recv() transfer data — Handle partial transfers; check return values; use flags for special behavior.
•close()/shutdown() terminate connections — shutdown() for half-close; close() for full release.
•setsockopt()/getsockopt() tune behavior — Essential options: SO_REUSEADDR, TCP_NODELAY, SO_KEEPALIVE.

What's Next:

API Mastered

3 / 5