Computer NetworksTransport Layer Concepts

Sockets

LevelIntermediate

Duration60 mins

TopicTransport Layer Concepts

2 / 5

Socket Address

Identifying Communication Endpoints

For two processes to communicate across a network, they must be able to find each other. This seemingly simple requirement encompasses one of the most critical aspects of network programming: socket addressing. How do we uniquely identify a socket among billions of potential endpoints across the global Internet?

The answer lies in a carefully designed addressing scheme that combines hierarchical components—IP addresses to identify machines, and port numbers to identify specific processes on those machines. Together, these form the socket address, the fundamental unit of endpoint identification in TCP/IP networking.

Understanding socket addressing is essential not just for programming, but for debugging, security analysis, and system design. Every "connection refused" error, every firewall rule, and every load balancer configuration ultimately deals with socket addresses.

What You Will Learn

By the end of this page, you will understand how socket addresses uniquely identify endpoints, the structure of address data types in code, how the 5-tuple identifies connections, and the mechanics of address binding. You'll be able to reason about socket addressing at both the conceptual and implementation levels.

The Socket Address Concept

A socket address is a composite identifier that uniquely locates a socket within a network. In the Internet Protocol suite, a socket address consists of two fundamental components:

1. IP Address (Network Identifier):

Identifies the machine (host) on the network
IPv4: 32-bit address (e.g., 192.168.1.100)
IPv6: 128-bit address (e.g., 2001:db8::1)
Enables routing across network boundaries

2. Port Number (Process Identifier):

Identifies the specific process/service on the machine
16-bit unsigned integer (range: 0-65535)
Enables multiple network applications on one host

Together, an IP address and port number form a transport address or endpoint. This combination is often written as IP:Port (e.g., 192.168.1.100:8080 or [2001:db8::1]:443 for IPv6).

Why Two Components?

The separation of IP address and port implements the transport layer's core function: process-to-process delivery. The IP address gets data to the right machine (network layer's job), while the port number gets it to the right process (transport layer's job). This clean separation enables independent addressing of hosts and services.

Port Number Classification:

Ports are divided into three ranges with different intended uses:

Range	Name	Purpose
0-1023	Well-Known Ports	Reserved for system services (HTTP:80, HTTPS:443, SSH:22)
1024-49151	Registered Ports	IANA-registered services (MySQL:3306, PostgreSQL:5432)
49152-65535	Dynamic/Ephemeral	Client-side ports assigned by OS for outbound connections

Well-known ports require root/administrator privileges to bind on most systems—a security measure preventing unprivileged processes from impersonating critical services.

The Address Uniqueness Requirement:

Within a single machine, no two sockets can be bound to the same socket address (IP + port) simultaneously for the same protocol. This uniqueness constraint is fundamental—it's how the operating system knows which socket should receive incoming data.

However, the same port number can be used:

On different IP addresses (multi-homed hosts)
By different protocols (TCP and UDP can both use port 53)
With address reuse options (SO_REUSEADDR, SO_REUSEPORT)

Converting Mermaid diagram...

Address Structures in Code

The socket API represents addresses using structured data types. Understanding these structures is essential for socket programming, as they appear in nearly every socket operation.

The Generic Socket Address (sockaddr):

The socket API was designed to be protocol-independent. To achieve this, it uses a generic address structure that can represent addresses from any protocol family:

struct sockaddr {
    sa_family_t sa_family;    // Address family (AF_INET, AF_INET6, etc.)
    char        sa_data[14];  // Protocol-specific address data
};

This generic structure is 16 bytes and serves as a common base type. Functions like bind(), connect(), and accept() accept pointers to sockaddr, allowing them to work with any address family.

IPv4 Address Structure (sockaddr_in):

struct sockaddr_in {
    sa_family_t    sin_family;  // AF_INET
    in_port_t      sin_port;    // 16-bit port (network byte order)
    struct in_addr sin_addr;    // 32-bit IPv4 address
    char           sin_zero[8]; // Padding to match sockaddr size
};

struct in_addr {
    uint32_t s_addr;            // IPv4 address (network byte order)
};

IPv6 Address Structure (sockaddr_in6):

struct sockaddr_in6 {
    sa_family_t     sin6_family;   // AF_INET6
    in_port_t       sin6_port;     // 16-bit port (network byte order)
    uint32_t        sin6_flowinfo; // Flow information
    struct in6_addr sin6_addr;     // 128-bit IPv6 address
    uint32_t        sin6_scope_id; // Scope ID (for link-local addresses)
};

struct in6_addr {
    uint8_t s6_addr[16];           // IPv6 address bytes
};

Byte Order Matters

Network protocols use big-endian (network byte order), but most modern CPUs use little-endian. Port numbers and IP addresses in socket structures must be in network byte order. Use conversion functions: htons()/ntohs() for ports (16-bit), htonl()/ntohl() for addresses (32-bit). Forgetting these conversions is a classic bug that causes connections to fail mysteriously.

socket_address_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
 
// Creating an IPv4 socket address for server listening on port 8080
void create_server_address() {
    struct sockaddr_in server_addr;
    
    // Zero out the structure first (good practice)
    memset(&server_addr, 0, sizeof(server_addr));
    
    // Set the address family
    server_addr.sin_family = AF_INET;
    
    // Set the port number (convert to network byte order)
    server_addr.sin_port = htons(8080);
    
    // Bind to all available interfaces (INADDR_ANY = 0.0.0.0)
    server_addr.sin_addr.s_addr = htonl(INADDR_ANY);
    
    // Alternative: Bind to specific IP address
    // inet_pton(AF_INET, "192.168.1.100", &server_addr.sin_addr);
    
    // Now server_addr can be used with bind()
    // bind(sockfd, (struct sockaddr*)&server_addr, sizeof(server_addr));
}
 
// Creating an IPv4 socket address to connect to example.com:443
void create_client_address() {
    struct sockaddr_in remote_addr;
    
    memset(&remote_addr, 0, sizeof(remote_addr));
    remote_addr.sin_family = AF_INET;
    remote_addr.sin_port = htons(443);
    
    // Convert dotted-decimal string to binary address
    if (inet_pton(AF_INET, "93.184.216.34", &remote_addr.sin_addr) <= 0) {
        // Error handling: invalid address format
    }
    
    // Now remote_addr can be used with connect()
    // connect(sockfd, (struct sockaddr*)&remote_addr, sizeof(remote_addr));
}

Storage-Size Address Structure (sockaddr_storage):

When writing protocol-independent code that must handle both IPv4 and IPv6, use sockaddr_storage—a structure large enough to hold any socket address:

struct sockaddr_storage {
    sa_family_t ss_family;  // Address family
    // Large enough buffer for any address type
    // Implementation-defined padding and alignment
};

This is particularly important when calling accept() or recvfrom(), where you don't know in advance whether you'll receive an IPv4 or IPv6 connection. Using sockaddr_storage ensures sufficient space for either.

Socket Address Structures Comparison
Structure	Size	Address Family	Use Case
sockaddr	16 bytes	Generic	API function parameter type (cast target)
sockaddr_in	16 bytes	AF_INET (IPv4)	IPv4 addresses
sockaddr_in6	28 bytes	AF_INET6 (IPv6)	IPv6 addresses
sockaddr_un	110+ bytes	AF_UNIX	Unix domain socket paths
sockaddr_storage	128 bytes	Any	Protocol-independent code, large enough for all

The 5-Tuple Connection Identifier

While a single socket address (IP + port) identifies one endpoint, a complete connection is identified by a 5-tuple—five values that together uniquely distinguish any connection in the system:

Protocol (TCP or UDP)
Source IP Address
Source Port Number
Destination IP Address
Destination Port Number

This 5-tuple is critical because it enables:

Demultiplexing: The OS routes incoming data to the correct socket
NAT traversal: Network Address Translators track connections by 5-tuple
Firewall rules: Security policies match on 5-tuple components
Connection tracking: Load balancers and routers identify flows

Why 5 Elements?

Consider a server running on 10.0.0.1:80 receiving connections from two clients at 192.168.1.5:50000 and 192.168.1.5:50001. Both connections have the same destination address, same protocol, and even the same source IP. Only the source port differs—but that's enough to uniquely identify each connection. The 5-tuple captures exactly the information needed for unique identification.

Connection Uniqueness Scenarios:

Consider how the 5-tuple enables multiple simultaneous connections:

Scenario 1: Multiple clients to same server

Connection A: TCP, 192.168.1.5:50000 → 10.0.0.1:80
Connection B: TCP, 192.168.1.5:50001 → 10.0.0.1:80
Connection C: TCP, 192.168.1.6:50000 → 10.0.0.1:80

All three connections go to the same server socket, but each has a unique 5-tuple.

Scenario 2: Same client to multiple servers

Connection D: TCP, 192.168.1.5:50002 → 10.0.0.1:80 (Web server)
Connection E: TCP, 192.168.1.5:50003 → 10.0.0.2:443 (API server)
Connection F: TCP, 192.168.1.5:50004 → 10.0.0.3:5432 (Database)

One client maintains multiple simultaneous connections.

Scenario 3: TCP and UDP to same endpoint

Connection G: TCP, 192.168.1.5:50005 → 10.0.0.1:53 (DNS over TCP)
Connection H: UDP, 192.168.1.5:50006 → 10.0.0.1:53 (DNS over UDP)

Same IP and port, but different protocols—allowed because the 5-tuple differs.

Converting Mermaid diagram...

Maximum Concurrent Connections:

The 5-tuple constraint has implications for scalability. Consider a server accepting connections on one IP:port:

Source IPs are limited by client population
Source ports are limited to ~65,535 per client IP
Maximum connections from one client IP: ~65,535
Maximum total connections: depends on total client IPs × ports

In practice, servers can handle millions of concurrent connections, but architects must understand these theoretical limits and plan for edge cases like proxy servers (single IP) or NAT pools.

Connection Identification in Practice

Operating systems maintain hash tables mapping 5-tuples to sockets. When a packet arrives, the kernel hashes its 5-tuple and looks up the corresponding socket in O(1) time. This efficient lookup is critical for high-performance networking—systems handling millions of connections cannot afford linear searches.

Binding Socket Addresses

Binding is the process of associating a socket with a local address. It's a critical step that determines which incoming connections or packets the socket will receive.

The bind() System Call:

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Bind assigns the socket sockfd the local address specified in addr. After binding, the socket is associated with that address for its lifetime (until closed).

When Binding is Required:

Servers: Must bind to a known port so clients can connect to a predictable address
UDP receivers: Must bind before receiving datagrams
Multicast: Must bind to receive multicast group traffic

When Binding is Optional:

TCP clients: The OS assigns an ephemeral port automatically on connect()
UDP senders: The OS assigns an ephemeral port automatically on first send

Address Binding Options

•INADDR_ANY (0.0.0.0) — Bind to all available network interfaces. The socket receives connections/packets arriving on any interface. Most common for servers.
•Specific IP Address — Bind to one interface only. The socket only receives traffic destined for that specific IP. Used for multi-homed hosts with distinct services per interface.
•IN6ADDR_ANY_INIT (::) — IPv6 equivalent of INADDR_ANY. On dual-stack systems, may also accept IPv4 connections (mapped to IPv6).
•Port 0 — Request OS to assign an available ephemeral port. Useful when the specific port doesn't matter (client connections, test servers).
•Loopback (127.0.0.1 / ::1) — Bind only to loopback interface. Socket is inaccessible from network—only local processes can connect. Common for development and local-only services.

binding_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
 
// Bind to all interfaces on specific port (typical server)
int bind_all_interfaces(int sockfd, uint16_t port) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);  // All interfaces
    addr.sin_port = htons(port);               // Specific port
    
    return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
}
 
// Bind to specific interface and port (multi-homed server)
int bind_specific_interface(int sockfd, const char *ip, uint16_t port) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    
    if (inet_pton(AF_INET, ip, &addr.sin_addr) <= 0) {
        return -1;  // Invalid IP address
    }
    
    return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
}
 
// Bind to loopback only (local-only service)
int bind_loopback_only(int sockfd, uint16_t port) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);  // 127.0.0.1
    addr.sin_port = htons(port);
    
    return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
}
 
// Let OS choose port (useful for clients, test servers)
int bind_ephemeral_port(int sockfd) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port = htons(0);  // Port 0 = OS chooses
    
    if (bind(sockfd, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
        return -1;
    }
    
    // Retrieve the assigned port
    socklen_t len = sizeof(addr);
    if (getsockname(sockfd, (struct sockaddr*)&addr, &len) < 0) {
        return -1;
    }
    
    printf("OS assigned port: %d\n", ntohs(addr.sin_port));
    return 0;
}

Common Binding Errors

EADDRINUSE: Address already in use—another socket owns this address. EACCES: Permission denied—binding to port < 1024 without root privileges. EADDRNOTAVAIL: Address not available—trying to bind to an IP not assigned to any interface. These errors are among the most common in network programming—understanding them accelerates debugging.

Address Reuse and Sharing

The default behavior—one socket per address—is safe but sometimes too restrictive. Socket options enable controlled sharing of addresses:

SO_REUSEADDR:

This option allows binding to an address that's in TIME_WAIT state. When a TCP connection closes, the socket enters TIME_WAIT for ~60 seconds to ensure delayed packets don't confuse new connections. Without SO_REUSEADDR, a server restart within this window fails with EADDRINUSE.

int optval = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));

Important: SO_REUSEADDR is almost always set on server sockets. The alternative—waiting 60 seconds after every restart—is operationally unacceptable.

SO_REUSEPORT:

This option (Linux 3.9+, BSD) allows multiple sockets to bind to exactly the same address and port. The kernel distributes incoming connections across all listening sockets. Use cases:

Multi-process servers: Each worker process has its own listening socket, avoiding the thundering herd problem of shared accept()
Zero-downtime restarts: New server process binds before old one exits
Load distribution: Kernel-level load balancing across processes

int optval = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));

Address Reuse Options Comparison
Option	Purpose	Typical Use	Platform
SO_REUSEADDR	Bind to address in TIME_WAIT	All TCP servers	POSIX (universal)
SO_REUSEPORT	Multiple sockets on same address	Multi-process load balancing	Linux 3.9+, BSD
SO_EXCLUSIVEADDRUSE	Prevent address stealing (Windows)	Security-critical servers	Windows only
IP_FREEBIND	Bind before address is assigned	Failover scenarios	Linux only

Security Implications:

Address reuse has security implications. On some systems, SO_REUSEADDR allows a malicious process to bind to the same address as an existing service, potentially hijacking connections. Linux mitigates this by requiring both processes to set SO_REUSEADDR, but the behavior varies across operating systems.

Best Practices:

Always set SO_REUSEADDR on server sockets before bind()
Use SO_REUSEPORT deliberately—understand the load distribution semantics
Be aware of platform differences—test on target deployment environment
Consider security context—in multi-tenant environments, address reuse creates attack surfaces

The TIME_WAIT Problem in Depth

TIME_WAIT exists to prevent a delayed packet from an old connection being misinterpreted as belonging to a new connection on the same 5-tuple. The wait is 2×MSL (Maximum Segment Lifetime, typically 30-60 seconds). While necessary for correctness, it creates practical issues for servers that restart frequently or accept many short-lived connections.

Address Resolution and Name Lookup

Humans prefer domain names (www.example.com); sockets require IP addresses. The gap is bridged by address resolution—typically DNS (Domain Name System).

The getaddrinfo() Function:

Modern socket programming uses getaddrinfo() for protocol-independent address resolution:

int getaddrinfo(const char *node,      // Hostname or IP string
                const char *service,   // Port number or service name
                const struct addrinfo *hints,  // Desired address type
                struct addrinfo **res); // Results (linked list)

This function handles:

DNS resolution (hostname → IP address)
Service name resolution (/etc/services lookup)
IPv4/IPv6 transparency
Multiple address results (for redundancy)

address_resolution.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <string.h>
#include <stdio.h>
 
// Resolve hostname and connect (protocol-independent)
int connect_to_server(const char *hostname, const char *port) {
    struct addrinfo hints, *result, *rp;
    int sockfd = -1;
    
    // Set up hints for TCP connection
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_UNSPEC;      // IPv4 or IPv6
    hints.ai_socktype = SOCK_STREAM;  // TCP
    hints.ai_protocol = IPPROTO_TCP;
    
    // Resolve hostname to addresses
    int err = getaddrinfo(hostname, port, &hints, &result);
    if (err != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
        return -1;
    }
    
    // Try each address until one succeeds
    for (rp = result; rp != NULL; rp = rp->ai_next) {
        sockfd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
        if (sockfd == -1) continue;
        
        if (connect(sockfd, rp->ai_addr, rp->ai_addrlen) == 0) {
            break;  // Success!
        }
        
        close(sockfd);
        sockfd = -1;
    }
    
    freeaddrinfo(result);  // Free the linked list
    
    if (rp == NULL) {
        fprintf(stderr, "Could not connect to %s:%s\n", hostname, port);
        return -1;
    }
    
    return sockfd;
}
 
// Resolve for server binding
int create_server_socket(const char *port) {
    struct addrinfo hints, *result;
    
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_INET6;       // IPv6 (dual-stack)
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;      // For bind() - wildcard address
    
    if (getaddrinfo(NULL, port, &hints, &result) != 0) {
        return -1;
    }
    
    int sockfd = socket(result->ai_family, result->ai_socktype, 
                        result->ai_protocol);
    
    // ... set socket options and bind ...
    
    freeaddrinfo(result);
    return sockfd;
}

getaddrinfo() Advantages

•Protocol Independence — Works for IPv4, IPv6, or dual-stack without code changes
•Multiple Results — Returns all addresses for a hostname, enabling fallback on connection failure
•Service Name Resolution — Converts 'http' to 80, 'https' to 443 automatically
•Thread Safety — Unlike older gethostbyname(), getaddrinfo() is reentrant
•Future Compatibility — Will support new address families without API changes

Dual-Stack Considerations

On dual-stack systems (IPv4 + IPv6), an IPv6 socket with IPV6_V6ONLY disabled can accept IPv4 connections. IPv4 addresses appear as IPv6-mapped addresses (::ffff:192.168.1.1). This simplifies server code—one socket handles both protocols. However, some applications require separate handling, controlled by the IPV6_V6ONLY socket option.

Special Addresses and Their Purposes

Several IP addresses and port numbers have special meanings in socket programming:

Special IP Addresses (IPv4):

Special IPv4 Addresses for Sockets
Address	Name	Socket Usage
0.0.0.0	INADDR_ANY	Bind to all interfaces; represents 'any' local address
127.0.0.1	INADDR_LOOPBACK	Loopback interface; local machine only
255.255.255.255	INADDR_BROADCAST	Local network broadcast (requires SO_BROADCAST)
127.0.0.0/8	Loopback block	Entire block reserved for loopback
224.0.0.0/4	Multicast	Multicast group addresses (requires special handling)

Special IP Addresses (IPv6):

Special IPv6 Addresses for Sockets
Address	Name	Socket Usage
::	IN6ADDR_ANY	Bind to all interfaces (IPv6 equivalent of 0.0.0.0)
::1	IN6ADDR_LOOPBACK	IPv6 loopback address
::ffff:x.x.x.x	IPv4-mapped	IPv4 address represented in IPv6 socket
fe80::/10	Link-local	Valid only on local network segment
ff00::/8	Multicast	IPv6 multicast addresses

Special Port Numbers:

Port	Special Meaning
0	Ephemeral port request—OS assigns available port
1-1023	Privileged ports—require root/admin to bind
22, 80, 443	Well-known services (SSH, HTTP, HTTPS)
49152-65535	IANA ephemeral range (OS client ports)

Practical Implications:

Development vs. Production: Binding to 127.0.0.1 during development prevents accidental network exposure
Container Networking: INADDR_ANY is essential in containers where network namespaces isolate interfaces
Security: Never bind sensitive services to INADDR_ANY if they should only be locally accessible
Testing: Port 0 is invaluable for tests that need sockets but don't care about specific ports

Security: Mind Your Binds

A common security mistake: binding to INADDR_ANY when a service should only be local (database admin ports, debug endpoints). Always consciously choose between loopback-only (127.0.0.1) and all-interfaces (0.0.0.0). Firewalls are defense-in-depth—proper binding is the first line of defense.

Summary: Socket Addressing

We've explored the complete landscape of socket addressing—the mechanism that enables billions of endpoints to communicate without ambiguity. Let's consolidate the essential points:

Key Takeaways

•Socket addresses combine IP and port — IP routes to the machine; port routes to the process. Together they form the transport-layer endpoint.
•The 5-tuple uniquely identifies connections — Protocol, source IP, source port, destination IP, destination port distinguish every connection in the system.
•Address structures are protocol-specific — sockaddr_in for IPv4, sockaddr_in6 for IPv6, with sockaddr_storage for protocol-independent code.
•Byte order conversion is mandatory — Network byte order differs from host byte order on most systems; use htons/htonl consistently.
•Binding associates sockets with local addresses — Servers bind to known ports; clients can let the OS assign ephemeral ports.
•Address reuse options solve operational problems — SO_REUSEADDR handles TIME_WAIT; SO_REUSEPORT enables multi-process load balancing.
•getaddrinfo() provides protocol-independent resolution — Handles DNS, IPv4/IPv6 transparency, and service name lookup in one function.

What's Next:

With addressing understood, we're ready to explore the Socket API—the system calls that create, connect, and manage sockets. The next page provides comprehensive coverage of each socket function, from creation through data transfer to closure.

Addressing Mastered

You now understand socket addressing at both conceptual and implementation levels—how endpoints are identified, how addresses are structured in code, and how binding and reuse work. This foundation is essential for writing correct, portable, and efficient network code.

2 / 5

Loading learning content...

Computer NetworksTransport Layer Concepts

Sockets

LevelIntermediate

Duration60 mins

TopicTransport Layer Concepts

2 / 5

Socket Address

Identifying Communication Endpoints

What You Will Learn

The Socket Address Concept

A socket address is a composite identifier that uniquely locates a socket within a network. In the Internet Protocol suite, a socket address consists of two fundamental components:

1. IP Address (Network Identifier):

Identifies the machine (host) on the network
IPv4: 32-bit address (e.g., 192.168.1.100)
IPv6: 128-bit address (e.g., 2001:db8::1)
Enables routing across network boundaries

2. Port Number (Process Identifier):

Identifies the specific process/service on the machine
16-bit unsigned integer (range: 0-65535)
Enables multiple network applications on one host

Together, an IP address and port number form a transport address or endpoint. This combination is often written as IP:Port (e.g., 192.168.1.100:8080 or [2001:db8::1]:443 for IPv6).

Why Two Components?

Port Number Classification:

Ports are divided into three ranges with different intended uses:

Range	Name	Purpose
0-1023	Well-Known Ports	Reserved for system services (HTTP:80, HTTPS:443, SSH:22)
1024-49151	Registered Ports	IANA-registered services (MySQL:3306, PostgreSQL:5432)
49152-65535	Dynamic/Ephemeral	Client-side ports assigned by OS for outbound connections

Well-known ports require root/administrator privileges to bind on most systems—a security measure preventing unprivileged processes from impersonating critical services.

The Address Uniqueness Requirement:

However, the same port number can be used:

On different IP addresses (multi-homed hosts)
By different protocols (TCP and UDP can both use port 53)
With address reuse options (SO_REUSEADDR, SO_REUSEPORT)

Converting Mermaid diagram...

Address Structures in Code

The socket API represents addresses using structured data types. Understanding these structures is essential for socket programming, as they appear in nearly every socket operation.

The Generic Socket Address (sockaddr):

The socket API was designed to be protocol-independent. To achieve this, it uses a generic address structure that can represent addresses from any protocol family:

struct sockaddr {
    sa_family_t sa_family;    // Address family (AF_INET, AF_INET6, etc.)
    char        sa_data[14];  // Protocol-specific address data
};

This generic structure is 16 bytes and serves as a common base type. Functions like bind(), connect(), and accept() accept pointers to sockaddr, allowing them to work with any address family.

IPv4 Address Structure (sockaddr_in):

struct sockaddr_in {
    sa_family_t    sin_family;  // AF_INET
    in_port_t      sin_port;    // 16-bit port (network byte order)
    struct in_addr sin_addr;    // 32-bit IPv4 address
    char           sin_zero[8]; // Padding to match sockaddr size
};

struct in_addr {
    uint32_t s_addr;            // IPv4 address (network byte order)
};

IPv6 Address Structure (sockaddr_in6):

struct sockaddr_in6 {
    sa_family_t     sin6_family;   // AF_INET6
    in_port_t       sin6_port;     // 16-bit port (network byte order)
    uint32_t        sin6_flowinfo; // Flow information
    struct in6_addr sin6_addr;     // 128-bit IPv6 address
    uint32_t        sin6_scope_id; // Scope ID (for link-local addresses)
};

struct in6_addr {
    uint8_t s6_addr[16];           // IPv6 address bytes
};

Byte Order Matters

socket_address_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
 
// Creating an IPv4 socket address for server listening on port 8080
void create_server_address() {
    struct sockaddr_in server_addr;
    
    // Zero out the structure first (good practice)
    memset(&server_addr, 0, sizeof(server_addr));
    
    // Set the address family
    server_addr.sin_family = AF_INET;
    
    // Set the port number (convert to network byte order)
    server_addr.sin_port = htons(8080);
    
    // Bind to all available interfaces (INADDR_ANY = 0.0.0.0)
    server_addr.sin_addr.s_addr = htonl(INADDR_ANY);
    
    // Alternative: Bind to specific IP address
    // inet_pton(AF_INET, "192.168.1.100", &server_addr.sin_addr);
    
    // Now server_addr can be used with bind()
    // bind(sockfd, (struct sockaddr*)&server_addr, sizeof(server_addr));
}
 
// Creating an IPv4 socket address to connect to example.com:443
void create_client_address() {
    struct sockaddr_in remote_addr;
    
    memset(&remote_addr, 0, sizeof(remote_addr));
    remote_addr.sin_family = AF_INET;
    remote_addr.sin_port = htons(443);
    
    // Convert dotted-decimal string to binary address
    if (inet_pton(AF_INET, "93.184.216.34", &remote_addr.sin_addr) <= 0) {
        // Error handling: invalid address format
    }
    
    // Now remote_addr can be used with connect()
    // connect(sockfd, (struct sockaddr*)&remote_addr, sizeof(remote_addr));
}

Storage-Size Address Structure (sockaddr_storage):

When writing protocol-independent code that must handle both IPv4 and IPv6, use sockaddr_storage—a structure large enough to hold any socket address:

struct sockaddr_storage {
    sa_family_t ss_family;  // Address family
    // Large enough buffer for any address type
    // Implementation-defined padding and alignment
};

Socket Address Structures Comparison
Structure	Size	Address Family	Use Case
sockaddr	16 bytes	Generic	API function parameter type (cast target)
sockaddr_in	16 bytes	AF_INET (IPv4)	IPv4 addresses
sockaddr_in6	28 bytes	AF_INET6 (IPv6)	IPv6 addresses
sockaddr_un	110+ bytes	AF_UNIX	Unix domain socket paths
sockaddr_storage	128 bytes	Any	Protocol-independent code, large enough for all

The 5-Tuple Connection Identifier

While a single socket address (IP + port) identifies one endpoint, a complete connection is identified by a 5-tuple—five values that together uniquely distinguish any connection in the system:

Protocol (TCP or UDP)
Source IP Address
Source Port Number
Destination IP Address
Destination Port Number

This 5-tuple is critical because it enables:

Demultiplexing: The OS routes incoming data to the correct socket
NAT traversal: Network Address Translators track connections by 5-tuple
Firewall rules: Security policies match on 5-tuple components
Connection tracking: Load balancers and routers identify flows

Why 5 Elements?

Connection Uniqueness Scenarios:

Consider how the 5-tuple enables multiple simultaneous connections:

Scenario 1: Multiple clients to same server

Connection A: TCP, 192.168.1.5:50000 → 10.0.0.1:80
Connection B: TCP, 192.168.1.5:50001 → 10.0.0.1:80
Connection C: TCP, 192.168.1.6:50000 → 10.0.0.1:80

All three connections go to the same server socket, but each has a unique 5-tuple.

Scenario 2: Same client to multiple servers

Connection D: TCP, 192.168.1.5:50002 → 10.0.0.1:80 (Web server)
Connection E: TCP, 192.168.1.5:50003 → 10.0.0.2:443 (API server)
Connection F: TCP, 192.168.1.5:50004 → 10.0.0.3:5432 (Database)

One client maintains multiple simultaneous connections.

Scenario 3: TCP and UDP to same endpoint

Connection G: TCP, 192.168.1.5:50005 → 10.0.0.1:53 (DNS over TCP)
Connection H: UDP, 192.168.1.5:50006 → 10.0.0.1:53 (DNS over UDP)

Same IP and port, but different protocols—allowed because the 5-tuple differs.

Converting Mermaid diagram...

Maximum Concurrent Connections:

The 5-tuple constraint has implications for scalability. Consider a server accepting connections on one IP:port:

Source IPs are limited by client population
Source ports are limited to ~65,535 per client IP
Maximum connections from one client IP: ~65,535
Maximum total connections: depends on total client IPs × ports

In practice, servers can handle millions of concurrent connections, but architects must understand these theoretical limits and plan for edge cases like proxy servers (single IP) or NAT pools.

Connection Identification in Practice

Binding Socket Addresses

Binding is the process of associating a socket with a local address. It's a critical step that determines which incoming connections or packets the socket will receive.

The bind() System Call:

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Bind assigns the socket sockfd the local address specified in addr. After binding, the socket is associated with that address for its lifetime (until closed).

When Binding is Required:

Servers: Must bind to a known port so clients can connect to a predictable address
UDP receivers: Must bind before receiving datagrams
Multicast: Must bind to receive multicast group traffic

When Binding is Optional:

TCP clients: The OS assigns an ephemeral port automatically on connect()
UDP senders: The OS assigns an ephemeral port automatically on first send

Address Binding Options

•INADDR_ANY (0.0.0.0) — Bind to all available network interfaces. The socket receives connections/packets arriving on any interface. Most common for servers.
•Specific IP Address — Bind to one interface only. The socket only receives traffic destined for that specific IP. Used for multi-homed hosts with distinct services per interface.
•IN6ADDR_ANY_INIT (::) — IPv6 equivalent of INADDR_ANY. On dual-stack systems, may also accept IPv4 connections (mapped to IPv6).
•Port 0 — Request OS to assign an available ephemeral port. Useful when the specific port doesn't matter (client connections, test servers).
•Loopback (127.0.0.1 / ::1) — Bind only to loopback interface. Socket is inaccessible from network—only local processes can connect. Common for development and local-only services.

binding_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
 
// Bind to all interfaces on specific port (typical server)
int bind_all_interfaces(int sockfd, uint16_t port) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);  // All interfaces
    addr.sin_port = htons(port);               // Specific port
    
    return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
}
 
// Bind to specific interface and port (multi-homed server)
int bind_specific_interface(int sockfd, const char *ip, uint16_t port) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_port = htons(port);
    
    if (inet_pton(AF_INET, ip, &addr.sin_addr) <= 0) {
        return -1;  // Invalid IP address
    }
    
    return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
}
 
// Bind to loopback only (local-only service)
int bind_loopback_only(int sockfd, uint16_t port) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);  // 127.0.0.1
    addr.sin_port = htons(port);
    
    return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
}
 
// Let OS choose port (useful for clients, test servers)
int bind_ephemeral_port(int sockfd) {
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof(addr));
    
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    addr.sin_port = htons(0);  // Port 0 = OS chooses
    
    if (bind(sockfd, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
        return -1;
    }
    
    // Retrieve the assigned port
    socklen_t len = sizeof(addr);
    if (getsockname(sockfd, (struct sockaddr*)&addr, &len) < 0) {
        return -1;
    }
    
    printf("OS assigned port: %d\n", ntohs(addr.sin_port));
    return 0;
}

Common Binding Errors

Address Reuse and Sharing

The default behavior—one socket per address—is safe but sometimes too restrictive. Socket options enable controlled sharing of addresses:

SO_REUSEADDR:

int optval = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));

Important: SO_REUSEADDR is almost always set on server sockets. The alternative—waiting 60 seconds after every restart—is operationally unacceptable.

SO_REUSEPORT:

This option (Linux 3.9+, BSD) allows multiple sockets to bind to exactly the same address and port. The kernel distributes incoming connections across all listening sockets. Use cases:

Multi-process servers: Each worker process has its own listening socket, avoiding the thundering herd problem of shared accept()
Zero-downtime restarts: New server process binds before old one exits
Load distribution: Kernel-level load balancing across processes

int optval = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));

Address Reuse Options Comparison
Option	Purpose	Typical Use	Platform
SO_REUSEADDR	Bind to address in TIME_WAIT	All TCP servers	POSIX (universal)
SO_REUSEPORT	Multiple sockets on same address	Multi-process load balancing	Linux 3.9+, BSD
SO_EXCLUSIVEADDRUSE	Prevent address stealing (Windows)	Security-critical servers	Windows only
IP_FREEBIND	Bind before address is assigned	Failover scenarios	Linux only

Security Implications:

Best Practices:

Always set SO_REUSEADDR on server sockets before bind()
Use SO_REUSEPORT deliberately—understand the load distribution semantics
Be aware of platform differences—test on target deployment environment
Consider security context—in multi-tenant environments, address reuse creates attack surfaces

The TIME_WAIT Problem in Depth

Address Resolution and Name Lookup

Humans prefer domain names (www.example.com); sockets require IP addresses. The gap is bridged by address resolution—typically DNS (Domain Name System).

The getaddrinfo() Function:

Modern socket programming uses getaddrinfo() for protocol-independent address resolution:

int getaddrinfo(const char *node,      // Hostname or IP string
                const char *service,   // Port number or service name
                const struct addrinfo *hints,  // Desired address type
                struct addrinfo **res); // Results (linked list)

This function handles:

DNS resolution (hostname → IP address)
Service name resolution (/etc/services lookup)
IPv4/IPv6 transparency
Multiple address results (for redundancy)

address_resolution.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <string.h>
#include <stdio.h>
 
// Resolve hostname and connect (protocol-independent)
int connect_to_server(const char *hostname, const char *port) {
    struct addrinfo hints, *result, *rp;
    int sockfd = -1;
    
    // Set up hints for TCP connection
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_UNSPEC;      // IPv4 or IPv6
    hints.ai_socktype = SOCK_STREAM;  // TCP
    hints.ai_protocol = IPPROTO_TCP;
    
    // Resolve hostname to addresses
    int err = getaddrinfo(hostname, port, &hints, &result);
    if (err != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
        return -1;
    }
    
    // Try each address until one succeeds
    for (rp = result; rp != NULL; rp = rp->ai_next) {
        sockfd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
        if (sockfd == -1) continue;
        
        if (connect(sockfd, rp->ai_addr, rp->ai_addrlen) == 0) {
            break;  // Success!
        }
        
        close(sockfd);
        sockfd = -1;
    }
    
    freeaddrinfo(result);  // Free the linked list
    
    if (rp == NULL) {
        fprintf(stderr, "Could not connect to %s:%s\n", hostname, port);
        return -1;
    }
    
    return sockfd;
}
 
// Resolve for server binding
int create_server_socket(const char *port) {
    struct addrinfo hints, *result;
    
    memset(&hints, 0, sizeof(hints));
    hints.ai_family = AF_INET6;       // IPv6 (dual-stack)
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;      // For bind() - wildcard address
    
    if (getaddrinfo(NULL, port, &hints, &result) != 0) {
        return -1;
    }
    
    int sockfd = socket(result->ai_family, result->ai_socktype, 
                        result->ai_protocol);
    
    // ... set socket options and bind ...
    
    freeaddrinfo(result);
    return sockfd;
}

getaddrinfo() Advantages

•Protocol Independence — Works for IPv4, IPv6, or dual-stack without code changes
•Multiple Results — Returns all addresses for a hostname, enabling fallback on connection failure
•Service Name Resolution — Converts 'http' to 80, 'https' to 443 automatically
•Thread Safety — Unlike older gethostbyname(), getaddrinfo() is reentrant
•Future Compatibility — Will support new address families without API changes

Dual-Stack Considerations

Special Addresses and Their Purposes

Several IP addresses and port numbers have special meanings in socket programming:

Special IP Addresses (IPv4):

Special IPv4 Addresses for Sockets
Address	Name	Socket Usage
0.0.0.0	INADDR_ANY	Bind to all interfaces; represents 'any' local address
127.0.0.1	INADDR_LOOPBACK	Loopback interface; local machine only
255.255.255.255	INADDR_BROADCAST	Local network broadcast (requires SO_BROADCAST)
127.0.0.0/8	Loopback block	Entire block reserved for loopback
224.0.0.0/4	Multicast	Multicast group addresses (requires special handling)

Special IP Addresses (IPv6):

Special IPv6 Addresses for Sockets
Address	Name	Socket Usage
::	IN6ADDR_ANY	Bind to all interfaces (IPv6 equivalent of 0.0.0.0)
::1	IN6ADDR_LOOPBACK	IPv6 loopback address
::ffff:x.x.x.x	IPv4-mapped	IPv4 address represented in IPv6 socket
fe80::/10	Link-local	Valid only on local network segment
ff00::/8	Multicast	IPv6 multicast addresses

Special Port Numbers:

Port	Special Meaning
0	Ephemeral port request—OS assigns available port
1-1023	Privileged ports—require root/admin to bind
22, 80, 443	Well-known services (SSH, HTTP, HTTPS)
49152-65535	IANA ephemeral range (OS client ports)

Practical Implications:

Development vs. Production: Binding to 127.0.0.1 during development prevents accidental network exposure
Container Networking: INADDR_ANY is essential in containers where network namespaces isolate interfaces
Security: Never bind sensitive services to INADDR_ANY if they should only be locally accessible
Testing: Port 0 is invaluable for tests that need sockets but don't care about specific ports

Security: Mind Your Binds

Summary: Socket Addressing

We've explored the complete landscape of socket addressing—the mechanism that enables billions of endpoints to communicate without ambiguity. Let's consolidate the essential points:

Key Takeaways

•Socket addresses combine IP and port — IP routes to the machine; port routes to the process. Together they form the transport-layer endpoint.
•The 5-tuple uniquely identifies connections — Protocol, source IP, source port, destination IP, destination port distinguish every connection in the system.
•Address structures are protocol-specific — sockaddr_in for IPv4, sockaddr_in6 for IPv6, with sockaddr_storage for protocol-independent code.
•Byte order conversion is mandatory — Network byte order differs from host byte order on most systems; use htons/htonl consistently.
•Binding associates sockets with local addresses — Servers bind to known ports; clients can let the OS assign ephemeral ports.
•Address reuse options solve operational problems — SO_REUSEADDR handles TIME_WAIT; SO_REUSEPORT enables multi-process load balancing.
•getaddrinfo() provides protocol-independent resolution — Handles DNS, IPv4/IPv6 transparency, and service name lookup in one function.

What's Next:

Addressing Mastered

2 / 5