Loading learning content...
For two processes to communicate across a network, they must be able to find each other. This seemingly simple requirement encompasses one of the most critical aspects of network programming: socket addressing. How do we uniquely identify a socket among billions of potential endpoints across the global Internet?
The answer lies in a carefully designed addressing scheme that combines hierarchical components—IP addresses to identify machines, and port numbers to identify specific processes on those machines. Together, these form the socket address, the fundamental unit of endpoint identification in TCP/IP networking.
Understanding socket addressing is essential not just for programming, but for debugging, security analysis, and system design. Every "connection refused" error, every firewall rule, and every load balancer configuration ultimately deals with socket addresses.
By the end of this page, you will understand how socket addresses uniquely identify endpoints, the structure of address data types in code, how the 5-tuple identifies connections, and the mechanics of address binding. You'll be able to reason about socket addressing at both the conceptual and implementation levels.
A socket address is a composite identifier that uniquely locates a socket within a network. In the Internet Protocol suite, a socket address consists of two fundamental components:
1. IP Address (Network Identifier):
2. Port Number (Process Identifier):
Together, an IP address and port number form a transport address or endpoint. This combination is often written as IP:Port (e.g., 192.168.1.100:8080 or [2001:db8::1]:443 for IPv6).
The separation of IP address and port implements the transport layer's core function: process-to-process delivery. The IP address gets data to the right machine (network layer's job), while the port number gets it to the right process (transport layer's job). This clean separation enables independent addressing of hosts and services.
Port Number Classification:
Ports are divided into three ranges with different intended uses:
| Range | Name | Purpose |
|---|---|---|
| 0-1023 | Well-Known Ports | Reserved for system services (HTTP:80, HTTPS:443, SSH:22) |
| 1024-49151 | Registered Ports | IANA-registered services (MySQL:3306, PostgreSQL:5432) |
| 49152-65535 | Dynamic/Ephemeral | Client-side ports assigned by OS for outbound connections |
Well-known ports require root/administrator privileges to bind on most systems—a security measure preventing unprivileged processes from impersonating critical services.
The Address Uniqueness Requirement:
Within a single machine, no two sockets can be bound to the same socket address (IP + port) simultaneously for the same protocol. This uniqueness constraint is fundamental—it's how the operating system knows which socket should receive incoming data.
However, the same port number can be used:
The socket API represents addresses using structured data types. Understanding these structures is essential for socket programming, as they appear in nearly every socket operation.
The Generic Socket Address (sockaddr):
The socket API was designed to be protocol-independent. To achieve this, it uses a generic address structure that can represent addresses from any protocol family:
struct sockaddr {
sa_family_t sa_family; // Address family (AF_INET, AF_INET6, etc.)
char sa_data[14]; // Protocol-specific address data
};
This generic structure is 16 bytes and serves as a common base type. Functions like bind(), connect(), and accept() accept pointers to sockaddr, allowing them to work with any address family.
IPv4 Address Structure (sockaddr_in):
struct sockaddr_in {
sa_family_t sin_family; // AF_INET
in_port_t sin_port; // 16-bit port (network byte order)
struct in_addr sin_addr; // 32-bit IPv4 address
char sin_zero[8]; // Padding to match sockaddr size
};
struct in_addr {
uint32_t s_addr; // IPv4 address (network byte order)
};
IPv6 Address Structure (sockaddr_in6):
struct sockaddr_in6 {
sa_family_t sin6_family; // AF_INET6
in_port_t sin6_port; // 16-bit port (network byte order)
uint32_t sin6_flowinfo; // Flow information
struct in6_addr sin6_addr; // 128-bit IPv6 address
uint32_t sin6_scope_id; // Scope ID (for link-local addresses)
};
struct in6_addr {
uint8_t s6_addr[16]; // IPv6 address bytes
};
Network protocols use big-endian (network byte order), but most modern CPUs use little-endian. Port numbers and IP addresses in socket structures must be in network byte order. Use conversion functions: htons()/ntohs() for ports (16-bit), htonl()/ntohl() for addresses (32-bit). Forgetting these conversions is a classic bug that causes connections to fail mysteriously.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
#include <sys/socket.h>#include <netinet/in.h>#include <arpa/inet.h>#include <string.h> // Creating an IPv4 socket address for server listening on port 8080void create_server_address() { struct sockaddr_in server_addr; // Zero out the structure first (good practice) memset(&server_addr, 0, sizeof(server_addr)); // Set the address family server_addr.sin_family = AF_INET; // Set the port number (convert to network byte order) server_addr.sin_port = htons(8080); // Bind to all available interfaces (INADDR_ANY = 0.0.0.0) server_addr.sin_addr.s_addr = htonl(INADDR_ANY); // Alternative: Bind to specific IP address // inet_pton(AF_INET, "192.168.1.100", &server_addr.sin_addr); // Now server_addr can be used with bind() // bind(sockfd, (struct sockaddr*)&server_addr, sizeof(server_addr));} // Creating an IPv4 socket address to connect to example.com:443void create_client_address() { struct sockaddr_in remote_addr; memset(&remote_addr, 0, sizeof(remote_addr)); remote_addr.sin_family = AF_INET; remote_addr.sin_port = htons(443); // Convert dotted-decimal string to binary address if (inet_pton(AF_INET, "93.184.216.34", &remote_addr.sin_addr) <= 0) { // Error handling: invalid address format } // Now remote_addr can be used with connect() // connect(sockfd, (struct sockaddr*)&remote_addr, sizeof(remote_addr));}Storage-Size Address Structure (sockaddr_storage):
When writing protocol-independent code that must handle both IPv4 and IPv6, use sockaddr_storage—a structure large enough to hold any socket address:
struct sockaddr_storage {
sa_family_t ss_family; // Address family
// Large enough buffer for any address type
// Implementation-defined padding and alignment
};
This is particularly important when calling accept() or recvfrom(), where you don't know in advance whether you'll receive an IPv4 or IPv6 connection. Using sockaddr_storage ensures sufficient space for either.
| Structure | Size | Address Family | Use Case |
|---|---|---|---|
| sockaddr | 16 bytes | Generic | API function parameter type (cast target) |
| sockaddr_in | 16 bytes | AF_INET (IPv4) | IPv4 addresses |
| sockaddr_in6 | 28 bytes | AF_INET6 (IPv6) | IPv6 addresses |
| sockaddr_un | 110+ bytes | AF_UNIX | Unix domain socket paths |
| sockaddr_storage | 128 bytes | Any | Protocol-independent code, large enough for all |
While a single socket address (IP + port) identifies one endpoint, a complete connection is identified by a 5-tuple—five values that together uniquely distinguish any connection in the system:
This 5-tuple is critical because it enables:
Consider a server running on 10.0.0.1:80 receiving connections from two clients at 192.168.1.5:50000 and 192.168.1.5:50001. Both connections have the same destination address, same protocol, and even the same source IP. Only the source port differs—but that's enough to uniquely identify each connection. The 5-tuple captures exactly the information needed for unique identification.
Connection Uniqueness Scenarios:
Consider how the 5-tuple enables multiple simultaneous connections:
Scenario 1: Multiple clients to same server
Connection A: TCP, 192.168.1.5:50000 → 10.0.0.1:80
Connection B: TCP, 192.168.1.5:50001 → 10.0.0.1:80
Connection C: TCP, 192.168.1.6:50000 → 10.0.0.1:80
All three connections go to the same server socket, but each has a unique 5-tuple.
Scenario 2: Same client to multiple servers
Connection D: TCP, 192.168.1.5:50002 → 10.0.0.1:80 (Web server)
Connection E: TCP, 192.168.1.5:50003 → 10.0.0.2:443 (API server)
Connection F: TCP, 192.168.1.5:50004 → 10.0.0.3:5432 (Database)
One client maintains multiple simultaneous connections.
Scenario 3: TCP and UDP to same endpoint
Connection G: TCP, 192.168.1.5:50005 → 10.0.0.1:53 (DNS over TCP)
Connection H: UDP, 192.168.1.5:50006 → 10.0.0.1:53 (DNS over UDP)
Same IP and port, but different protocols—allowed because the 5-tuple differs.
Maximum Concurrent Connections:
The 5-tuple constraint has implications for scalability. Consider a server accepting connections on one IP:port:
In practice, servers can handle millions of concurrent connections, but architects must understand these theoretical limits and plan for edge cases like proxy servers (single IP) or NAT pools.
Operating systems maintain hash tables mapping 5-tuples to sockets. When a packet arrives, the kernel hashes its 5-tuple and looks up the corresponding socket in O(1) time. This efficient lookup is critical for high-performance networking—systems handling millions of connections cannot afford linear searches.
Binding is the process of associating a socket with a local address. It's a critical step that determines which incoming connections or packets the socket will receive.
The bind() System Call:
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
Bind assigns the socket sockfd the local address specified in addr. After binding, the socket is associated with that address for its lifetime (until closed).
When Binding is Required:
When Binding is Optional:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
#include <sys/socket.h>#include <netinet/in.h>#include <string.h>#include <stdio.h> // Bind to all interfaces on specific port (typical server)int bind_all_interfaces(int sockfd, uint16_t port) { struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_ANY); // All interfaces addr.sin_port = htons(port); // Specific port return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));} // Bind to specific interface and port (multi-homed server)int bind_specific_interface(int sockfd, const char *ip, uint16_t port) { struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_port = htons(port); if (inet_pton(AF_INET, ip, &addr.sin_addr) <= 0) { return -1; // Invalid IP address } return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));} // Bind to loopback only (local-only service)int bind_loopback_only(int sockfd, uint16_t port) { struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); // 127.0.0.1 addr.sin_port = htons(port); return bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));} // Let OS choose port (useful for clients, test servers)int bind_ephemeral_port(int sockfd) { struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); addr.sin_family = AF_INET; addr.sin_addr.s_addr = htonl(INADDR_ANY); addr.sin_port = htons(0); // Port 0 = OS chooses if (bind(sockfd, (struct sockaddr*)&addr, sizeof(addr)) < 0) { return -1; } // Retrieve the assigned port socklen_t len = sizeof(addr); if (getsockname(sockfd, (struct sockaddr*)&addr, &len) < 0) { return -1; } printf("OS assigned port: %d\n", ntohs(addr.sin_port)); return 0;}EADDRINUSE: Address already in use—another socket owns this address. EACCES: Permission denied—binding to port < 1024 without root privileges. EADDRNOTAVAIL: Address not available—trying to bind to an IP not assigned to any interface. These errors are among the most common in network programming—understanding them accelerates debugging.
The default behavior—one socket per address—is safe but sometimes too restrictive. Socket options enable controlled sharing of addresses:
SO_REUSEADDR:
This option allows binding to an address that's in TIME_WAIT state. When a TCP connection closes, the socket enters TIME_WAIT for ~60 seconds to ensure delayed packets don't confuse new connections. Without SO_REUSEADDR, a server restart within this window fails with EADDRINUSE.
int optval = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));
Important: SO_REUSEADDR is almost always set on server sockets. The alternative—waiting 60 seconds after every restart—is operationally unacceptable.
SO_REUSEPORT:
This option (Linux 3.9+, BSD) allows multiple sockets to bind to exactly the same address and port. The kernel distributes incoming connections across all listening sockets. Use cases:
int optval = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));
| Option | Purpose | Typical Use | Platform |
|---|---|---|---|
| SO_REUSEADDR | Bind to address in TIME_WAIT | All TCP servers | POSIX (universal) |
| SO_REUSEPORT | Multiple sockets on same address | Multi-process load balancing | Linux 3.9+, BSD |
| SO_EXCLUSIVEADDRUSE | Prevent address stealing (Windows) | Security-critical servers | Windows only |
| IP_FREEBIND | Bind before address is assigned | Failover scenarios | Linux only |
Security Implications:
Address reuse has security implications. On some systems, SO_REUSEADDR allows a malicious process to bind to the same address as an existing service, potentially hijacking connections. Linux mitigates this by requiring both processes to set SO_REUSEADDR, but the behavior varies across operating systems.
Best Practices:
TIME_WAIT exists to prevent a delayed packet from an old connection being misinterpreted as belonging to a new connection on the same 5-tuple. The wait is 2×MSL (Maximum Segment Lifetime, typically 30-60 seconds). While necessary for correctness, it creates practical issues for servers that restart frequently or accept many short-lived connections.
Humans prefer domain names (www.example.com); sockets require IP addresses. The gap is bridged by address resolution—typically DNS (Domain Name System).
The getaddrinfo() Function:
Modern socket programming uses getaddrinfo() for protocol-independent address resolution:
int getaddrinfo(const char *node, // Hostname or IP string
const char *service, // Port number or service name
const struct addrinfo *hints, // Desired address type
struct addrinfo **res); // Results (linked list)
This function handles:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
#include <sys/types.h>#include <sys/socket.h>#include <netdb.h>#include <string.h>#include <stdio.h> // Resolve hostname and connect (protocol-independent)int connect_to_server(const char *hostname, const char *port) { struct addrinfo hints, *result, *rp; int sockfd = -1; // Set up hints for TCP connection memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_UNSPEC; // IPv4 or IPv6 hints.ai_socktype = SOCK_STREAM; // TCP hints.ai_protocol = IPPROTO_TCP; // Resolve hostname to addresses int err = getaddrinfo(hostname, port, &hints, &result); if (err != 0) { fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err)); return -1; } // Try each address until one succeeds for (rp = result; rp != NULL; rp = rp->ai_next) { sockfd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (sockfd == -1) continue; if (connect(sockfd, rp->ai_addr, rp->ai_addrlen) == 0) { break; // Success! } close(sockfd); sockfd = -1; } freeaddrinfo(result); // Free the linked list if (rp == NULL) { fprintf(stderr, "Could not connect to %s:%s\n", hostname, port); return -1; } return sockfd;} // Resolve for server bindingint create_server_socket(const char *port) { struct addrinfo hints, *result; memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_INET6; // IPv6 (dual-stack) hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_PASSIVE; // For bind() - wildcard address if (getaddrinfo(NULL, port, &hints, &result) != 0) { return -1; } int sockfd = socket(result->ai_family, result->ai_socktype, result->ai_protocol); // ... set socket options and bind ... freeaddrinfo(result); return sockfd;}On dual-stack systems (IPv4 + IPv6), an IPv6 socket with IPV6_V6ONLY disabled can accept IPv4 connections. IPv4 addresses appear as IPv6-mapped addresses (::ffff:192.168.1.1). This simplifies server code—one socket handles both protocols. However, some applications require separate handling, controlled by the IPV6_V6ONLY socket option.
Several IP addresses and port numbers have special meanings in socket programming:
Special IP Addresses (IPv4):
| Address | Name | Socket Usage |
|---|---|---|
| 0.0.0.0 | INADDR_ANY | Bind to all interfaces; represents 'any' local address |
| 127.0.0.1 | INADDR_LOOPBACK | Loopback interface; local machine only |
| 255.255.255.255 | INADDR_BROADCAST | Local network broadcast (requires SO_BROADCAST) |
| 127.0.0.0/8 | Loopback block | Entire block reserved for loopback |
| 224.0.0.0/4 | Multicast | Multicast group addresses (requires special handling) |
Special IP Addresses (IPv6):
| Address | Name | Socket Usage |
|---|---|---|
| :: | IN6ADDR_ANY | Bind to all interfaces (IPv6 equivalent of 0.0.0.0) |
| ::1 | IN6ADDR_LOOPBACK | IPv6 loopback address |
| ::ffff:x.x.x.x | IPv4-mapped | IPv4 address represented in IPv6 socket |
| fe80::/10 | Link-local | Valid only on local network segment |
| ff00::/8 | Multicast | IPv6 multicast addresses |
Special Port Numbers:
| Port | Special Meaning |
|---|---|
| 0 | Ephemeral port request—OS assigns available port |
| 1-1023 | Privileged ports—require root/admin to bind |
| 22, 80, 443 | Well-known services (SSH, HTTP, HTTPS) |
| 49152-65535 | IANA ephemeral range (OS client ports) |
Practical Implications:
A common security mistake: binding to INADDR_ANY when a service should only be local (database admin ports, debug endpoints). Always consciously choose between loopback-only (127.0.0.1) and all-interfaces (0.0.0.0). Firewalls are defense-in-depth—proper binding is the first line of defense.
We've explored the complete landscape of socket addressing—the mechanism that enables billions of endpoints to communicate without ambiguity. Let's consolidate the essential points:
What's Next:
With addressing understood, we're ready to explore the Socket API—the system calls that create, connect, and manage sockets. The next page provides comprehensive coverage of each socket function, from creation through data transfer to closure.
You now understand socket addressing at both conceptual and implementation levels—how endpoints are identified, how addresses are structured in code, and how binding and reuse work. This foundation is essential for writing correct, portable, and efficient network code.