Loading content...
Consider a popular web server like Google's. At any given moment, millions of users are connected to google.com on port 443. If the server only used the destination port (443) to identify connections, all these users would be indistinguishable—their data would mix chaotically, and reliable communication would be impossible.
Clearly, port numbers alone are insufficient for TCP's connection-oriented model. TCP needs a way to uniquely identify each individual connection among millions of simultaneous connections to the same server port.
The solution is the 4-tuple: a combination of four values that uniquely identifies every TCP connection on the entire Internet:
(Source IP, Source Port, Destination IP, Destination Port)
This 4-tuple is the complete "name" of a TCP connection—no two active connections anywhere can have the same 4-tuple.
By the end of this page, you will understand TCP connection identification in depth—why the 4-tuple is necessary, how it enables massive concurrency, how connections are created and tracked, and how the operating system manages millions of connection identities efficiently. You'll also understand the contrast with UDP's simpler identification model.
What is a 4-Tuple?
A 4-tuple (also called a quad or socket pair) is the combination of four network identifiers that together uniquely specify a TCP connection:
Source IP Address (32 bits for IPv4, 128 bits for IPv6)
Source Port (16 bits)
Destination IP Address (32 bits for IPv4, 128 bits for IPv6)
Destination Port (16 bits)
Formal Notation:
4-tuple notation: (192.168.1.100, 52341, 203.0.113.50, 443)
( src_ip, src_port, dst_ip, dst_port)
Alternative notation: 192.168.1.100:52341 <-> 203.0.113.50:443
From the server's perspective, the 4-tuple appears reversed: what was the source becomes the destination, and vice versa. The same connection has 4-tuple (A, a, B, b) from A's view and (B, b, A, a) from B's view. Both representations identify the same connection.
Why All Four Components Are Necessary:
| If We Omit... | Problem |
|---|---|
| Source IP | Can't distinguish connections from different clients |
| Source Port | Can't distinguish multiple connections from same client |
| Destination IP | Needed for multi-homed hosts (multiple IPs) |
| Destination Port | Can't distinguish connections to different services |
Mathematical Perspective:
The 4-tuple creates an enormous address space:
In practice, practical limits are much lower (IP addresses in use, port range constraints), but there's still space for billions of simultaneous connections.
Understanding the distinction between port-based and connection-based identification clarifies why TCP and UDP demultiplex differently.
Port-Based Identification (UDP):
UDP uses a 2-tuple: (Destination IP, Destination Port)
UDP Socket A listens on port 53
↓
All datagrams to port 53 go to Socket A
↓
Socket A receives from any source
Connection-Based Identification (TCP):
TCP uses a 4-tuple: (Source IP, Source Port, Dest IP, Dest Port)
TCP Listening Socket on port 80
↓
Client A connects → New Socket (A's 4-tuple)
Client B connects → New Socket (B's 4-tuple)
Client A opens 2nd connection → New Socket (A's new 4-tuple)
↓
Each socket is completely independent
Practical Implication:
A UDP server on port 53 (DNS) with 10,000 clients:
A TCP server on port 443 (HTTPS) with 10,000 clients:
This is why high-connection-count services (like WebSocket servers or long-polling) require careful resource planning—each TCP connection consumes kernel resources.
Let's trace how a connection's 4-tuple is established, used, and released throughout its lifecycle.
Phase 1: Connection Initiation (Client Side)
1. Client application calls connect(server_ip, server_port)
2. OS selects ephemeral source port (e.g., 52341)
3. 4-tuple is now known:
(client_ip=192.168.1.100, src_port=52341,
server_ip=203.0.113.50, dst_port=443)
4. Client sends SYN with this 4-tuple
Phase 2: Connection Acceptance (Server Side)
1. SYN arrives at server's listening socket on port 443
2. Server extracts 4-tuple from SYN packet
3. Server creates embryonic connection (SYN_RCVD state)
4. Server sends SYN-ACK
5. Client sends ACK
6. Server creates new connection socket for this 4-tuple
7. accept() returns the new socket to application
Phase 3: Data Transfer
every segment includes source and destination ports
→ receiver matches against known 4-tuples
→ data delivered to correct connection socket
Phase 4: Connection Termination
1. One side initiates close (sends FIN)
2. Four-way handshake completes
3. Socket moves to TIME_WAIT (typically 60-120 seconds)
4. During TIME_WAIT, 4-tuple cannot be reused
5. After TIME_WAIT expires, 4-tuple is released
1234567891011121314151617181920212223242526272829303132333435
Connection Lifecycle 4-Tuple State═══════════════════ ════════════════ Client: connect() 4-tuple selected │ (192.168.1.100:52341, 203.0.113.50:443) ▼ │[SYN] ──────────────────────────────────────►│ │ ◄────────────────────────────────── [SYN-ACK] │[ACK] ──────────────────────────────────────►│ │ │ ▼ ▼ESTABLISHED ◄────────────────────────► ESTABLISHED │ │ │ ══════ Data Transfer ══════ │ │ Every segment carries: │ │ src=52341, dst=443 │ │ (or reversed for responses) │ │ │ ▼ ▼[FIN] ──────────────────────────────────────►│ ◄────────────────────────────────── [ACK] ◄────────────────────────────────── [FIN][ACK] ──────────────────────────────────────►│ │ │ ▼ ▼TIME_WAIT (2*MSL) CLOSED │ │ │ 4-tuple locked during this time │ │ (prevents delayed segment confusion) │ ▼ CLOSED 4-tuple released │ └─► Port 52341 available for reuseDuring TIME_WAIT, the 4-tuple is locked because old delayed segments might still arrive. If the same 4-tuple were immediately reused for a new connection, these old segments could be mistakenly delivered to the new connection. The TIME_WAIT period (2× Maximum Segment Lifetime, typically 60-120 seconds) ensures all old segments have expired.
The power of 4-tuple identification is most evident when we examine how a server handles multiple connections to the same port.
Scenario: Web Server Port 443
Server IP: 203.0.113.50 Server Port: 443 (HTTPS) Active connections:
Connection 1: (10.0.0.1, 52341, 203.0.113.50, 443)
└─► Client 10.0.0.1, first browser tab
Connection 2: (10.0.0.1, 52342, 203.0.113.50, 443)
└─► Client 10.0.0.1, second browser tab (same client!)
Connection 3: (10.0.0.2, 52341, 203.0.113.50, 443)
└─► Client 10.0.0.2 (same source port as Connection 1, different IP)
Connection 4: (192.168.5.100, 48001, 203.0.113.50, 443)
└─► Different network entirely
All four connections use destination port 443, but they're completely separate because their 4-tuples differ.
| Connection | Source IP | Source Port | Dest IP | Dest Port | Unique? |
|---|---|---|---|---|---|
| 1 | 10.0.0.1 | 52341 | 203.0.113.50 | 443 | Yes |
| 2 | 10.0.0.1 | 52342 | 203.0.113.50 | 443 | Yes (diff src port) |
| 3 | 10.0.0.2 | 52341 | 203.0.113.50 | 443 | Yes (diff src IP) |
| 4 | 192.168.5.100 | 48001 | 203.0.113.50 | 443 | Yes (diff src IP) |
| 5 | 10.0.0.1 | 52341 | 203.0.113.50 | 443 | NO - same as #1 |
Client Opens Multiple Connections:
A single web page might require 6-8 connections to load all resources (HTML, CSS, images, scripts). The browser uses different source ports:
Browser loading google.com from 192.168.1.100:
├── Connection for main HTML: 192.168.1.100:52341 → google:443
├── Connection for CSS: 192.168.1.100:52342 → google:443
├── Connection for logo.png: 192.168.1.100:52343 → google:443
├── Connection for app.js: 192.168.1.100:52344 → google:443
├── Connection for analytics: 192.168.1.100:52345 → google:443
└── Connection for fonts: 192.168.1.100:52346 → google:443
Each connection tracks its own:
Server Handling Thousands of Connections:
Server socket table:
┌─────────────────────────────────────────────────────────────────────┐
│ Source IP │ Src Port │ Dest IP │ Dst Port │ Socket FD │ │
├─────────────────────────────────────────────────────────────────────┤
│ 0.0.0.0 │ 443 │ * │ * │ 3 │ L │
│ 10.0.0.1 │ 52341 │ 203.0.113.50 │ 443 │ 5 │ E │
│ 10.0.0.1 │ 52342 │ 203.0.113.50 │ 443 │ 6 │ E │
│ 10.0.0.2 │ 52341 │ 203.0.113.50 │ 443 │ 7 │ E │
│ 192.168.5.100│ 48001 │ 203.0.113.50 │ 443 │ 8 │ E │
│ ... potentially 100,000+ more ... │ │
└─────────────────────────────────────────────────────────────────────┘
L = Listening socket, E = Established connection
Notice the listening socket has wildcards (*) for source IP and port. It matches any incoming SYN. Once a connection is established, a new socket with the specific 4-tuple handles that connection. The listening socket continues waiting for new connections, never transferring data itself.
The operating system maintains a connection table (also called the socket table or TCP control block table) that maps 4-tuples to connection sockets. Efficient implementation is critical for high-performance networking.
Data Structure Requirements:
Hash Table Implementation:
Most systems use hash tables indexed by 4-tuple:
Hash Function:
hash = hash(src_ip, src_port, dst_ip, dst_port)
bucket = hash % NUM_BUCKETS
Stored in bucket:
- Linked list of connections with same hash
- Each entry: 4-tuple → socket pointer
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
# TCP Connection Table Structure (simplified) struct tcp_connection { # Identification (the 4-tuple) uint32_t local_ip; uint16_t local_port; uint32_t remote_ip; uint16_t remote_port; # State enum tcp_state state; # ESTABLISHED, TIME_WAIT, etc. # Sequence number tracking uint32_t snd_una; # Oldest unacknowledged byte uint32_t snd_nxt; # Next byte to send uint32_t rcv_nxt; # Next byte expected # Buffers struct send_buffer *send_buf; struct recv_buffer *recv_buf; # Timers struct timer retransmit_timer; struct timer keepalive_timer; struct timer time_wait_timer; # Window management uint16_t snd_wnd; # Send window uint16_t rcv_wnd; # Receive window # Congestion control uint32_t cwnd; # Congestion window uint32_t ssthresh; # Slow start threshold # Hash chain for lookup struct tcp_connection *next_in_bucket;} # Hash table structurestruct connection_table { struct tcp_connection* buckets[HASH_SIZE]; spinlock_t bucket_locks[HASH_SIZE]; # Per-bucket locking atomic_t connection_count;} # Jenkins hash for 4-tuple (example)function connection_hash(local_ip, local_port, remote_ip, remote_port): h = local_ip h ^= (local_port << 16) h ^= remote_ip h ^= (remote_port << 16) h = jenkins_mix(h) # Additional mixing return h % HASH_SIZEMemory Consumption:
Each TCP connection requires significant kernel memory:
| Component | Typical Size |
|---|---|
| Connection structure | 200-400 bytes |
| Send buffer | 16 KB - 4 MB (tunable) |
| Receive buffer | 16 KB - 4 MB (tunable) |
| Socket structure | 200-600 bytes |
| Total per connection | ~50 KB - 8 MB |
With default settings (~50 KB per connection):
Scaling Strategies:
The 'C10K problem' (handling 10,000 concurrent connections) was a significant challenge in the early 2000s. Modern systems have advanced to the 'C10M problem' (10 million connections). Solutions include kernel bypass (DPDK), zero-copy networking, and optimized socket APIs (epoll, kqueue, io_uring).
Let's trace exactly how an incoming TCP segment is demultiplexed using connection identification.
Incoming Segment:
IP Header:
Source IP: 10.0.0.1
Destination IP: 203.0.113.50
Protocol: 6 (TCP)
TCP Header:
Source Port: 52341
Destination Port: 443
Sequence Number: 1000
ACK Number: 5000
Flags: ACK, PSH
Payload: "GET /index.html HTTP/1.1..."
Demultiplexing Process:
1. IP layer receives datagram
→ Destination 203.0.113.50 matches local IP ✓
→ Protocol 6 = TCP, pass to TCP handler
2. TCP handler extracts 4-tuple:
→ (10.0.0.1, 52341, 203.0.113.50, 443)
3. Hash and lookup:
→ hash(10.0.0.1, 52341, 203.0.113.50, 443) = 0x4F2A1
→ bucket = 0x4F2A1 % 65536 = 19633
→ Search bucket 19633 for exact 4-tuple match
4. Match found:
→ Socket FD 5 owns this connection
→ Connection state: ESTABLISHED ✓
5. Sequence number validation:
→ Segment seq 1000 within receive window? ✓
6. Deliver payload:
→ Add "GET /index.html..." to FD 5's receive buffer
→ Wake application if blocked on recv()
7. Send ACK (piggy-backed or delayed)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
function tcp_receive(ip_header, tcp_segment): # Step 1: Build the 4-tuple four_tuple = { local_ip: ip_header.dest_ip, local_port: tcp_segment.dest_port, remote_ip: ip_header.src_ip, remote_port: tcp_segment.src_port } # Step 2: Look up established connection connection = connection_table.lookup(four_tuple) if connection is not None: # Existing connection - process segment return process_established_segment(connection, tcp_segment) # Step 3: Check for listening socket listen_socket = find_listening_socket(four_tuple.local_ip, four_tuple.local_port) if listen_socket is not None: if tcp_segment.flags.SYN and not tcp_segment.flags.ACK: # New connection request return handle_syn(listen_socket, four_tuple, tcp_segment) else: # Non-SYN to listening socket send_rst(four_tuple) return # Step 4: No matching socket if not tcp_segment.flags.RST: send_rst(four_tuple) drop_segment() function process_established_segment(conn, segment): # Validate sequence numbers if not is_valid_sequence(conn, segment.seq_num): # Out of window - send duplicate ACK send_ack(conn) return # Process based on segment type if segment.flags.RST: close_connection(conn, "RST received") elif segment.flags.FIN: handle_fin(conn, segment) elif segment.flags.ACK: handle_ack(conn, segment) # Deliver payload to receive buffer if segment.payload_length > 0: conn.recv_buffer.add(segment.payload, segment.seq_num) notify_application(conn) # Send ACK for received data schedule_ack(conn)Notice the two-step lookup: first check established connections (with full 4-tuple match), then fall back to listening sockets (wildcards allowed). This ordering is important—we want to deliver segments to existing connections first, only creating new connections when no match exists and the segment is a valid SYN.
Several edge cases and special scenarios complicate connection identification. Understanding these is essential for debugging network issues.
Edge Case 1: TIME_WAIT Connections
A connection in TIME_WAIT still occupies its 4-tuple:
Connection closes:
(192.168.1.100, 52341, 203.0.113.50, 443) → TIME_WAIT
Another SYN arrives with same 4-tuple:
Option A: Reject (strict) - "connection exists"
Option B: Accept (lenient) - check if valid new connection
Modern implementations use sequence numbers to distinguish
delayed old segments from legitimate new connections.
Edge Case 2: Simultaneous Open
Both sides send SYN simultaneously (rare):
Client: SYN (seq=100) → ... → SYN (seq=200) :Server
Both enter SYN_RCVD state
Both respond with SYN-ACK
Both enter ESTABLISHED
Same 4-tuple, both initiated - handled correctly by TCP state machine
| Scenario | Problem | Resolution |
|---|---|---|
| TIME_WAIT collision | New connection has same 4-tuple as TIME_WAIT | Check sequence numbers; accept if clearly new |
| Duplicate SYN | Retransmitted SYN during handshake | Re-send SYN-ACK; idempotent response |
| Old duplicate SYN | Ancient SYN from previous connection | Sequence number check; RST if invalid |
| NAT port reuse | NAT reassigns same external port | May cause brief TIME_WAIT conflicts |
| Multi-homed host | Same connection via different interfaces | Include interface in lookup (zone ID) |
| Load balancer failover | Connection moves between servers | Connection state synchronization needed |
Edge Case 3: NAT and Connection Tracking
Network Address Translation creates challenges:
Original connection:
192.168.1.100:52341 → 8.8.8.8:443
After NAT:
203.0.113.1:61234 → 8.8.8.8:443
NAT device must:
1. Translate outgoing (192.168.1.100:52341 → 203.0.113.1:61234)
2. Track the mapping for incoming responses
3. Reverse-translate incoming (203.0.113.1:61234 → 192.168.1.100:52341)
The NAT device maintains its own connection table, translating 4-tuples on both directions.
Edge Case 4: Connection Hijacking Attempts
Attackers may try to inject segments into existing connections:
Legitimate: (10.0.0.1, 52341, 203.0.113.50, 443)
Attacker sends: Spoofed src=10.0.0.1:52341, guessed seq number
Protection:
1. 4-tuple must match exactly
2. Sequence number must be in receive window
3. (Optional) TCP MD5 or AO authentication
The combination of 4-tuple matching and sequence number validation provides reasonable protection against blind injection attacks.
If an attacker can guess the 4-tuple and current sequence numbers, they can inject malicious data into a connection. Initial Sequence Numbers (ISNs) should be random and unpredictable. Modern systems use cryptographically secure ISN generation to prevent these attacks.
Let's examine real-world tools and techniques for viewing and understanding connection identification.
Viewing Connection Tables:
# Linux - modern tool (ss = socket statistics)
ss -tnp
# State Recv-Q Send-Q Local Address:Port Peer Address:Port
# ESTAB 0 0 192.168.1.100:52341 93.184.216.34:443
# Linux - traditional tool (netstat)
netstat -tn
# Proto Local Address Foreign Address State
# tcp 192.168.1.100:52341 93.184.216.34:443 ESTABLISHED
# Windows
netstat -n
# Proto Local Address Foreign Address State
# TCP 192.168.1.100:52341 93.184.216.34:443 ESTABLISHED
# macOS
netstat -n | grep ESTABLISHED
Reading Connection Information:
Each line shows a unique 4-tuple:
12345678910111213141516171819202122232425
# Count connections per statess -tan state established | wc -l # Show connections to a specific remote portss -tn 'dport = :443' # Show connections from a specific sourcess -tn 'src 192.168.1.100' # Watch connection table in real-timewatch -n 1 'ss -tn | head -20' # Show detailed connection info (Linux)ss -tnei# Includes: timer info, congestion control, RTT, etc. # Show connections with process infoss -tnp# Will show: users:(("chrome",pid=1234,fd=56)) # Count connections per remote IPss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn # Find connections in TIME_WAITss -tan state time-waitPacket Capture and 4-Tuple:
Using tcpdump or Wireshark, you can observe 4-tuples in action:
# Capture packets showing 4-tuple
tcpdump -n -i eth0 'tcp port 443'
# Output:
# 192.168.1.100.52341 > 93.184.216.34.443: Flags [P.], seq 1:500, ack 1
# 93.184.216.34.443 > 192.168.1.100.52341: Flags [.], ack 500
# Format: src_ip.src_port > dst_ip.dst_port: [flags]
Filtering by Connection:
# Capture specific connection only
tcpdump -n 'host 192.168.1.100 and port 52341 and host 93.184.216.34 and port 443'
# This captures only the specific 4-tuple
Connection Statistics:
# /proc/net/tcp on Linux shows raw connection data
cat /proc/net/tcp
# sl local_address rem_address st tx_queue rx_queue ...
# 0: 6400A8C0:CC55 22D8B85D:01BB 01 00000000:00000000 ...
# (hex IP:port) (hex IP:port) (state)
When troubleshooting connection problems, start with 'ss -tan' to see all connections and their states. Look for connections stuck in SYN_SENT (connection not being accepted), TIME_WAIT (many recent closes), or CLOSE_WAIT (application not calling close()). Each state tells a story about connection lifecycle issues.
We've thoroughly explored TCP connection identification—the mechanism that enables millions of simultaneous connections to the same server port. Let's consolidate the key concepts:
What's Next:
We've covered how ports and connections identify endpoints. The final piece of the multiplexing/demultiplexing puzzle is understanding how data ultimately reaches application processes. The next page explores process mapping—the relationship between network sockets and operating system processes.
You now understand TCP connection identification in depth. You can explain why the 4-tuple is necessary, how servers handle thousands of connections to the same port, how connection tables are implemented, and how to use practical tools to inspect connection state.