Multiplexing Demultiplexing - Learning Module

Loading content...

0/228

Connection Identification

Beyond Port Numbers: The Complete Connection Identity

Consider a popular web server like Google's. At any given moment, millions of users are connected to google.com on port 443. If the server only used the destination port (443) to identify connections, all these users would be indistinguishable—their data would mix chaotically, and reliable communication would be impossible.

Clearly, port numbers alone are insufficient for TCP's connection-oriented model. TCP needs a way to uniquely identify each individual connection among millions of simultaneous connections to the same server port.

The solution is the 4-tuple: a combination of four values that uniquely identifies every TCP connection on the entire Internet:

(Source IP, Source Port, Destination IP, Destination Port)

This 4-tuple is the complete "name" of a TCP connection—no two active connections anywhere can have the same 4-tuple.

What You Will Master

By the end of this page, you will understand TCP connection identification in depth—why the 4-tuple is necessary, how it enables massive concurrency, how connections are created and tracked, and how the operating system manages millions of connection identities efficiently. You'll also understand the contrast with UDP's simpler identification model.

The 4-Tuple Explained

What is a 4-Tuple?

A 4-tuple (also called a quad or socket pair) is the combination of four network identifiers that together uniquely specify a TCP connection:

Source IP Address (32 bits for IPv4, 128 bits for IPv6)
- The IP address of the machine initiating the connection
- Identifies the client host
Source Port (16 bits)
- The port number on the source machine
- Identifies the specific socket/process on the client
Destination IP Address (32 bits for IPv4, 128 bits for IPv6)
- The IP address of the target machine
- Identifies the server host
Destination Port (16 bits)
- The port number on the target machine
- Identifies the specific service/process on the server

Formal Notation:

4-tuple notation: (192.168.1.100, 52341, 203.0.113.50, 443)
                  (  src_ip,      src_port, dst_ip,     dst_port)

Alternative notation: 192.168.1.100:52341 <-> 203.0.113.50:443

Symmetry of the 4-Tuple

From the server's perspective, the 4-tuple appears reversed: what was the source becomes the destination, and vice versa. The same connection has 4-tuple (A, a, B, b) from A's view and (B, b, A, a) from B's view. Both representations identify the same connection.

Why All Four Components Are Necessary:

If We Omit...	Problem
Source IP	Can't distinguish connections from different clients
Source Port	Can't distinguish multiple connections from same client
Destination IP	Needed for multi-homed hosts (multiple IPs)
Destination Port	Can't distinguish connections to different services

Mathematical Perspective:

The 4-tuple creates an enormous address space:

IPv4: 2³² × 2¹⁶ × 2³² × 2¹⁶ = 2⁹⁶ possible 4-tuples
IPv6: 2¹²⁸ × 2¹⁶ × 2¹²⁸ × 2¹⁶ = 2²⁸⁸ possible 4-tuples

In practice, practical limits are much lower (IP addresses in use, port range constraints), but there's still space for billions of simultaneous connections.

Connection Identification vs. Port Identification

Understanding the distinction between port-based and connection-based identification clarifies why TCP and UDP demultiplex differently.

Port-Based Identification (UDP):

UDP uses a 2-tuple: (Destination IP, Destination Port)

UDP Socket A listens on port 53
   ↓
All datagrams to port 53 go to Socket A
   ↓
Socket A receives from any source

Simple and fast
No per-source state
One socket handles all clients
Application must track sender identity

Connection-Based Identification (TCP):

TCP uses a 4-tuple: (Source IP, Source Port, Dest IP, Dest Port)

TCP Listening Socket on port 80
   ↓
Client A connects → New Socket (A's 4-tuple)
Client B connects → New Socket (B's 4-tuple)
Client A opens 2nd connection → New Socket (A's new 4-tuple)
   ↓
Each socket is completely independent

Connection isolation
Per-connection state (sequence numbers, windows)
Separate socket per client connection
Operating system tracks connections

UDP: 2-Tuple Model

•One socket per local port
•All sources share the socket
•No connection state maintained
•Simpler, lower memory usage
•Application layer handles multiplexing
•Best for: stateless req/response

TCP: 4-Tuple Model

•One socket per connection
•Each connection isolated
•Full state per connection
•More memory, more tracking
•OS handles multiplexing fully
•Best for: ongoing data streams

Practical Implication:

A UDP server on port 53 (DNS) with 10,000 clients:

1 socket
Minimal kernel memory
Application receives "from" address with each packet

A TCP server on port 443 (HTTPS) with 10,000 clients:

10,000 connection sockets + 1 listening socket
Significant kernel memory (buffers, state per connection)
Each connection is separate file descriptor

This is why high-connection-count services (like WebSocket servers or long-polling) require careful resource planning—each TCP connection consumes kernel resources.

Connection Lifecycle and Identification

Let's trace how a connection's 4-tuple is established, used, and released throughout its lifecycle.

Phase 1: Connection Initiation (Client Side)

1. Client application calls connect(server_ip, server_port)
2. OS selects ephemeral source port (e.g., 52341)
3. 4-tuple is now known:
   (client_ip=192.168.1.100, src_port=52341, 
    server_ip=203.0.113.50, dst_port=443)
4. Client sends SYN with this 4-tuple

Phase 2: Connection Acceptance (Server Side)

1. SYN arrives at server's listening socket on port 443
2. Server extracts 4-tuple from SYN packet
3. Server creates embryonic connection (SYN_RCVD state)
4. Server sends SYN-ACK
5. Client sends ACK
6. Server creates new connection socket for this 4-tuple
7. accept() returns the new socket to application

Phase 3: Data Transfer

every segment includes source and destination ports
→ receiver matches against known 4-tuples
→ data delivered to correct connection socket

Phase 4: Connection Termination

1. One side initiates close (sends FIN)
2. Four-way handshake completes
3. Socket moves to TIME_WAIT (typically 60-120 seconds)
4. During TIME_WAIT, 4-tuple cannot be reused
5. After TIME_WAIT expires, 4-tuple is released

Connection Identification Through Lifecycle

Diagram

Connection Lifecycle                    4-Tuple State
═══════════════════                    ════════════════
 
Client: connect()                      4-tuple selected
     │                                 (192.168.1.100:52341, 203.0.113.50:443)
     ▼                                       │
[SYN] ──────────────────────────────────────►│
                                             │
     ◄────────────────────────────────── [SYN-ACK]
                                             │
[ACK] ──────────────────────────────────────►│
     │                                       │
     ▼                                       ▼
ESTABLISHED ◄────────────────────────► ESTABLISHED
     │                                       │
     │  ══════ Data Transfer ══════         │
     │    Every segment carries:             │
     │    src=52341, dst=443                 │
     │    (or reversed for responses)        │
     │                                       │
     ▼                                       ▼
[FIN] ──────────────────────────────────────►│
     ◄────────────────────────────────── [ACK]
     ◄────────────────────────────────── [FIN]
[ACK] ──────────────────────────────────────►│
     │                                       │
     ▼                                       ▼
TIME_WAIT (2*MSL)                      CLOSED
     │                                       │
     │   4-tuple locked during this time     │
     │   (prevents delayed segment confusion) │
     ▼                                       
CLOSED                                 4-tuple released
     │
     └─► Port 52341 available for reuse

TIME_WAIT and 4-Tuple Reuse

During TIME_WAIT, the 4-tuple is locked because old delayed segments might still arrive. If the same 4-tuple were immediately reused for a new connection, these old segments could be mistakenly delivered to the new connection. The TIME_WAIT period (2× Maximum Segment Lifetime, typically 60-120 seconds) ensures all old segments have expired.

Same Port, Multiple Connections

The power of 4-tuple identification is most evident when we examine how a server handles multiple connections to the same port.

Scenario: Web Server Port 443

Server IP: 203.0.113.50 Server Port: 443 (HTTPS) Active connections:

Connection 1: (10.0.0.1, 52341, 203.0.113.50, 443)
   └─► Client 10.0.0.1, first browser tab

Connection 2: (10.0.0.1, 52342, 203.0.113.50, 443)
   └─► Client 10.0.0.1, second browser tab (same client!)

Connection 3: (10.0.0.2, 52341, 203.0.113.50, 443)
   └─► Client 10.0.0.2 (same source port as Connection 1, different IP)

Connection 4: (192.168.5.100, 48001, 203.0.113.50, 443)
   └─► Different network entirely

All four connections use destination port 443, but they're completely separate because their 4-tuples differ.

Distinguishing Connections by 4-Tuple
Connection	Source IP	Source Port	Dest IP	Dest Port	Unique?
1	10.0.0.1	52341	203.0.113.50	443	Yes
2	10.0.0.1	52342	203.0.113.50	443	Yes (diff src port)
3	10.0.0.2	52341	203.0.113.50	443	Yes (diff src IP)
4	192.168.5.100	48001	203.0.113.50	443	Yes (diff src IP)
5	10.0.0.1	52341	203.0.113.50	443	NO - same as #1

Client Opens Multiple Connections:

A single web page might require 6-8 connections to load all resources (HTML, CSS, images, scripts). The browser uses different source ports:

Browser loading google.com from 192.168.1.100:
├── Connection for main HTML:  192.168.1.100:52341 → google:443
├── Connection for CSS:        192.168.1.100:52342 → google:443
├── Connection for logo.png:   192.168.1.100:52343 → google:443
├── Connection for app.js:     192.168.1.100:52344 → google:443
├── Connection for analytics:  192.168.1.100:52345 → google:443
└── Connection for fonts:      192.168.1.100:52346 → google:443

Each connection tracks its own:

Sequence numbers
Acknowledgments
Window sizes
Retransmission timers
Data buffers

Server Handling Thousands of Connections:

Server socket table:
┌─────────────────────────────────────────────────────────────────────┐
│  Source IP    │ Src Port │   Dest IP    │ Dst Port │ Socket FD │   │
├─────────────────────────────────────────────────────────────────────┤
│  0.0.0.0      │    443   │     *        │    *     │    3      │ L │
│  10.0.0.1     │  52341   │ 203.0.113.50 │   443    │    5      │ E │
│  10.0.0.1     │  52342   │ 203.0.113.50 │   443    │    6      │ E │
│  10.0.0.2     │  52341   │ 203.0.113.50 │   443    │    7      │ E │
│  192.168.5.100│  48001   │ 203.0.113.50 │   443    │    8      │ E │
│   ... potentially 100,000+ more ...                              │  │
└─────────────────────────────────────────────────────────────────────┘
 L = Listening socket, E = Established connection

The Listening Socket is Special

Notice the listening socket has wildcards (*) for source IP and port. It matches any incoming SYN. Once a connection is established, a new socket with the specific 4-tuple handles that connection. The listening socket continues waiting for new connections, never transferring data itself.

Connection Table Implementation

The operating system maintains a connection table (also called the socket table or TCP control block table) that maps 4-tuples to connection sockets. Efficient implementation is critical for high-performance networking.

Data Structure Requirements:

Fast Lookup: O(1) average case for incoming segments
Fast Insert: O(1) for new connections
Fast Delete: O(1) for closed connections
Memory Efficient: Minimal overhead per connection
Concurrent: Support multiple CPU cores

Hash Table Implementation:

Most systems use hash tables indexed by 4-tuple:

Hash Function:
hash = hash(src_ip, src_port, dst_ip, dst_port)
bucket = hash % NUM_BUCKETS

Stored in bucket:
- Linked list of connections with same hash
- Each entry: 4-tuple → socket pointer

Connection Table Structure (Linux-style)

Pseudocode

# TCP Connection Table Structure (simplified)
 
struct tcp_connection {
    # Identification (the 4-tuple)
    uint32_t local_ip;
    uint16_t local_port;
    uint32_t remote_ip;
    uint16_t remote_port;
    
    # State
    enum tcp_state state;    # ESTABLISHED, TIME_WAIT, etc.
    
    # Sequence number tracking
    uint32_t snd_una;        # Oldest unacknowledged byte
    uint32_t snd_nxt;        # Next byte to send
    uint32_t rcv_nxt;        # Next byte expected
    
    # Buffers
    struct send_buffer *send_buf;
    struct recv_buffer *recv_buf;
    
    # Timers
    struct timer retransmit_timer;
    struct timer keepalive_timer;
    struct timer time_wait_timer;
    
    # Window management
    uint16_t snd_wnd;        # Send window
    uint16_t rcv_wnd;        # Receive window
    
    # Congestion control
    uint32_t cwnd;           # Congestion window
    uint32_t ssthresh;       # Slow start threshold
    
    # Hash chain for lookup
    struct tcp_connection *next_in_bucket;
}
 
# Hash table structure
struct connection_table {
    struct tcp_connection* buckets[HASH_SIZE];
    spinlock_t bucket_locks[HASH_SIZE];  # Per-bucket locking
    atomic_t connection_count;
}
 
# Jenkins hash for 4-tuple (example)
function connection_hash(local_ip, local_port, remote_ip, remote_port):
    h = local_ip
    h ^= (local_port << 16)
    h ^= remote_ip
    h ^= (remote_port << 16)
    h = jenkins_mix(h)    # Additional mixing
    return h % HASH_SIZE

Memory Consumption:

Each TCP connection requires significant kernel memory:

Component	Typical Size
Connection structure	200-400 bytes
Send buffer	16 KB - 4 MB (tunable)
Receive buffer	16 KB - 4 MB (tunable)
Socket structure	200-600 bytes
Total per connection	~50 KB - 8 MB

With default settings (~50 KB per connection):

10,000 connections: ~500 MB
100,000 connections: ~5 GB
1,000,000 connections: ~50 GB

Scaling Strategies:

Reduce buffer sizes: Lower default socket buffer sizes
Increase hash table size: Reduce chain length for faster lookup
Use connection pooling: Reuse connections instead of creating new ones
Consider UDP: For high-fanout, low-state scenarios

C10K and C10M Problems

The 'C10K problem' (handling 10,000 concurrent connections) was a significant challenge in the early 2000s. Modern systems have advanced to the 'C10M problem' (10 million connections). Solutions include kernel bypass (DPDK), zero-copy networking, and optimized socket APIs (epoll, kqueue, io_uring).

Demultiplexing with Connection Identification

Let's trace exactly how an incoming TCP segment is demultiplexed using connection identification.

Incoming Segment:

IP Header:
  Source IP: 10.0.0.1
  Destination IP: 203.0.113.50
  Protocol: 6 (TCP)

TCP Header:
  Source Port: 52341
  Destination Port: 443
  Sequence Number: 1000
  ACK Number: 5000
  Flags: ACK, PSH
  
Payload: "GET /index.html HTTP/1.1..."

Demultiplexing Process:

1. IP layer receives datagram
   → Destination 203.0.113.50 matches local IP ✓
   → Protocol 6 = TCP, pass to TCP handler

2. TCP handler extracts 4-tuple:
   → (10.0.0.1, 52341, 203.0.113.50, 443)

3. Hash and lookup:
   → hash(10.0.0.1, 52341, 203.0.113.50, 443) = 0x4F2A1
   → bucket = 0x4F2A1 % 65536 = 19633
   → Search bucket 19633 for exact 4-tuple match

4. Match found:
   → Socket FD 5 owns this connection
   → Connection state: ESTABLISHED ✓

5. Sequence number validation:
   → Segment seq 1000 within receive window? ✓

6. Deliver payload:
   → Add "GET /index.html..." to FD 5's receive buffer
   → Wake application if blocked on recv()

7. Send ACK (piggy-backed or delayed)

TCP Demultiplexing Algorithm

Pseudocode

function tcp_receive(ip_header, tcp_segment):
    
    # Step 1: Build the 4-tuple
    four_tuple = {
        local_ip:    ip_header.dest_ip,
        local_port:  tcp_segment.dest_port,
        remote_ip:   ip_header.src_ip,
        remote_port: tcp_segment.src_port
    }
    
    # Step 2: Look up established connection
    connection = connection_table.lookup(four_tuple)
    
    if connection is not None:
        # Existing connection - process segment
        return process_established_segment(connection, tcp_segment)
    
    # Step 3: Check for listening socket
    listen_socket = find_listening_socket(four_tuple.local_ip, 
                                          four_tuple.local_port)
    
    if listen_socket is not None:
        if tcp_segment.flags.SYN and not tcp_segment.flags.ACK:
            # New connection request
            return handle_syn(listen_socket, four_tuple, tcp_segment)
        else:
            # Non-SYN to listening socket
            send_rst(four_tuple)
            return
    
    # Step 4: No matching socket
    if not tcp_segment.flags.RST:
        send_rst(four_tuple)
    drop_segment()
 
 
function process_established_segment(conn, segment):
    # Validate sequence numbers
    if not is_valid_sequence(conn, segment.seq_num):
        # Out of window - send duplicate ACK
        send_ack(conn)
        return
    
    # Process based on segment type
    if segment.flags.RST:
        close_connection(conn, "RST received")
    elif segment.flags.FIN:
        handle_fin(conn, segment)
    elif segment.flags.ACK:
        handle_ack(conn, segment)
    
    # Deliver payload to receive buffer
    if segment.payload_length > 0:
        conn.recv_buffer.add(segment.payload, segment.seq_num)
        notify_application(conn)
    
    # Send ACK for received data
    schedule_ack(conn)

The Two-Step Lookup

Notice the two-step lookup: first check established connections (with full 4-tuple match), then fall back to listening sockets (wildcards allowed). This ordering is important—we want to deliver segments to existing connections first, only creating new connections when no match exists and the segment is a valid SYN.

Edge Cases in Connection Identification

Several edge cases and special scenarios complicate connection identification. Understanding these is essential for debugging network issues.

Edge Case 1: TIME_WAIT Connections

A connection in TIME_WAIT still occupies its 4-tuple:

Connection closes:
  (192.168.1.100, 52341, 203.0.113.50, 443) → TIME_WAIT

Another SYN arrives with same 4-tuple:
  Option A: Reject (strict) - "connection exists"
  Option B: Accept (lenient) - check if valid new connection
  
Modern implementations use sequence numbers to distinguish
delayed old segments from legitimate new connections.

Edge Case 2: Simultaneous Open

Both sides send SYN simultaneously (rare):

Client: SYN (seq=100) → ... → SYN (seq=200) :Server

Both enter SYN_RCVD state
Both respond with SYN-ACK
Both enter ESTABLISHED

Same 4-tuple, both initiated - handled correctly by TCP state machine

Connection Identification Edge Cases
Scenario	Problem	Resolution
TIME_WAIT collision	New connection has same 4-tuple as TIME_WAIT	Check sequence numbers; accept if clearly new
Duplicate SYN	Retransmitted SYN during handshake	Re-send SYN-ACK; idempotent response
Old duplicate SYN	Ancient SYN from previous connection	Sequence number check; RST if invalid
NAT port reuse	NAT reassigns same external port	May cause brief TIME_WAIT conflicts
Multi-homed host	Same connection via different interfaces	Include interface in lookup (zone ID)
Load balancer failover	Connection moves between servers	Connection state synchronization needed

Edge Case 3: NAT and Connection Tracking

Network Address Translation creates challenges:

Original connection:
  192.168.1.100:52341 → 8.8.8.8:443

After NAT:
  203.0.113.1:61234 → 8.8.8.8:443

NAT device must:
1. Translate outgoing (192.168.1.100:52341 → 203.0.113.1:61234)
2. Track the mapping for incoming responses
3. Reverse-translate incoming (203.0.113.1:61234 → 192.168.1.100:52341)

The NAT device maintains its own connection table, translating 4-tuples on both directions.

Edge Case 4: Connection Hijacking Attempts

Attackers may try to inject segments into existing connections:

Legitimate: (10.0.0.1, 52341, 203.0.113.50, 443)

Attacker sends: Spoofed src=10.0.0.1:52341, guessed seq number

Protection:
1. 4-tuple must match exactly
2. Sequence number must be in receive window
3. (Optional) TCP MD5 or AO authentication

The combination of 4-tuple matching and sequence number validation provides reasonable protection against blind injection attacks.

Sequence Number Attacks

If an attacker can guess the 4-tuple and current sequence numbers, they can inject malicious data into a connection. Initial Sequence Numbers (ISNs) should be random and unpredictable. Modern systems use cryptographically secure ISN generation to prevent these attacks.

Practical Connection Identification

Let's examine real-world tools and techniques for viewing and understanding connection identification.

Viewing Connection Tables:

# Linux - modern tool (ss = socket statistics)
ss -tnp
# State Recv-Q Send-Q Local Address:Port Peer Address:Port
# ESTAB 0      0      192.168.1.100:52341 93.184.216.34:443

# Linux - traditional tool (netstat)
netstat -tn
# Proto Local Address         Foreign Address       State
# tcp   192.168.1.100:52341   93.184.216.34:443     ESTABLISHED

# Windows
netstat -n
# Proto Local Address         Foreign Address       State
# TCP   192.168.1.100:52341   93.184.216.34:443     ESTABLISHED

# macOS
netstat -n | grep ESTABLISHED

Reading Connection Information:

Each line shows a unique 4-tuple:

Local Address:Port = (Local IP, Local Port)
Foreign Address:Port = (Remote IP, Remote Port)
State = TCP connection state

Connection Inspection Examples
Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Count connections per state
ss -tan state established | wc -l
 
# Show connections to a specific remote port
ss -tn 'dport = :443'
 
# Show connections from a specific source
ss -tn 'src 192.168.1.100'
 
# Watch connection table in real-time
watch -n 1 'ss -tn | head -20'
 
# Show detailed connection info (Linux)
ss -tnei
# Includes: timer info, congestion control, RTT, etc.
 
# Show connections with process info
ss -tnp
# Will show: users:(("chrome",pid=1234,fd=56))
 
# Count connections per remote IP
ss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn
 
# Find connections in TIME_WAIT
ss -tan state time-wait

Packet Capture and 4-Tuple:

Using tcpdump or Wireshark, you can observe 4-tuples in action:

# Capture packets showing 4-tuple
tcpdump -n -i eth0 'tcp port 443'

# Output:
# 192.168.1.100.52341 > 93.184.216.34.443: Flags [P.], seq 1:500, ack 1
# 93.184.216.34.443 > 192.168.1.100.52341: Flags [.], ack 500

# Format: src_ip.src_port > dst_ip.dst_port: [flags]

Filtering by Connection:

# Capture specific connection only
tcpdump -n 'host 192.168.1.100 and port 52341 and host 93.184.216.34 and port 443'

# This captures only the specific 4-tuple

Connection Statistics:

# /proc/net/tcp on Linux shows raw connection data
cat /proc/net/tcp
# sl  local_address rem_address   st tx_queue rx_queue ...
#  0: 6400A8C0:CC55 22D8B85D:01BB 01 00000000:00000000 ...
#     (hex IP:port) (hex IP:port) (state)

Debugging Connection Issues

When troubleshooting connection problems, start with 'ss -tan' to see all connections and their states. Look for connections stuck in SYN_SENT (connection not being accepted), TIME_WAIT (many recent closes), or CLOSE_WAIT (application not calling close()). Each state tells a story about connection lifecycle issues.

Summary: Connection Identification

We've thoroughly explored TCP connection identification—the mechanism that enables millions of simultaneous connections to the same server port. Let's consolidate the key concepts:

Key Takeaways

•The 4-tuple uniquely identifies TCP connections — (Source IP, Source Port, Dest IP, Dest Port) creates a globally unique identifier
•Same port, multiple connections — Thousands of connections to port 443 are distinguished by different 4-tuples
•TCP differs from UDP identification — TCP uses 4-tuple (connection), UDP uses 2-tuple (port only)
•Connection tables enable fast lookup — Hash tables provide O(1) average lookup for millions of connections
•Each connection has full state — Sequence numbers, buffers, timers, congestion control are per-connection
•TIME_WAIT protects 4-tuple reuse — Prevents confusion from delayed segments
•Listening sockets use wildcards — Accept any source, then create specific connection sockets

What's Next:

We've covered how ports and connections identify endpoints. The final piece of the multiplexing/demultiplexing puzzle is understanding how data ultimately reaches application processes. The next page explores process mapping—the relationship between network sockets and operating system processes.

Page Complete

You now understand TCP connection identification in depth. You can explain why the 4-tuple is necessary, how servers handle thousands of connections to the same port, how connection tables are implemented, and how to use practical tools to inspect connection state.

Connection Identification

Beyond Port Numbers: The Complete Connection Identity

The solution is the 4-tuple: a combination of four values that uniquely identifies every TCP connection on the entire Internet:

(Source IP, Source Port, Destination IP, Destination Port)

This 4-tuple is the complete "name" of a TCP connection—no two active connections anywhere can have the same 4-tuple.

What You Will Master

The 4-Tuple Explained

What is a 4-Tuple?

A 4-tuple (also called a quad or socket pair) is the combination of four network identifiers that together uniquely specify a TCP connection:

Source IP Address (32 bits for IPv4, 128 bits for IPv6)
- The IP address of the machine initiating the connection
- Identifies the client host
Source Port (16 bits)
- The port number on the source machine
- Identifies the specific socket/process on the client
Destination IP Address (32 bits for IPv4, 128 bits for IPv6)
- The IP address of the target machine
- Identifies the server host
Destination Port (16 bits)
- The port number on the target machine
- Identifies the specific service/process on the server

Formal Notation:

4-tuple notation: (192.168.1.100, 52341, 203.0.113.50, 443)
                  (  src_ip,      src_port, dst_ip,     dst_port)

Alternative notation: 192.168.1.100:52341 <-> 203.0.113.50:443

Symmetry of the 4-Tuple

Why All Four Components Are Necessary:

If We Omit...	Problem
Source IP	Can't distinguish connections from different clients
Source Port	Can't distinguish multiple connections from same client
Destination IP	Needed for multi-homed hosts (multiple IPs)
Destination Port	Can't distinguish connections to different services

Mathematical Perspective:

The 4-tuple creates an enormous address space:

IPv4: 2³² × 2¹⁶ × 2³² × 2¹⁶ = 2⁹⁶ possible 4-tuples
IPv6: 2¹²⁸ × 2¹⁶ × 2¹²⁸ × 2¹⁶ = 2²⁸⁸ possible 4-tuples

In practice, practical limits are much lower (IP addresses in use, port range constraints), but there's still space for billions of simultaneous connections.

Connection Identification vs. Port Identification

Understanding the distinction between port-based and connection-based identification clarifies why TCP and UDP demultiplex differently.

Port-Based Identification (UDP):

UDP uses a 2-tuple: (Destination IP, Destination Port)

UDP Socket A listens on port 53
   ↓
All datagrams to port 53 go to Socket A
   ↓
Socket A receives from any source

Simple and fast
No per-source state
One socket handles all clients
Application must track sender identity

Connection-Based Identification (TCP):

TCP uses a 4-tuple: (Source IP, Source Port, Dest IP, Dest Port)

TCP Listening Socket on port 80
   ↓
Client A connects → New Socket (A's 4-tuple)
Client B connects → New Socket (B's 4-tuple)
Client A opens 2nd connection → New Socket (A's new 4-tuple)
   ↓
Each socket is completely independent

Connection isolation
Per-connection state (sequence numbers, windows)
Separate socket per client connection
Operating system tracks connections

UDP: 2-Tuple Model

•One socket per local port
•All sources share the socket
•No connection state maintained
•Simpler, lower memory usage
•Application layer handles multiplexing
•Best for: stateless req/response

TCP: 4-Tuple Model

•One socket per connection
•Each connection isolated
•Full state per connection
•More memory, more tracking
•OS handles multiplexing fully
•Best for: ongoing data streams

Practical Implication:

A UDP server on port 53 (DNS) with 10,000 clients:

1 socket
Minimal kernel memory
Application receives "from" address with each packet

A TCP server on port 443 (HTTPS) with 10,000 clients:

10,000 connection sockets + 1 listening socket
Significant kernel memory (buffers, state per connection)
Each connection is separate file descriptor

This is why high-connection-count services (like WebSocket servers or long-polling) require careful resource planning—each TCP connection consumes kernel resources.

Connection Lifecycle and Identification

Let's trace how a connection's 4-tuple is established, used, and released throughout its lifecycle.

Phase 1: Connection Initiation (Client Side)

1. Client application calls connect(server_ip, server_port)
2. OS selects ephemeral source port (e.g., 52341)
3. 4-tuple is now known:
   (client_ip=192.168.1.100, src_port=52341, 
    server_ip=203.0.113.50, dst_port=443)
4. Client sends SYN with this 4-tuple

Phase 2: Connection Acceptance (Server Side)

1. SYN arrives at server's listening socket on port 443
2. Server extracts 4-tuple from SYN packet
3. Server creates embryonic connection (SYN_RCVD state)
4. Server sends SYN-ACK
5. Client sends ACK
6. Server creates new connection socket for this 4-tuple
7. accept() returns the new socket to application

Phase 3: Data Transfer

every segment includes source and destination ports
→ receiver matches against known 4-tuples
→ data delivered to correct connection socket

Phase 4: Connection Termination

1. One side initiates close (sends FIN)
2. Four-way handshake completes
3. Socket moves to TIME_WAIT (typically 60-120 seconds)
4. During TIME_WAIT, 4-tuple cannot be reused
5. After TIME_WAIT expires, 4-tuple is released

Connection Identification Through Lifecycle

Diagram

Connection Lifecycle                    4-Tuple State
═══════════════════                    ════════════════
 
Client: connect()                      4-tuple selected
     │                                 (192.168.1.100:52341, 203.0.113.50:443)
     ▼                                       │
[SYN] ──────────────────────────────────────►│
                                             │
     ◄────────────────────────────────── [SYN-ACK]
                                             │
[ACK] ──────────────────────────────────────►│
     │                                       │
     ▼                                       ▼
ESTABLISHED ◄────────────────────────► ESTABLISHED
     │                                       │
     │  ══════ Data Transfer ══════         │
     │    Every segment carries:             │
     │    src=52341, dst=443                 │
     │    (or reversed for responses)        │
     │                                       │
     ▼                                       ▼
[FIN] ──────────────────────────────────────►│
     ◄────────────────────────────────── [ACK]
     ◄────────────────────────────────── [FIN]
[ACK] ──────────────────────────────────────►│
     │                                       │
     ▼                                       ▼
TIME_WAIT (2*MSL)                      CLOSED
     │                                       │
     │   4-tuple locked during this time     │
     │   (prevents delayed segment confusion) │
     ▼                                       
CLOSED                                 4-tuple released
     │
     └─► Port 52341 available for reuse

TIME_WAIT and 4-Tuple Reuse

Same Port, Multiple Connections

The power of 4-tuple identification is most evident when we examine how a server handles multiple connections to the same port.

Scenario: Web Server Port 443

Server IP: 203.0.113.50 Server Port: 443 (HTTPS) Active connections:

Connection 1: (10.0.0.1, 52341, 203.0.113.50, 443)
   └─► Client 10.0.0.1, first browser tab

Connection 2: (10.0.0.1, 52342, 203.0.113.50, 443)
   └─► Client 10.0.0.1, second browser tab (same client!)

Connection 3: (10.0.0.2, 52341, 203.0.113.50, 443)
   └─► Client 10.0.0.2 (same source port as Connection 1, different IP)

Connection 4: (192.168.5.100, 48001, 203.0.113.50, 443)
   └─► Different network entirely

All four connections use destination port 443, but they're completely separate because their 4-tuples differ.

Distinguishing Connections by 4-Tuple
Connection	Source IP	Source Port	Dest IP	Dest Port	Unique?
1	10.0.0.1	52341	203.0.113.50	443	Yes
2	10.0.0.1	52342	203.0.113.50	443	Yes (diff src port)
3	10.0.0.2	52341	203.0.113.50	443	Yes (diff src IP)
4	192.168.5.100	48001	203.0.113.50	443	Yes (diff src IP)
5	10.0.0.1	52341	203.0.113.50	443	NO - same as #1

Client Opens Multiple Connections:

A single web page might require 6-8 connections to load all resources (HTML, CSS, images, scripts). The browser uses different source ports:

Browser loading google.com from 192.168.1.100:
├── Connection for main HTML:  192.168.1.100:52341 → google:443
├── Connection for CSS:        192.168.1.100:52342 → google:443
├── Connection for logo.png:   192.168.1.100:52343 → google:443
├── Connection for app.js:     192.168.1.100:52344 → google:443
├── Connection for analytics:  192.168.1.100:52345 → google:443
└── Connection for fonts:      192.168.1.100:52346 → google:443

Each connection tracks its own:

Sequence numbers
Acknowledgments
Window sizes
Retransmission timers
Data buffers

Server Handling Thousands of Connections:

Server socket table:
┌─────────────────────────────────────────────────────────────────────┐
│  Source IP    │ Src Port │   Dest IP    │ Dst Port │ Socket FD │   │
├─────────────────────────────────────────────────────────────────────┤
│  0.0.0.0      │    443   │     *        │    *     │    3      │ L │
│  10.0.0.1     │  52341   │ 203.0.113.50 │   443    │    5      │ E │
│  10.0.0.1     │  52342   │ 203.0.113.50 │   443    │    6      │ E │
│  10.0.0.2     │  52341   │ 203.0.113.50 │   443    │    7      │ E │
│  192.168.5.100│  48001   │ 203.0.113.50 │   443    │    8      │ E │
│   ... potentially 100,000+ more ...                              │  │
└─────────────────────────────────────────────────────────────────────┘
 L = Listening socket, E = Established connection

The Listening Socket is Special

Connection Table Implementation

Data Structure Requirements:

Fast Lookup: O(1) average case for incoming segments
Fast Insert: O(1) for new connections
Fast Delete: O(1) for closed connections
Memory Efficient: Minimal overhead per connection
Concurrent: Support multiple CPU cores

Hash Table Implementation:

Most systems use hash tables indexed by 4-tuple:

Hash Function:
hash = hash(src_ip, src_port, dst_ip, dst_port)
bucket = hash % NUM_BUCKETS

Stored in bucket:
- Linked list of connections with same hash
- Each entry: 4-tuple → socket pointer

Connection Table Structure (Linux-style)

Pseudocode

# TCP Connection Table Structure (simplified)
 
struct tcp_connection {
    # Identification (the 4-tuple)
    uint32_t local_ip;
    uint16_t local_port;
    uint32_t remote_ip;
    uint16_t remote_port;
    
    # State
    enum tcp_state state;    # ESTABLISHED, TIME_WAIT, etc.
    
    # Sequence number tracking
    uint32_t snd_una;        # Oldest unacknowledged byte
    uint32_t snd_nxt;        # Next byte to send
    uint32_t rcv_nxt;        # Next byte expected
    
    # Buffers
    struct send_buffer *send_buf;
    struct recv_buffer *recv_buf;
    
    # Timers
    struct timer retransmit_timer;
    struct timer keepalive_timer;
    struct timer time_wait_timer;
    
    # Window management
    uint16_t snd_wnd;        # Send window
    uint16_t rcv_wnd;        # Receive window
    
    # Congestion control
    uint32_t cwnd;           # Congestion window
    uint32_t ssthresh;       # Slow start threshold
    
    # Hash chain for lookup
    struct tcp_connection *next_in_bucket;
}
 
# Hash table structure
struct connection_table {
    struct tcp_connection* buckets[HASH_SIZE];
    spinlock_t bucket_locks[HASH_SIZE];  # Per-bucket locking
    atomic_t connection_count;
}
 
# Jenkins hash for 4-tuple (example)
function connection_hash(local_ip, local_port, remote_ip, remote_port):
    h = local_ip
    h ^= (local_port << 16)
    h ^= remote_ip
    h ^= (remote_port << 16)
    h = jenkins_mix(h)    # Additional mixing
    return h % HASH_SIZE

Memory Consumption:

Each TCP connection requires significant kernel memory:

Component	Typical Size
Connection structure	200-400 bytes
Send buffer	16 KB - 4 MB (tunable)
Receive buffer	16 KB - 4 MB (tunable)
Socket structure	200-600 bytes
Total per connection	~50 KB - 8 MB

With default settings (~50 KB per connection):

10,000 connections: ~500 MB
100,000 connections: ~5 GB
1,000,000 connections: ~50 GB

Scaling Strategies:

Reduce buffer sizes: Lower default socket buffer sizes
Increase hash table size: Reduce chain length for faster lookup
Use connection pooling: Reuse connections instead of creating new ones
Consider UDP: For high-fanout, low-state scenarios

C10K and C10M Problems

Demultiplexing with Connection Identification

Let's trace exactly how an incoming TCP segment is demultiplexed using connection identification.

Incoming Segment:

IP Header:
  Source IP: 10.0.0.1
  Destination IP: 203.0.113.50
  Protocol: 6 (TCP)

TCP Header:
  Source Port: 52341
  Destination Port: 443
  Sequence Number: 1000
  ACK Number: 5000
  Flags: ACK, PSH
  
Payload: "GET /index.html HTTP/1.1..."

Demultiplexing Process:

1. IP layer receives datagram
   → Destination 203.0.113.50 matches local IP ✓
   → Protocol 6 = TCP, pass to TCP handler

2. TCP handler extracts 4-tuple:
   → (10.0.0.1, 52341, 203.0.113.50, 443)

3. Hash and lookup:
   → hash(10.0.0.1, 52341, 203.0.113.50, 443) = 0x4F2A1
   → bucket = 0x4F2A1 % 65536 = 19633
   → Search bucket 19633 for exact 4-tuple match

4. Match found:
   → Socket FD 5 owns this connection
   → Connection state: ESTABLISHED ✓

5. Sequence number validation:
   → Segment seq 1000 within receive window? ✓

6. Deliver payload:
   → Add "GET /index.html..." to FD 5's receive buffer
   → Wake application if blocked on recv()

7. Send ACK (piggy-backed or delayed)

TCP Demultiplexing Algorithm

Pseudocode

function tcp_receive(ip_header, tcp_segment):
    
    # Step 1: Build the 4-tuple
    four_tuple = {
        local_ip:    ip_header.dest_ip,
        local_port:  tcp_segment.dest_port,
        remote_ip:   ip_header.src_ip,
        remote_port: tcp_segment.src_port
    }
    
    # Step 2: Look up established connection
    connection = connection_table.lookup(four_tuple)
    
    if connection is not None:
        # Existing connection - process segment
        return process_established_segment(connection, tcp_segment)
    
    # Step 3: Check for listening socket
    listen_socket = find_listening_socket(four_tuple.local_ip, 
                                          four_tuple.local_port)
    
    if listen_socket is not None:
        if tcp_segment.flags.SYN and not tcp_segment.flags.ACK:
            # New connection request
            return handle_syn(listen_socket, four_tuple, tcp_segment)
        else:
            # Non-SYN to listening socket
            send_rst(four_tuple)
            return
    
    # Step 4: No matching socket
    if not tcp_segment.flags.RST:
        send_rst(four_tuple)
    drop_segment()
 
 
function process_established_segment(conn, segment):
    # Validate sequence numbers
    if not is_valid_sequence(conn, segment.seq_num):
        # Out of window - send duplicate ACK
        send_ack(conn)
        return
    
    # Process based on segment type
    if segment.flags.RST:
        close_connection(conn, "RST received")
    elif segment.flags.FIN:
        handle_fin(conn, segment)
    elif segment.flags.ACK:
        handle_ack(conn, segment)
    
    # Deliver payload to receive buffer
    if segment.payload_length > 0:
        conn.recv_buffer.add(segment.payload, segment.seq_num)
        notify_application(conn)
    
    # Send ACK for received data
    schedule_ack(conn)

The Two-Step Lookup

Edge Cases in Connection Identification

Several edge cases and special scenarios complicate connection identification. Understanding these is essential for debugging network issues.

Edge Case 1: TIME_WAIT Connections

A connection in TIME_WAIT still occupies its 4-tuple:

Connection closes:
  (192.168.1.100, 52341, 203.0.113.50, 443) → TIME_WAIT

Another SYN arrives with same 4-tuple:
  Option A: Reject (strict) - "connection exists"
  Option B: Accept (lenient) - check if valid new connection
  
Modern implementations use sequence numbers to distinguish
delayed old segments from legitimate new connections.

Edge Case 2: Simultaneous Open

Both sides send SYN simultaneously (rare):

Client: SYN (seq=100) → ... → SYN (seq=200) :Server

Both enter SYN_RCVD state
Both respond with SYN-ACK
Both enter ESTABLISHED

Same 4-tuple, both initiated - handled correctly by TCP state machine

Connection Identification Edge Cases
Scenario	Problem	Resolution
TIME_WAIT collision	New connection has same 4-tuple as TIME_WAIT	Check sequence numbers; accept if clearly new
Duplicate SYN	Retransmitted SYN during handshake	Re-send SYN-ACK; idempotent response
Old duplicate SYN	Ancient SYN from previous connection	Sequence number check; RST if invalid
NAT port reuse	NAT reassigns same external port	May cause brief TIME_WAIT conflicts
Multi-homed host	Same connection via different interfaces	Include interface in lookup (zone ID)
Load balancer failover	Connection moves between servers	Connection state synchronization needed

Edge Case 3: NAT and Connection Tracking

Network Address Translation creates challenges:

Original connection:
  192.168.1.100:52341 → 8.8.8.8:443

After NAT:
  203.0.113.1:61234 → 8.8.8.8:443

NAT device must:
1. Translate outgoing (192.168.1.100:52341 → 203.0.113.1:61234)
2. Track the mapping for incoming responses
3. Reverse-translate incoming (203.0.113.1:61234 → 192.168.1.100:52341)

The NAT device maintains its own connection table, translating 4-tuples on both directions.

Edge Case 4: Connection Hijacking Attempts

Attackers may try to inject segments into existing connections:

Legitimate: (10.0.0.1, 52341, 203.0.113.50, 443)

Attacker sends: Spoofed src=10.0.0.1:52341, guessed seq number

Protection:
1. 4-tuple must match exactly
2. Sequence number must be in receive window
3. (Optional) TCP MD5 or AO authentication

The combination of 4-tuple matching and sequence number validation provides reasonable protection against blind injection attacks.

Sequence Number Attacks

Practical Connection Identification

Let's examine real-world tools and techniques for viewing and understanding connection identification.

Viewing Connection Tables:

# Linux - modern tool (ss = socket statistics)
ss -tnp
# State Recv-Q Send-Q Local Address:Port Peer Address:Port
# ESTAB 0      0      192.168.1.100:52341 93.184.216.34:443

# Linux - traditional tool (netstat)
netstat -tn
# Proto Local Address         Foreign Address       State
# tcp   192.168.1.100:52341   93.184.216.34:443     ESTABLISHED

# Windows
netstat -n
# Proto Local Address         Foreign Address       State
# TCP   192.168.1.100:52341   93.184.216.34:443     ESTABLISHED

# macOS
netstat -n | grep ESTABLISHED

Reading Connection Information:

Each line shows a unique 4-tuple:

Local Address:Port = (Local IP, Local Port)
Foreign Address:Port = (Remote IP, Remote Port)
State = TCP connection state

Connection Inspection Examples
Commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Count connections per state
ss -tan state established | wc -l
 
# Show connections to a specific remote port
ss -tn 'dport = :443'
 
# Show connections from a specific source
ss -tn 'src 192.168.1.100'
 
# Watch connection table in real-time
watch -n 1 'ss -tn | head -20'
 
# Show detailed connection info (Linux)
ss -tnei
# Includes: timer info, congestion control, RTT, etc.
 
# Show connections with process info
ss -tnp
# Will show: users:(("chrome",pid=1234,fd=56))
 
# Count connections per remote IP
ss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn
 
# Find connections in TIME_WAIT
ss -tan state time-wait

Packet Capture and 4-Tuple:

Using tcpdump or Wireshark, you can observe 4-tuples in action:

# Capture packets showing 4-tuple
tcpdump -n -i eth0 'tcp port 443'

# Output:
# 192.168.1.100.52341 > 93.184.216.34.443: Flags [P.], seq 1:500, ack 1
# 93.184.216.34.443 > 192.168.1.100.52341: Flags [.], ack 500

# Format: src_ip.src_port > dst_ip.dst_port: [flags]

Filtering by Connection:

# Capture specific connection only
tcpdump -n 'host 192.168.1.100 and port 52341 and host 93.184.216.34 and port 443'

# This captures only the specific 4-tuple

Connection Statistics:

# /proc/net/tcp on Linux shows raw connection data
cat /proc/net/tcp
# sl  local_address rem_address   st tx_queue rx_queue ...
#  0: 6400A8C0:CC55 22D8B85D:01BB 01 00000000:00000000 ...
#     (hex IP:port) (hex IP:port) (state)

Debugging Connection Issues

Summary: Connection Identification

We've thoroughly explored TCP connection identification—the mechanism that enables millions of simultaneous connections to the same server port. Let's consolidate the key concepts:

Key Takeaways

•The 4-tuple uniquely identifies TCP connections — (Source IP, Source Port, Dest IP, Dest Port) creates a globally unique identifier
•Same port, multiple connections — Thousands of connections to port 443 are distinguished by different 4-tuples
•TCP differs from UDP identification — TCP uses 4-tuple (connection), UDP uses 2-tuple (port only)
•Connection tables enable fast lookup — Hash tables provide O(1) average lookup for millions of connections
•Each connection has full state — Sequence numbers, buffers, timers, congestion control are per-connection
•TIME_WAIT protects 4-tuple reuse — Prevents confusion from delayed segments
•Listening sockets use wildcards — Accept any source, then create specific connection sockets

What's Next:

Page Complete