Computer NetworksTCP Protocol

TCP State Diagram

LevelIntermediate

Duration90 mins

TopicTCP Protocol

3 / 5

ESTABLISHED: The Stable Data Transfer Phase

The Heart of TCP Communication

After the careful choreography of the three-way handshake, the TCP connection enters its most important and longest-lasting state: ESTABLISHED. This is where TCP delivers on its promise—reliable, ordered, bidirectional byte stream delivery between two endpoints. The transient nature of handshake states gives way to a stable platform for data exchange that may persist for milliseconds or months.

The ESTABLISHED state is where applications actually use TCP. Every HTTP request you make, every database query you send, every real-time message you receive—all travel through connections in the ESTABLISHED state. Understanding this state means understanding how TCP maintains reliability during data transfer, how it detects and handles failures, and how applications interact with established connections.

This page explores the ESTABLISHED state comprehensively—from the conditions that define it, through the mechanics of data transfer it enables, to the keep-alive mechanisms that maintain it, and the socket operations available during this phase.

Learning Objectives

By the end of this page, you will understand: (1) What defines the ESTABLISHED state and how it's entered, (2) Full-duplex data transfer and the send/receive buffers, (3) Sequence numbers and acknowledgments during data transfer, (4) TCP keep-alive mechanisms and their configuration, (5) Detecting and handling connection failures, and (6) Socket operations available in ESTABLISHED state.

Defining the ESTABLISHED State

The ESTABLISHED state represents a fully synchronized, bidirectional communication channel. Both endpoints have completed the handshake and agreed on the parameters that will govern their data exchange.

Entry Conditions

A socket enters ESTABLISHED from different prior states:

From State	Trigger	Context
SYN_SENT	Receive valid SYN+ACK, send ACK	Normal client connection
SYN_RECEIVED	Receive valid ACK	Normal server accept
SYN_RECEIVED	(from SYN_SENT via simultaneous open)	Rare peer-to-peer case

What's Been Negotiated

By the time ESTABLISHED is reached, both sides have agreed on:

Sequence Number Spaces:

Client knows server's Initial Sequence Number (ISN)
Server knows client's ISN
Both know the next expected sequence number from the peer

Communication Parameters:

Maximum Segment Size (MSS) for each direction
Window scaling factor (if negotiated)
SACK support (if both sides agreed)
Timestamp usage (if both sides agreed)

Connection Identity:

Local IP:Port ↔ Remote IP:Port (the 4-tuple)
Uniquely identifies this connection in both kernels

The Established Connection Structure

The kernel maintains substantial state for established connections:

// Simplified view of socket state in ESTABLISHED
struct tcp_sock {
    // Sequence number tracking
    u32 snd_una;        // Oldest unacknowledged sequence number
    u32 snd_nxt;        // Next sequence number to send
    u32 snd_wnd;        // Send window (from receiver's advertisements)
    u32 rcv_nxt;        // Next expected sequence from peer
    u32 rcv_wnd;        // Our receive window
    
    // Buffer management
    struct sk_buff_head write_queue;    // Data waiting to send
    struct sk_buff_head receive_queue;  // Data received, awaiting read
    struct sk_buff_head out_of_order;   // Out-of-order segments
    
    // Timing and RTT
    u32 srtt_us;        // Smoothed RTT estimate
    u32 rttvar_us;      // RTT variance
    u32 rto;            // Retransmission timeout
    
    // Congestion control
    u32 cwnd;           // Congestion window
    u32 ssthresh;       // Slow start threshold
    
    // Keep-alive state
    u8 keepalive_probes; // Number of keepalive probes sent
    unsigned long keepalive_time; // Last data/ACK time
};

Converting Mermaid diagram...

ESTABLISHED Is Symmetric

Once established, both endpoints are peers. There's no longer a distinction between 'client' and 'server' at the TCP level—both can send and receive data freely. The asymmetry of connection establishment gives way to the symmetry of data transfer.

Full-Duplex Data Transfer

TCP in ESTABLISHED state provides full-duplex communication—both endpoints can send data to each other simultaneously, independently. This is not alternating half-duplex but truly concurrent bidirectional data flow.

The Byte Stream Model

TCP presents a byte stream abstraction to applications:

Data written is a continuous stream of bytes, not discrete messages
TCP may segment and reassemble as needed
Receiver sees bytes in the same order they were sent
No inherent message boundaries (application must implement if needed)

Application writes:  [Hello Wo][rld! How are ][you?]
                          │         │           │
TCP segments:        [Hello World! Ho][w are you?]
                          │               │
Receiver sees:       [Hello World! How are you?]
                     (same bytes, same order, different grouping)

Send and Receive Buffers

Each direction of data flow uses a buffer managed by the kernel:

                        ┌─────────────────────────┐
  Application          │       TCP Socket         │          Network
                       │                          │
     write() ─────────▶│   ┌──────────────────┐  │───▶ Segments
                       │   │   Send Buffer    │  │
                       │   │   (SO_SNDBUF)    │  │
                       │   └──────────────────┘  │
                       │                          │
     read() ◀─────────│   ┌──────────────────┐  │◀─── Segments
                       │   │  Receive Buffer  │  │
                       │   │   (SO_RCVBUF)    │  │
                       │   └──────────────────┘  │
                       └─────────────────────────┘

Send Buffer (SO_SNDBUF):

Holds data written by application but not yet acknowledged
write() blocks when buffer is full (unless non-blocking)
Data removed when ACKs arrive
Typical size: 16KB - 4MB

Receive Buffer (SO_RCVBUF):

Holds data received from network but not yet read by application
Determines advertised window size
Data removed when application calls read()
Typical size: 16KB - 4MB

Sequence Numbers and Acknowledgments

During data transfer, sequence numbers track every byte:

Sender's View:

snd_una               snd_nxt                 snd_una + snd_wnd
   │                     │                            │
   ▼                     ▼                            ▼
   ├─────────────────────┼────────────────────────────┤
   │     Unacked data    │    Available window        │
   │   (awaiting ACKs)   │   (can send more)         │
   └─────────────────────┴────────────────────────────┘

Receiver's View:

   rcv_nxt                                   rcv_nxt + rcv_wnd
      │                                              │
      ▼                                              ▼
      ├──────────────────────────────────────────────┤
      │              Receive Window                  │
      │     (will accept these sequence numbers)     │
      └──────────────────────────────────────────────┘

Every data segment contains:

Sequence number of first byte
ACK bit set (acknowledging peer's data)
Acknowledgment number (next expected from peer)
Window size (receiver's available buffer space)

Example Exchange:

Time  │  Direction  │  Seq       │  Ack     │  Data
──────┼─────────────┼────────────┼──────────┼────────────
  1   │  A → B      │  1000      │  5000    │  100 bytes
  2   │  B → A      │  5000      │  1100    │  200 bytes
  3   │  A → B      │  1100      │  5200    │  150 bytes
  4   │  B → A      │  5200      │  1250    │  ACK only

full_duplex_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
"""
Demonstrating full-duplex TCP communication
Both threads send and receive simultaneously
"""
import socket
import threading
import time
 
def client_handler(conn, name):
    """Handle one direction of communication"""
    def sender():
        for i in range(5):
            msg = f"Message {i} from {name}"
            conn.send(msg.encode())
            print(f"[{name}] Sent: {msg}")
            time.sleep(0.5)
    
    def receiver():
        conn.settimeout(5)
        try:
            while True:
                data = conn.recv(1024)
                if not data:
                    break
                print(f"[{name}] Received: {data.decode()}")
        except socket.timeout:
            pass
    
    # Run sender and receiver concurrently
    t1 = threading.Thread(target=sender)
    t2 = threading.Thread(target=receiver)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
 
def run_server():
    server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server.bind(('127.0.0.1', 9999))
    server.listen(1)
    print("[Server] Listening...")
    
    conn, addr = server.accept()
    print(f"[Server] Connection from {addr}")
    
    # Server sends and receives simultaneously
    client_handler(conn, "Server")
    conn.close()
    server.close()
 
def run_client():
    time.sleep(0.5)  # Let server start
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client.connect(('127.0.0.1', 9999))
    print("[Client] Connected")
    
    # Client sends and receives simultaneously
    client_handler(client, "Client")
    client.close()
 
if __name__ == "__main__":
    server_thread = threading.Thread(target=run_server)
    client_thread = threading.Thread(target=run_client)
    
    server_thread.start()
    client_thread.start()
    
    server_thread.join()
    client_thread.join()
    
    print("
Full-duplex communication complete!")

Piggybacking ACKs

TCP often 'piggybacks' acknowledgments on data segments. Instead of sending a separate ACK, it includes the acknowledgment in the next outgoing data segment. This reduces overhead and is one reason why full-duplex communication is efficient. Empty ACKs are only sent when there's no data to piggyback on (using delayed ACKs).

Reliability During Data Transfer

The ESTABLISHED state is where TCP's reliability guarantees are continuously enforced. Every segment must be acknowledged, lost segments must be retransmitted, and order must be preserved.

The Reliability Guarantee

TCP guarantees that data written to a socket will be:

Delivered to the remote application (or an error returned)
Delivered in order (bytes arrive in send order)
Delivered without corruption (checksum verification)
Delivered exactly once (duplicate detection)

Acknowledgment Mechanisms

Cumulative ACKs:

ACK number acknowledges all bytes up to that point
If bytes 1-1000 and 2001-3000 received, ACK = 1001 (gap present)
Simple but inefficient when segments are lost out of order

Selective ACKs (SACK):

Additional option field lists received ranges
Sender knows exactly which segments are missing
Enables targeted retransmission

Standard ACK:  "I've received up to byte 1000"
With SACK:    "I've received up to byte 1000, 
               plus bytes 2001-3000 and 4001-5000"
               
Sender knows: Need to retransmit bytes 1001-2000 and 3001-4000

Retransmission Triggers

Retransmission Triggering Events
Trigger	Mechanism	Speed	Use Case
Timeout (RTO)	Timer expires, no ACK received	Slow (RTT + variance)	Fallback for any loss
Triple Duplicate ACK	3 identical ACKs received	Fast (immediate)	Packet loss detection
SACK-based	Selective ACK reveals gap	Fast (targeted)	Efficient recovery
Tail Loss Probe	Proactive probe after delay	Medium	Detecting final segment loss

The Retransmission Timer (RTO)

TCP maintains a retransmission timer that adapts to network conditions:

RTO = SRTT + 4 × RTTVAR

Where:
- SRTT = Smoothed Round-Trip Time (exponentially weighted average)
- RTTVAR = RTT Variance estimate

On timeout:
- RTO doubles (exponential backoff)
- Maximum typically 60-120 seconds
- Minimum typically 200ms-1s

Calculating RTT:

// Jacobson's algorithm (simplified)
void update_rtt(long measured_rtt) {
    // First measurement
    if (srtt == 0) {
        srtt = measured_rtt;
        rttvar = measured_rtt / 2;
        rto = srtt + 4 * rttvar;
        return;
    }
    
    // Subsequent measurements
    long err = measured_rtt - srtt;
    
    // SRTT = SRTT + (1/8) * (RTT - SRTT)
    srtt = srtt + (err >> 3);
    
    // RTTVAR = RTTVAR + (1/4) * (|RTT - SRTT| - RTTVAR)
    rttvar = rttvar + ((abs(err) - rttvar) >> 2);
    
    // RTO = SRTT + 4 * RTTVAR (with minimum)
    rto = max(MIN_RTO, srtt + 4 * rttvar);
}

Out-of-Order Handling

When segments arrive out of order, TCP must handle them carefully:

Buffer out-of-order segments in a separate queue
Send immediate duplicate ACK (signals gap to sender)
Reassemble when gap is filled (move to receive buffer)
Deliver to application in order (never deliver gaps)

Segments arrive:  1000  3000  2000  4000
                    ↓     ↓     ↓     ↓
                    
Receive buffer:  [1000]           (deliver 1000)
OOO buffer:            [3000]     (buffer 3000)
ACK sent:        1001             (expecting 1001, not 3001)

After 2000:      [1000][2000][3000][4000] (reassembled)
ACK sent:        5000             (all received)

Application Sees Reliable Stream

All this complexity—retransmissions, buffering, reordering—is invisible to the application. The application's read() call returns data in order, reliably. If TCP cannot deliver data (connection broken, remote crashed), it reports an error. The complexity is TCP's burden, not the application's.

TCP Keep-Alive Mechanism

An established TCP connection can exist indefinitely without any data transfer. The connection is purely a state maintained by both kernels—there's no "active" component that requires constant communication. But this creates a problem: how do you detect a dead peer?

The Dead Peer Problem

Consider these scenarios:

Remote host crashes (power failure, kernel panic)
- No FIN or RST sent (crash was instantaneous)
- Local side doesn't know connection is dead
- Data written will never be acknowledged
Network path fails
- Cables cut, routers fail, firewall state expires
- Packets can't reach destination
- Neither side knows until data transfer fails
Remote application hangs
- Still technically "alive" but unresponsive
- May recover later
- How long should we wait?

Without keep-alive, a silent connection could remain "ESTABLISHED" in the kernel forever, consuming resources.

How Keep-Alive Works

TCP keep-alive sends probe packets when the connection is idle:

┌─────────────────────────────────────────────────────────────────┐
│                        Keep-Alive Timeline                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Last data ──────────── tcp_keepalive_time ─────────────▶ Probe │
│     (2 hours default)                                           │
│                                                                 │
│  Probe ──────────────── tcp_keepalive_intvl ─────────────▶ Probe │
│     (75 seconds default)                                        │
│                                                                 │
│  After tcp_keepalive_probes failures (default 9): Connection dead │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Keep-alive probe packet:

Empty segment (no data)
Sequence number = snd_nxt - 1 (deliberately old)
Peer must respond with ACK (acknowledging its rcv_nxt)
Response proves peer is alive and reachable

Keep-alive response interpretation:

Response	Meaning	Action
ACK received	Peer is alive	Reset timer
RST received	Peer rebooted (lost connection state)	Close connection
No response	Peer dead or unreachable	Continue probing
ICMP error	Network path broken	Close connection

configure_keepalive.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
"""
Configuring TCP keep-alive for faster dead peer detection
 
Default Linux values:
- tcp_keepalive_time:  7200 seconds (2 hours!)
- tcp_keepalive_intvl: 75 seconds
- tcp_keepalive_probes: 9
 
Time to detect dead peer with defaults:
  7200 + (75 × 9) = 7875 seconds = 2+ hours
 
With aggressive settings below:
  60 + (10 × 6) = 120 seconds = 2 minutes
"""
 
import socket
 
def create_socket_with_keepalive():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
    # Enable keep-alive
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
    
    # Platform-specific options (Linux)
    # TCP_KEEPIDLE: seconds before first probe
    # TCP_KEEPINTVL: seconds between probes  
    # TCP_KEEPCNT: number of failed probes before giving up
    
    import platform
    if platform.system() == 'Linux':
        # Wait 60 seconds before first probe
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
        
        # Send probes every 10 seconds
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10)
        
        # Give up after 6 failed probes
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 6)
    
    elif platform.system() == 'Darwin':  # macOS
        # macOS uses TCP_KEEPALIVE for idle time
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPALIVE, 60)
    
    elif platform.system() == 'Windows':
        # Windows uses SIO_KEEPALIVE_VALS ioctl
        # onoff, keepalivetime (ms), keepaliveinterval (ms)
        sock.ioctl(socket.SIO_KEEPALIVE_VALS, (1, 60000, 10000))
    
    return sock
 
# Example usage
if __name__ == "__main__":
    sock = create_socket_with_keepalive()
    sock.connect(('example.com', 80))
    
    # Connection now monitored with 2-minute dead peer detection
    print("Connected with keep-alive enabled")
    
    # ... use connection ...
    sock.close()

When to Use Keep-Alive

Good Use Cases:

Long-lived connections (database pools, message queues)
Connections through stateful firewalls (prevents state expiration)
Detecting crashed servers in distributed systems
IoT devices that may lose connectivity silently

Caution Needed:

Mobile networks (aggressive keep-alive drains battery, uses data quota)
Connections expected to be idle for legitimate reasons
High connection counts (keep-alive adds overhead)

The Controversy Around Keep-Alive

TCP keep-alive is somewhat controversial in the networking community:

Arguments Against:

Violates end-to-end principle (network shouldn't care about idle connections)
Default 2-hour timeout is too long to be useful
Application-level heartbeats are more flexible
Adds network overhead for inactive connections

Arguments For:

Catches scenarios application heartbeats can't (kernel crash, network failure)
Simpler than implementing per-application heartbeats
Prevents resource leaks from zombie connections
Works even when application is blocked

Application-Level Heartbeats

Many protocols implement their own heartbeat mechanisms (WebSocket pings, AMQP heartbeats, gRPC keepalives). These operate at the application layer and can convey more information (e.g., 'I'm alive but overloaded'). Consider TCP keep-alive as a fallback that catches edge cases, not a replacement for application-level health monitoring.

Socket Operations in ESTABLISHED State

The ESTABLISHED state is where the primary socket I/O operations are used. Understanding these operations and their behaviors is essential for network programming.

Reading Data: read() / recv()

ssize_t recv(int sockfd, void *buf, size_t len, int flags);

Behavior:

Blocks until at least 1 byte is available (unless non-blocking)
Returns number of bytes read (may be less than requested)
Returns 0 on graceful close (FIN received)
Returns -1 on error (check errno)

Important Flags:

Flag	Effect
MSG_PEEK	Read data without removing from buffer
MSG_WAITALL	Block until full len bytes received
MSG_DONTWAIT	Non-blocking for this call only
MSG_OOB	Receive out-of-band (urgent) data

Edge Cases:

int n = recv(sockfd, buf, 1024, 0);

if (n > 0) {
    // Data received (process n bytes)
} else if (n == 0) {
    // Connection closed by peer (FIN received)
    // Socket transitioning to CLOSE_WAIT
} else { // n < 0
    if (errno == EAGAIN || errno == EWOULDBLOCK) {
        // Non-blocking: no data available yet
    } else if (errno == ECONNRESET) {
        // Connection reset by peer (RST received)
    } else {
        // Other error
        perror("recv");
    }
}

Writing Data: write() / send()

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

Behavior:

Copies data to kernel send buffer
Blocks if send buffer is full (unless non-blocking)
Returns number of bytes copied (may be less than len)
Actual transmission happens asynchronously
Success does NOT mean data was received by peer!

Important Flags:

Flag	Effect
MSG_NOSIGNAL	Don't send SIGPIPE on broken connection
MSG_DONTWAIT	Non-blocking for this call only
MSG_MORE	More data coming (delay transmission)
MSG_OOB	Send as out-of-band (urgent) data

socket_io_patterns.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
 * Common socket I/O patterns for ESTABLISHED connections
 */
#include <sys/socket.h>
#include <errno.h>
#include <string.h>
 
/**
 * Pattern 1: Read exactly N bytes
 * (Handles partial reads)
 */
int read_exactly(int sockfd, void *buf, size_t n) {
    size_t total = 0;
    char *ptr = buf;
    
    while (total < n) {
        ssize_t bytes = recv(sockfd, ptr + total, n - total, 0);
        
        if (bytes < 0) {
            if (errno == EINTR)
                continue;  // Interrupted, retry
            return -1;     // Error
        } else if (bytes == 0) {
            return 0;      // Connection closed
        }
        
        total += bytes;
    }
    
    return total;  // Success: read exactly n bytes
}
 
/**
 * Pattern 2: Write all data
 * (Handles partial writes)
 */
int write_all(int sockfd, const void *buf, size_t n) {
    size_t total = 0;
    const char *ptr = buf;
    
    while (total < n) {
        ssize_t bytes = send(sockfd, ptr + total, n - total, MSG_NOSIGNAL);
        
        if (bytes < 0) {
            if (errno == EINTR)
                continue;  // Interrupted, retry
            return -1;     // Error (EPIPE = broken connection)
        }
        
        total += bytes;
    }
    
    return total;  // Success: wrote all n bytes
}
 
/**
 * Pattern 3: Non-blocking check for data
 */
int data_available(int sockfd) {
    char buf;
    ssize_t n = recv(sockfd, &buf, 1, MSG_PEEK | MSG_DONTWAIT);
    
    if (n > 0)
        return 1;           // Data available
    else if (n == 0)
        return -1;          // Connection closed
    else if (errno == EAGAIN || errno == EWOULDBLOCK)
        return 0;           // No data yet
    else
        return -1;          // Error
}

Shutdown Operations

While the connection is ESTABLISHED, you can partially close it:

int shutdown(int sockfd, int how);

how	Effect	Use Case
SHUT_RD	Stop reading (future reads return 0)	Rare
SHUT_WR	Send FIN, signal no more writes	Clean close
SHUT_RDWR	Both (like close but keeps fd)	Rare

shutdown(SHUT_WR) is the clean way to signal "I'm done sending":

Sends FIN to peer
Peer's read() returns 0 (EOF)
Peer knows no more data coming
Connection transitions to FIN_WAIT states
You can still receive data from peer

Socket Status and Options

// Check socket error status
int error;
socklen_t len = sizeof(error);
getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len);
// error non-zero means connection broken

// Get buffer sizes
int sndbuf, rcvbuf;
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &sndbuf, &len);
getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, &len);

// Get connection info
struct sockaddr_in peer;
getpeername(sockfd, (struct sockaddr*)&peer, &len);
// peer now contains remote IP and port

send() Success ≠ Delivery

A successful send() only means the kernel accepted the data into its buffer. The data may be lost to network failure, the peer may crash before reading it, or the connection may reset. If you need confirmation of receipt, you must implement application-level acknowledgments. TCP's ACKs only guarantee the kernel received the data, not that the application processed it.

Detecting Connection Failures

Even in ESTABLISHED state, connections can fail. Applications must detect and handle these failures appropriately.

How Failures Manifest

1. Graceful Close (recv returns 0)

The peer called close() or shutdown()—a normal, expected closure.

int n = recv(sockfd, buf, sizeof(buf), 0);
if (n == 0) {
    // Peer closed connection gracefully
    // This is normal, not an error
    close(sockfd);
}

2. Connection Reset (ECONNRESET)

The peer sent RST—connection is forcibly terminated.

int n = recv(sockfd, buf, sizeof(buf), 0);
if (n < 0 && errno == ECONNRESET) {
    // Peer reset connection (RST received)
    // Possible causes:
    //   - Peer crashed and rebooted
    //   - Peer process died abnormally
    //   - Firewall injection attack
    close(sockfd);
}

3. Broken Pipe (EPIPE / SIGPIPE)

Write to a connection that peer has closed.

// Peer has closed their side (FIN received)
// We try to send data
int n = send(sockfd, data, len, MSG_NOSIGNAL);
if (n < 0 && errno == EPIPE) {
    // Cannot write: connection broken
    close(sockfd);
}

// Without MSG_NOSIGNAL, SIGPIPE is raised instead!
// Default SIGPIPE action: terminate process

4. Timeout (ETIMEDOUT)

Retransmission limit exceeded—no response from peer.

int n = recv(sockfd, buf, sizeof(buf), 0);
if (n < 0 && errno == ETIMEDOUT) {
    // Network path is dead (no response to probes)
    // Takes a long time to detect with defaults
    close(sockfd);
}

Connection Failure Scenarios
Scenario	recv() behavior	send() behavior	Detection speed
Peer graceful close	Returns 0	EPIPE after FIN_WAIT	Immediate
Peer crash (RST)	ECONNRESET	ECONNRESET	Immediate
Network failure	ETIMEDOUT (eventually)	Blocks then ETIMEDOUT	Minutes
Peer power loss	Blocks forever (no keepalive)	Blocks then ETIMEDOUT	Minutes to hours
Firewall kills state	Hangs or RST (varies)	RST on next write	On next I/O

Proactive Failure Detection

Don't wait for failures to manifest—detect them early:

1. Enable TCP Keep-Alive (as discussed earlier)

2. Implement Application Heartbeats

import threading
import time

def heartbeat_monitor(sock, interval=30, timeout=90):
    """Send periodic heartbeats, detect missing responses"""
    last_response = time.time()
    
    while True:
        # Send heartbeat
        try:
            sock.send(b"PING")
        except:
            raise ConnectionError("Failed to send heartbeat")
        
        # Check for response (with timeout)
        sock.settimeout(timeout)
        try:
            response = sock.recv(4)
            if response == b"PONG":
                last_response = time.time()
        except socket.timeout:
            if time.time() - last_response > timeout:
                raise ConnectionError("Heartbeat timeout")
        
        time.sleep(interval)

3. Use Proper Timeouts

// Set socket receive timeout
struct timeval tv;
tv.tv_sec = 30;   // 30 seconds
tv.tv_usec = 0;
setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));

// recv() will now return EAGAIN/EWOULDBLOCK after 30 seconds

Defensive Programming

Always assume connections can fail at any time. Use non-blocking I/O or timeouts to prevent indefinite blocking. Handle all error cases explicitly. Use MSG_NOSIGNAL to prevent SIGPIPE crashes. Log connection failures for debugging. Implement reconnection logic where appropriate.

Summary: The ESTABLISHED State

We've explored the ESTABLISHED state comprehensively—the stable, productive phase of TCP connections where reliable data transfer occurs.

Key Takeaways

Core Concepts

•ESTABLISHED is the stable data transfer state — Both endpoints have synchronized and can exchange data bidirectionally.
•Full-duplex communication — Both sides can send simultaneously; data flows independently in each direction.
•Byte stream abstraction — TCP delivers ordered bytes, not messages; applications must handle framing.
•Continuous reliability enforcement — Sequence numbers, ACKs, retransmissions maintain delivery guarantees.
•Keep-alive detects dead peers — Without it, idle connections may not notice peer failure for long periods.
•Socket operations have nuances — Partial reads/writes, graceful vs. reset closure, blocking vs. non-blocking all matter.
•Failure detection is critical — Applications must handle ECONNRESET, EPIPE, ETIMEDOUT, and graceful close appropriately.

What's Next

All good things must end. When one or both sides want to terminate the connection, TCP enters its closing states: FIN_WAIT and TIME_WAIT. These states handle the orderly teardown of the connection, ensuring all data is delivered and no orphaned packets can interfere with future connections. In the next page, we'll explore the connection termination states and the critical role of TIME_WAIT.

Page Complete

You now understand the TCP ESTABLISHED state in depth—from entry conditions and data transfer mechanics, through reliability enforcement and keep-alive, to socket operations and failure detection. This knowledge is essential for writing robust networked applications that handle the realities of unreliable networks and uncooperative peers.

3 / 5

Loading learning content...

Computer NetworksTCP Protocol

TCP State Diagram

LevelIntermediate

Duration90 mins

TopicTCP Protocol

3 / 5

ESTABLISHED: The Stable Data Transfer Phase

The Heart of TCP Communication

Learning Objectives

Defining the ESTABLISHED State

Entry Conditions

A socket enters ESTABLISHED from different prior states:

From State	Trigger	Context
SYN_SENT	Receive valid SYN+ACK, send ACK	Normal client connection
SYN_RECEIVED	Receive valid ACK	Normal server accept
SYN_RECEIVED	(from SYN_SENT via simultaneous open)	Rare peer-to-peer case

What's Been Negotiated

By the time ESTABLISHED is reached, both sides have agreed on:

Sequence Number Spaces:

Client knows server's Initial Sequence Number (ISN)
Server knows client's ISN
Both know the next expected sequence number from the peer

Communication Parameters:

Maximum Segment Size (MSS) for each direction
Window scaling factor (if negotiated)
SACK support (if both sides agreed)
Timestamp usage (if both sides agreed)

Connection Identity:

Local IP:Port ↔ Remote IP:Port (the 4-tuple)
Uniquely identifies this connection in both kernels

The Established Connection Structure

The kernel maintains substantial state for established connections:

// Simplified view of socket state in ESTABLISHED
struct tcp_sock {
    // Sequence number tracking
    u32 snd_una;        // Oldest unacknowledged sequence number
    u32 snd_nxt;        // Next sequence number to send
    u32 snd_wnd;        // Send window (from receiver's advertisements)
    u32 rcv_nxt;        // Next expected sequence from peer
    u32 rcv_wnd;        // Our receive window
    
    // Buffer management
    struct sk_buff_head write_queue;    // Data waiting to send
    struct sk_buff_head receive_queue;  // Data received, awaiting read
    struct sk_buff_head out_of_order;   // Out-of-order segments
    
    // Timing and RTT
    u32 srtt_us;        // Smoothed RTT estimate
    u32 rttvar_us;      // RTT variance
    u32 rto;            // Retransmission timeout
    
    // Congestion control
    u32 cwnd;           // Congestion window
    u32 ssthresh;       // Slow start threshold
    
    // Keep-alive state
    u8 keepalive_probes; // Number of keepalive probes sent
    unsigned long keepalive_time; // Last data/ACK time
};

Converting Mermaid diagram...

ESTABLISHED Is Symmetric

Full-Duplex Data Transfer

The Byte Stream Model

TCP presents a byte stream abstraction to applications:

Data written is a continuous stream of bytes, not discrete messages
TCP may segment and reassemble as needed
Receiver sees bytes in the same order they were sent
No inherent message boundaries (application must implement if needed)

Application writes:  [Hello Wo][rld! How are ][you?]
                          │         │           │
TCP segments:        [Hello World! Ho][w are you?]
                          │               │
Receiver sees:       [Hello World! How are you?]
                     (same bytes, same order, different grouping)

Send and Receive Buffers

Each direction of data flow uses a buffer managed by the kernel:

                        ┌─────────────────────────┐
  Application          │       TCP Socket         │          Network
                       │                          │
     write() ─────────▶│   ┌──────────────────┐  │───▶ Segments
                       │   │   Send Buffer    │  │
                       │   │   (SO_SNDBUF)    │  │
                       │   └──────────────────┘  │
                       │                          │
     read() ◀─────────│   ┌──────────────────┐  │◀─── Segments
                       │   │  Receive Buffer  │  │
                       │   │   (SO_RCVBUF)    │  │
                       │   └──────────────────┘  │
                       └─────────────────────────┘

Send Buffer (SO_SNDBUF):

Holds data written by application but not yet acknowledged
write() blocks when buffer is full (unless non-blocking)
Data removed when ACKs arrive
Typical size: 16KB - 4MB

Receive Buffer (SO_RCVBUF):

Holds data received from network but not yet read by application
Determines advertised window size
Data removed when application calls read()
Typical size: 16KB - 4MB

Sequence Numbers and Acknowledgments

During data transfer, sequence numbers track every byte:

Sender's View:

snd_una               snd_nxt                 snd_una + snd_wnd
   │                     │                            │
   ▼                     ▼                            ▼
   ├─────────────────────┼────────────────────────────┤
   │     Unacked data    │    Available window        │
   │   (awaiting ACKs)   │   (can send more)         │
   └─────────────────────┴────────────────────────────┘

Receiver's View:

   rcv_nxt                                   rcv_nxt + rcv_wnd
      │                                              │
      ▼                                              ▼
      ├──────────────────────────────────────────────┤
      │              Receive Window                  │
      │     (will accept these sequence numbers)     │
      └──────────────────────────────────────────────┘

Every data segment contains:

Sequence number of first byte
ACK bit set (acknowledging peer's data)
Acknowledgment number (next expected from peer)
Window size (receiver's available buffer space)

Example Exchange:

Time  │  Direction  │  Seq       │  Ack     │  Data
──────┼─────────────┼────────────┼──────────┼────────────
  1   │  A → B      │  1000      │  5000    │  100 bytes
  2   │  B → A      │  5000      │  1100    │  200 bytes
  3   │  A → B      │  1100      │  5200    │  150 bytes
  4   │  B → A      │  5200      │  1250    │  ACK only

full_duplex_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
"""
Demonstrating full-duplex TCP communication
Both threads send and receive simultaneously
"""
import socket
import threading
import time
 
def client_handler(conn, name):
    """Handle one direction of communication"""
    def sender():
        for i in range(5):
            msg = f"Message {i} from {name}"
            conn.send(msg.encode())
            print(f"[{name}] Sent: {msg}")
            time.sleep(0.5)
    
    def receiver():
        conn.settimeout(5)
        try:
            while True:
                data = conn.recv(1024)
                if not data:
                    break
                print(f"[{name}] Received: {data.decode()}")
        except socket.timeout:
            pass
    
    # Run sender and receiver concurrently
    t1 = threading.Thread(target=sender)
    t2 = threading.Thread(target=receiver)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
 
def run_server():
    server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    server.bind(('127.0.0.1', 9999))
    server.listen(1)
    print("[Server] Listening...")
    
    conn, addr = server.accept()
    print(f"[Server] Connection from {addr}")
    
    # Server sends and receives simultaneously
    client_handler(conn, "Server")
    conn.close()
    server.close()
 
def run_client():
    time.sleep(0.5)  # Let server start
    client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    client.connect(('127.0.0.1', 9999))
    print("[Client] Connected")
    
    # Client sends and receives simultaneously
    client_handler(client, "Client")
    client.close()
 
if __name__ == "__main__":
    server_thread = threading.Thread(target=run_server)
    client_thread = threading.Thread(target=run_client)
    
    server_thread.start()
    client_thread.start()
    
    server_thread.join()
    client_thread.join()
    
    print("
Full-duplex communication complete!")

Piggybacking ACKs

Reliability During Data Transfer

The ESTABLISHED state is where TCP's reliability guarantees are continuously enforced. Every segment must be acknowledged, lost segments must be retransmitted, and order must be preserved.

The Reliability Guarantee

TCP guarantees that data written to a socket will be:

Delivered to the remote application (or an error returned)
Delivered in order (bytes arrive in send order)
Delivered without corruption (checksum verification)
Delivered exactly once (duplicate detection)

Acknowledgment Mechanisms

Cumulative ACKs:

ACK number acknowledges all bytes up to that point
If bytes 1-1000 and 2001-3000 received, ACK = 1001 (gap present)
Simple but inefficient when segments are lost out of order

Selective ACKs (SACK):

Additional option field lists received ranges
Sender knows exactly which segments are missing
Enables targeted retransmission

Standard ACK:  "I've received up to byte 1000"
With SACK:    "I've received up to byte 1000, 
               plus bytes 2001-3000 and 4001-5000"
               
Sender knows: Need to retransmit bytes 1001-2000 and 3001-4000

Retransmission Triggers

Retransmission Triggering Events
Trigger	Mechanism	Speed	Use Case
Timeout (RTO)	Timer expires, no ACK received	Slow (RTT + variance)	Fallback for any loss
Triple Duplicate ACK	3 identical ACKs received	Fast (immediate)	Packet loss detection
SACK-based	Selective ACK reveals gap	Fast (targeted)	Efficient recovery
Tail Loss Probe	Proactive probe after delay	Medium	Detecting final segment loss

The Retransmission Timer (RTO)

TCP maintains a retransmission timer that adapts to network conditions:

RTO = SRTT + 4 × RTTVAR

Where:
- SRTT = Smoothed Round-Trip Time (exponentially weighted average)
- RTTVAR = RTT Variance estimate

On timeout:
- RTO doubles (exponential backoff)
- Maximum typically 60-120 seconds
- Minimum typically 200ms-1s

Calculating RTT:

// Jacobson's algorithm (simplified)
void update_rtt(long measured_rtt) {
    // First measurement
    if (srtt == 0) {
        srtt = measured_rtt;
        rttvar = measured_rtt / 2;
        rto = srtt + 4 * rttvar;
        return;
    }
    
    // Subsequent measurements
    long err = measured_rtt - srtt;
    
    // SRTT = SRTT + (1/8) * (RTT - SRTT)
    srtt = srtt + (err >> 3);
    
    // RTTVAR = RTTVAR + (1/4) * (|RTT - SRTT| - RTTVAR)
    rttvar = rttvar + ((abs(err) - rttvar) >> 2);
    
    // RTO = SRTT + 4 * RTTVAR (with minimum)
    rto = max(MIN_RTO, srtt + 4 * rttvar);
}

Out-of-Order Handling

When segments arrive out of order, TCP must handle them carefully:

Buffer out-of-order segments in a separate queue
Send immediate duplicate ACK (signals gap to sender)
Reassemble when gap is filled (move to receive buffer)
Deliver to application in order (never deliver gaps)

Segments arrive:  1000  3000  2000  4000
                    ↓     ↓     ↓     ↓
                    
Receive buffer:  [1000]           (deliver 1000)
OOO buffer:            [3000]     (buffer 3000)
ACK sent:        1001             (expecting 1001, not 3001)

After 2000:      [1000][2000][3000][4000] (reassembled)
ACK sent:        5000             (all received)

Application Sees Reliable Stream

TCP Keep-Alive Mechanism

The Dead Peer Problem

Consider these scenarios:

Remote host crashes (power failure, kernel panic)
- No FIN or RST sent (crash was instantaneous)
- Local side doesn't know connection is dead
- Data written will never be acknowledged
Network path fails
- Cables cut, routers fail, firewall state expires
- Packets can't reach destination
- Neither side knows until data transfer fails
Remote application hangs
- Still technically "alive" but unresponsive
- May recover later
- How long should we wait?

Without keep-alive, a silent connection could remain "ESTABLISHED" in the kernel forever, consuming resources.

How Keep-Alive Works

TCP keep-alive sends probe packets when the connection is idle:

┌─────────────────────────────────────────────────────────────────┐
│                        Keep-Alive Timeline                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Last data ──────────── tcp_keepalive_time ─────────────▶ Probe │
│     (2 hours default)                                           │
│                                                                 │
│  Probe ──────────────── tcp_keepalive_intvl ─────────────▶ Probe │
│     (75 seconds default)                                        │
│                                                                 │
│  After tcp_keepalive_probes failures (default 9): Connection dead │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Keep-alive probe packet:

Empty segment (no data)
Sequence number = snd_nxt - 1 (deliberately old)
Peer must respond with ACK (acknowledging its rcv_nxt)
Response proves peer is alive and reachable

Keep-alive response interpretation:

Response	Meaning	Action
ACK received	Peer is alive	Reset timer
RST received	Peer rebooted (lost connection state)	Close connection
No response	Peer dead or unreachable	Continue probing
ICMP error	Network path broken	Close connection

configure_keepalive.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
"""
Configuring TCP keep-alive for faster dead peer detection
 
Default Linux values:
- tcp_keepalive_time:  7200 seconds (2 hours!)
- tcp_keepalive_intvl: 75 seconds
- tcp_keepalive_probes: 9
 
Time to detect dead peer with defaults:
  7200 + (75 × 9) = 7875 seconds = 2+ hours
 
With aggressive settings below:
  60 + (10 × 6) = 120 seconds = 2 minutes
"""
 
import socket
 
def create_socket_with_keepalive():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
    # Enable keep-alive
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
    
    # Platform-specific options (Linux)
    # TCP_KEEPIDLE: seconds before first probe
    # TCP_KEEPINTVL: seconds between probes  
    # TCP_KEEPCNT: number of failed probes before giving up
    
    import platform
    if platform.system() == 'Linux':
        # Wait 60 seconds before first probe
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
        
        # Send probes every 10 seconds
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10)
        
        # Give up after 6 failed probes
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 6)
    
    elif platform.system() == 'Darwin':  # macOS
        # macOS uses TCP_KEEPALIVE for idle time
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPALIVE, 60)
    
    elif platform.system() == 'Windows':
        # Windows uses SIO_KEEPALIVE_VALS ioctl
        # onoff, keepalivetime (ms), keepaliveinterval (ms)
        sock.ioctl(socket.SIO_KEEPALIVE_VALS, (1, 60000, 10000))
    
    return sock
 
# Example usage
if __name__ == "__main__":
    sock = create_socket_with_keepalive()
    sock.connect(('example.com', 80))
    
    # Connection now monitored with 2-minute dead peer detection
    print("Connected with keep-alive enabled")
    
    # ... use connection ...
    sock.close()

When to Use Keep-Alive

Good Use Cases:

Long-lived connections (database pools, message queues)
Connections through stateful firewalls (prevents state expiration)
Detecting crashed servers in distributed systems
IoT devices that may lose connectivity silently

Caution Needed:

Mobile networks (aggressive keep-alive drains battery, uses data quota)
Connections expected to be idle for legitimate reasons
High connection counts (keep-alive adds overhead)

The Controversy Around Keep-Alive

TCP keep-alive is somewhat controversial in the networking community:

Arguments Against:

Violates end-to-end principle (network shouldn't care about idle connections)
Default 2-hour timeout is too long to be useful
Application-level heartbeats are more flexible
Adds network overhead for inactive connections

Arguments For:

Catches scenarios application heartbeats can't (kernel crash, network failure)
Simpler than implementing per-application heartbeats
Prevents resource leaks from zombie connections
Works even when application is blocked

Application-Level Heartbeats

Socket Operations in ESTABLISHED State

The ESTABLISHED state is where the primary socket I/O operations are used. Understanding these operations and their behaviors is essential for network programming.

Reading Data: read() / recv()

ssize_t recv(int sockfd, void *buf, size_t len, int flags);

Behavior:

Blocks until at least 1 byte is available (unless non-blocking)
Returns number of bytes read (may be less than requested)
Returns 0 on graceful close (FIN received)
Returns -1 on error (check errno)

Important Flags:

Flag	Effect
MSG_PEEK	Read data without removing from buffer
MSG_WAITALL	Block until full len bytes received
MSG_DONTWAIT	Non-blocking for this call only
MSG_OOB	Receive out-of-band (urgent) data

Edge Cases:

int n = recv(sockfd, buf, 1024, 0);

if (n > 0) {
    // Data received (process n bytes)
} else if (n == 0) {
    // Connection closed by peer (FIN received)
    // Socket transitioning to CLOSE_WAIT
} else { // n < 0
    if (errno == EAGAIN || errno == EWOULDBLOCK) {
        // Non-blocking: no data available yet
    } else if (errno == ECONNRESET) {
        // Connection reset by peer (RST received)
    } else {
        // Other error
        perror("recv");
    }
}

Writing Data: write() / send()

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

Behavior:

Copies data to kernel send buffer
Blocks if send buffer is full (unless non-blocking)
Returns number of bytes copied (may be less than len)
Actual transmission happens asynchronously
Success does NOT mean data was received by peer!

Important Flags:

Flag	Effect
MSG_NOSIGNAL	Don't send SIGPIPE on broken connection
MSG_DONTWAIT	Non-blocking for this call only
MSG_MORE	More data coming (delay transmission)
MSG_OOB	Send as out-of-band (urgent) data

socket_io_patterns.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
 * Common socket I/O patterns for ESTABLISHED connections
 */
#include <sys/socket.h>
#include <errno.h>
#include <string.h>
 
/**
 * Pattern 1: Read exactly N bytes
 * (Handles partial reads)
 */
int read_exactly(int sockfd, void *buf, size_t n) {
    size_t total = 0;
    char *ptr = buf;
    
    while (total < n) {
        ssize_t bytes = recv(sockfd, ptr + total, n - total, 0);
        
        if (bytes < 0) {
            if (errno == EINTR)
                continue;  // Interrupted, retry
            return -1;     // Error
        } else if (bytes == 0) {
            return 0;      // Connection closed
        }
        
        total += bytes;
    }
    
    return total;  // Success: read exactly n bytes
}
 
/**
 * Pattern 2: Write all data
 * (Handles partial writes)
 */
int write_all(int sockfd, const void *buf, size_t n) {
    size_t total = 0;
    const char *ptr = buf;
    
    while (total < n) {
        ssize_t bytes = send(sockfd, ptr + total, n - total, MSG_NOSIGNAL);
        
        if (bytes < 0) {
            if (errno == EINTR)
                continue;  // Interrupted, retry
            return -1;     // Error (EPIPE = broken connection)
        }
        
        total += bytes;
    }
    
    return total;  // Success: wrote all n bytes
}
 
/**
 * Pattern 3: Non-blocking check for data
 */
int data_available(int sockfd) {
    char buf;
    ssize_t n = recv(sockfd, &buf, 1, MSG_PEEK | MSG_DONTWAIT);
    
    if (n > 0)
        return 1;           // Data available
    else if (n == 0)
        return -1;          // Connection closed
    else if (errno == EAGAIN || errno == EWOULDBLOCK)
        return 0;           // No data yet
    else
        return -1;          // Error
}

Shutdown Operations

While the connection is ESTABLISHED, you can partially close it:

int shutdown(int sockfd, int how);

how	Effect	Use Case
SHUT_RD	Stop reading (future reads return 0)	Rare
SHUT_WR	Send FIN, signal no more writes	Clean close
SHUT_RDWR	Both (like close but keeps fd)	Rare

shutdown(SHUT_WR) is the clean way to signal "I'm done sending":

Sends FIN to peer
Peer's read() returns 0 (EOF)
Peer knows no more data coming
Connection transitions to FIN_WAIT states
You can still receive data from peer

Socket Status and Options

// Check socket error status
int error;
socklen_t len = sizeof(error);
getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len);
// error non-zero means connection broken

// Get buffer sizes
int sndbuf, rcvbuf;
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &sndbuf, &len);
getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, &len);

// Get connection info
struct sockaddr_in peer;
getpeername(sockfd, (struct sockaddr*)&peer, &len);
// peer now contains remote IP and port

send() Success ≠ Delivery

Detecting Connection Failures

Even in ESTABLISHED state, connections can fail. Applications must detect and handle these failures appropriately.

How Failures Manifest

1. Graceful Close (recv returns 0)

The peer called close() or shutdown()—a normal, expected closure.

int n = recv(sockfd, buf, sizeof(buf), 0);
if (n == 0) {
    // Peer closed connection gracefully
    // This is normal, not an error
    close(sockfd);
}

2. Connection Reset (ECONNRESET)

The peer sent RST—connection is forcibly terminated.

int n = recv(sockfd, buf, sizeof(buf), 0);
if (n < 0 && errno == ECONNRESET) {
    // Peer reset connection (RST received)
    // Possible causes:
    //   - Peer crashed and rebooted
    //   - Peer process died abnormally
    //   - Firewall injection attack
    close(sockfd);
}

3. Broken Pipe (EPIPE / SIGPIPE)

Write to a connection that peer has closed.

// Peer has closed their side (FIN received)
// We try to send data
int n = send(sockfd, data, len, MSG_NOSIGNAL);
if (n < 0 && errno == EPIPE) {
    // Cannot write: connection broken
    close(sockfd);
}

// Without MSG_NOSIGNAL, SIGPIPE is raised instead!
// Default SIGPIPE action: terminate process

4. Timeout (ETIMEDOUT)

Retransmission limit exceeded—no response from peer.

int n = recv(sockfd, buf, sizeof(buf), 0);
if (n < 0 && errno == ETIMEDOUT) {
    // Network path is dead (no response to probes)
    // Takes a long time to detect with defaults
    close(sockfd);
}

Connection Failure Scenarios
Scenario	recv() behavior	send() behavior	Detection speed
Peer graceful close	Returns 0	EPIPE after FIN_WAIT	Immediate
Peer crash (RST)	ECONNRESET	ECONNRESET	Immediate
Network failure	ETIMEDOUT (eventually)	Blocks then ETIMEDOUT	Minutes
Peer power loss	Blocks forever (no keepalive)	Blocks then ETIMEDOUT	Minutes to hours
Firewall kills state	Hangs or RST (varies)	RST on next write	On next I/O

Proactive Failure Detection

Don't wait for failures to manifest—detect them early:

1. Enable TCP Keep-Alive (as discussed earlier)

2. Implement Application Heartbeats

import threading
import time

def heartbeat_monitor(sock, interval=30, timeout=90):
    """Send periodic heartbeats, detect missing responses"""
    last_response = time.time()
    
    while True:
        # Send heartbeat
        try:
            sock.send(b"PING")
        except:
            raise ConnectionError("Failed to send heartbeat")
        
        # Check for response (with timeout)
        sock.settimeout(timeout)
        try:
            response = sock.recv(4)
            if response == b"PONG":
                last_response = time.time()
        except socket.timeout:
            if time.time() - last_response > timeout:
                raise ConnectionError("Heartbeat timeout")
        
        time.sleep(interval)

3. Use Proper Timeouts

// Set socket receive timeout
struct timeval tv;
tv.tv_sec = 30;   // 30 seconds
tv.tv_usec = 0;
setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));

// recv() will now return EAGAIN/EWOULDBLOCK after 30 seconds

Defensive Programming

Summary: The ESTABLISHED State

We've explored the ESTABLISHED state comprehensively—the stable, productive phase of TCP connections where reliable data transfer occurs.

Key Takeaways

Core Concepts

•ESTABLISHED is the stable data transfer state — Both endpoints have synchronized and can exchange data bidirectionally.
•Full-duplex communication — Both sides can send simultaneously; data flows independently in each direction.
•Byte stream abstraction — TCP delivers ordered bytes, not messages; applications must handle framing.
•Continuous reliability enforcement — Sequence numbers, ACKs, retransmissions maintain delivery guarantees.
•Keep-alive detects dead peers — Without it, idle connections may not notice peer failure for long periods.
•Socket operations have nuances — Partial reads/writes, graceful vs. reset closure, blocking vs. non-blocking all matter.
•Failure detection is critical — Applications must handle ECONNRESET, EPIPE, ETIMEDOUT, and graceful close appropriately.

What's Next

Page Complete

3 / 5