Loading learning content...
After the careful choreography of the three-way handshake, the TCP connection enters its most important and longest-lasting state: ESTABLISHED. This is where TCP delivers on its promise—reliable, ordered, bidirectional byte stream delivery between two endpoints. The transient nature of handshake states gives way to a stable platform for data exchange that may persist for milliseconds or months.
The ESTABLISHED state is where applications actually use TCP. Every HTTP request you make, every database query you send, every real-time message you receive—all travel through connections in the ESTABLISHED state. Understanding this state means understanding how TCP maintains reliability during data transfer, how it detects and handles failures, and how applications interact with established connections.
This page explores the ESTABLISHED state comprehensively—from the conditions that define it, through the mechanics of data transfer it enables, to the keep-alive mechanisms that maintain it, and the socket operations available during this phase.
By the end of this page, you will understand: (1) What defines the ESTABLISHED state and how it's entered, (2) Full-duplex data transfer and the send/receive buffers, (3) Sequence numbers and acknowledgments during data transfer, (4) TCP keep-alive mechanisms and their configuration, (5) Detecting and handling connection failures, and (6) Socket operations available in ESTABLISHED state.
The ESTABLISHED state represents a fully synchronized, bidirectional communication channel. Both endpoints have completed the handshake and agreed on the parameters that will govern their data exchange.
A socket enters ESTABLISHED from different prior states:
| From State | Trigger | Context |
|---|---|---|
| SYN_SENT | Receive valid SYN+ACK, send ACK | Normal client connection |
| SYN_RECEIVED | Receive valid ACK | Normal server accept |
| SYN_RECEIVED | (from SYN_SENT via simultaneous open) | Rare peer-to-peer case |
By the time ESTABLISHED is reached, both sides have agreed on:
Sequence Number Spaces:
Communication Parameters:
Connection Identity:
The kernel maintains substantial state for established connections:
// Simplified view of socket state in ESTABLISHED
struct tcp_sock {
// Sequence number tracking
u32 snd_una; // Oldest unacknowledged sequence number
u32 snd_nxt; // Next sequence number to send
u32 snd_wnd; // Send window (from receiver's advertisements)
u32 rcv_nxt; // Next expected sequence from peer
u32 rcv_wnd; // Our receive window
// Buffer management
struct sk_buff_head write_queue; // Data waiting to send
struct sk_buff_head receive_queue; // Data received, awaiting read
struct sk_buff_head out_of_order; // Out-of-order segments
// Timing and RTT
u32 srtt_us; // Smoothed RTT estimate
u32 rttvar_us; // RTT variance
u32 rto; // Retransmission timeout
// Congestion control
u32 cwnd; // Congestion window
u32 ssthresh; // Slow start threshold
// Keep-alive state
u8 keepalive_probes; // Number of keepalive probes sent
unsigned long keepalive_time; // Last data/ACK time
};
Once established, both endpoints are peers. There's no longer a distinction between 'client' and 'server' at the TCP level—both can send and receive data freely. The asymmetry of connection establishment gives way to the symmetry of data transfer.
TCP in ESTABLISHED state provides full-duplex communication—both endpoints can send data to each other simultaneously, independently. This is not alternating half-duplex but truly concurrent bidirectional data flow.
TCP presents a byte stream abstraction to applications:
Application writes: [Hello Wo][rld! How are ][you?]
│ │ │
TCP segments: [Hello World! Ho][w are you?]
│ │
Receiver sees: [Hello World! How are you?]
(same bytes, same order, different grouping)
Each direction of data flow uses a buffer managed by the kernel:
┌─────────────────────────┐
Application │ TCP Socket │ Network
│ │
write() ─────────▶│ ┌──────────────────┐ │───▶ Segments
│ │ Send Buffer │ │
│ │ (SO_SNDBUF) │ │
│ └──────────────────┘ │
│ │
read() ◀─────────│ ┌──────────────────┐ │◀─── Segments
│ │ Receive Buffer │ │
│ │ (SO_RCVBUF) │ │
│ └──────────────────┘ │
└─────────────────────────┘
Send Buffer (SO_SNDBUF):
Receive Buffer (SO_RCVBUF):
During data transfer, sequence numbers track every byte:
Sender's View:
snd_una snd_nxt snd_una + snd_wnd
│ │ │
▼ ▼ ▼
├─────────────────────┼────────────────────────────┤
│ Unacked data │ Available window │
│ (awaiting ACKs) │ (can send more) │
└─────────────────────┴────────────────────────────┘
Receiver's View:
rcv_nxt rcv_nxt + rcv_wnd
│ │
▼ ▼
├──────────────────────────────────────────────┤
│ Receive Window │
│ (will accept these sequence numbers) │
└──────────────────────────────────────────────┘
Every data segment contains:
Example Exchange:
Time │ Direction │ Seq │ Ack │ Data
──────┼─────────────┼────────────┼──────────┼────────────
1 │ A → B │ 1000 │ 5000 │ 100 bytes
2 │ B → A │ 5000 │ 1100 │ 200 bytes
3 │ A → B │ 1100 │ 5200 │ 150 bytes
4 │ B → A │ 5200 │ 1250 │ ACK only
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
"""Demonstrating full-duplex TCP communicationBoth threads send and receive simultaneously"""import socketimport threadingimport time def client_handler(conn, name): """Handle one direction of communication""" def sender(): for i in range(5): msg = f"Message {i} from {name}" conn.send(msg.encode()) print(f"[{name}] Sent: {msg}") time.sleep(0.5) def receiver(): conn.settimeout(5) try: while True: data = conn.recv(1024) if not data: break print(f"[{name}] Received: {data.decode()}") except socket.timeout: pass # Run sender and receiver concurrently t1 = threading.Thread(target=sender) t2 = threading.Thread(target=receiver) t1.start() t2.start() t1.join() t2.join() def run_server(): server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server.bind(('127.0.0.1', 9999)) server.listen(1) print("[Server] Listening...") conn, addr = server.accept() print(f"[Server] Connection from {addr}") # Server sends and receives simultaneously client_handler(conn, "Server") conn.close() server.close() def run_client(): time.sleep(0.5) # Let server start client = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client.connect(('127.0.0.1', 9999)) print("[Client] Connected") # Client sends and receives simultaneously client_handler(client, "Client") client.close() if __name__ == "__main__": server_thread = threading.Thread(target=run_server) client_thread = threading.Thread(target=run_client) server_thread.start() client_thread.start() server_thread.join() client_thread.join() print("Full-duplex communication complete!")TCP often 'piggybacks' acknowledgments on data segments. Instead of sending a separate ACK, it includes the acknowledgment in the next outgoing data segment. This reduces overhead and is one reason why full-duplex communication is efficient. Empty ACKs are only sent when there's no data to piggyback on (using delayed ACKs).
The ESTABLISHED state is where TCP's reliability guarantees are continuously enforced. Every segment must be acknowledged, lost segments must be retransmitted, and order must be preserved.
TCP guarantees that data written to a socket will be:
Cumulative ACKs:
Selective ACKs (SACK):
Standard ACK: "I've received up to byte 1000"
With SACK: "I've received up to byte 1000,
plus bytes 2001-3000 and 4001-5000"
Sender knows: Need to retransmit bytes 1001-2000 and 3001-4000
| Trigger | Mechanism | Speed | Use Case |
|---|---|---|---|
| Timeout (RTO) | Timer expires, no ACK received | Slow (RTT + variance) | Fallback for any loss |
| Triple Duplicate ACK | 3 identical ACKs received | Fast (immediate) | Packet loss detection |
| SACK-based | Selective ACK reveals gap | Fast (targeted) | Efficient recovery |
| Tail Loss Probe | Proactive probe after delay | Medium | Detecting final segment loss |
TCP maintains a retransmission timer that adapts to network conditions:
RTO = SRTT + 4 × RTTVAR
Where:
- SRTT = Smoothed Round-Trip Time (exponentially weighted average)
- RTTVAR = RTT Variance estimate
On timeout:
- RTO doubles (exponential backoff)
- Maximum typically 60-120 seconds
- Minimum typically 200ms-1s
Calculating RTT:
// Jacobson's algorithm (simplified)
void update_rtt(long measured_rtt) {
// First measurement
if (srtt == 0) {
srtt = measured_rtt;
rttvar = measured_rtt / 2;
rto = srtt + 4 * rttvar;
return;
}
// Subsequent measurements
long err = measured_rtt - srtt;
// SRTT = SRTT + (1/8) * (RTT - SRTT)
srtt = srtt + (err >> 3);
// RTTVAR = RTTVAR + (1/4) * (|RTT - SRTT| - RTTVAR)
rttvar = rttvar + ((abs(err) - rttvar) >> 2);
// RTO = SRTT + 4 * RTTVAR (with minimum)
rto = max(MIN_RTO, srtt + 4 * rttvar);
}
When segments arrive out of order, TCP must handle them carefully:
Segments arrive: 1000 3000 2000 4000
↓ ↓ ↓ ↓
Receive buffer: [1000] (deliver 1000)
OOO buffer: [3000] (buffer 3000)
ACK sent: 1001 (expecting 1001, not 3001)
After 2000: [1000][2000][3000][4000] (reassembled)
ACK sent: 5000 (all received)
All this complexity—retransmissions, buffering, reordering—is invisible to the application. The application's read() call returns data in order, reliably. If TCP cannot deliver data (connection broken, remote crashed), it reports an error. The complexity is TCP's burden, not the application's.
An established TCP connection can exist indefinitely without any data transfer. The connection is purely a state maintained by both kernels—there's no "active" component that requires constant communication. But this creates a problem: how do you detect a dead peer?
Consider these scenarios:
Remote host crashes (power failure, kernel panic)
Network path fails
Remote application hangs
Without keep-alive, a silent connection could remain "ESTABLISHED" in the kernel forever, consuming resources.
TCP keep-alive sends probe packets when the connection is idle:
┌─────────────────────────────────────────────────────────────────┐
│ Keep-Alive Timeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Last data ──────────── tcp_keepalive_time ─────────────▶ Probe │
│ (2 hours default) │
│ │
│ Probe ──────────────── tcp_keepalive_intvl ─────────────▶ Probe │
│ (75 seconds default) │
│ │
│ After tcp_keepalive_probes failures (default 9): Connection dead │
│ │
└─────────────────────────────────────────────────────────────────┘
Keep-alive probe packet:
Keep-alive response interpretation:
| Response | Meaning | Action |
|---|---|---|
| ACK received | Peer is alive | Reset timer |
| RST received | Peer rebooted (lost connection state) | Close connection |
| No response | Peer dead or unreachable | Continue probing |
| ICMP error | Network path broken | Close connection |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
"""Configuring TCP keep-alive for faster dead peer detection Default Linux values:- tcp_keepalive_time: 7200 seconds (2 hours!)- tcp_keepalive_intvl: 75 seconds- tcp_keepalive_probes: 9 Time to detect dead peer with defaults: 7200 + (75 × 9) = 7875 seconds = 2+ hours With aggressive settings below: 60 + (10 × 6) = 120 seconds = 2 minutes""" import socket def create_socket_with_keepalive(): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Enable keep-alive sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) # Platform-specific options (Linux) # TCP_KEEPIDLE: seconds before first probe # TCP_KEEPINTVL: seconds between probes # TCP_KEEPCNT: number of failed probes before giving up import platform if platform.system() == 'Linux': # Wait 60 seconds before first probe sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60) # Send probes every 10 seconds sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10) # Give up after 6 failed probes sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 6) elif platform.system() == 'Darwin': # macOS # macOS uses TCP_KEEPALIVE for idle time sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPALIVE, 60) elif platform.system() == 'Windows': # Windows uses SIO_KEEPALIVE_VALS ioctl # onoff, keepalivetime (ms), keepaliveinterval (ms) sock.ioctl(socket.SIO_KEEPALIVE_VALS, (1, 60000, 10000)) return sock # Example usageif __name__ == "__main__": sock = create_socket_with_keepalive() sock.connect(('example.com', 80)) # Connection now monitored with 2-minute dead peer detection print("Connected with keep-alive enabled") # ... use connection ... sock.close()Good Use Cases:
Caution Needed:
TCP keep-alive is somewhat controversial in the networking community:
Arguments Against:
Arguments For:
Many protocols implement their own heartbeat mechanisms (WebSocket pings, AMQP heartbeats, gRPC keepalives). These operate at the application layer and can convey more information (e.g., 'I'm alive but overloaded'). Consider TCP keep-alive as a fallback that catches edge cases, not a replacement for application-level health monitoring.
The ESTABLISHED state is where the primary socket I/O operations are used. Understanding these operations and their behaviors is essential for network programming.
ssize_t recv(int sockfd, void *buf, size_t len, int flags);
Behavior:
Important Flags:
| Flag | Effect |
|---|---|
| MSG_PEEK | Read data without removing from buffer |
| MSG_WAITALL | Block until full len bytes received |
| MSG_DONTWAIT | Non-blocking for this call only |
| MSG_OOB | Receive out-of-band (urgent) data |
Edge Cases:
int n = recv(sockfd, buf, 1024, 0);
if (n > 0) {
// Data received (process n bytes)
} else if (n == 0) {
// Connection closed by peer (FIN received)
// Socket transitioning to CLOSE_WAIT
} else { // n < 0
if (errno == EAGAIN || errno == EWOULDBLOCK) {
// Non-blocking: no data available yet
} else if (errno == ECONNRESET) {
// Connection reset by peer (RST received)
} else {
// Other error
perror("recv");
}
}
ssize_t send(int sockfd, const void *buf, size_t len, int flags);
Behavior:
Important Flags:
| Flag | Effect |
|---|---|
| MSG_NOSIGNAL | Don't send SIGPIPE on broken connection |
| MSG_DONTWAIT | Non-blocking for this call only |
| MSG_MORE | More data coming (delay transmission) |
| MSG_OOB | Send as out-of-band (urgent) data |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
/** * Common socket I/O patterns for ESTABLISHED connections */#include <sys/socket.h>#include <errno.h>#include <string.h> /** * Pattern 1: Read exactly N bytes * (Handles partial reads) */int read_exactly(int sockfd, void *buf, size_t n) { size_t total = 0; char *ptr = buf; while (total < n) { ssize_t bytes = recv(sockfd, ptr + total, n - total, 0); if (bytes < 0) { if (errno == EINTR) continue; // Interrupted, retry return -1; // Error } else if (bytes == 0) { return 0; // Connection closed } total += bytes; } return total; // Success: read exactly n bytes} /** * Pattern 2: Write all data * (Handles partial writes) */int write_all(int sockfd, const void *buf, size_t n) { size_t total = 0; const char *ptr = buf; while (total < n) { ssize_t bytes = send(sockfd, ptr + total, n - total, MSG_NOSIGNAL); if (bytes < 0) { if (errno == EINTR) continue; // Interrupted, retry return -1; // Error (EPIPE = broken connection) } total += bytes; } return total; // Success: wrote all n bytes} /** * Pattern 3: Non-blocking check for data */int data_available(int sockfd) { char buf; ssize_t n = recv(sockfd, &buf, 1, MSG_PEEK | MSG_DONTWAIT); if (n > 0) return 1; // Data available else if (n == 0) return -1; // Connection closed else if (errno == EAGAIN || errno == EWOULDBLOCK) return 0; // No data yet else return -1; // Error}While the connection is ESTABLISHED, you can partially close it:
int shutdown(int sockfd, int how);
| how | Effect | Use Case |
|---|---|---|
| SHUT_RD | Stop reading (future reads return 0) | Rare |
| SHUT_WR | Send FIN, signal no more writes | Clean close |
| SHUT_RDWR | Both (like close but keeps fd) | Rare |
shutdown(SHUT_WR) is the clean way to signal "I'm done sending":
// Check socket error status
int error;
socklen_t len = sizeof(error);
getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len);
// error non-zero means connection broken
// Get buffer sizes
int sndbuf, rcvbuf;
getsockopt(sockfd, SOL_SOCKET, SO_SNDBUF, &sndbuf, &len);
getsockopt(sockfd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, &len);
// Get connection info
struct sockaddr_in peer;
getpeername(sockfd, (struct sockaddr*)&peer, &len);
// peer now contains remote IP and port
A successful send() only means the kernel accepted the data into its buffer. The data may be lost to network failure, the peer may crash before reading it, or the connection may reset. If you need confirmation of receipt, you must implement application-level acknowledgments. TCP's ACKs only guarantee the kernel received the data, not that the application processed it.
Even in ESTABLISHED state, connections can fail. Applications must detect and handle these failures appropriately.
1. Graceful Close (recv returns 0)
The peer called close() or shutdown()—a normal, expected closure.
int n = recv(sockfd, buf, sizeof(buf), 0);
if (n == 0) {
// Peer closed connection gracefully
// This is normal, not an error
close(sockfd);
}
2. Connection Reset (ECONNRESET)
The peer sent RST—connection is forcibly terminated.
int n = recv(sockfd, buf, sizeof(buf), 0);
if (n < 0 && errno == ECONNRESET) {
// Peer reset connection (RST received)
// Possible causes:
// - Peer crashed and rebooted
// - Peer process died abnormally
// - Firewall injection attack
close(sockfd);
}
3. Broken Pipe (EPIPE / SIGPIPE)
Write to a connection that peer has closed.
// Peer has closed their side (FIN received)
// We try to send data
int n = send(sockfd, data, len, MSG_NOSIGNAL);
if (n < 0 && errno == EPIPE) {
// Cannot write: connection broken
close(sockfd);
}
// Without MSG_NOSIGNAL, SIGPIPE is raised instead!
// Default SIGPIPE action: terminate process
4. Timeout (ETIMEDOUT)
Retransmission limit exceeded—no response from peer.
int n = recv(sockfd, buf, sizeof(buf), 0);
if (n < 0 && errno == ETIMEDOUT) {
// Network path is dead (no response to probes)
// Takes a long time to detect with defaults
close(sockfd);
}
| Scenario | recv() behavior | send() behavior | Detection speed |
|---|---|---|---|
| Peer graceful close | Returns 0 | EPIPE after FIN_WAIT | Immediate |
| Peer crash (RST) | ECONNRESET | ECONNRESET | Immediate |
| Network failure | ETIMEDOUT (eventually) | Blocks then ETIMEDOUT | Minutes |
| Peer power loss | Blocks forever (no keepalive) | Blocks then ETIMEDOUT | Minutes to hours |
| Firewall kills state | Hangs or RST (varies) | RST on next write | On next I/O |
Don't wait for failures to manifest—detect them early:
1. Enable TCP Keep-Alive (as discussed earlier)
2. Implement Application Heartbeats
import threading
import time
def heartbeat_monitor(sock, interval=30, timeout=90):
"""Send periodic heartbeats, detect missing responses"""
last_response = time.time()
while True:
# Send heartbeat
try:
sock.send(b"PING")
except:
raise ConnectionError("Failed to send heartbeat")
# Check for response (with timeout)
sock.settimeout(timeout)
try:
response = sock.recv(4)
if response == b"PONG":
last_response = time.time()
except socket.timeout:
if time.time() - last_response > timeout:
raise ConnectionError("Heartbeat timeout")
time.sleep(interval)
3. Use Proper Timeouts
// Set socket receive timeout
struct timeval tv;
tv.tv_sec = 30; // 30 seconds
tv.tv_usec = 0;
setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));
// recv() will now return EAGAIN/EWOULDBLOCK after 30 seconds
Always assume connections can fail at any time. Use non-blocking I/O or timeouts to prevent indefinite blocking. Handle all error cases explicitly. Use MSG_NOSIGNAL to prevent SIGPIPE crashes. Log connection failures for debugging. Implement reconnection logic where appropriate.
We've explored the ESTABLISHED state comprehensively—the stable, productive phase of TCP connections where reliable data transfer occurs.
All good things must end. When one or both sides want to terminate the connection, TCP enters its closing states: FIN_WAIT and TIME_WAIT. These states handle the orderly teardown of the connection, ensuring all data is delivered and no orphaned packets can interfere with future connections. In the next page, we'll explore the connection termination states and the critical role of TIME_WAIT.
You now understand the TCP ESTABLISHED state in depth—from entry conditions and data transfer mechanics, through reliability enforcement and keep-alive, to socket operations and failure detection. This knowledge is essential for writing robust networked applications that handle the realities of unreliable networks and uncooperative peers.