Loading content...
Every TCP segment containing data expects an acknowledgment. In a naive implementation, each received segment triggers an immediate ACK segment in response. While correct, this generates significant overhead—pure ACK segments carry no data but consume bandwidth, generate interrupts, and add load to network devices.
Delayed Acknowledgments address this inefficiency by allowing the receiver to wait briefly before sending an ACK, hoping either to:
This optimization, specified in RFC 1122, can halve the number of ACK segments on a connection. However, it introduces a timing element that interacts—sometimes problematically—with Nagle's Algorithm.
By the end of this page, you will understand how delayed ACKs work, their RFC specifications, the infamous Nagle/delayed ACK interaction, configuration options across operating systems, and how to diagnose and resolve delayed ACK-related performance issues.
To understand why delayed ACKs exist, consider the overhead of immediate acknowledgments:
Immediate ACK Overhead Analysis:
Scenario: Bulk data transfer, 1 MB file, MSS = 1460 bytes
──────────────────────────────────────────────────────────
Data segments: 1,000,000 / 1,460 ≈ 685 segments
With Immediate ACKs:
ACK segments generated: 685
ACK wire bytes: 685 × 40 bytes = 27,400 bytes
ACK ratio: 27,400 / 1,000,000 = 2.74% overhead
With Delayed ACKs (ACK every 2nd segment):
ACK segments generated: 343
ACK wire bytes: 343 × 40 bytes = 13,720 bytes
ACK ratio: 13,720 / 1,000,000 = 1.37% overhead
Reduction: 50% fewer ACKs
Beyond Bandwidth:
The savings extend beyond raw bandwidth:
Reduced interrupts: Each ACK triggers an interrupt on the sender. 50% fewer ACKs = 50% fewer interrupts.
Lower router/switch load: Network devices must process each packet; fewer ACKs mean less processing.
Piggybacking opportunity: Waiting allows the ACK to be combined with outgoing data, eliminating a separate ACK packet entirely.
| Metric | Immediate ACKs | Delayed ACKs | Improvement |
|---|---|---|---|
| ACKs per 1MB transfer | 685 | ~343 | 50% reduction |
| ACK bandwidth (1MB) | 27.4 KB | 13.7 KB | 50% reduction |
| Sender interrupts | 685 | ~343 | 50% reduction |
| With piggybacking | N/A | Potentially 0 | Up to 100% reduction |
In request-response protocols, piggybacking is the primary benefit. When the receiver sends a response shortly after receiving a request, the ACK rides along with the response data—no separate ACK packet needed. This is why HTTP, gRPC, and similar protocols benefit enormously from delayed ACKs when combined with properly-configured Nagle behavior.
Delayed acknowledgments are specified in RFC 1122 (Requirements for Internet Hosts) and further refined in RFC 5681 (TCP Congestion Control).
RFC 1122 Section 4.2.3.2 — When to Send an ACK Segment:
A host that is receiving a stream of TCP data segments can increase efficiency in both the Internet and the hosts by sending fewer than one ACK (acknowledgment) segment per data segment received; this is known as a 'delayed ACK.'
A TCP SHOULD implement a delayed ACK, but an ACK should not be excessively delayed; in particular, the delay MUST be less than 0.5 seconds, and in a stream of full-sized segments there SHOULD be an ACK for at least every second segment.
Key Requirements:
RFC 5681 Refinement:
An ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet.
The 'Two Segment' Rule:
Received Segments Action
───────────────── ──────
Segment 1 arrives Start 500ms timer
Segment 2 arrives ACK immediately (ACK every 2nd)
OR
Timer expires ACK now (max 500ms delay)
Typical Implementation Timings:
While the RFC specifies a 500ms maximum, most implementations use shorter delays:
| Operating System | Default Delay | Notes |
|---|---|---|
| Linux | 40ms | Dynamic, based on RTT |
| Windows | 200ms | Configurable via registry |
| macOS/BSD | 100ms | Configurable |
| Embedded | 50-200ms | Varies by stack |
The 500ms limit dates from the early Internet when RTTs could exceed 1 second on intercontinental paths. Modern networks rarely see RTTs above 300ms. The practical delay timers (40-200ms) reflect this reality. Linux dynamically adjusts based on observed RTT, making the delay proportional to the connection's latency.
Delayed ACKs are implemented in the TCP receive path. When a segment arrives, the TCP stack decides whether to ACK immediately or schedule a delayed ACK.
Decision Flowchart:
┌─────────────────────────────┐
│ Segment Received │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ Out of order segment? │
└─────────────────────────────┘
/ \
Yes No
│ │
▼ ▼
┌──────────────┐ ┌─────────────────────────┐
│ ACK IMMEDIATE│ │ Is this the 2nd segment │
│(for fast │ │ since last ACK? │
│retransmit) │ └─────────────────────────┘
└──────────────┘ / \
Yes No
│ │
▼ ▼
┌──────────────┐ ┌───────────────────┐
│ ACK IMMEDIATE│ │ Is delayed ACK │
│(2 segment │ │ timer running? │
│rule) │ └───────────────────┘
└──────────────┘ / \
Yes No
│ │
▼ ▼
┌────────────┐ ┌──────────────┐
│ Wait for │ │ Start timer │
│ timer │ │ (40-200ms) │
└────────────┘ └──────────────┘
Pseudocode Implementation:
class TCPReceiver:
def __init__(self):
self.delayed_ack_timer = None
self.segments_since_last_ack = 0
self.delay_timeout_ms = 40 # Linux default
def on_segment_received(self, segment):
# Process segment data...
# Check if we should ACK immediately
if self.should_ack_immediately(segment):
self.send_ack()
self.cancel_delayed_ack_timer()
self.segments_since_last_ack = 0
else:
self.segments_since_last_ack += 1
# Start or continue delayed ACK timer
if not self.delayed_ack_timer:
self.delayed_ack_timer = Timer(
self.delay_timeout_ms,
self.delayed_ack_timeout
)
def should_ack_immediately(self, segment):
# Out-of-order segment: ACK immediately for fast retransmit
if segment.seq != self.expected_seq:
return True
# PSH flag set: application wants low latency
if segment.flags.PSH:
return True
# 2nd segment since last ACK: ACK now
if self.segments_since_last_ack >= 1:
return True
# Have outgoing data: can piggyback
if self.has_data_to_send():
return True
return False
def delayed_ack_timeout(self):
# Timer fired: must ACK now
self.send_ack()
self.segments_since_last_ack = 0
The PSH (Push) flag in TCP headers tells the receiver to deliver data to the application immediately. Many TCP stacks also use PSH as a hint to ACK immediately, reducing latency for applications that explicitly request it. This is why some libraries set PSH on every segment—to defeat delayed ACKs.
We touched on this in the Nagle's Algorithm page, but the interaction deserves detailed analysis as it causes significant performance issues in production systems.
The Problem Setup:
The Deadlock Sequence:
Time Client Server Issue
──── ────── ────── ─────
0ms Send Request (500 bytes)
Outstanding: 500 bytes
10ms App writes more data (maybe) Request received
Nagle: Can't send yet Start delayed ACK timer
(outstanding data exists) (40-200ms)
20ms Waiting for ACK... Timer running...
No data to piggyback
50ms Still waiting... Timer still running...
Processing request...
┌───────────────────────────────────────────────────────┐
│ BOTH SIDES ARE WAITING FOR THE OTHER │
│ Client: Waiting for ACK (Nagle blocking) │
│ Server: Waiting for timer or outgoing data │
└───────────────────────────────────────────────────────┘
200ms Delayed ACK timer fires!
◀─────────────────────────── ACK sent
Outstanding: 0 bytes
Can now send buffered data ──────────────────────────▶
400ms Response ready
◀─────────────────────────── Response sent
Every request-response cycle pays a 40-200ms 'tax' due to this interaction. For applications making multiple sequential requests (database queries, API calls), this adds up to seconds of unnecessary latency. A 10-query operation that should take 100ms takes 2+ seconds.
Visualizing the Waste:
Optimal timeline (no interaction):
├─Request─┤─Process─┤─Response─┤
0────────10───────20──────────30ms
Actual timeline (Nagle + Delayed ACK):
├─Request─┤──────────────────────────────────────────┤─Process─┤─Response─┤
0────────10────────────────────────────────────────210───────220─────────230ms
▲
└── 200ms wasted waiting for delayed ACK
The Mathematics:
Latency increase = Delayed ACK timeout (40-200ms)
For a sequence of N request-response cycles:
Optimal: N × (RTT + processing time)
Actual: N × (RTT + processing time + ACK delay)
Overhead: N × ACK delay
Example: 10 database queries, 10ms each, 200ms ACK delay:
Optimal: 10 × (10 + 10) = 200ms
Actual: 10 × (10 + 10 + 200) = 2200ms
Overhead: 2000ms (10x slower!)
| Scenario | 40ms Delay | 100ms Delay | 200ms Delay |
|---|---|---|---|
| Single request | +40ms | +100ms | +200ms |
| 10 sequential requests | +400ms | +1000ms | +2000ms |
| 100 sequential requests | +4s | +10s | +20s |
| Database-heavy page load | Noticeable | Slow | Unusable |
Several approaches mitigate the Nagle/delayed ACK interaction:
Solution 1: Disable Nagle's Algorithm (TCP_NODELAY)
The most common solution for request-response protocols:
// Client-side: Disable Nagle
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));
Pros: Immediate sends, no interaction problem Cons: Potentially inefficient for other scenarios Use when: Request-response patterns, latency-sensitive applications
Modern database drivers, HTTP clients, and RPC frameworks typically set TCP_NODELAY by default. Check your library's documentation—you may not need to configure anything. Redis, MySQL connectors, gRPC, and most HTTP/2 implementations disable Nagle.
Solution 2: Disable Delayed ACKs (TCP_QUICKACK)
Linux provides the TCP_QUICKACK option:
// Server-side: Disable delayed ACKs
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_QUICKACK, &flag, sizeof(flag));
Important: TCP_QUICKACK is not persistent! It must be set after each read() to remain effective. The kernel resets it after sending a quick ACK.
while ((n = read(sock, buf, sizeof(buf))) > 0) {
// Re-enable quick ACK after each read
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_QUICKACK, &flag, sizeof(flag));
// Process data...
}
Solution 3: Configure Delayed ACK Timeout (System-wide)
# Linux: Set minimum delayed ACK timeout
echo 20 > /proc/sys/net/ipv4/tcp_delack_min # 20ms minimum
# Windows: Registry setting (requires reboot)
reg add "HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" \
/v TcpAckFrequency /t REG_DWORD /d 1 /f
# Note: TcpAckFrequency=1 means ACK every segment (no delay)
Solution 4: Application-Level Batching
Write complete messages in a single send() call:
# Instead of:
sock.send(header) # Triggers Nagle buffering
sock.send(body) # Waits for ACK
# Do:
message = header + body
sock.send(message) # Sent immediately (single write)
Solution 5: Use TCP_CORK for Explicit Batching
// Cork, write pieces, uncork
int cork = 1;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));
write(sock, header, header_len);
write(sock, body, body_len);
cork = 0;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork)); // Sends all
| Solution | Where Applied | Persistence | Best For |
|---|---|---|---|
| TCP_NODELAY | Sender | Per-socket | Request-response clients |
| TCP_QUICKACK | Receiver (Linux) | Per-read() | Servers, requires repetition |
| System config | System-wide | Permanent | All applications on host |
| App batching | Application | Per-write | When you control the code |
| TCP_CORK | Sender (Linux) | Per-cork/uncork | Known message boundaries |
Identifying delayed ACK issues requires careful timing analysis in packet captures.
Signature Pattern:
The classic signature is a ~40-200ms gap between receiving a small segment and sending the ACK:
Packet # Time Source Dest Info
──────── ──── ────── ──── ────
1 0.000000 Client Server [PSH, ACK] Len=200
2 0.201015 Server Client [ACK] ACK=201
↑
~200ms gap = delayed ACK!
Wireshark Analysis:
# Filter for ACK-only packets (potential delayed ACKs)
tcp.len == 0 && tcp.flags.ack == 1
# Custom column for inter-arrival time
Edit → Preferences → Columns → Add "Delta Time Displayed"
# Look for patterns:
# - Small data segment
# - 40-200ms gap
# - Pure ACK (no data)
On Linux systems, 40ms is the default delayed ACK timeout. If you see consistent ~40ms gaps before ACKs, delayed ACKs are likely the cause. Windows typically shows ~200ms gaps. These specific timings are diagnostic signatures.
Command-Line Diagnostics:
# Capture and analyze ACK timing
tcpdump -i eth0 -nn 'tcp and host 10.0.0.1' -w capture.pcap
# Analyze inter-packet times with tshark
tshark -r capture.pcap -T fields \
-e frame.time_delta_displayed \
-e tcp.len \
-e tcp.flags.ack | \
awk '$1 > 0.035 && $2 == 0 {print "Delayed ACK: " $1 "s"}'
# Count potential delayed ACKs
tshark -r capture.pcap -T fields -e frame.time_delta_displayed -e tcp.len | \
awk '$1 > 0.035 && $1 < 0.25 && $2 == 0 {count++} END {print count}'
Application-Level Timing:
import time
# Measure round-trip time for request-response
start = time.time()
sock.send(request)
response = sock.recv(4096)
end = time.time()
rtt = (end - start) * 1000 # milliseconds
# If RTT is consistently 40-200ms higher than expected,
# Nagle/delayed ACK interaction is likely
print(f"RTT: {rtt:.1f}ms")
# Compare with Nagle disabled:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
Benchmark Script:
import socket
import time
def benchmark_rtt(host, port, with_nodelay=False):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
if with_nodelay:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
sock.connect((host, port))
times = []
for _ in range(100):
start = time.time()
sock.send(b"PING\n")
sock.recv(1024)
times.append((time.time() - start) * 1000)
sock.close()
return sum(times) / len(times)
rtt_nagle = benchmark_rtt("server", 8080, with_nodelay=False)
rtt_nodelay = benchmark_rtt("server", 8080, with_nodelay=True)
print(f"With Nagle: {rtt_nagle:.1f}ms")
print(f"Without Nagle: {rtt_nodelay:.1f}ms")
print(f"Difference: {rtt_nagle - rtt_nodelay:.1f}ms")
TCP implementations continue to evolve, with several developments improving the delayed ACK situation.
Linux Improvements:
Dynamic ACK Timeout: Linux adjusts the delayed ACK timer based on observed RTT. Faster connections get shorter delays.
Quick ACK Mode: After connection establishment or idle periods, Linux sends immediate ACKs temporarily to help congestion control converge.
TCP_NOTSENT_LOWAT: Reduces buffering latency by controlling when the socket becomes writable.
QUIC Protocol:
QUIC (used by HTTP/3) learns from TCP's mistakes:
QUIC Improvements:
├── ACK frames are cheap (0-RTT overhead possible)
├── ACK frequency is adaptive and negotiable
├── No Nagle algorithm (application controls batching)
└── ACK timing can be explicitly coordinated
Modern HTTP protocols largely avoid the Nagle/delayed ACK problem. HTTP/2 multiplexes streams over a single connection, naturally batching data. HTTP/3 (QUIC) reimplements reliability with explicit ACK handling. If you can use HTTP/2 or HTTP/3, many SWS-related issues disappear.
Best Practices Summary:
Configuration Checklist:
□ Identify your traffic pattern (streaming, request-response, interactive)
□ Review library defaults (many already optimize for you)
□ Measure baseline latency with and without TCP_NODELAY
□ If using Linux, TCP_QUICKACK may help on the receiver side
□ For complex protocols, consider application-level batching
□ Monitor production for the 40-200ms latency signature
□ Document your socket options for future maintainers
Delayed ACKs are a valuable optimization that, when combined with Nagle's Algorithm, can cause unexpected latency issues.
What's Next:
We've now covered all three components of SWS prevention: Nagle's Algorithm (sender), Clark's Algorithm (receiver), and Delayed ACKs (timing). The final page synthesizes these into a comprehensive Performance Impact analysis, quantifying the effects and providing production guidance.
You now understand delayed acknowledgments comprehensively—their purpose, RFC requirements, implementation, the Nagle interaction problem, solutions, and diagnostics. Next, we'll analyze the overall performance impact of Silly Window Syndrome and its solutions.