Silly Window Syndrome - Learning Module

Loading content...

0/228

Delayed ACK: Timing-Based Optimization

The Art of Waiting Before Acknowledging

Every TCP segment containing data expects an acknowledgment. In a naive implementation, each received segment triggers an immediate ACK segment in response. While correct, this generates significant overhead—pure ACK segments carry no data but consume bandwidth, generate interrupts, and add load to network devices.

Delayed Acknowledgments address this inefficiency by allowing the receiver to wait briefly before sending an ACK, hoping either to:

Piggyback the ACK on outgoing data (common in request-response protocols)
Combine ACKs for multiple received segments into a single ACK (common in bulk transfers)

This optimization, specified in RFC 1122, can halve the number of ACK segments on a connection. However, it introduces a timing element that interacts—sometimes problematically—with Nagle's Algorithm.

What You Will Learn

By the end of this page, you will understand how delayed ACKs work, their RFC specifications, the infamous Nagle/delayed ACK interaction, configuration options across operating systems, and how to diagnose and resolve delayed ACK-related performance issues.

The Rationale for Delayed ACKs

To understand why delayed ACKs exist, consider the overhead of immediate acknowledgments:

Immediate ACK Overhead Analysis:

Scenario: Bulk data transfer, 1 MB file, MSS = 1460 bytes
──────────────────────────────────────────────────────────
Data segments: 1,000,000 / 1,460 ≈ 685 segments

With Immediate ACKs:
  ACK segments generated: 685
  ACK wire bytes: 685 × 40 bytes = 27,400 bytes
  ACK ratio: 27,400 / 1,000,000 = 2.74% overhead

With Delayed ACKs (ACK every 2nd segment):
  ACK segments generated: 343
  ACK wire bytes: 343 × 40 bytes = 13,720 bytes
  ACK ratio: 13,720 / 1,000,000 = 1.37% overhead
  Reduction: 50% fewer ACKs

Beyond Bandwidth:

The savings extend beyond raw bandwidth:

Reduced interrupts: Each ACK triggers an interrupt on the sender. 50% fewer ACKs = 50% fewer interrupts.
Lower router/switch load: Network devices must process each packet; fewer ACKs mean less processing.
Piggybacking opportunity: Waiting allows the ACK to be combined with outgoing data, eliminating a separate ACK packet entirely.

Delayed ACK Benefits Analysis
Metric	Immediate ACKs	Delayed ACKs	Improvement
ACKs per 1MB transfer	685	~343	50% reduction
ACK bandwidth (1MB)	27.4 KB	13.7 KB	50% reduction
Sender interrupts	685	~343	50% reduction
With piggybacking	N/A	Potentially 0	Up to 100% reduction

The Piggybacking Advantage

In request-response protocols, piggybacking is the primary benefit. When the receiver sends a response shortly after receiving a request, the ACK rides along with the response data—no separate ACK packet needed. This is why HTTP, gRPC, and similar protocols benefit enormously from delayed ACKs when combined with properly-configured Nagle behavior.

RFC Specification and Requirements

Delayed acknowledgments are specified in RFC 1122 (Requirements for Internet Hosts) and further refined in RFC 5681 (TCP Congestion Control).

RFC 1122 Section 4.2.3.2 — When to Send an ACK Segment:

A host that is receiving a stream of TCP data segments can increase efficiency in both the Internet and the hosts by sending fewer than one ACK (acknowledgment) segment per data segment received; this is known as a 'delayed ACK.'

A TCP SHOULD implement a delayed ACK, but an ACK should not be excessively delayed; in particular, the delay MUST be less than 0.5 seconds, and in a stream of full-sized segments there SHOULD be an ACK for at least every second segment.

Key Requirements:

RFC 1122 Delayed ACK Rules

•Maximum delay: 500ms (MUST) — The ACK must be sent within 500 milliseconds of receiving the segment, regardless of other conditions
•ACK every 2nd segment (SHOULD) — For streams of full-sized segments, ACK at least every other segment to maintain sender pacing
•Implementation is optional (SHOULD) — Delayed ACKs are strongly recommended but not absolutely required
•No delay for out-of-order segments — Segments arriving out of order should be ACKed immediately (triggers fast retransmit)

RFC 5681 Refinement:

An ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet.

The 'Two Segment' Rule:

Received Segments          Action
─────────────────          ──────
Segment 1 arrives          Start 500ms timer
Segment 2 arrives          ACK immediately (ACK every 2nd)
                           OR
Timer expires              ACK now (max 500ms delay)

Typical Implementation Timings:

While the RFC specifies a 500ms maximum, most implementations use shorter delays:

Operating System	Default Delay	Notes
Linux	40ms	Dynamic, based on RTT
Windows	200ms	Configurable via registry
macOS/BSD	100ms	Configurable
Embedded	50-200ms	Varies by stack

Why 500ms?

The 500ms limit dates from the early Internet when RTTs could exceed 1 second on intercontinental paths. Modern networks rarely see RTTs above 300ms. The practical delay timers (40-200ms) reflect this reality. Linux dynamically adjusts based on observed RTT, making the delay proportional to the connection's latency.

Implementation Mechanics

Delayed ACKs are implemented in the TCP receive path. When a segment arrives, the TCP stack decides whether to ACK immediately or schedule a delayed ACK.

Decision Flowchart:

                    ┌─────────────────────────────┐
                    │     Segment Received        │
                    └─────────────────────────────┘
                                  │
                                  ▼
                    ┌─────────────────────────────┐
                    │   Out of order segment?     │
                    └─────────────────────────────┘
                           /              \
                        Yes                No
                         │                  │
                         ▼                  ▼
                ┌──────────────┐  ┌─────────────────────────┐
                │ ACK IMMEDIATE│  │ Is this the 2nd segment │
                │(for fast     │  │ since last ACK?         │
                │retransmit)   │  └─────────────────────────┘
                └──────────────┘         /              \
                                      Yes                No
                                       │                  │
                                       ▼                  ▼
                              ┌──────────────┐  ┌───────────────────┐
                              │ ACK IMMEDIATE│  │ Is delayed ACK    │
                              │(2 segment    │  │ timer running?    │
                              │rule)         │  └───────────────────┘
                              └──────────────┘         /         \
                                                    Yes           No
                                                     │             │
                                                     ▼             ▼
                                              ┌────────────┐ ┌──────────────┐
                                              │ Wait for   │ │ Start timer  │
                                              │ timer      │ │ (40-200ms)   │
                                              └────────────┘ └──────────────┘

Pseudocode Implementation:

class TCPReceiver:
    def __init__(self):
        self.delayed_ack_timer = None
        self.segments_since_last_ack = 0
        self.delay_timeout_ms = 40  # Linux default
        
    def on_segment_received(self, segment):
        # Process segment data...
        
        # Check if we should ACK immediately
        if self.should_ack_immediately(segment):
            self.send_ack()
            self.cancel_delayed_ack_timer()
            self.segments_since_last_ack = 0
        else:
            self.segments_since_last_ack += 1
            # Start or continue delayed ACK timer
            if not self.delayed_ack_timer:
                self.delayed_ack_timer = Timer(
                    self.delay_timeout_ms,
                    self.delayed_ack_timeout
                )
    
    def should_ack_immediately(self, segment):
        # Out-of-order segment: ACK immediately for fast retransmit
        if segment.seq != self.expected_seq:
            return True
        
        # PSH flag set: application wants low latency
        if segment.flags.PSH:
            return True
        
        # 2nd segment since last ACK: ACK now
        if self.segments_since_last_ack >= 1:
            return True
        
        # Have outgoing data: can piggyback
        if self.has_data_to_send():
            return True
        
        return False
    
    def delayed_ack_timeout(self):
        # Timer fired: must ACK now
        self.send_ack()
        self.segments_since_last_ack = 0

The PSH Flag

The PSH (Push) flag in TCP headers tells the receiver to deliver data to the application immediately. Many TCP stacks also use PSH as a hint to ACK immediately, reducing latency for applications that explicitly request it. This is why some libraries set PSH on every segment—to defeat delayed ACKs.

The Nagle/Delayed ACK Interaction Problem

We touched on this in the Nagle's Algorithm page, but the interaction deserves detailed analysis as it causes significant performance issues in production systems.

The Problem Setup:

Client: Has Nagle's Algorithm enabled (default)
Server: Has Delayed ACKs enabled (default)
Protocol: Request-response (e.g., HTTP, database query)
Request size: Small (< MSS)

The Deadlock Sequence:

Time    Client                        Server                     Issue
────    ──────                        ──────                     ─────
0ms     Send Request (500 bytes)
        Outstanding: 500 bytes

10ms    App writes more data (maybe)  Request received
        Nagle: Can't send yet         Start delayed ACK timer
        (outstanding data exists)     (40-200ms)

20ms    Waiting for ACK...            Timer running...
                                      No data to piggyback

50ms    Still waiting...              Timer still running...
                                      Processing request...

        ┌───────────────────────────────────────────────────────┐
        │ BOTH SIDES ARE WAITING FOR THE OTHER                  │
        │ Client: Waiting for ACK (Nagle blocking)              │
        │ Server: Waiting for timer or outgoing data            │
        └───────────────────────────────────────────────────────┘

200ms                                 Delayed ACK timer fires!
        ◀───────────────────────────  ACK sent
        Outstanding: 0 bytes
        Can now send buffered data ──────────────────────────▶

400ms                                 Response ready
        ◀───────────────────────────  Response sent

The 200ms Tax

Every request-response cycle pays a 40-200ms 'tax' due to this interaction. For applications making multiple sequential requests (database queries, API calls), this adds up to seconds of unnecessary latency. A 10-query operation that should take 100ms takes 2+ seconds.

Visualizing the Waste:

Optimal timeline (no interaction):
├─Request─┤─Process─┤─Response─┤
0────────10───────20──────────30ms

Actual timeline (Nagle + Delayed ACK):
├─Request─┤──────────────────────────────────────────┤─Process─┤─Response─┤
0────────10────────────────────────────────────────210───────220─────────230ms
              ▲
              └── 200ms wasted waiting for delayed ACK

The Mathematics:

Latency increase = Delayed ACK timeout (40-200ms)

For a sequence of N request-response cycles:
  Optimal:  N × (RTT + processing time)
  Actual:   N × (RTT + processing time + ACK delay)
  Overhead: N × ACK delay

Example: 10 database queries, 10ms each, 200ms ACK delay:
  Optimal:  10 × (10 + 10) = 200ms
  Actual:   10 × (10 + 10 + 200) = 2200ms
  Overhead: 2000ms (10x slower!)

Latency Impact by ACK Delay
Scenario	40ms Delay	100ms Delay	200ms Delay
Single request	+40ms	+100ms	+200ms
10 sequential requests	+400ms	+1000ms	+2000ms
100 sequential requests	+4s	+10s	+20s
Database-heavy page load	Noticeable	Slow	Unusable

Solutions and Configuration

Several approaches mitigate the Nagle/delayed ACK interaction:

Solution 1: Disable Nagle's Algorithm (TCP_NODELAY)

The most common solution for request-response protocols:

// Client-side: Disable Nagle
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));

Pros: Immediate sends, no interaction problem Cons: Potentially inefficient for other scenarios Use when: Request-response patterns, latency-sensitive applications

Most Libraries Do This Already

Modern database drivers, HTTP clients, and RPC frameworks typically set TCP_NODELAY by default. Check your library's documentation—you may not need to configure anything. Redis, MySQL connectors, gRPC, and most HTTP/2 implementations disable Nagle.

Solution 2: Disable Delayed ACKs (TCP_QUICKACK)

Linux provides the TCP_QUICKACK option:

// Server-side: Disable delayed ACKs
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_QUICKACK, &flag, sizeof(flag));

Important: TCP_QUICKACK is not persistent! It must be set after each read() to remain effective. The kernel resets it after sending a quick ACK.

while ((n = read(sock, buf, sizeof(buf))) > 0) {
    // Re-enable quick ACK after each read
    int flag = 1;
    setsockopt(sock, IPPROTO_TCP, TCP_QUICKACK, &flag, sizeof(flag));
    
    // Process data...
}

Solution 3: Configure Delayed ACK Timeout (System-wide)

# Linux: Set minimum delayed ACK timeout
echo 20 > /proc/sys/net/ipv4/tcp_delack_min  # 20ms minimum

# Windows: Registry setting (requires reboot)
reg add "HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" \
    /v TcpAckFrequency /t REG_DWORD /d 1 /f

# Note: TcpAckFrequency=1 means ACK every segment (no delay)

Solution 4: Application-Level Batching

Write complete messages in a single send() call:

# Instead of:
sock.send(header)       # Triggers Nagle buffering
sock.send(body)         # Waits for ACK

# Do:
message = header + body
sock.send(message)      # Sent immediately (single write)

Solution 5: Use TCP_CORK for Explicit Batching

// Cork, write pieces, uncork
int cork = 1;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));

write(sock, header, header_len);
write(sock, body, body_len);

cork = 0;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));  // Sends all

Solution Comparison
Solution	Where Applied	Persistence	Best For
TCP_NODELAY	Sender	Per-socket	Request-response clients
TCP_QUICKACK	Receiver (Linux)	Per-read()	Servers, requires repetition
System config	System-wide	Permanent	All applications on host
App batching	Application	Per-write	When you control the code
TCP_CORK	Sender (Linux)	Per-cork/uncork	Known message boundaries

Diagnostic Techniques

Identifying delayed ACK issues requires careful timing analysis in packet captures.

Signature Pattern:

The classic signature is a ~40-200ms gap between receiving a small segment and sending the ACK:

Packet #  Time      Source    Dest      Info
────────  ────      ──────    ────      ────
1         0.000000  Client    Server    [PSH, ACK] Len=200
2         0.201015  Server    Client    [ACK] ACK=201
                    ↑
                    ~200ms gap = delayed ACK!

Wireshark Analysis:

# Filter for ACK-only packets (potential delayed ACKs)
tcp.len == 0 && tcp.flags.ack == 1

# Custom column for inter-arrival time
Edit → Preferences → Columns → Add "Delta Time Displayed"

# Look for patterns:
# - Small data segment
# - 40-200ms gap
# - Pure ACK (no data)

The Telltale 40ms

On Linux systems, 40ms is the default delayed ACK timeout. If you see consistent ~40ms gaps before ACKs, delayed ACKs are likely the cause. Windows typically shows ~200ms gaps. These specific timings are diagnostic signatures.

Command-Line Diagnostics:

# Capture and analyze ACK timing
tcpdump -i eth0 -nn 'tcp and host 10.0.0.1' -w capture.pcap

# Analyze inter-packet times with tshark
tshark -r capture.pcap -T fields \
    -e frame.time_delta_displayed \
    -e tcp.len \
    -e tcp.flags.ack | \
    awk '$1 > 0.035 && $2 == 0 {print "Delayed ACK: " $1 "s"}'

# Count potential delayed ACKs
tshark -r capture.pcap -T fields -e frame.time_delta_displayed -e tcp.len | \
    awk '$1 > 0.035 && $1 < 0.25 && $2 == 0 {count++} END {print count}'

Application-Level Timing:

import time

# Measure round-trip time for request-response
start = time.time()
sock.send(request)
response = sock.recv(4096)
end = time.time()

rtt = (end - start) * 1000  # milliseconds

# If RTT is consistently 40-200ms higher than expected,
# Nagle/delayed ACK interaction is likely
print(f"RTT: {rtt:.1f}ms")

# Compare with Nagle disabled:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

Benchmark Script:

import socket
import time

def benchmark_rtt(host, port, with_nodelay=False):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    if with_nodelay:
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
    sock.connect((host, port))
    
    times = []
    for _ in range(100):
        start = time.time()
        sock.send(b"PING\n")
        sock.recv(1024)
        times.append((time.time() - start) * 1000)
    
    sock.close()
    return sum(times) / len(times)

rtt_nagle = benchmark_rtt("server", 8080, with_nodelay=False)
rtt_nodelay = benchmark_rtt("server", 8080, with_nodelay=True)

print(f"With Nagle:    {rtt_nagle:.1f}ms")
print(f"Without Nagle: {rtt_nodelay:.1f}ms")
print(f"Difference:    {rtt_nagle - rtt_nodelay:.1f}ms")

Modern Developments and Best Practices

TCP implementations continue to evolve, with several developments improving the delayed ACK situation.

Linux Improvements:

Dynamic ACK Timeout: Linux adjusts the delayed ACK timer based on observed RTT. Faster connections get shorter delays.
Quick ACK Mode: After connection establishment or idle periods, Linux sends immediate ACKs temporarily to help congestion control converge.
TCP_NOTSENT_LOWAT: Reduces buffering latency by controlling when the socket becomes writable.

QUIC Protocol:

QUIC (used by HTTP/3) learns from TCP's mistakes:

QUIC Improvements:
├── ACK frames are cheap (0-RTT overhead possible)
├── ACK frequency is adaptive and negotiable
├── No Nagle algorithm (application controls batching)
└── ACK timing can be explicitly coordinated

HTTP/2 and HTTP/3

Modern HTTP protocols largely avoid the Nagle/delayed ACK problem. HTTP/2 multiplexes streams over a single connection, naturally batching data. HTTP/3 (QUIC) reimplements reliability with explicit ACK handling. If you can use HTTP/2 or HTTP/3, many SWS-related issues disappear.

Best Practices Summary:

Production Recommendations

•Request-response protocols: Disable Nagle (TCP_NODELAY) on the client
•High-throughput servers: Consider TCP_QUICKACK if latency matters more than throughput
•Database connections: Most drivers set TCP_NODELAY; verify your specific driver
•Gaming/real-time: Always use TCP_NODELAY; consider UDP for lowest latency
•File transfers: Leave defaults; Nagle + delayed ACKs optimize throughput
•New applications: Consider QUIC/HTTP3 to avoid these legacy issues

Configuration Checklist:

□ Identify your traffic pattern (streaming, request-response, interactive)
□ Review library defaults (many already optimize for you)
□ Measure baseline latency with and without TCP_NODELAY
□ If using Linux, TCP_QUICKACK may help on the receiver side
□ For complex protocols, consider application-level batching
□ Monitor production for the 40-200ms latency signature
□ Document your socket options for future maintainers

Summary

Delayed ACKs are a valuable optimization that, when combined with Nagle's Algorithm, can cause unexpected latency issues.

Key Takeaways

•Delayed ACKs reduce network overhead — By ACKing every 2nd segment and allowing piggybacking, they can halve ACK traffic
•The delay is bounded — RFC requires ACK within 500ms; most implementations use 40-200ms
•The Nagle interaction is well-known — When Nagle waits for an ACK and delayed ACK waits to piggyback, both wait for up to 200ms
•TCP_NODELAY is the common fix — Disabling Nagle on request-response connections eliminates the interaction
•TCP_QUICKACK is Linux-specific — Must be reset after each read; provides receiver-side quick acknowledgment
•Diagnosis uses timing analysis — Look for 40-200ms gaps before pure ACK packets in captures

What's Next:

We've now covered all three components of SWS prevention: Nagle's Algorithm (sender), Clark's Algorithm (receiver), and Delayed ACKs (timing). The final page synthesizes these into a comprehensive Performance Impact analysis, quantifying the effects and providing production guidance.

Page Complete

You now understand delayed acknowledgments comprehensively—their purpose, RFC requirements, implementation, the Nagle interaction problem, solutions, and diagnostics. Next, we'll analyze the overall performance impact of Silly Window Syndrome and its solutions.