Silly Window Syndrome - Learning Module

Loading content...

0/228

Sender-Side Solution: Nagle's Algorithm

The Elegance of Nagle's Algorithm

In 1984, John Nagle, working at Ford Aerospace and Communications Corporation, observed that the network connecting their systems was being clogged by what he termed 'tinygrams'—small packets that wasted bandwidth and degraded performance for everyone. His solution, documented in RFC 896, became one of the most elegant and widely-deployed algorithms in the TCP/IP stack.

Nagle's Algorithm addresses sender-induced Silly Window Syndrome by implementing a simple but powerful rule: if there is already unacknowledged data outstanding, buffer small writes until the pending data is acknowledged. This single heuristic dramatically improves network efficiency for bulk data transfers while preserving responsiveness for interactive applications.

What You Will Learn

By the end of this page, you will understand the mechanics of Nagle's Algorithm, its implementation in TCP stacks, how it interacts with other TCP features, its performance characteristics, and the specific scenarios where disabling it becomes necessary.

Historical Context and Motivation

In the early 1980s, Ford Aerospace operated a network of Unix workstations connected by ARPANET links. Engineers began noticing severe congestion that couldn't be explained by the traffic volume alone. Investigation revealed that the network was saturated with tiny packets—often containing just a single character from terminal sessions.

The Original Problem Report (RFC 896):

John Nagle's RFC 896, titled 'Congestion Control in IP/TCP Internetworks,' described the situation:

'A simple telnet connection from a TIP to a host uses a packet for each character typed. A 1-character packet requires a 40-byte header, yielding an efficiency of 2%. Worse, the network is often asked to carry packets at least 40 times the size that would really be needed.'

The solution couldn't simply be 'send larger packets' because interactive applications genuinely need character-by-character responsiveness. The insight was to distinguish between:

Interactive mode: Single character, no outstanding data → send immediately
Bulk mode: Data arriving while previous data is unacknowledged → buffer until ACK arrives

The Tinygram Problem

Nagle coined the term 'tinygram' to describe these inefficient small packets. The name stuck in the networking community and is still used today. The core insight was that tinygrams weren't inherently bad—they were a symptom of inappropriate defaults. Interactive applications need tinygrams; bulk transfers don't.

The Key Insight:

Nagle observed that when an application is writing data faster than the network can deliver it, buffering makes sense. But when the network is keeping up with application writes (no outstanding unacknowledged data), there's no need to buffer—send immediately.

Scenario A: Network keeping up (interactive)
─────────────────────────────────────────────
App writes 'a' → TCP sends 'a' → ACK received
App writes 'b' → TCP sends 'b' → ACK received
App writes 'c' → TCP sends 'c' → ACK received

→ Responsive! Each character delivered immediately.


Scenario B: App writes faster than network (bulk)
─────────────────────────────────────────────────
App writes 'a' → TCP sends 'a' → (waiting for ACK)
App writes 'b' → TCP buffers 'b' (has outstanding data)
App writes 'c' → TCP buffers 'c' (still waiting)
App writes 'd' → TCP buffers 'd' (still waiting)
             ACK for 'a' arrives
             → TCP sends 'bcd' in one segment

→ Efficient! Multiple bytes coalesced into one segment.

This distinction—using the presence of outstanding unacknowledged data as a proxy for network utilization—is the genius of Nagle's Algorithm.

The Algorithm Specification

Nagle's Algorithm can be stated concisely. This simplicity is part of its elegance—a few lines of logic provide enormous efficiency benefits.

The Original Specification (RFC 896):

Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged.

A More Precise Formulation:

When the application writes data to TCP:
    IF (there is unacknowledged data outstanding)
        AND (the amount to send is less than MSS)
    THEN
        buffer the data
        wait for ACK (or until we accumulate MSS bytes)
    ELSE
        send the data immediately

The MSS Exception

Note the MSS (Maximum Segment Size) exception: if we have enough data to fill a maximum-sized segment, we send it regardless of outstanding ACKs. This ensures that bulk transfers aren't artificially slowed—we're only coalescing smaller-than-MSS writes.

Pseudocode Implementation:

def nagle_send_policy(segment_size, outstanding_data, mss):
    """
    Determines whether to send data immediately or buffer.
    
    Args:
        segment_size: bytes of data ready to send
        outstanding_data: bytes sent but not yet acknowledged
        mss: Maximum Segment Size for this connection
    
    Returns:
        'SEND' or 'BUFFER'
    """
    # Always send if we have a full segment
    if segment_size >= mss:
        return 'SEND'
    
    # Send if no data is outstanding (interactive mode)
    if outstanding_data == 0:
        return 'SEND'
    
    # Otherwise, buffer and wait for ACK
    return 'BUFFER'

State Machine Representation:

                    ┌──────────────────────┐
                    │   IDLE (no pending)  │
                    └──────────────────────┘
                              │
                    Application writes data
                              │
                              ▼
                    ┌──────────────────────┐
                    │ outstanding == 0?    │
                    └──────────────────────┘
                          /         \
                       Yes            No
                        │              │
                        ▼              ▼
              ┌─────────────┐  ┌─────────────────────┐
              │ SEND NOW    │  │ segment >= MSS?     │
              └─────────────┘  └─────────────────────┘
                                    /         \
                                 Yes            No
                                  │              │
                                  ▼              ▼
                        ┌─────────────┐  ┌─────────────┐
                        │ SEND NOW    │  │ BUFFER      │
                        └─────────────┘  │ wait for ACK│
                                         └─────────────┘

Detailed Mechanics and Examples

Let's trace through detailed examples to see Nagle's Algorithm in action.

Example 1: Interactive Terminal Session

The user types 'ls' with a 50ms gap between keystrokes. RTT is 20ms.

Time(ms)  Event                      Outstanding Data    Action
────────  ─────                      ────────────────    ──────
0         User types 'l'             0 bytes             SEND 'l'
10        Segment 'l' reaches receiver
20        ACK received               0 bytes
50        User types 's'             0 bytes             SEND 's'
60        Segment 's' reaches receiver
70        ACK received               0 bytes
100       User types '\n'            0 bytes             SEND '\n'
110       Segment '\n' reaches receiver
120       ACK received               0 bytes

Result: All characters sent immediately! The human typing speed is slower than the RTT, so each keystroke finds no outstanding data and is sent immediately. Nagle's Algorithm preserves full interactivity.

Interactive Performance Preserved

This example demonstrates why Nagle's Algorithm doesn't hurt interactive applications. As long as the application's data rate is lower than the network's ACK rate, every write is sent immediately because there's no outstanding data when the write occurs.

Example 2: Bulk Data Transfer with Small Writes

An application writes data 10 bytes at a time (perhaps poorly buffered I/O). RTT is 100ms. MSS is 1460 bytes.

Time(ms)  Event                      Outstanding    Buffer    Action
────────  ─────                      ───────────    ──────    ──────
0         App writes 10B             0              0         SEND 10B (first write)
1         App writes 10B             10             10        BUFFER
2         App writes 10B             10             20        BUFFER
...       (continues)                 10            ...       BUFFER
50        App writes 10B             10             500       BUFFER (50 writes)
100       ACK received               0              500       SEND 500B buffered data
101       App writes 10B             500            10        BUFFER
102       App writes 10B             500            20        BUFFER
...       (continues)                500            ...       BUFFER
145       Buffer reaches 1460B       500            1460      SEND (reached MSS)
...

Result: Instead of 146 tiny segments (146 × 50 bytes = 7,300 bytes on wire), we send:

Initial 10-byte segment: 50 bytes
500-byte segment: 540 bytes
1460-byte segment: 1500 bytes
Repeat pattern...

Efficiency improvement: From 10/50 = 20% to nearly 97%.

Nagle's Algorithm: Before and After Comparison
Metric	Without Nagle	With Nagle	Improvement
Segments for 1460B data	146	~3-5	~30-50x fewer
Wire bytes for 1460B data	7,300	~2,200	~3x reduction
Efficiency (1460B)	20%	66%+	3x+ improvement
ACKs generated	146	~3-5	~30-50x fewer
CPU interrupts	146	~3-5	~30-50x fewer

Implementation in TCP Stacks

Nagle's Algorithm is enabled by default in virtually all modern TCP implementations. Understanding how to configure it is essential for troubleshooting and optimization.

Checking and Configuring Nagle's Algorithm:

// ===== C/C++ (POSIX sockets) =====
#include <netinet/tcp.h>

int sock = socket(AF_INET, SOCK_STREAM, 0);

// Disable Nagle's Algorithm (enable TCP_NODELAY)
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));

// Re-enable Nagle's Algorithm
int flag = 0;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));

// Check current setting
int flag;
socklen_t len = sizeof(flag);
getsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, &len);
printf("TCP_NODELAY: %s\n", flag ? "enabled" : "disabled");

Naming Confusion: TCP_NODELAY

The socket option is named TCP_NODELAY, not TCP_NAGLE. Setting TCP_NODELAY=1 DISABLES Nagle's Algorithm (no delay). This naming is counterintuitive but historically entrenched. Remember: NODELAY=1 means 'send immediately, don't apply Nagle buffering.'

Implementation Across Languages:

# ===== Python =====
import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Disable Nagle (enable TCP_NODELAY)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

# Re-enable Nagle
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 0)

// ===== Java =====
Socket socket = new Socket(host, port);

// Disable Nagle's Algorithm
socket.setTcpNoDelay(true);

// Check current setting
boolean noDelay = socket.getTcpNoDelay();

// ===== Node.js =====
const net = require('net');

const socket = net.createConnection({ port: 8080 });

// Disable Nagle's Algorithm
socket.setNoDelay(true);

// ===== Go =====
import "net"

conn, _ := net.Dial("tcp", "example.com:80")
tcpConn := conn.(*net.TCPConn)

// Disable Nagle's Algorithm
tcpConn.SetNoDelay(true)

System-Wide Settings (Linux):

# View current system default
cat /proc/sys/net/ipv4/tcp_low_latency

# Note: There's no system-wide Nagle disable.
# TCP_NODELAY must be set per-socket by applications.
# The tcp_low_latency setting affects queue behavior, not Nagle.

The Nagle/Delayed ACK Interaction Problem

One of the most notorious TCP performance issues arises from the interaction between Nagle's Algorithm (sender-side) and Delayed ACKs (receiver-side). Each mechanism is beneficial on its own, but together they can create unexpected latency.

Delayed ACK Overview:

Delayed ACKs (RFC 1122) allow receivers to wait up to 200-500ms before sending an ACK, hoping to:

Combine the ACK with outgoing data (piggybacking)
ACK multiple segments with a single ACK

The Problematic Scenario: Request-Response Protocols

Consider a simple request-response protocol:

Client sends a small request (< MSS)
Server processes and sends a response
Client needs the response before sending the next request

Time      Client                      Server                    Issue
────      ──────                      ──────                    ─────
0ms       Send request (200B)         
                                      Receives request
                                      Starts Delayed ACK timer
                                      (waiting to piggyback)
10ms      ACK needed to send next!    Processing...
          (Nagle waiting for ACK)     

                                      Response ready!
200ms                                 Delayed ACK timer fires
                                      Send ACK + Response
200ms     ACK received!
          Can send next request

The 200ms Penalty:

Each request-response cycle incurs up to 200ms of artificial latency. The client's Nagle is waiting for an ACK; the server's delayed ACK is waiting to piggyback. Neither knows the other is waiting.

The Classic Deadlock

This is sometimes called a 'pseudo-deadlock'—both sides are waiting on the other. Nagle's Algorithm on the client waits for an ACK before sending more data. Delayed ACK on the server waits for outgoing data to piggyback. The deadlock breaks only when the delayed ACK timer fires (up to 200ms).

Solutions to the Nagle/Delayed ACK Problem:

Option 1: Disable Nagle (TCP_NODELAY) on client
───────────────────────────────────────────────
+ Eliminates the wait-for-ACK blocking
+ Immediate sends regardless of outstanding data
- May reduce efficiency for bulk transfers
- Must be done per-socket by application

Option 2: Use TCP_CORK (Linux) or TCP_NOPUSH (BSD)
──────────────────────────────────────────────────
+ Explicitly cork the socket, write multiple pieces, then uncork
+ Combines benefits: batching when you want, immediate when you don't
- More complex application logic required
- Platform-specific

Option 3: Use larger writes (application-level batching)
────────────────────────────────────────────────────────
+ Write entire request in one call (triggers immediate send)
+ No socket options needed
- Requires application restructuring
- May not always be possible

Modern Best Practice:

For request-response protocols (HTTP clients, database drivers, RPC systems), most libraries disable Nagle by default:

# Common libraries with Nagle disabled by default:
- Most HTTP client libraries
- Redis client libraries
- gRPC
- Most game networking libraries

Nagle/Delayed ACK Interaction by Protocol Pattern
Protocol Pattern	With Nagle	Without Nagle	Recommendation
Streaming (one direction)	✓ Good	= OK	Keep Nagle enabled
Request-response (small)	✗ 200ms penalty	✓ Good	Disable Nagle
Interactive (typing)	✓ Good	✓ Good	Either works
Bulk file transfer	✓ Good	✓ Good	Keep Nagle enabled
Real-time gaming	✗ Latency issues	✓ Required	Disable Nagle

When to Disable Nagle's Algorithm

While Nagle's Algorithm is beneficial by default, certain applications require its disabling for optimal performance.

Applications Requiring TCP_NODELAY:

Latency-Sensitive Applications

•Real-time gaming — Player input must reach servers with minimal delay; 200ms penalty is unacceptable in competitive games
•Financial trading systems — Milliseconds matter; any buffering delay can mean lost arbitrage opportunities worth millions
•Remote desktop/VNC — Mouse movements and keystrokes need immediate transmission for responsive feel
•Interactive shells (SSH) — Character-by-character transmission with immediate echo is expected
•Live video conferencing — Control messages for video streams require low latency
•Database queries — Request-response patterns suffer from Nagle/delayed ACK interaction

Decision Framework:

                    ┌─────────────────────────────────────────┐
                    │        What is your traffic pattern?    │
                    └─────────────────────────────────────────┘
                                         │
                    ┌────────────────────┼────────────────────┐
                    │                    │                    │
            Streaming (bulk)    Request-Response      Interactive
                    │                    │                    │
                    ▼                    ▼                    ▼
            Keep Nagle ON      Disable Nagle        Depends on RTT
                    │                    │                    │
                    │                    │         RTT < human speed?
                    │                    │            /         \
                    │                    │         Yes           No
                    │                    │          │            │
                    ▼                    ▼          ▼            ▼
             High efficiency      Low latency    Keep ON    Disable

Measuring the Impact:

# Testing without Nagle (TCP_NODELAY enabled)
for i in {1..100}; do
    curl -w "%{time_total}\n" -o /dev/null -s http://example.com/api
done | awk '{sum+=$1} END {print "Avg: " sum/NR " seconds"}'

# Compare with Nagle enabled (application-specific)
# Look for ~200ms differences in request-response patterns

The TCP_CORK Alternative

On Linux, TCP_CORK provides finer control. When set, TCP buffers all writes until either TCP_CORK is cleared or the buffer reaches MSS. This lets you batch writes explicitly: set TCP_CORK, write headers and body, clear TCP_CORK. The result is optimal segment sizing without losing Nagle's benefits elsewhere.

TCP_CORK and TCP_NOPUSH: Explicit Control

Modern TCP stacks provide additional options beyond the binary Nagle on/off choice. TCP_CORK (Linux) and TCP_NOPUSH (BSD/macOS) give applications explicit control over segment timing.

TCP_CORK (Linux):

When TCP_CORK is set, TCP accumulates data but does not send until:

TCP_CORK is cleared
The buffer accumulates MSS bytes
200ms timeout expires

// Example: HTTP server sending response
int cork = 1;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));

// Write HTTP headers (50 bytes)
write(sock, headers, 50);

// Write HTTP body (2000 bytes)  
write(sock, body, 2000);

// Uncork - sends everything in optimal segments
cork = 0;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));

Result: One or two properly-sized segments instead of many small ones.

Comparison: Nagle vs TCP_CORK vs TCP_NODELAY
Option	Behavior	Best For	Drawback
Nagle (default)	Buffer small writes if ACK pending	General use	Latency with delayed ACK
TCP_NODELAY	Send immediately always	Request-response patterns	Many small segments
TCP_CORK	Buffer until uncork or MSS	Known multi-write messages	Requires explicit uncork
TCP_CORK + TCP_NODELAY	Undefined; don't combine	N/A	Behavior is inconsistent

TCP_NOPUSH (BSD, macOS):

BSD systems provide TCP_NOPUSH, which is similar to TCP_CORK but with subtle differences:

// BSD/macOS
int nopush = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NOPUSH, &nopush, sizeof(nopush));

// Write multiple pieces...
write(sock, part1, len1);
write(sock, part2, len2);

// Clear to flush
nopush = 0;
setsockopt(sock, IPPROTO_TCP, TCP_NOPUSH, &nopush, sizeof(nopush));

Key Differences:

Aspect	TCP_CORK (Linux)	TCP_NOPUSH (BSD)
Auto-send timeout	200ms	No timeout
Interaction with close()	Flushes	Flushes
Availability	Linux only	BSD, macOS, iOS

Platform-Portable Code:

#ifdef TCP_CORK
    setsockopt(sock, IPPROTO_TCP, TCP_CORK, &flag, sizeof(flag));
#elif defined(TCP_NOPUSH)
    setsockopt(sock, IPPROTO_TCP, TCP_NOPUSH, &flag, sizeof(flag));
#else
    // Fallback: just use TCP_NODELAY or buffer in application
#endif

Practical Application: sendfile() and Writev()

For optimal network I/O, combine TCP_CORK with scatter-gather I/O (writev()) or zero-copy (sendfile()). This allows sending headers and file content in optimally-sized segments without extra memory copies. Many high-performance web servers (nginx, Apache) use this pattern.

Summary

Nagle's Algorithm remains one of the most impactful optimizations in TCP's history—a simple heuristic that dramatically improved network efficiency without breaking existing applications.

Key Takeaways

•The core rule is simple — Don't send small segments while prior data is unacknowledged; buffer until ACK arrives or we accumulate MSS bytes
•Interactive traffic is preserved — When no data is outstanding, Nagle sends immediately, maintaining responsiveness
•Bulk transfers become efficient — Many small application writes coalesce into fewer, larger segments
•Delayed ACK interaction causes latency — The combination can add up to 200ms per request-response cycle
•TCP_NODELAY disables Nagle — Set on the socket when latency matters more than efficiency
•TCP_CORK/TCP_NOPUSH provide explicit control — Use when you know message boundaries and want optimal segmentation

What's Next:

Nagle's Algorithm addresses sender-induced SWS. But what about receiver-induced SWS, where the receiver advertises tiny windows? The next page covers Clark's Algorithm, which prevents the receiver from advertising small window openings, complementing Nagle's sender-side solution.

Page Complete

You now understand Nagle's Algorithm in depth—its mechanics, implementation, interaction with delayed ACKs, and when to disable it. Next, we'll examine Clark's Algorithm, the receiver-side complement that prevents small window advertisements.