Loading content...
In 1984, John Nagle, working at Ford Aerospace and Communications Corporation, observed that the network connecting their systems was being clogged by what he termed 'tinygrams'—small packets that wasted bandwidth and degraded performance for everyone. His solution, documented in RFC 896, became one of the most elegant and widely-deployed algorithms in the TCP/IP stack.
Nagle's Algorithm addresses sender-induced Silly Window Syndrome by implementing a simple but powerful rule: if there is already unacknowledged data outstanding, buffer small writes until the pending data is acknowledged. This single heuristic dramatically improves network efficiency for bulk data transfers while preserving responsiveness for interactive applications.
By the end of this page, you will understand the mechanics of Nagle's Algorithm, its implementation in TCP stacks, how it interacts with other TCP features, its performance characteristics, and the specific scenarios where disabling it becomes necessary.
In the early 1980s, Ford Aerospace operated a network of Unix workstations connected by ARPANET links. Engineers began noticing severe congestion that couldn't be explained by the traffic volume alone. Investigation revealed that the network was saturated with tiny packets—often containing just a single character from terminal sessions.
The Original Problem Report (RFC 896):
John Nagle's RFC 896, titled 'Congestion Control in IP/TCP Internetworks,' described the situation:
'A simple telnet connection from a TIP to a host uses a packet for each character typed. A 1-character packet requires a 40-byte header, yielding an efficiency of 2%. Worse, the network is often asked to carry packets at least 40 times the size that would really be needed.'
The solution couldn't simply be 'send larger packets' because interactive applications genuinely need character-by-character responsiveness. The insight was to distinguish between:
Nagle coined the term 'tinygram' to describe these inefficient small packets. The name stuck in the networking community and is still used today. The core insight was that tinygrams weren't inherently bad—they were a symptom of inappropriate defaults. Interactive applications need tinygrams; bulk transfers don't.
The Key Insight:
Nagle observed that when an application is writing data faster than the network can deliver it, buffering makes sense. But when the network is keeping up with application writes (no outstanding unacknowledged data), there's no need to buffer—send immediately.
Scenario A: Network keeping up (interactive)
─────────────────────────────────────────────
App writes 'a' → TCP sends 'a' → ACK received
App writes 'b' → TCP sends 'b' → ACK received
App writes 'c' → TCP sends 'c' → ACK received
→ Responsive! Each character delivered immediately.
Scenario B: App writes faster than network (bulk)
─────────────────────────────────────────────────
App writes 'a' → TCP sends 'a' → (waiting for ACK)
App writes 'b' → TCP buffers 'b' (has outstanding data)
App writes 'c' → TCP buffers 'c' (still waiting)
App writes 'd' → TCP buffers 'd' (still waiting)
ACK for 'a' arrives
→ TCP sends 'bcd' in one segment
→ Efficient! Multiple bytes coalesced into one segment.
This distinction—using the presence of outstanding unacknowledged data as a proxy for network utilization—is the genius of Nagle's Algorithm.
Nagle's Algorithm can be stated concisely. This simplicity is part of its elegance—a few lines of logic provide enormous efficiency benefits.
The Original Specification (RFC 896):
Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged.
A More Precise Formulation:
When the application writes data to TCP:
IF (there is unacknowledged data outstanding)
AND (the amount to send is less than MSS)
THEN
buffer the data
wait for ACK (or until we accumulate MSS bytes)
ELSE
send the data immediately
Note the MSS (Maximum Segment Size) exception: if we have enough data to fill a maximum-sized segment, we send it regardless of outstanding ACKs. This ensures that bulk transfers aren't artificially slowed—we're only coalescing smaller-than-MSS writes.
Pseudocode Implementation:
def nagle_send_policy(segment_size, outstanding_data, mss):
"""
Determines whether to send data immediately or buffer.
Args:
segment_size: bytes of data ready to send
outstanding_data: bytes sent but not yet acknowledged
mss: Maximum Segment Size for this connection
Returns:
'SEND' or 'BUFFER'
"""
# Always send if we have a full segment
if segment_size >= mss:
return 'SEND'
# Send if no data is outstanding (interactive mode)
if outstanding_data == 0:
return 'SEND'
# Otherwise, buffer and wait for ACK
return 'BUFFER'
State Machine Representation:
┌──────────────────────┐
│ IDLE (no pending) │
└──────────────────────┘
│
Application writes data
│
▼
┌──────────────────────┐
│ outstanding == 0? │
└──────────────────────┘
/ \
Yes No
│ │
▼ ▼
┌─────────────┐ ┌─────────────────────┐
│ SEND NOW │ │ segment >= MSS? │
└─────────────┘ └─────────────────────┘
/ \
Yes No
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ SEND NOW │ │ BUFFER │
└─────────────┘ │ wait for ACK│
└─────────────┘
Let's trace through detailed examples to see Nagle's Algorithm in action.
Example 1: Interactive Terminal Session
The user types 'ls' with a 50ms gap between keystrokes. RTT is 20ms.
Time(ms) Event Outstanding Data Action
──────── ───── ──────────────── ──────
0 User types 'l' 0 bytes SEND 'l'
10 Segment 'l' reaches receiver
20 ACK received 0 bytes
50 User types 's' 0 bytes SEND 's'
60 Segment 's' reaches receiver
70 ACK received 0 bytes
100 User types '\n' 0 bytes SEND '\n'
110 Segment '\n' reaches receiver
120 ACK received 0 bytes
Result: All characters sent immediately! The human typing speed is slower than the RTT, so each keystroke finds no outstanding data and is sent immediately. Nagle's Algorithm preserves full interactivity.
This example demonstrates why Nagle's Algorithm doesn't hurt interactive applications. As long as the application's data rate is lower than the network's ACK rate, every write is sent immediately because there's no outstanding data when the write occurs.
Example 2: Bulk Data Transfer with Small Writes
An application writes data 10 bytes at a time (perhaps poorly buffered I/O). RTT is 100ms. MSS is 1460 bytes.
Time(ms) Event Outstanding Buffer Action
──────── ───── ─────────── ────── ──────
0 App writes 10B 0 0 SEND 10B (first write)
1 App writes 10B 10 10 BUFFER
2 App writes 10B 10 20 BUFFER
... (continues) 10 ... BUFFER
50 App writes 10B 10 500 BUFFER (50 writes)
100 ACK received 0 500 SEND 500B buffered data
101 App writes 10B 500 10 BUFFER
102 App writes 10B 500 20 BUFFER
... (continues) 500 ... BUFFER
145 Buffer reaches 1460B 500 1460 SEND (reached MSS)
...
Result: Instead of 146 tiny segments (146 × 50 bytes = 7,300 bytes on wire), we send:
Efficiency improvement: From 10/50 = 20% to nearly 97%.
| Metric | Without Nagle | With Nagle | Improvement |
|---|---|---|---|
| Segments for 1460B data | 146 | ~3-5 | ~30-50x fewer |
| Wire bytes for 1460B data | 7,300 | ~2,200 | ~3x reduction |
| Efficiency (1460B) | 20% | 66%+ | 3x+ improvement |
| ACKs generated | 146 | ~3-5 | ~30-50x fewer |
| CPU interrupts | 146 | ~3-5 | ~30-50x fewer |
Nagle's Algorithm is enabled by default in virtually all modern TCP implementations. Understanding how to configure it is essential for troubleshooting and optimization.
Checking and Configuring Nagle's Algorithm:
// ===== C/C++ (POSIX sockets) =====
#include <netinet/tcp.h>
int sock = socket(AF_INET, SOCK_STREAM, 0);
// Disable Nagle's Algorithm (enable TCP_NODELAY)
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));
// Re-enable Nagle's Algorithm
int flag = 0;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));
// Check current setting
int flag;
socklen_t len = sizeof(flag);
getsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, &len);
printf("TCP_NODELAY: %s\n", flag ? "enabled" : "disabled");
The socket option is named TCP_NODELAY, not TCP_NAGLE. Setting TCP_NODELAY=1 DISABLES Nagle's Algorithm (no delay). This naming is counterintuitive but historically entrenched. Remember: NODELAY=1 means 'send immediately, don't apply Nagle buffering.'
Implementation Across Languages:
# ===== Python =====
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Disable Nagle (enable TCP_NODELAY)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
# Re-enable Nagle
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 0)
// ===== Java =====
Socket socket = new Socket(host, port);
// Disable Nagle's Algorithm
socket.setTcpNoDelay(true);
// Check current setting
boolean noDelay = socket.getTcpNoDelay();
// ===== Node.js =====
const net = require('net');
const socket = net.createConnection({ port: 8080 });
// Disable Nagle's Algorithm
socket.setNoDelay(true);
// ===== Go =====
import "net"
conn, _ := net.Dial("tcp", "example.com:80")
tcpConn := conn.(*net.TCPConn)
// Disable Nagle's Algorithm
tcpConn.SetNoDelay(true)
System-Wide Settings (Linux):
# View current system default
cat /proc/sys/net/ipv4/tcp_low_latency
# Note: There's no system-wide Nagle disable.
# TCP_NODELAY must be set per-socket by applications.
# The tcp_low_latency setting affects queue behavior, not Nagle.
One of the most notorious TCP performance issues arises from the interaction between Nagle's Algorithm (sender-side) and Delayed ACKs (receiver-side). Each mechanism is beneficial on its own, but together they can create unexpected latency.
Delayed ACK Overview:
Delayed ACKs (RFC 1122) allow receivers to wait up to 200-500ms before sending an ACK, hoping to:
The Problematic Scenario: Request-Response Protocols
Consider a simple request-response protocol:
Time Client Server Issue
──── ────── ────── ─────
0ms Send request (200B)
Receives request
Starts Delayed ACK timer
(waiting to piggyback)
10ms ACK needed to send next! Processing...
(Nagle waiting for ACK)
Response ready!
200ms Delayed ACK timer fires
Send ACK + Response
200ms ACK received!
Can send next request
The 200ms Penalty:
Each request-response cycle incurs up to 200ms of artificial latency. The client's Nagle is waiting for an ACK; the server's delayed ACK is waiting to piggyback. Neither knows the other is waiting.
This is sometimes called a 'pseudo-deadlock'—both sides are waiting on the other. Nagle's Algorithm on the client waits for an ACK before sending more data. Delayed ACK on the server waits for outgoing data to piggyback. The deadlock breaks only when the delayed ACK timer fires (up to 200ms).
Solutions to the Nagle/Delayed ACK Problem:
Option 1: Disable Nagle (TCP_NODELAY) on client
───────────────────────────────────────────────
+ Eliminates the wait-for-ACK blocking
+ Immediate sends regardless of outstanding data
- May reduce efficiency for bulk transfers
- Must be done per-socket by application
Option 2: Use TCP_CORK (Linux) or TCP_NOPUSH (BSD)
──────────────────────────────────────────────────
+ Explicitly cork the socket, write multiple pieces, then uncork
+ Combines benefits: batching when you want, immediate when you don't
- More complex application logic required
- Platform-specific
Option 3: Use larger writes (application-level batching)
────────────────────────────────────────────────────────
+ Write entire request in one call (triggers immediate send)
+ No socket options needed
- Requires application restructuring
- May not always be possible
Modern Best Practice:
For request-response protocols (HTTP clients, database drivers, RPC systems), most libraries disable Nagle by default:
# Common libraries with Nagle disabled by default:
- Most HTTP client libraries
- Redis client libraries
- gRPC
- Most game networking libraries
| Protocol Pattern | With Nagle | Without Nagle | Recommendation |
|---|---|---|---|
| Streaming (one direction) | ✓ Good | = OK | Keep Nagle enabled |
| Request-response (small) | ✗ 200ms penalty | ✓ Good | Disable Nagle |
| Interactive (typing) | ✓ Good | ✓ Good | Either works |
| Bulk file transfer | ✓ Good | ✓ Good | Keep Nagle enabled |
| Real-time gaming | ✗ Latency issues | ✓ Required | Disable Nagle |
While Nagle's Algorithm is beneficial by default, certain applications require its disabling for optimal performance.
Applications Requiring TCP_NODELAY:
Decision Framework:
┌─────────────────────────────────────────┐
│ What is your traffic pattern? │
└─────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
Streaming (bulk) Request-Response Interactive
│ │ │
▼ ▼ ▼
Keep Nagle ON Disable Nagle Depends on RTT
│ │ │
│ │ RTT < human speed?
│ │ / \
│ │ Yes No
│ │ │ │
▼ ▼ ▼ ▼
High efficiency Low latency Keep ON Disable
Measuring the Impact:
# Testing without Nagle (TCP_NODELAY enabled)
for i in {1..100}; do
curl -w "%{time_total}\n" -o /dev/null -s http://example.com/api
done | awk '{sum+=$1} END {print "Avg: " sum/NR " seconds"}'
# Compare with Nagle enabled (application-specific)
# Look for ~200ms differences in request-response patterns
On Linux, TCP_CORK provides finer control. When set, TCP buffers all writes until either TCP_CORK is cleared or the buffer reaches MSS. This lets you batch writes explicitly: set TCP_CORK, write headers and body, clear TCP_CORK. The result is optimal segment sizing without losing Nagle's benefits elsewhere.
Modern TCP stacks provide additional options beyond the binary Nagle on/off choice. TCP_CORK (Linux) and TCP_NOPUSH (BSD/macOS) give applications explicit control over segment timing.
TCP_CORK (Linux):
When TCP_CORK is set, TCP accumulates data but does not send until:
// Example: HTTP server sending response
int cork = 1;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));
// Write HTTP headers (50 bytes)
write(sock, headers, 50);
// Write HTTP body (2000 bytes)
write(sock, body, 2000);
// Uncork - sends everything in optimal segments
cork = 0;
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &cork, sizeof(cork));
Result: One or two properly-sized segments instead of many small ones.
| Option | Behavior | Best For | Drawback |
|---|---|---|---|
| Nagle (default) | Buffer small writes if ACK pending | General use | Latency with delayed ACK |
| TCP_NODELAY | Send immediately always | Request-response patterns | Many small segments |
| TCP_CORK | Buffer until uncork or MSS | Known multi-write messages | Requires explicit uncork |
| TCP_CORK + TCP_NODELAY | Undefined; don't combine | N/A | Behavior is inconsistent |
TCP_NOPUSH (BSD, macOS):
BSD systems provide TCP_NOPUSH, which is similar to TCP_CORK but with subtle differences:
// BSD/macOS
int nopush = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NOPUSH, &nopush, sizeof(nopush));
// Write multiple pieces...
write(sock, part1, len1);
write(sock, part2, len2);
// Clear to flush
nopush = 0;
setsockopt(sock, IPPROTO_TCP, TCP_NOPUSH, &nopush, sizeof(nopush));
Key Differences:
| Aspect | TCP_CORK (Linux) | TCP_NOPUSH (BSD) |
|---|---|---|
| Auto-send timeout | 200ms | No timeout |
| Interaction with close() | Flushes | Flushes |
| Availability | Linux only | BSD, macOS, iOS |
Platform-Portable Code:
#ifdef TCP_CORK
setsockopt(sock, IPPROTO_TCP, TCP_CORK, &flag, sizeof(flag));
#elif defined(TCP_NOPUSH)
setsockopt(sock, IPPROTO_TCP, TCP_NOPUSH, &flag, sizeof(flag));
#else
// Fallback: just use TCP_NODELAY or buffer in application
#endif
For optimal network I/O, combine TCP_CORK with scatter-gather I/O (writev()) or zero-copy (sendfile()). This allows sending headers and file content in optimally-sized segments without extra memory copies. Many high-performance web servers (nginx, Apache) use this pattern.
Nagle's Algorithm remains one of the most impactful optimizations in TCP's history—a simple heuristic that dramatically improved network efficiency without breaking existing applications.
What's Next:
Nagle's Algorithm addresses sender-induced SWS. But what about receiver-induced SWS, where the receiver advertises tiny windows? The next page covers Clark's Algorithm, which prevents the receiver from advertising small window openings, complementing Nagle's sender-side solution.
You now understand Nagle's Algorithm in depth—its mechanics, implementation, interaction with delayed ACKs, and when to disable it. Next, we'll examine Clark's Algorithm, the receiver-side complement that prevents small window advertisements.