Loading content...
Reliable transport requires more than just numbering bytes and acknowledging receipt—it requires flow control: a mechanism for the receiver to pace the sender according to its processing capacity. The 16-bit window size field in the TCP header is the primary instrument of this control.
Without flow control, a fast sender could overwhelm a slow receiver, causing buffer overflow, data loss, and degraded performance. The window size field allows the receiver to advertise how much data it can accept, effectively giving it veto power over the sender's transmission rate.
This deceptively simple 16-bit integer—backed by sophisticated algorithms and modern extensions—enables TCP to function efficiently across links ranging from dial-up modems to 100 Gbps datacenter connections.
By the end of this page, you will understand the window field's position and semantics in the TCP header, how receivers calculate and advertise window size, the sliding window mechanism for efficient data transfer, the window scaling option for high-bandwidth networks, zero window situations and the persist timer, and the distinction between flow control and congestion control.
The window size field occupies bytes 14–15 of the TCP header (bits 112–127), immediately following the flag bits. Like all multi-byte fields in TCP, it's transmitted in network byte order (big-endian).
| Byte Position | Bit Range | Field | Size |
|---|---|---|---|
| Bytes 0–1 | 0–15 | Source Port | 16 bits |
| Bytes 2–3 | 16–31 | Destination Port | 16 bits |
| Bytes 4–7 | 32–63 | Sequence Number | 32 bits |
| Bytes 8–11 | 64–95 | Acknowledgment Number | 32 bits |
| Byte 12 | 96–99 | Data Offset + Reserved | 4+3 bits |
| Byte 13 | 103–111 | Flags (NS through FIN) | 9 bits |
| Bytes 14–15 | 112–127 | Window Size | 16 bits |
| Bytes 16–17 | 128–143 | Checksum | 16 bits |
| Bytes 18–19 | 144–159 | Urgent Pointer | 16 bits |
The 16-Bit Limitation:
With 16 bits, the window size can express values from 0 to 65,535 bytes (64 KB - 1 byte). This seemed generous in 1981 when TCP was designed, but it became a significant bottleneck as networks evolved:
| Link Type | Window Needed (BDP) | 16-bit Max | Limitation |
|---|---|---|---|
| 10 Mbps, 10ms RTT | 12.5 KB | 64 KB | No issue |
| 100 Mbps, 100ms RTT | 1.25 MB | 64 KB | 19x too small |
| 1 Gbps, 100ms RTT | 12.5 MB | 64 KB | 195x too small |
| 10 Gbps, 100ms RTT | 125 MB | 64 KB | 1953x too small |
The solution—window scaling (RFC 7323)—multiplies the 16-bit value by a power of 2, extending effective maximum to ~1 GB.
BDP = Bandwidth × Round-Trip Time. It represents the amount of data 'in flight' when the pipe is full. For maximum throughput, the window must be at least as large as the BDP. A 100 Mbps link with 100ms RTT needs a 1.25 MB window—far exceeding the basic 64 KB limit.
The window size represents the number of bytes the receiver is willing to accept, starting from the acknowledgment number. It's a dynamic advertisement that changes with each segment based on available buffer space.
Precise Definition:
Receive Window = Buffer Size - (Data Received but Not Read by Application)
= Buffer Size - (RCV.NXT - RCV.USER)
Where:
Sender's Interpretation:
When the sender receives an ACK with:
It calculates:
Send Window = Ack + Window = 1000 + 4000 = 5000
Allowed to send: Bytes 1000 through 4999
The sender must not transmit beyond sequence number 4999 until a new window advertisement arrives.
Flow control (window size) prevents overwhelming the RECEIVER. Congestion control (CWND) prevents overwhelming the NETWORK. The sender respects whichever is smaller. A receiver with a 1 MB buffer on a congested network with CWND=10 KB will only receive data at the CWND rate.
TCP's sliding window is the conceptual framework for understanding flow control. The window "slides" along the sequence number space as data is sent, acknowledged, and the application consumes it.
Sender's Window:
The sender maintains a window divided into four regions:
|----------|----------|----------|----------|
| Sent | Sent | Usable | Not Yet |
| Acked | Not Ackd | Window | Allowed |
|----------|----------|----------|----------|
<---- SND.UNA ---->
<---- SND.NXT ---->
<---- SND.UNA + SND.WND ---->
How the Window Slides:
Example Sequence:
Initial: ACK=1000, Window=4000 → Allowed: 1000-4999
Step 1: Sender transmits 1000 bytes (1000-1999)
In flight: 1000 bytes, Usable: 3000 bytes
Step 2: Receive ACK=2000, Window=4000 → Allowed: 2000-5999
Window slid right by 1000 bytes
Step 3: Application reads data at receiver
Receive ACK=2000, Window=5000 → Allowed: 2000-6999
Window expanded by 1000 bytes (more buffer available)
| Event | Left Edge (SND.UNA) | Right Edge (SND.UNA+WND) | Effect |
|---|---|---|---|
| ACK received (data confirmed) | Moves right | Moves right | Window slides forward |
| Data sent | No change | No change | Usable window decreases |
| Window advertisement increase | No change | Moves right | Window expands |
| Window advertisement decrease | No change | Moves left | Window shrinks (rare) |
RFC 793 discourages shrinking the right edge of the window (reducing window while left edge is fixed). Data already sent but now 'outside' the new window creates ambiguity. Modern TCPs avoid window shrinking, but it can occur in buggy implementations.
The 16-bit window field's 64 KB maximum is insufficient for high-bandwidth, high-latency networks ("long fat networks" or LFNs). Window Scaling (RFC 7323, originally RFC 1323) extends the effective window by using a negotiated scale factor.
How Window Scaling Works:
Maximum Window with Scaling:
Maximum Scale Factor: 14
Maximum Multiplier: 2^14 = 16,384
Maximum Window: 65,535 × 16,384 ≈ 1.07 GB
| Scale Factor | Multiplier | Maximum Window | Suitable For |
|---|---|---|---|
| 0 | 1 | 64 KB | Low-speed links |
| 2 | 4 | 256 KB | Moderate connections |
| 4 | 16 | 1 MB | Fast Ethernet, moderate RTT |
| 7 | 128 | 8 MB | Gigabit, low latency |
| 10 | 1,024 | 64 MB | 10 Gbps datacenter |
| 14 | 16,384 | ~1 GB | Maximum (extreme conditions) |
Window Scale Option Format:
Kind: 3 (Window Scale)
Length: 3 bytes
Shift: Scale factor (0-14)
SYN segment options:
[Kind=3][Length=3][Shift Count=7]
Meaning: Multiply my window advertisements by 128 (2^7)
Negotiation Rules:
123456789101112131415161718192021222324252627
import socket # Create socket with custom receive buffer (implies window scale)sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Set receive buffer to 1 MB (will enable window scaling)sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1_000_000) # Get actual buffer size (kernel may double it for overhead)actual_buffer = sock.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF)print(f"Receive buffer: {actual_buffer} bytes") # Linux auto-tuning (net.ipv4.tcp_rmem = min default max)# Example: 4096 131072 6291456# This allows automatic window scaling up to 6 MB # View current TCP window scaling settings# cat /proc/sys/net/ipv4/tcp_window_scaling# 1 = enabled (default on modern systems) # Wireshark shows calculated window:# "Window size value: 8192"# "Calculated window size: 1048576" (8192 * 128, scale=7) def calculate_window(advertised_window: int, scale_factor: int) -> int: """Calculate actual window size from header value and scale.""" return advertised_window << scale_factor # Left shift = multiply by 2^scaleIn Wireshark, if you don't capture the SYN handshake, you won't know the scale factor. The window values will appear small. Always capture from connection start when analyzing window behavior, or use Wireshark's 'Preferences → Protocols → TCP → Relative sequence numbers' settings.
When the receiver's buffer fills completely—perhaps because the application isn't reading data fast enough—it advertises a zero window: Window = 0. This tells the sender to stop transmitting entirely.
Zero Window Situation:
Receiver state:
Buffer size: 4096 bytes
Unread data: 4096 bytes
Available: 0 bytes
Receiver sends: ACK=5000, Window=0
"I acknowledge byte 4999, but STOP! I have no room."
Sender must pause all data transmission.
The Zero Window Deadlock Problem:
If the receiver's buffer clears but the window update ACK is lost, deadlock occurs:
The Persist Timer Solution:
To prevent this deadlock, TCP implements the persist timer:
Constantly advertising tiny windows (e.g., 10 bytes) wastes bandwidth on header overhead. Silly Window Syndrome (SWS) avoidance algorithms ensure receivers don't advertise small windows and senders don't transmit tiny segments. Clark's algorithm (receiver) and Nagle's algorithm (sender) address this.
The receiver dynamically calculates the window to advertise based on its current buffer state. This calculation must be performed carefully to prevent overflow and optimize throughput.
Basic Calculation:
Available Window = Receive Buffer Size - Data Pending
= SO_RCVBUF - (RCV.NXT - RCV.USER)
Where:
SO_RCVBUF = Allocated receive buffer
RCV.NXT = Next expected sequence number
RCV.USER = Last byte read by application
Practical Considerations:
Out-of-Order Data: Buffer may contain non-contiguous data waiting for gaps to fill. This consumes space but isn't reflected in the basic calculation.
Memory Pressure: Under low memory, the kernel may reduce advertised windows even if buffer has space.
Auto-Tuning: Modern systems dynamically adjust receive buffer size based on connection characteristics.
Application Latency: If application reads slowly, window drops; fast readers maintain large windows.
Linux Auto-Tuning (tcp_rmem):
net.ipv4.tcp_rmem = 4096 131072 6291456
min default max
- min: Minimum buffer, used under memory pressure
- default: Initial buffer for new connections
- max: Maximum buffer, auto-tuning ceiling
12345678910111213141516171819202122232425262728
# View current TCP receive buffer settingssysctl net.ipv4.tcp_rmem# Output: net.ipv4.tcp_rmem = 4096 131072 6291456 # View send buffer settings (affects sender's view of window)sysctl net.ipv4.tcp_wmem# Output: net.ipv4.tcp_wmem = 4096 16384 4194304 # Monitor window sizes for active connectionsss -ti | grep -A1 "ESTAB"# Shows: wscale:7 rcv_space:14480 snd_wscale:7 rcv_wscale:7 # Detailed view of specific connectionss -ti dst 93.184.216.34:443# Output includes:# snd_wnd:123456 (current send window)# rcv_wnd:234567 (current receive window)# cwnd:10 (congestion window in segments)# ssthresh:65535 (slow start threshold) # Real-time window tracking with tcpdumptcpdump -i any -nn 'tcp port 443' -v 2>&1 | grep "win "# Shows window values in each segment # Wireshark filter for window analysis# tcp.analysis.window_update (window updates)# tcp.analysis.zero_window (zero window conditions)# tcp.window_size_value < 100 (small windows)If transfers are slow, check window sizes. A connection limited by small receive windows shows low throughput despite available bandwidth. Use 'ss -ti' to see current windows, or analyze captures for window patterns. Low receive windows often indicate a slow application or insufficient buffer configuration.
Window size directly impacts achievable throughput. Understanding this relationship is essential for performance tuning and troubleshooting.
The Fundamental Relationship:
Maximum Throughput = Window Size / Round-Trip Time
Example:
Window = 64 KB = 524,288 bits
RTT = 100 ms = 0.1 seconds
Max Throughput = 524,288 / 0.1 = 5.24 Mbps
No matter how fast your link is, you cannot exceed this throughput with this window and RTT.
| Window Size | 10ms RTT | 50ms RTT | 100ms RTT | 200ms RTT |
|---|---|---|---|---|
| 64 KB | 52 Mbps | 10 Mbps | 5.2 Mbps | 2.6 Mbps |
| 256 KB | 210 Mbps | 42 Mbps | 21 Mbps | 10 Mbps |
| 1 MB | 840 Mbps | 168 Mbps | 84 Mbps | 42 Mbps |
| 4 MB | 3.4 Gbps | 672 Mbps | 336 Mbps | 168 Mbps |
| 16 MB | 13.4 Gbps | 2.7 Gbps | 1.3 Gbps | 672 Mbps |
Performance Bottleneck Analysis:
Window-Limited: If Window/RTT < Link Speed, you're window-limited. Increase buffer sizes.
Bandwidth-Limited: If Link Speed < Window/RTT, you're using available capacity. Window increase won't help.
Congestion-Limited: If CWND < RWND, network congestion limits throughput. Investigate packet loss.
Application-Limited: If application reads slowly, RWND drops, limiting throughput. Optimize application.
Increasing buffer sizes improves potential throughput but increases per-connection memory usage. A server with 10,000 connections and 1 MB buffers each needs 10+ GB just for TCP buffers. Balance throughput against memory constraints in high-concurrency scenarios.
A critical distinction often confused: flow control (window size) and congestion control (CWND) are separate mechanisms solving different problems.
Flow Control via Window Size:
Congestion Control via CWND:
| Aspect | Flow Control (RWND) | Congestion Control (CWND) |
|---|---|---|
| What it protects | Receiver's buffer | Network's capacity |
| Who sets it | Receiver | Sender |
| How it's communicated | Window field in header | Not in header; sender-local |
| What triggers reduction | Buffer fills up | Packet loss or delay signals |
| Goal | Match receiver's consumption rate | Stay below network's capacity |
Effective Window:
The effective window—the actual amount the sender can transmit—is the minimum of both:
Effective Window = min(RWND, CWND) - Bytes In Flight
Whichever is smaller acts as the bottleneck:
Diagnosing slow transfers requires identifying which factor is limiting.
Use 'ss -ti' on Linux to compare rcv_wnd and cwnd. If cwnd is consistently much smaller than rcv_wnd, the network (congestion window) is the limit. If rcv_wnd is small or frequently hitting zero, the receiver (flow control) is the limit.
We've comprehensively explored the TCP window size field—the receiver's voice in controlling data flow and a critical determinant of network performance.
Module Complete:
You have now mastered the five core fields of the TCP header that enable reliable, ordered, flow-controlled communication:
These fields work together to transform unreliable IP datagrams into the reliable byte stream that powers virtually all Internet applications.
You now possess comprehensive understanding of the TCP header structure and its key fields. From port numbers identifying processes to window size enabling flow control, you understand how TCP's header design enables reliable, ordered, full-duplex byte stream communication. This foundation prepares you for advanced topics in TCP connection management and performance optimization.