Computer NetworksTCP Flow & Congestion Control

Sliding Window

LevelIntermediate

Duration60 mins

TopicTCP Flow & Congestion Control

3 / 5

Receive Window

The Receiver's Responsibility

While the sender pushes data into the network, the receiver faces a different challenge: accepting that data reliably, buffering it appropriately, and delivering it to the application in order—all while providing feedback that prevents the sender from overwhelming its capacity.

The receive window is the receiver's tool for this task. It represents the receiver's current capacity to accept data and is advertised to the sender in every ACK. When the application reads slowly, the window shrinks. When it catches up, the window expands. This dynamic feedback creates a self-regulating system that adapts to application behavior.

This page examines receive window mechanics in exhaustive detail: how the receiver manages its buffer, calculates its available capacity, advertises the window, and handles edge cases like out-of-order arrival and zero window conditions.

What You Will Learn

By the end of this page, you will understand: how the receiver maintains its buffer and window state, the calculation of the advertised window value, how out-of-order segments are handled, the generation of acknowledgments, and how receiver behavior affects sender throughput.

Receive Window State Variables

Just as the sender maintains state for transmission, the receiver maintains state for reception and window advertisement. RFC 793 specifies these variables:

The Receive Sequence Space:

RFC 793 Receive Sequence Variables
Variable	Full Name	Description
`RCV.NXT`	Receive Next	The next expected sequence number. Bytes before this have been received and acknowledged.
`RCV.WND`	Receive Window	The window size the receiver is willing to accept. Advertised to the sender.
`RCV.UP`	Receive Urgent Pointer	Points to urgent data if URG flag is set.
`IRS`	Initial Receive Sequence Number	The sequence number of the first byte received (from the sender's ISS).

Understanding RCV.NXT:

RCV.NXT is the cornerstone of receive-side TCP. It represents:

Next expected byte: The receiver expects a segment with this sequence number next
Cumulative acknowledgment: All bytes before RCV.NXT have been received without gaps
In-order delivery point: Bytes up to RCV.NXT - 1 can be delivered to the application
ACK value: The ACK number in outgoing segments equals RCV.NXT

The receive window boundaries:

Acceptable sequence numbers: [RCV.NXT, RCV.NXT + RCV.WND)

Bytes with sequence numbers < RCV.NXT are duplicates (already received)
Bytes with sequence numbers ≥ RCV.NXT + RCV.WND are beyond the window (rejected)
Bytes in the acceptable range are buffered, even if out of order

RCV.NXT Only Advances on Contiguous Data

A critical point: RCV.NXT advances only when contiguous data is received. If bytes 0-999 are received, then bytes 2000-2999 arrive (out of order), RCV.NXT stays at 1000. Only when bytes 1000-1999 arrive (filling the gap) does RCV.NXT jump to 3000. This is why cumulative ACKs indicate the highest contiguously received byte.

Receive Sequence Space Visualization
Receive Buffer State:
                                
Byte positions:  0    1000   2000   3000   4000   5000   6000   7000   8000
                 |------|------|------|------|------|------|------|------|
 
Scenario: Window = 5000, received bytes 0-999 and 2000-3999 (out of order)
 
     [==DELIVERED==][===GAP===][=OUT OF ORDER=][=======WINDOW=======][BEYOND]
     |<-- 0-999 -->|<1000-1999>|<- 2000-3999 ->|<-- 4000-7999 ----->| 8000+
                   ^           ^               ^                    ^
                   │           │               │                    │
               RCV.NXT=1000   Hole          Buffered           RCV.NXT + RCV.WND
               
ACK value = RCV.NXT = 1000 (indicating "give me byte 1000")
Window advertised = RCV.WND (5000 in this example)
 
When bytes 1000-1999 arrive:
- Gap filled, RCV.NXT advances to 4000 (skipping buffered 2000-3999)
- All 4000 bytes (0-3999) can be delivered to application
- ACK value becomes 4000

The Receive Buffer

The receive buffer is the memory region that holds incoming data between network arrival and application consumption. Its management is central to receive window behavior.

Buffer organization:

The receive buffer is logically organized into regions:

Receive Buffer Organization
Receive Buffer Memory Layout:
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                            RECEIVE BUFFER                                   │
├──────────────────┬────────────────────────────────────────┬─────────────────┤
│   [FREE SPACE]   │     [PENDING DATA]                     │  [AVAILABLE]    │
│  (recycled from  │   (received, not yet read by app)      │   (window)      │
│   app reads)     │                                        │                 │
├──────────────────┼────────────────────────────────────────┼─────────────────┤
│  Can receive     │  Holding data for                      │  Can accept     │
│  more data       │  application                           │  new data       │
│  indirectly      │                                        │                 │
└──────────────────┴────────────────────────────────────────┴─────────────────┘
 
The PENDING DATA region may contain:
┌─────────────────────────────┬──────────┬──────────────────┐
│   IN-ORDER (deliverable)    │   GAP    │   OUT-OF-ORDER   │
│   (bytes up to RCV.NXT-1)   │  (hole)  │   (future bytes) │
├─────────────────────────────┼──────────┼──────────────────┤
│   Ready for app read()      │  Empty   │   Waiting for    │
│                             │          │   gap to fill    │
└─────────────────────────────┴──────────┴──────────────────┘

Window calculation:

The receive window advertised to the sender is calculated based on available buffer space:

RCV.WND = Buffer_Size - (Bytes_Buffered_Not_Yet_Read_By_App)

More precisely:

RCV.WND = Buffer_Size - (Highest_Sequence_Received - Last_Sequence_Read_By_App)

This includes both in-order and out-of-order data. If the application reads slowly, buffered data accumulates, and RCV.WND shrinks. If the application reads quickly, buffer space frees, and RCV.WND grows.

Out-of-Order Data Consumes Buffer Space

A subtle but important point: out-of-order data occupies buffer space even though it can't be delivered yet. If a sender transmits bytes 0-999, 2000-2999, 4000-4999 (skipping 1000-1999 and 3000-3999), all 3000 received bytes consume buffer space, not just the 1000 in-order bytes. This reduces the window the receiver can advertise.

Buffer size configuration:

The receive buffer size is configured at the socket level, often with OS defaults that may need adjustment for high-performance scenarios:

Platform	Configuration Method	Typical Default	Maximum
Linux	`setsockopt(SO_RCVBUF)` or sysctl	87 KB - 6 MB (auto-tuned)	16+ MB configurable
Windows	`setsockopt(SO_RCVBUF)` or registry	64 KB - 1 MB	16+ MB configurable
macOS	`setsockopt(SO_RCVBUF)` or sysctl	128 KB - 4 MB	Limited by sysctl

Auto-tuning:

Modern operating systems implement TCP receive buffer auto-tuning:

Start with a small buffer
Observe actual usage patterns (RTT, throughput)
Grow buffer as needed up to a maximum
Shrink if memory pressure requires

This balances performance with memory efficiency across diverse connections.

Out-of-Order Segment Handling

The Internet provides no ordering guarantees. Packets can take different paths, experience different delays, and arrive out of order. The receiver must handle this gracefully.

RFC 793's original guidance:

The original TCP specification suggested that receivers could discard out-of-order segments. The sender would eventually retransmit, and hopefully the retransmission would arrive in order. This is correct but inefficient.

Modern receiver behavior:

Modern TCP implementations buffer out-of-order segments:

Accept if within window: Any segment with sequence number in [RCV.NXT, RCV.NXT + RCV.WND) is accepted
Buffer in reassembly queue: Out-of-order segments are stored in a data structure (often a list or tree of sequence ranges)
Coalesce when possible: When the missing gap arrives, merge contiguous ranges
Deliver to application: Only contiguous data from RCV.NXT can be delivered

Reassembly data structures:

Implementations typically use one of these data structures for out-of-order segment management:

Data Structure	Insertion	Gap Query	Coalescing	Memory
Linked list of ranges	O(n)	O(n)	O(1) merge	Minimal per-range
Red-black tree	O(log n)	O(log n)	O(log n) merge	Nodes + data
Segment bitmap	O(1)	O(1)	O(byte range)	High for large windows

The linked list approach is common for connections with infrequent reordering. Trees scale better for connections with persistent reordering.

Out-of-Order Reassembly Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Initial state: RCV.NXT = 1000, no buffered data
 
Segment arrives: seq=2000, len=1000
  - Out of order (expected 1000, got 2000)
  - Buffer: [(2000, 2999)]
  - RCV.NXT still 1000
  
Segment arrives: seq=4000, len=500
  - Out of order
  - Buffer: [(2000, 2999), (4000, 4499)]
  - RCV.NXT still 1000
  
Segment arrives: seq=1000, len=1000
  - In order! Fills first gap
  - Buffer: [(2000, 2999), (4000, 4499)]
  - Delivers bytes 1000-2999 to app (including buffered 2000-2999)
  - RCV.NXT advances to 3000
  
Segment arrives: seq=3000, len=1500
  - In order (starts at RCV.NXT)
  - Overlaps and extends past buffered (4000, 4499)
  - Delivers bytes 3000-4499 to app
  - RCV.NXT advances to 4500
  - Buffer: [] (empty)

SACK Reports Out-of-Order Blocks

The Selective Acknowledgment (SACK) option allows the receiver to report which out-of-order blocks it has received. This helps the sender retransmit only the missing data. Each SACK block is a (start, end) sequence number range. The receiver can report up to 4 SACK blocks per ACK (limited by TCP options space).

Window Advertisement

The receiver communicates its capacity to the sender through the Window field in the TCP header. This 16-bit field (extended by window scaling) indicates how many bytes beyond the acknowledged byte the receiver can accept.

When to advertise:

The receiver includes a window value in every segment it sends, particularly:

ACK segments: Every acknowledgment carries a window update
Data segments: If the receiver is also sending data, it includes window information
Window updates: Pure window update segments (ACK with no new data, just updated window)

Computing the advertised value:

Advertised_Window = min(RCV.WND, Maximum_Representable_Value)

With window scaling, Maximum_Representable_Value = 65535 × 2^scale_factor. Without scaling, it's just 65535.

Window shrinking considerations:

A subtle issue: should the receiver ever shrink the window's right edge? Consider:

Receiver advertises window = 10000, starting at ACK = 5000
Sender's window covers bytes 5000-14999
Receiver's buffer fills; available space drops to 3000
Should receiver advertise window = 3000?

This would move the window right edge from 15000 to 8000—a dangerous backward movement. The sender may have already transmitted bytes 8000-14999, which the receiver now says it can't accept.

RFC 793 advises against shrinking:

The standard recommends the receiver not shrink the right edge of the window. Instead:

Delay sending ACKs until buffer space frees
Or advertise the same window even if buffer is tighter

However, implementations vary, and robust senders must handle window shrinkage gracefully.

Silly Window Syndrome Avoidance

If the receiver advertised every tiny increase in buffer space, the sender might send small segments, leading to poor efficiency. Clark's solution: only advertise a window increase when either (a) the window has grown by at least one MSS, or (b) half the buffer is free. This prevents advertising small windows that would trigger inefficient small segments.

Window Advertisement Best Practices

•Never shrink the right edge: Maintain RCV.NXT + RCV.WND monotonically increasing or stable.
•Apply Silly Window Syndrome prevention: Don't advertise window until it's at least MSS or half-buffer.
•Use window scaling for high-BDP paths: A 16-bit window (max 64KB) is insufficient for modern networks.
•Consider delayed ACKs: Waiting briefly may allow multiple packets to be ACKed together with an updated window.
•Handle memory pressure gracefully: If system memory is low, advertise a smaller (but stable) window.

Generating Acknowledgments

The receiver generates ACKs to inform the sender of successful reception. ACK generation strategy significantly impacts performance.

What the ACK communicates:

ACK number: "I have received all bytes up to (ACK-1); send me byte ACK next"
Window: "You may send up to [Window] bytes beyond byte [ACK-1]"
SACK blocks (if enabled): "I've also received these out-of-order ranges"

Immediate vs. Delayed ACKs:

RFC 1122 allows ACKs to be delayed to improve efficiency:

Immediate ACKs

•ACK sent for every segment received
•Maximum responsiveness
•Essential for loss detection (duplicate ACKs)
•Higher overhead (more packets)
•Required for out-of-order segments

Delayed ACKs

•ACK after a delay (max 500ms, typically ~40ms)
•Can ACK multiple segments together
•Lower overhead (fewer packets)
•May slightly increase latency
•Combines with data if receiver is sending

RFC 1122 guidelines:

Delay at most 500ms: The delay should not exceed 500 milliseconds (many implementations use 40-200ms).
ACK at least every other segment: If two full-sized segments arrive, ACK immediately.
ACK out-of-order segments immediately: This triggers fast retransmit if a gap is detected.
ACK after timeout: Even if no new data arrives, ACK when the delay timer expires.

The duplicate ACK mechanism:

When an out-of-order segment arrives, the receiver:

Sends an ACK with the same ACK number as before (RCV.NXT hasn't changed)
This is a "duplicate ACK" from the sender's perspective
Multiple duplicate ACKs signal loss to the sender
Three duplicate ACKs trigger fast retransmit

Converting Mermaid diagram...

Zero Window Condition

When the receive buffer is completely full—typically because the application isn't reading data—the receiver advertises a zero window. This is a normal (if undesirable) condition that TCP handles gracefully.

How zero window arises:

Data arrives faster than the application reads
Receive buffer fills up
RCV.WND drops to 0
Receiver advertises window = 0
Sender stops transmitting (must respect the window)

Recovery from zero window:

Application reads data from buffer
Buffer space frees up
Receiver can now advertise window > 0
Problem: How does the receiver tell the sender?

The receiver's options for window reopening:

Window Reopening Strategies

•Wait for sender's probe: Sender's persist timer will trigger a window probe; respond with updated window
•Send unsolicited ACK: Proactively send an ACK with the new window value (window update)
•Combine with outgoing data: If the receiver is also sending, include the window update

Window update segments:

A window update is an ACK segment with:

ACK number = RCV.NXT (no new data acknowledged)
Window = new, larger value
No data payload

This segment serves only to inform the sender that the window has reopened.

Why window updates can fail:

Window updates are "pure ACKs" with no data. They are not retransmitted if lost:

The sender never knows it was sent
Without the update, sender waits for persist timer
This adds latency but doesn't cause data loss

This is why the sender's persist timer is essential—it recovers from lost window updates.

Zero Window vs. Connection Problems

A zero window is not a connection failure—it's normal flow control. The connection remains open, and transmission will resume when the receiver advertises space. However, prolonged zero windows (minutes+) often indicate application problems: the application isn't reading data, perhaps due to a hung process or empty loop. Monitoring zero window duration is valuable for detecting application issues.

Receiver behavior during zero window:

Accept window probes (1 byte) and acknowledge them
Continue processing data already in buffer
Deliver in-order data to application when possible
Update window when buffer space frees

The receiver should not simply ignore the sender; it must respond to probes to prevent permanent deadlock.

Application Delivery

The ultimate purpose of the receive buffer is to deliver data to the application. This process affects window behavior and overall throughput.

Delivery semantics:

TCP delivers a stream of bytes to the application:

No message boundaries (application must implement framing if needed)
Always in order (out-of-order bytes wait for gaps to fill)
Reliable (every byte delivered exactly once, or connection fails)
No duplicates (even if network duplicates occurred)

The read system call:

When the application calls read() (or recv()):

TCP copies bytes from receive buffer to application buffer
Copied bytes are removed from receive buffer
Buffer space becomes available for new data
If enough space freed, receiver may advertise larger window

Application Read Impact on Window
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Scenario: Receive buffer = 10000 bytes, MSS = 1000 bytes
 
Initial state:
  - Buffer: 8000 bytes used (data waiting for app)
  - RCV.WND = 2000 bytes (available space)
  - Sender is constrained by small window
 
Application calls read(4000):
  - 4000 bytes moved from buffer to application
  - Buffer: 4000 bytes used
  - RCV.WND = 6000 bytes (more space!)
  
Should receiver send window update?
  - Old window = 2000, New window = 6000
  - Increase = 4000 bytes (4 MSS)
  - Clark's SWS solution: Update if >= 1 MSS or >= half buffer
  - 4000 >= 1000 (MSS), so yes, send update
  
Receiver sends: ACK=X, Window=6000
Sender receives update, can now send more data

Push (PSH) flag handling:

The PSH flag tells the receiver to deliver buffered data to the application promptly rather than waiting for more. When TCP receives a segment with PSH set:

Add the data to the receive buffer (in order)
If contiguous data available, wake up the application
Deliver data immediately (don't wait for buffer to fill)

Blocking vs. non-blocking read:

Mode	Buffer Empty Behavior	Buffer Partially Full Behavior
Blocking	Block until data arrives	Return whatever is available
Non-blocking	Return immediately with error (EAGAIN)	Return whatever is available

In both cases, TCP delivers whatever contiguous data is available—it never delivers out-of-order or waits for more data when some is ready.

Application Read Performance Impact

Applications that read data slowly constrain TCP throughput. The window shrinks, the sender throttles, and throughput drops. For maximum performance: (1) read data promptly, (2) use large read buffers to reduce syscall overhead, and (3) consider using async I/O to overlap reading with processing.

Summary: The Receive Window

We've explored the receive window in depth—the receiver's mechanism for managing incoming data and advertising capacity. Let's consolidate the key concepts:

Key Takeaways

•RCV.NXT and RCV.WND: The next expected byte and advertised window define the receiver's acceptable sequence range.
•Receive buffer: Holds data from network arrival to application consumption. Window = available buffer space.
•Out-of-order handling: Modern receivers buffer out-of-order segments for efficiency; RCV.NXT only advances on contiguous data.
•Window advertisement: Included in every ACK; should avoid shrinking the right edge; subject to Silly Window Syndrome prevention.
•ACK generation: Delayed ACKs improve efficiency; immediate ACKs needed for out-of-order to trigger fast retransmit.
•Zero window: Normal flow control when buffer is full; persist timer at sender prevents deadlock.
•Application delivery: Only in-order bytes are delivered; slow applications constrain throughput via window shrinkage.

What's next:

We've examined both send and receive windows. However, there's a critical limitation: the 16-bit window field in the TCP header limits windows to 65,535 bytes. For modern high-bandwidth, high-latency networks, this is woefully inadequate. Next, we'll explore Window Scaling—the TCP option that extends window sizes to match contemporary network demands.

Page Complete

You now understand the receive window mechanism: how the receiver manages its buffer, advertises capacity, handles out-of-order data, and delivers to applications. This knowledge is essential for understanding TCP performance characteristics and diagnosing throughput issues. Next, we'll explore window scaling for high-performance networks.

3 / 5

Loading learning content...

Computer NetworksTCP Flow & Congestion Control

Sliding Window

LevelIntermediate

Duration60 mins

TopicTCP Flow & Congestion Control

3 / 5

Receive Window

The Receiver's Responsibility

What You Will Learn

Receive Window State Variables

Just as the sender maintains state for transmission, the receiver maintains state for reception and window advertisement. RFC 793 specifies these variables:

The Receive Sequence Space:

RFC 793 Receive Sequence Variables
Variable	Full Name	Description
`RCV.NXT`	Receive Next	The next expected sequence number. Bytes before this have been received and acknowledged.
`RCV.WND`	Receive Window	The window size the receiver is willing to accept. Advertised to the sender.
`RCV.UP`	Receive Urgent Pointer	Points to urgent data if URG flag is set.
`IRS`	Initial Receive Sequence Number	The sequence number of the first byte received (from the sender's ISS).

Understanding RCV.NXT:

RCV.NXT is the cornerstone of receive-side TCP. It represents:

Next expected byte: The receiver expects a segment with this sequence number next
Cumulative acknowledgment: All bytes before RCV.NXT have been received without gaps
In-order delivery point: Bytes up to RCV.NXT - 1 can be delivered to the application
ACK value: The ACK number in outgoing segments equals RCV.NXT

The receive window boundaries:

Acceptable sequence numbers: [RCV.NXT, RCV.NXT + RCV.WND)

Bytes with sequence numbers < RCV.NXT are duplicates (already received)
Bytes with sequence numbers ≥ RCV.NXT + RCV.WND are beyond the window (rejected)
Bytes in the acceptable range are buffered, even if out of order

RCV.NXT Only Advances on Contiguous Data

Receive Sequence Space Visualization
Receive Buffer State:
                                
Byte positions:  0    1000   2000   3000   4000   5000   6000   7000   8000
                 |------|------|------|------|------|------|------|------|
 
Scenario: Window = 5000, received bytes 0-999 and 2000-3999 (out of order)
 
     [==DELIVERED==][===GAP===][=OUT OF ORDER=][=======WINDOW=======][BEYOND]
     |<-- 0-999 -->|<1000-1999>|<- 2000-3999 ->|<-- 4000-7999 ----->| 8000+
                   ^           ^               ^                    ^
                   │           │               │                    │
               RCV.NXT=1000   Hole          Buffered           RCV.NXT + RCV.WND
               
ACK value = RCV.NXT = 1000 (indicating "give me byte 1000")
Window advertised = RCV.WND (5000 in this example)
 
When bytes 1000-1999 arrive:
- Gap filled, RCV.NXT advances to 4000 (skipping buffered 2000-3999)
- All 4000 bytes (0-3999) can be delivered to application
- ACK value becomes 4000

The Receive Buffer

The receive buffer is the memory region that holds incoming data between network arrival and application consumption. Its management is central to receive window behavior.

Buffer organization:

The receive buffer is logically organized into regions:

Receive Buffer Organization
Receive Buffer Memory Layout:
 
┌─────────────────────────────────────────────────────────────────────────────┐
│                            RECEIVE BUFFER                                   │
├──────────────────┬────────────────────────────────────────┬─────────────────┤
│   [FREE SPACE]   │     [PENDING DATA]                     │  [AVAILABLE]    │
│  (recycled from  │   (received, not yet read by app)      │   (window)      │
│   app reads)     │                                        │                 │
├──────────────────┼────────────────────────────────────────┼─────────────────┤
│  Can receive     │  Holding data for                      │  Can accept     │
│  more data       │  application                           │  new data       │
│  indirectly      │                                        │                 │
└──────────────────┴────────────────────────────────────────┴─────────────────┘
 
The PENDING DATA region may contain:
┌─────────────────────────────┬──────────┬──────────────────┐
│   IN-ORDER (deliverable)    │   GAP    │   OUT-OF-ORDER   │
│   (bytes up to RCV.NXT-1)   │  (hole)  │   (future bytes) │
├─────────────────────────────┼──────────┼──────────────────┤
│   Ready for app read()      │  Empty   │   Waiting for    │
│                             │          │   gap to fill    │
└─────────────────────────────┴──────────┴──────────────────┘

Window calculation:

The receive window advertised to the sender is calculated based on available buffer space:

RCV.WND = Buffer_Size - (Bytes_Buffered_Not_Yet_Read_By_App)

More precisely:

RCV.WND = Buffer_Size - (Highest_Sequence_Received - Last_Sequence_Read_By_App)

Out-of-Order Data Consumes Buffer Space

Buffer size configuration:

The receive buffer size is configured at the socket level, often with OS defaults that may need adjustment for high-performance scenarios:

Platform	Configuration Method	Typical Default	Maximum
Linux	`setsockopt(SO_RCVBUF)` or sysctl	87 KB - 6 MB (auto-tuned)	16+ MB configurable
Windows	`setsockopt(SO_RCVBUF)` or registry	64 KB - 1 MB	16+ MB configurable
macOS	`setsockopt(SO_RCVBUF)` or sysctl	128 KB - 4 MB	Limited by sysctl

Auto-tuning:

Modern operating systems implement TCP receive buffer auto-tuning:

Start with a small buffer
Observe actual usage patterns (RTT, throughput)
Grow buffer as needed up to a maximum
Shrink if memory pressure requires

This balances performance with memory efficiency across diverse connections.

Out-of-Order Segment Handling

The Internet provides no ordering guarantees. Packets can take different paths, experience different delays, and arrive out of order. The receiver must handle this gracefully.

RFC 793's original guidance:

Modern receiver behavior:

Modern TCP implementations buffer out-of-order segments:

Accept if within window: Any segment with sequence number in [RCV.NXT, RCV.NXT + RCV.WND) is accepted
Buffer in reassembly queue: Out-of-order segments are stored in a data structure (often a list or tree of sequence ranges)
Coalesce when possible: When the missing gap arrives, merge contiguous ranges
Deliver to application: Only contiguous data from RCV.NXT can be delivered

Reassembly data structures:

Implementations typically use one of these data structures for out-of-order segment management:

Data Structure	Insertion	Gap Query	Coalescing	Memory
Linked list of ranges	O(n)	O(n)	O(1) merge	Minimal per-range
Red-black tree	O(log n)	O(log n)	O(log n) merge	Nodes + data
Segment bitmap	O(1)	O(1)	O(byte range)	High for large windows

The linked list approach is common for connections with infrequent reordering. Trees scale better for connections with persistent reordering.

Out-of-Order Reassembly Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Initial state: RCV.NXT = 1000, no buffered data
 
Segment arrives: seq=2000, len=1000
  - Out of order (expected 1000, got 2000)
  - Buffer: [(2000, 2999)]
  - RCV.NXT still 1000
  
Segment arrives: seq=4000, len=500
  - Out of order
  - Buffer: [(2000, 2999), (4000, 4499)]
  - RCV.NXT still 1000
  
Segment arrives: seq=1000, len=1000
  - In order! Fills first gap
  - Buffer: [(2000, 2999), (4000, 4499)]
  - Delivers bytes 1000-2999 to app (including buffered 2000-2999)
  - RCV.NXT advances to 3000
  
Segment arrives: seq=3000, len=1500
  - In order (starts at RCV.NXT)
  - Overlaps and extends past buffered (4000, 4499)
  - Delivers bytes 3000-4499 to app
  - RCV.NXT advances to 4500
  - Buffer: [] (empty)

SACK Reports Out-of-Order Blocks

Window Advertisement

When to advertise:

The receiver includes a window value in every segment it sends, particularly:

ACK segments: Every acknowledgment carries a window update
Data segments: If the receiver is also sending data, it includes window information
Window updates: Pure window update segments (ACK with no new data, just updated window)

Computing the advertised value:

Advertised_Window = min(RCV.WND, Maximum_Representable_Value)

With window scaling, Maximum_Representable_Value = 65535 × 2^scale_factor. Without scaling, it's just 65535.

Window shrinking considerations:

A subtle issue: should the receiver ever shrink the window's right edge? Consider:

Receiver advertises window = 10000, starting at ACK = 5000
Sender's window covers bytes 5000-14999
Receiver's buffer fills; available space drops to 3000
Should receiver advertise window = 3000?

This would move the window right edge from 15000 to 8000—a dangerous backward movement. The sender may have already transmitted bytes 8000-14999, which the receiver now says it can't accept.

RFC 793 advises against shrinking:

The standard recommends the receiver not shrink the right edge of the window. Instead:

Delay sending ACKs until buffer space frees
Or advertise the same window even if buffer is tighter

However, implementations vary, and robust senders must handle window shrinkage gracefully.

Silly Window Syndrome Avoidance

Window Advertisement Best Practices

•Never shrink the right edge: Maintain RCV.NXT + RCV.WND monotonically increasing or stable.
•Apply Silly Window Syndrome prevention: Don't advertise window until it's at least MSS or half-buffer.
•Use window scaling for high-BDP paths: A 16-bit window (max 64KB) is insufficient for modern networks.
•Consider delayed ACKs: Waiting briefly may allow multiple packets to be ACKed together with an updated window.
•Handle memory pressure gracefully: If system memory is low, advertise a smaller (but stable) window.

Generating Acknowledgments

The receiver generates ACKs to inform the sender of successful reception. ACK generation strategy significantly impacts performance.

What the ACK communicates:

ACK number: "I have received all bytes up to (ACK-1); send me byte ACK next"
Window: "You may send up to [Window] bytes beyond byte [ACK-1]"
SACK blocks (if enabled): "I've also received these out-of-order ranges"

Immediate vs. Delayed ACKs:

RFC 1122 allows ACKs to be delayed to improve efficiency:

Immediate ACKs

•ACK sent for every segment received
•Maximum responsiveness
•Essential for loss detection (duplicate ACKs)
•Higher overhead (more packets)
•Required for out-of-order segments

Delayed ACKs

•ACK after a delay (max 500ms, typically ~40ms)
•Can ACK multiple segments together
•Lower overhead (fewer packets)
•May slightly increase latency
•Combines with data if receiver is sending

RFC 1122 guidelines:

Delay at most 500ms: The delay should not exceed 500 milliseconds (many implementations use 40-200ms).
ACK at least every other segment: If two full-sized segments arrive, ACK immediately.
ACK out-of-order segments immediately: This triggers fast retransmit if a gap is detected.
ACK after timeout: Even if no new data arrives, ACK when the delay timer expires.

The duplicate ACK mechanism:

When an out-of-order segment arrives, the receiver:

Sends an ACK with the same ACK number as before (RCV.NXT hasn't changed)
This is a "duplicate ACK" from the sender's perspective
Multiple duplicate ACKs signal loss to the sender
Three duplicate ACKs trigger fast retransmit

Converting Mermaid diagram...

Zero Window Condition

How zero window arises:

Data arrives faster than the application reads
Receive buffer fills up
RCV.WND drops to 0
Receiver advertises window = 0
Sender stops transmitting (must respect the window)

Recovery from zero window:

Application reads data from buffer
Buffer space frees up
Receiver can now advertise window > 0
Problem: How does the receiver tell the sender?

The receiver's options for window reopening:

Window Reopening Strategies

•Wait for sender's probe: Sender's persist timer will trigger a window probe; respond with updated window
•Send unsolicited ACK: Proactively send an ACK with the new window value (window update)
•Combine with outgoing data: If the receiver is also sending, include the window update

Window update segments:

A window update is an ACK segment with:

ACK number = RCV.NXT (no new data acknowledged)
Window = new, larger value
No data payload

This segment serves only to inform the sender that the window has reopened.

Why window updates can fail:

Window updates are "pure ACKs" with no data. They are not retransmitted if lost:

The sender never knows it was sent
Without the update, sender waits for persist timer
This adds latency but doesn't cause data loss

This is why the sender's persist timer is essential—it recovers from lost window updates.

Zero Window vs. Connection Problems

Receiver behavior during zero window:

Accept window probes (1 byte) and acknowledge them
Continue processing data already in buffer
Deliver in-order data to application when possible
Update window when buffer space frees

The receiver should not simply ignore the sender; it must respond to probes to prevent permanent deadlock.

Application Delivery

The ultimate purpose of the receive buffer is to deliver data to the application. This process affects window behavior and overall throughput.

Delivery semantics:

TCP delivers a stream of bytes to the application:

No message boundaries (application must implement framing if needed)
Always in order (out-of-order bytes wait for gaps to fill)
Reliable (every byte delivered exactly once, or connection fails)
No duplicates (even if network duplicates occurred)

The read system call:

When the application calls read() (or recv()):

TCP copies bytes from receive buffer to application buffer
Copied bytes are removed from receive buffer
Buffer space becomes available for new data
If enough space freed, receiver may advertise larger window

Application Read Impact on Window
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Scenario: Receive buffer = 10000 bytes, MSS = 1000 bytes
 
Initial state:
  - Buffer: 8000 bytes used (data waiting for app)
  - RCV.WND = 2000 bytes (available space)
  - Sender is constrained by small window
 
Application calls read(4000):
  - 4000 bytes moved from buffer to application
  - Buffer: 4000 bytes used
  - RCV.WND = 6000 bytes (more space!)
  
Should receiver send window update?
  - Old window = 2000, New window = 6000
  - Increase = 4000 bytes (4 MSS)
  - Clark's SWS solution: Update if >= 1 MSS or >= half buffer
  - 4000 >= 1000 (MSS), so yes, send update
  
Receiver sends: ACK=X, Window=6000
Sender receives update, can now send more data

Push (PSH) flag handling:

The PSH flag tells the receiver to deliver buffered data to the application promptly rather than waiting for more. When TCP receives a segment with PSH set:

Add the data to the receive buffer (in order)
If contiguous data available, wake up the application
Deliver data immediately (don't wait for buffer to fill)

Blocking vs. non-blocking read:

Mode	Buffer Empty Behavior	Buffer Partially Full Behavior
Blocking	Block until data arrives	Return whatever is available
Non-blocking	Return immediately with error (EAGAIN)	Return whatever is available

In both cases, TCP delivers whatever contiguous data is available—it never delivers out-of-order or waits for more data when some is ready.

Application Read Performance Impact

Summary: The Receive Window

We've explored the receive window in depth—the receiver's mechanism for managing incoming data and advertising capacity. Let's consolidate the key concepts:

Key Takeaways

•RCV.NXT and RCV.WND: The next expected byte and advertised window define the receiver's acceptable sequence range.
•Receive buffer: Holds data from network arrival to application consumption. Window = available buffer space.
•Out-of-order handling: Modern receivers buffer out-of-order segments for efficiency; RCV.NXT only advances on contiguous data.
•Window advertisement: Included in every ACK; should avoid shrinking the right edge; subject to Silly Window Syndrome prevention.
•ACK generation: Delayed ACKs improve efficiency; immediate ACKs needed for out-of-order to trigger fast retransmit.
•Zero window: Normal flow control when buffer is full; persist timer at sender prevents deadlock.
•Application delivery: Only in-order bytes are delivered; slow applications constrain throughput via window shrinkage.

What's next:

Page Complete

3 / 5