Computer NetworksTCP Timers

TCP Timers: The Temporal Guardians of Reliable Communication

LevelIntermediate

Duration90 mins

TopicTCP Timers

2 / 5

Persistence Timer: Breaking the Zero-Window Deadlock

The Hidden Deadlock Threat

TCP's flow control mechanism is elegantly simple: the receiver advertises how much data it can accept in its receive buffer using the window size field. When the receiver's buffer fills up (perhaps the application isn't reading data fast enough), it advertises a zero window, telling the sender to stop transmitting. The sender pauses and waits for a window update.

But what happens if that window update is lost?

The sender waits for a window update that will never arrive. The receiver waits for data that the sender is too polite to send. Neither side takes action. The connection is frozen—deadlocked. Without intervention, this impasse could last forever.

This is where the persistence timer enters the picture. It's TCP's insurance policy against the zero-window deadlock, ensuring that connections can always recover from lost window updates.

What You Will Learn

By the end of this page, you will understand: how the zero-window deadlock occurs, how the persistence timer detects and breaks the deadlock using window probes, the relationship between persistence timing and exponential backoff, Silly Window Syndrome and its connection to zero-window situations, and practical debugging of persistence timer behavior.

The Zero-Window Deadlock Scenario

To understand the persistence timer, we must first understand the problem it solves. Let's trace through the sequence of events that leads to deadlock:

Initial State:

Sender has data to transmit
Receiver has a receive buffer (let's say 64KB)
Receiver advertises a 64KB window
Everything is working normally

The Sequence Leading to Deadlock:

deadlock_scenario.txt

Diagram

Timeline: Zero-Window Deadlock Development
 
Sender (Host A)                              Receiver (Host B)
     |                                              |
     |------- Data segments (64KB total) --------->|  ← Receiver buffer fills
     |                                              |  Application busy, not reading
     |                                              |
     |<-------- ACK, Window = 0 -------------------|  ← Buffer full!
     |                                              |
     |  [Sender pauses, waiting for window > 0]    |
     |                                              |
     |                                              |  [Time passes...]
     |                                              |  Application reads 16KB
     |                                              |
     |<-------- ACK, Window = 16384 ---------------|  ← Window update sent!
     |          ╲                                   |
     |           ╲  THIS PACKET IS LOST            |
     |            ╲ (router congestion)            |
     |             ╲                               |
     |              💀                             |
     |                                              |
     |  [Sender still waiting for window > 0]      |  [Receiver waiting for data]
     |  [Sender thinks: "Buffer is still full"]    |  [Receiver thinks: "I told them
     |  [Will wait forever...]                     |   I have space..."]
     |                                              |
     |         🔒 DEADLOCK 🔒                        |
     |                                              |
     |  Neither side sends anything.               |
     |  Connection is stuck forever.               |

Why This Is a Genuine Problem:

You might wonder why TCP's retransmission timer doesn't solve this. The answer is crucial: the retransmission timer only applies to data segments. When the sender receives a zero-window ACK, it stops sending data. With no data in flight, there's nothing to time out. The sender is correctly following the flow control protocol by waiting for permission to send.

Similarly, the receiver has done everything correctly. It sent a window update when space became available. TCP provides reliable delivery for data, but pure ACK segments (which window updates often are) are not themselves reliably delivered—they're just best-effort.

This asymmetry creates the deadlock potential.

ACKs Are Not Reliably Delivered

A common misconception is that everything in TCP is reliable. In fact, only data is reliably delivered. Acknowledgments themselves can be lost without triggering retransmission—the next ACK will implicitly cover the lost one. But a window update ACK is special: if it's lost and no data follows, there's no "next ACK" to fix the problem.

The Persistence Timer Solution

The solution is beautifully simple: periodically probe the receiver to check if the window is still zero. This is the role of the persistence timer.

When the sender receives a zero-window ACK, it starts the persistence timer. When this timer expires, the sender transmits a window probe—a special segment designed to elicit a response from the receiver.

The window probe forces the receiver to respond with its current window size. If the window is still zero, the sender knows to wait longer. If the window has opened (meaning a previous window update was lost), the sender discovers this and can resume transmission.

Window Probe Characteristics:

Anatomy of a Window Probe

•Contains Data — A window probe typically contains one byte of data beyond what has been acknowledged. This byte may or may not be accepted depending on buffer state.
•Triggers ACK — Because it contains a sequence number, the receiver must respond with an ACK, which includes the current window size.
•Zero-Data Probe Option — Some implementations send a zero-byte probe (same sequence number), which also triggers an ACK without transmitting new data.
•Persistent — The sender continues sending probes (with exponential backoff) as long as the window remains zero. The connection stays alive.
•Unbounded — Unlike retransmissions, which eventually give up, persistence probes continue indefinitely. A zero-window situation might be legitimate and long-lasting.

persistence_timer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
"""
TCP Persistence Timer Implementation
 
This module demonstrates the persistence timer mechanism that prevents
deadlock when the receiver advertises a zero window.
"""
 
from dataclasses import dataclass
from typing import Callable, Optional
from enum import Enum
import time
 
 
class WindowState(Enum):
    OPEN = "open"           # Normal transmission allowed
    ZERO = "zero"           # Window closed, probing active
    PROBING = "probing"     # Currently sending probe
 
 
@dataclass
class PersistenceTimerConfig:
    """Configuration for persistence timer behavior."""
    initial_probe_interval: float = 1.0      # First probe after 1 second
    max_probe_interval: float = 60.0         # Maximum probe interval
    backoff_multiplier: float = 2.0          # Exponential backoff factor
    
 
class TCPPersistenceTimer:
    """
    Implements TCP persistence timer mechanism.
    
    The persistence timer activates when a zero-window is received,
    sending periodic probes to detect window reopening.
    """
    
    def __init__(self, config: Optional[PersistenceTimerConfig] = None):
        self.config = config or PersistenceTimerConfig()
        self.window_state = WindowState.OPEN
        self.current_probe_interval = self.config.initial_probe_interval
        self.probe_count = 0
        self.timer_started_at: Optional[float] = None
        self.last_probe_sent_at: Optional[float] = None
        
    def window_update_received(self, window_size: int):
        """
        Called when an ACK with window size is received.
        
        Args:
            window_size: The advertised window size from the receiver
        """
        if window_size > 0:
            # Window opened - stop persistence timer
            if self.window_state != WindowState.OPEN:
                print(f"✓ Window opened (size={window_size}). "
                      f"Probes sent: {self.probe_count}")
                self._reset_timer()
            self.window_state = WindowState.OPEN
            
        else:
            # Zero window received
            if self.window_state == WindowState.OPEN:
                # Transition to zero-window state, start persistence timer
                print(f"⚠ Zero window received! Starting persistence timer.")
                print(f"  First probe will be sent in {self.config.initial_probe_interval}s")
                self._start_timer()
            else:
                # Still in zero-window, continue probing
                self._schedule_next_probe()
            
            self.window_state = WindowState.ZERO
    
    def _start_timer(self):
        """Initialize persistence timer state."""
        self.timer_started_at = time.time()
        self.current_probe_interval = self.config.initial_probe_interval
        self.probe_count = 0
        
    def _reset_timer(self):
        """Reset persistence timer state."""
        self.timer_started_at = None
        self.last_probe_sent_at = None
        self.current_probe_interval = self.config.initial_probe_interval
        self.probe_count = 0
        
    def _schedule_next_probe(self):
        """Calculate and schedule the next probe with exponential backoff."""
        # Apply exponential backoff
        self.current_probe_interval = min(
            self.current_probe_interval * self.config.backoff_multiplier,
            self.config.max_probe_interval
        )
        print(f"  Window still zero. Next probe in {self.current_probe_interval:.1f}s")
        
    def should_send_probe(self, current_time: float) -> bool:
        """
        Check if a window probe should be sent.
        
        Args:
            current_time: Current time for comparison
            
        Returns:
            True if a probe should be sent
        """
        if self.window_state != WindowState.ZERO:
            return False
            
        reference_time = self.last_probe_sent_at or self.timer_started_at
        if reference_time is None:
            return False
            
        return (current_time - reference_time) >= self.current_probe_interval
        
    def probe_sent(self, current_time: float):
        """Record that a window probe was sent."""
        self.last_probe_sent_at = current_time
        self.probe_count += 1
        print(f"→ Window probe #{self.probe_count} sent")
        
    def get_statistics(self) -> dict:
        """Return current timer statistics."""
        return {
            "state": self.window_state.value,
            "probes_sent": self.probe_count,
            "current_interval": self.current_probe_interval,
            "max_interval": self.config.max_probe_interval,
        }
 
 
def demonstrate_persistence_timer():
    """
    Simulate a zero-window scenario and persistence timer operation.
    """
    print("=" * 70)
    print("TCP Persistence Timer Demonstration")
    print("=" * 70)
    print()
    
    timer = TCPPersistenceTimer(PersistenceTimerConfig(
        initial_probe_interval=1.0,
        max_probe_interval=60.0,
        backoff_multiplier=2.0,
    ))
    
    # Scenario: Normal operation, then zero window
    print("PHASE 1: Normal Operation")
    print("-" * 40)
    timer.window_update_received(65535)  # Normal 64KB window
    print(f"State: {timer.get_statistics()['state']}")
    print()
    
    print("PHASE 2: Zero Window Received")
    print("-" * 40)
    timer.window_update_received(0)  # Buffer full!
    print()
    
    print("PHASE 3: Probe Schedule (simulated)")
    print("-" * 40)
    
    # Simulate probe schedule
    intervals = []
    interval = 1.0
    for i in range(8):
        intervals.append(interval)
        print(f"Probe {i+1}: after {interval:.1f}s (cumulative: {sum(intervals):.1f}s)")
        interval = min(interval * 2, 60.0)
    
    print()
    print(f"After 8 probes: ~{sum(intervals)/60:.1f} minutes elapsed")
    print("Probes continue indefinitely until window opens")
    print()
    
    print("PHASE 4: Window Opens (after probe 4)")
    print("-" * 40)
    # Simulate receiving window update after some probes
    timer.probe_count = 4  # Simulate probes were sent
    timer.window_update_received(16384)  # Window opened!
    
    print()
    print("Final Statistics:", timer.get_statistics())
 
 
if __name__ == "__main__":
    demonstrate_persistence_timer()

Probe Timing and Exponential Backoff

The persistence timer uses exponential backoff similarly to the retransmission timer, but with important differences in purpose and behavior.

Initial Probe Interval:

When a zero window is first received, the sender waits a short time (typically the current RTO or a similar value) before sending the first probe. This allows for the common case where the window opens quickly.

Exponential Backoff:

If the window remains zero, subsequent probes are sent with increasing intervals:

Probe 1: 1 second (or RTO)
Probe 2: 2 seconds
Probe 3: 4 seconds
Probe 4: 8 seconds
etc.

The interval typically caps at 60 seconds to ensure the sender detects window opening within a reasonable time.

Persistence Timer Probe Schedule (Typical Values)
Probe #	Interval	Cumulative Time	Purpose
1	1s	1s	Quick check for transient zero-window
2	2s	3s	Application may be briefly busy
3	4s	7s	Slightly longer wait
4	8s	15s	Application processing may take time
5	16s	31s	Backing off to reduce probe overhead
6	32s	63s	Major processing delay possible
7	60s	123s	Capped at maximum interval
8+	60s	183s+	Continues indefinitely at max interval

Why Probes Continue Indefinitely

Unlike retransmission which gives up after some limit (indicating the path is dead), persistence probes continue forever. This is intentional: a zero-window condition might be legitimate. Perhaps the receiver is processing a complex transaction or waiting for user input. As long as the ACKs come back (even with zero window), the connection is alive and should be maintained.

Comparison: Retransmission Timer vs. Persistence Timer

Aspect	Retransmission Timer	Persistence Timer
Trigger	Data sent, awaiting ACK	Zero window received
Purpose	Recover from segment loss	Detect window opening
Gives Up	Yes (after max retries)	No (continues forever)
Backoff	Exponential	Exponential
Segment Sent	Retransmit unACKed data	Window probe (1 byte)
Success Condition	ACK received	Window > 0
Typical Max	15 retries, ~15 min	No maximum

These two timers handle completely different scenarios but use similar exponential backoff mechanisms to avoid overwhelming the network.

Window Probe Packet Structure

The window probe segment is carefully constructed to elicit a response without causing problems. Let's examine its structure and the considerations involved.

The One-Byte Probe:

The most common implementation sends a probe containing one byte of data from the next sequence number the sender would transmit:

┌─────────────────────────────────────────────────────────┐
│                    TCP Header (20+ bytes)                │
├─────────────────────────────────────────────────────────┤
│ Source Port: 54321        │ Destination Port: 80       │
├─────────────────────────────────────────────────────────┤
│ Sequence Number: 5000 (next byte to send)               │
├─────────────────────────────────────────────────────────┤
│ ACK Number: 12500                                       │
├─────────────────────────────────────────────────────────┤
│ Flags: ACK                                              │
├─────────────────────────────────────────────────────────┤
│ Window: 65535 (sender's advertised window)              │
├─────────────────────────────────────────────────────────┤
│                      Payload: 1 byte                    │
└─────────────────────────────────────────────────────────┘

What Happens When a Probe Arrives

•If Window Is Still Zero: The receiver still has no buffer space. It ACKs the probe with the same ACK number (not accepting the byte) and Window = 0. The sender sees the window is still closed and schedules another probe.
•If Window Has Opened: The receiver has buffer space now. It can accept the probe's data byte. It ACKs with an advanced ACK number and Window > 0. The sender sees the window opened and resumes normal transmission.
•If Window Opened But Probe Byte Not Needed: Even if the window opened, the receiver ACKs with Window > 0. The probe succeeded—the sender knows to resume.
•Receiver Persist Response: The receiver always responds to a probe with its current window state, breaking any potential deadlock.

The Probe Byte Is Actually Data

The probe byte isn't dummy data—it's the actual next byte the sender would transmit. If the receiver can accept it, normal transmission proceeds from that point. If not, the byte will be resent when the window opens. This design elegantly combines the probe with productive data transfer when possible.

Zero-Length Probe Alternative:

Some implementations send probes with zero data bytes but a sequence number equal to SND.NXT - 1 (the last byte that was acknowledged). This is sometimes called a "keep-alive probe" or "window probe with no data."

Sequence Number: 4999  (already ACKed, receiver will re-ACK it)
Payload: 0 bytes

The receiver sees a sequence number it has already acknowledged. Per TCP rules, it must respond with an ACK containing its current window size. This achieves the same goal without potentially transmitting data that can't be accepted.

Trade-offs between approaches:

Probe Type	Advantage	Disadvantage
1-byte data probe	Combines probe with data	May complicate accounting if window is still zero
Zero-byte probe	Clean separation of concerns	Pure overhead, no data transferred

Connection to Silly Window Syndrome

The persistence timer and zero-window situation are closely related to a TCP efficiency problem called Silly Window Syndrome (SWS). Understanding this connection helps clarify why certain TCP behaviors exist.

What Is Silly Window Syndrome?

SWS occurs when the sender transmits and the receiver accepts very small segments, leading to high overhead. Consider this scenario:

Receiver buffer is full → advertises Window = 0
Application reads 1 byte → buffer has 1 byte free
Receiver advertises Window = 1
Sender sends segment with 1 byte of data (but 40+ bytes header!)
Receiver buffer is full again → advertises Window = 0
Repeat...

The result is terrible efficiency: 40+ bytes of TCP/IP headers to deliver 1 byte of actual data.

Silly Window Syndrome Effects

•Extremely low throughput (data/header ratio)
•High CPU overhead (processing many tiny segments)
•Network bandwidth wasted on headers
•Increased latency due to many round-trips
•Can cascade to affect other connections

SWS Avoidance Solutions

•Receiver-side (Clark's solution): Don't advertise window until it's at least MSS or 50% of buffer
•Sender-side (Nagle's algorithm): Don't send small segments if data is outstanding
•Wait for window to reach useful size before advertising
•Combine small application writes into larger segments
•Delay ACKs to allow window to accumulate

Clark's Algorithm (Receiver-Side Prevention):

The receiver implements SWS avoidance by following this rule:

Do not advertise a small window. After advertising a zero window, do not advertise a non-zero window until either:

The window is at least one Maximum Segment Size (MSS), OR

The window is at least 50% of the receive buffer size

This means the receiver might have some buffer space but still advertise Window = 0, waiting until it has enough space to be useful.

Implications for Persistence Timer:

Because of Clark's algorithm, a zero-window condition might last longer than strictly necessary (receiver has some space, but not enough to advertise). The persistence timer's probing ensures that even when the receiver is waiting to accumulate buffer space, the sender eventually learns the window has opened.

sws_avoidance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
"""
Silly Window Syndrome Avoidance - Clark's Algorithm
 
Demonstrates receiver-side SWS avoidance and its interaction
with the persistence timer mechanism.
"""
 
from dataclasses import dataclass
from typing import Optional
 
 
@dataclass
class ReceiverBuffer:
    """Simulates a TCP receiver buffer with SWS avoidance."""
    
    total_size: int = 65536      # 64KB buffer
    mss: int = 1460              # Maximum Segment Size
    used: int = 0                # Bytes currently in buffer
    
    @property
    def available(self) -> int:
        """Actual available space in buffer."""
        return self.total_size - self.used
    
    def advertised_window(self) -> int:
        """
        Calculate the window to advertise, applying Clark's algorithm.
        
        Clark's Rule: After advertising window = 0, don't advertise
        a non-zero window until it reaches MSS or 50% of buffer.
        """
        available = self.available
        
        # Threshold: larger of (MSS, 50% of buffer)
        threshold = max(self.mss, self.total_size // 2)
        
        if available >= threshold:
            # Window is useful, advertise it
            return available
        elif available == self.total_size:
            # Buffer completely empty, always advertise
            return available
        else:
            # Window too small, advertise zero (SWS avoidance)
            return 0
    
    def receive_data(self, bytes_received: int):
        """Simulate receiving data into buffer."""
        self.used = min(self.total_size, self.used + bytes_received)
        
    def application_read(self, bytes_read: int):
        """Simulate application reading from buffer."""
        self.used = max(0, self.used - bytes_read)
 
 
def demonstrate_sws_avoidance():
    """Show how SWS avoidance affects window advertisements."""
    
    print("=" * 70)
    print("Clark's Algorithm: SWS Avoidance Demonstration")
    print("=" * 70)
    print()
    
    buffer = ReceiverBuffer(total_size=65536, mss=1460)
    
    print(f"Buffer Configuration:")
    print(f"  Total size: {buffer.total_size:,} bytes")
    print(f"  MSS: {buffer.mss:,} bytes")
    print(f"  SWS Threshold: max(MSS, 50% buffer) = "
          f"max({buffer.mss}, {buffer.total_size//2}) = "
          f"{max(buffer.mss, buffer.total_size//2):,} bytes")
    print()
    
    scenarios = [
        ("Initial state (empty buffer)", lambda b: None),
        ("Buffer fills (60KB received)", lambda b: b.receive_data(60000)),
        ("Small read (100 bytes)", lambda b: b.application_read(100)),
        ("Another small read (500 bytes)", lambda b: b.application_read(500)),
        ("Larger read (1000 bytes)", lambda b: b.application_read(1000)),
        ("Read enough for MSS (1500 bytes more)", lambda b: b.application_read(1500)),
        ("Application processes batch (30KB)", lambda b: b.application_read(30000)),
    ]
    
    print(f"{'Scenario':<45} {'Used':>10} {'Available':>10} {'Advertised':>12}")
    print("-" * 77)
    
    for description, action in scenarios:
        action(buffer)
        advertised = buffer.advertised_window()
        status = "" if advertised > 0 else " ← Zero window (SWS)"
        
        print(f"{description:<45} {buffer.used:>10,} {buffer.available:>10,} "
              f"{advertised:>12,}{status}")
    
    print()
    print("Key Observations:")
    print("• Even with some space available, window stays zero (SWS avoidance)")
    print("• Window only opens when available space reaches threshold")
    print("• This is why persistence timer may probe even when receiver has space")
    print("• The probe eventually gets the window update when space is sufficient")
 
 
if __name__ == "__main__":
    demonstrate_sws_avoidance()

Real-World Persistence Timer Behavior

Let's examine how persistence timers behave in real TCP implementations and how to observe and debug them.

Linux Implementation Details:

Linux uses the persistence timer as part of its TCP stack. Key characteristics:

# There's no direct sysctl for persist timer, but related settings:

# Socket option SO_RCVBUF affects when zero-window occurs
sysctl net.core.rmem_max
sysctl net.core.rmem_default

# Related: TCP window scaling
sysctl net.ipv4.tcp_window_scaling

# Zero-window probe timeout starts at RTO value
# and backs off exponentially up to ~60 seconds

Observing Persistence Timer with ss:

# Show detailed TCP socket info including timer state
ss -ti state established

# Look for connections in persist mode
# Example output might show:
#   timer:(persist,1min23sec,7)
#   This means: persist timer active, 1m23s until next probe, 7 probes sent

Debugging Zero-Window Situations
Observation	Likely Cause	Investigation Steps
Connection stuck, no data flowing	Zero-window deadlock (lost update)	Check for persist timer activity; capture traffic to see probes
Periodic small packets with no progress	Persistence probes being sent	Normal behavior; investigate why receiver window stays zero
Receiver shows window=0 for long periods	Application not reading data	Check application; look for blocking calls, deadlocks, slow processing
Sender shows persist timer with high count	Receiver consistently has zero window	Receiver buffer too small or application too slow; tune SO_RCVBUF
Brief zero-windows but quick recovery	Normal operation under load	This is expected; buffer temporarily fills then drains

Wireshark Analysis

In Wireshark, look for '[TCP ZeroWindow]' and '[TCP Window Update]' info column messages. You can also filter with 'tcp.window_size == 0' to find all zero-window advertisements. Following TCP streams helps visualize the probe-response pattern during persist conditions.

Common Causes of Persistent Zero-Window:

Slow Application Processing
- Application reads data slower than it arrives
- Database queries blocking the read loop
- Single-threaded application doing heavy computation
Application Bug
- Application deadlocked and not reading
- File descriptor leak exhausting resources
- Memory pressure causing slow page faults during reads
Intentional Flow Control
- Receiver deliberately signaling to slow down
- Back-pressure mechanism working as designed
- Rate limiting at application layer
Buffer Tuning Issues
- Receive buffer too small for workload
- Large latency × bandwidth product requires larger buffers
- Default buffer sizes insufficient for high-BDP paths

Resolution Approaches:

Increase buffer size: setsockopt(SO_RCVBUF, larger_value)
Application optimization: Process data faster, use async I/O
Diagnostic logging: Add monitoring when window approaches zero
Architecture review: Consider back-pressure aware designs

Summary: Mastering the Persistence Timer

We've explored TCP's persistence timer in depth. Let's consolidate the key concepts:

Key Takeaways

•The Deadlock Problem — When the receiver advertises window=0 and the subsequent window update is lost, neither side will send, creating a permanent deadlock without intervention.
•Window Probes — The persistence timer sends periodic probes (typically 1 byte of data) that force the receiver to respond with its current window state, breaking any deadlock.
•Exponential Backoff — Probes start frequent and back off exponentially (capping around 60 seconds) to balance responsiveness with efficiency.
•Indefinite Probing — Unlike retransmission, persistence probes continue forever because a zero-window might be a legitimate, long-lasting condition.
•SWS Avoidance Connection — The receiver may advertise window=0 even when it has some space (Clark's algorithm) to prevent tiny inefficient transfers.
•Debugging Skills — Recognize persist timer behavior in tools like ss and Wireshark; investigate slow applications when zero-window persists.

What's Next:

The persistence timer handles flow control scenarios, but what about connections that go idle for long periods? How does TCP know if a peer is still alive when no data is flowing? The next page explores the keepalive timer—TCP's mechanism for detecting dead connections and maintaining long-lived sessions.

Page Complete

You now understand TCP's persistence timer thoroughly—from the deadlock scenario it prevents to the window probing mechanism it employs. This knowledge is essential for debugging flow control issues and understanding TCP's resilience to packet loss in control messages.

2 / 5

Loading learning content...

Computer NetworksTCP Timers

TCP Timers: The Temporal Guardians of Reliable Communication

LevelIntermediate

Duration90 mins

TopicTCP Timers

2 / 5

Persistence Timer: Breaking the Zero-Window Deadlock

The Hidden Deadlock Threat

But what happens if that window update is lost?

This is where the persistence timer enters the picture. It's TCP's insurance policy against the zero-window deadlock, ensuring that connections can always recover from lost window updates.

What You Will Learn

The Zero-Window Deadlock Scenario

To understand the persistence timer, we must first understand the problem it solves. Let's trace through the sequence of events that leads to deadlock:

Initial State:

Sender has data to transmit
Receiver has a receive buffer (let's say 64KB)
Receiver advertises a 64KB window
Everything is working normally

The Sequence Leading to Deadlock:

deadlock_scenario.txt

Diagram

Timeline: Zero-Window Deadlock Development
 
Sender (Host A)                              Receiver (Host B)
     |                                              |
     |------- Data segments (64KB total) --------->|  ← Receiver buffer fills
     |                                              |  Application busy, not reading
     |                                              |
     |<-------- ACK, Window = 0 -------------------|  ← Buffer full!
     |                                              |
     |  [Sender pauses, waiting for window > 0]    |
     |                                              |
     |                                              |  [Time passes...]
     |                                              |  Application reads 16KB
     |                                              |
     |<-------- ACK, Window = 16384 ---------------|  ← Window update sent!
     |          ╲                                   |
     |           ╲  THIS PACKET IS LOST            |
     |            ╲ (router congestion)            |
     |             ╲                               |
     |              💀                             |
     |                                              |
     |  [Sender still waiting for window > 0]      |  [Receiver waiting for data]
     |  [Sender thinks: "Buffer is still full"]    |  [Receiver thinks: "I told them
     |  [Will wait forever...]                     |   I have space..."]
     |                                              |
     |         🔒 DEADLOCK 🔒                        |
     |                                              |
     |  Neither side sends anything.               |
     |  Connection is stuck forever.               |

Why This Is a Genuine Problem:

This asymmetry creates the deadlock potential.

ACKs Are Not Reliably Delivered

The Persistence Timer Solution

The solution is beautifully simple: periodically probe the receiver to check if the window is still zero. This is the role of the persistence timer.

Window Probe Characteristics:

Anatomy of a Window Probe

•Contains Data — A window probe typically contains one byte of data beyond what has been acknowledged. This byte may or may not be accepted depending on buffer state.
•Triggers ACK — Because it contains a sequence number, the receiver must respond with an ACK, which includes the current window size.
•Zero-Data Probe Option — Some implementations send a zero-byte probe (same sequence number), which also triggers an ACK without transmitting new data.
•Persistent — The sender continues sending probes (with exponential backoff) as long as the window remains zero. The connection stays alive.
•Unbounded — Unlike retransmissions, which eventually give up, persistence probes continue indefinitely. A zero-window situation might be legitimate and long-lasting.

persistence_timer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
"""
TCP Persistence Timer Implementation
 
This module demonstrates the persistence timer mechanism that prevents
deadlock when the receiver advertises a zero window.
"""
 
from dataclasses import dataclass
from typing import Callable, Optional
from enum import Enum
import time
 
 
class WindowState(Enum):
    OPEN = "open"           # Normal transmission allowed
    ZERO = "zero"           # Window closed, probing active
    PROBING = "probing"     # Currently sending probe
 
 
@dataclass
class PersistenceTimerConfig:
    """Configuration for persistence timer behavior."""
    initial_probe_interval: float = 1.0      # First probe after 1 second
    max_probe_interval: float = 60.0         # Maximum probe interval
    backoff_multiplier: float = 2.0          # Exponential backoff factor
    
 
class TCPPersistenceTimer:
    """
    Implements TCP persistence timer mechanism.
    
    The persistence timer activates when a zero-window is received,
    sending periodic probes to detect window reopening.
    """
    
    def __init__(self, config: Optional[PersistenceTimerConfig] = None):
        self.config = config or PersistenceTimerConfig()
        self.window_state = WindowState.OPEN
        self.current_probe_interval = self.config.initial_probe_interval
        self.probe_count = 0
        self.timer_started_at: Optional[float] = None
        self.last_probe_sent_at: Optional[float] = None
        
    def window_update_received(self, window_size: int):
        """
        Called when an ACK with window size is received.
        
        Args:
            window_size: The advertised window size from the receiver
        """
        if window_size > 0:
            # Window opened - stop persistence timer
            if self.window_state != WindowState.OPEN:
                print(f"✓ Window opened (size={window_size}). "
                      f"Probes sent: {self.probe_count}")
                self._reset_timer()
            self.window_state = WindowState.OPEN
            
        else:
            # Zero window received
            if self.window_state == WindowState.OPEN:
                # Transition to zero-window state, start persistence timer
                print(f"⚠ Zero window received! Starting persistence timer.")
                print(f"  First probe will be sent in {self.config.initial_probe_interval}s")
                self._start_timer()
            else:
                # Still in zero-window, continue probing
                self._schedule_next_probe()
            
            self.window_state = WindowState.ZERO
    
    def _start_timer(self):
        """Initialize persistence timer state."""
        self.timer_started_at = time.time()
        self.current_probe_interval = self.config.initial_probe_interval
        self.probe_count = 0
        
    def _reset_timer(self):
        """Reset persistence timer state."""
        self.timer_started_at = None
        self.last_probe_sent_at = None
        self.current_probe_interval = self.config.initial_probe_interval
        self.probe_count = 0
        
    def _schedule_next_probe(self):
        """Calculate and schedule the next probe with exponential backoff."""
        # Apply exponential backoff
        self.current_probe_interval = min(
            self.current_probe_interval * self.config.backoff_multiplier,
            self.config.max_probe_interval
        )
        print(f"  Window still zero. Next probe in {self.current_probe_interval:.1f}s")
        
    def should_send_probe(self, current_time: float) -> bool:
        """
        Check if a window probe should be sent.
        
        Args:
            current_time: Current time for comparison
            
        Returns:
            True if a probe should be sent
        """
        if self.window_state != WindowState.ZERO:
            return False
            
        reference_time = self.last_probe_sent_at or self.timer_started_at
        if reference_time is None:
            return False
            
        return (current_time - reference_time) >= self.current_probe_interval
        
    def probe_sent(self, current_time: float):
        """Record that a window probe was sent."""
        self.last_probe_sent_at = current_time
        self.probe_count += 1
        print(f"→ Window probe #{self.probe_count} sent")
        
    def get_statistics(self) -> dict:
        """Return current timer statistics."""
        return {
            "state": self.window_state.value,
            "probes_sent": self.probe_count,
            "current_interval": self.current_probe_interval,
            "max_interval": self.config.max_probe_interval,
        }
 
 
def demonstrate_persistence_timer():
    """
    Simulate a zero-window scenario and persistence timer operation.
    """
    print("=" * 70)
    print("TCP Persistence Timer Demonstration")
    print("=" * 70)
    print()
    
    timer = TCPPersistenceTimer(PersistenceTimerConfig(
        initial_probe_interval=1.0,
        max_probe_interval=60.0,
        backoff_multiplier=2.0,
    ))
    
    # Scenario: Normal operation, then zero window
    print("PHASE 1: Normal Operation")
    print("-" * 40)
    timer.window_update_received(65535)  # Normal 64KB window
    print(f"State: {timer.get_statistics()['state']}")
    print()
    
    print("PHASE 2: Zero Window Received")
    print("-" * 40)
    timer.window_update_received(0)  # Buffer full!
    print()
    
    print("PHASE 3: Probe Schedule (simulated)")
    print("-" * 40)
    
    # Simulate probe schedule
    intervals = []
    interval = 1.0
    for i in range(8):
        intervals.append(interval)
        print(f"Probe {i+1}: after {interval:.1f}s (cumulative: {sum(intervals):.1f}s)")
        interval = min(interval * 2, 60.0)
    
    print()
    print(f"After 8 probes: ~{sum(intervals)/60:.1f} minutes elapsed")
    print("Probes continue indefinitely until window opens")
    print()
    
    print("PHASE 4: Window Opens (after probe 4)")
    print("-" * 40)
    # Simulate receiving window update after some probes
    timer.probe_count = 4  # Simulate probes were sent
    timer.window_update_received(16384)  # Window opened!
    
    print()
    print("Final Statistics:", timer.get_statistics())
 
 
if __name__ == "__main__":
    demonstrate_persistence_timer()

Probe Timing and Exponential Backoff

The persistence timer uses exponential backoff similarly to the retransmission timer, but with important differences in purpose and behavior.

Initial Probe Interval:

Exponential Backoff:

If the window remains zero, subsequent probes are sent with increasing intervals:

Probe 1: 1 second (or RTO)
Probe 2: 2 seconds
Probe 3: 4 seconds
Probe 4: 8 seconds
etc.

The interval typically caps at 60 seconds to ensure the sender detects window opening within a reasonable time.

Persistence Timer Probe Schedule (Typical Values)
Probe #	Interval	Cumulative Time	Purpose
1	1s	1s	Quick check for transient zero-window
2	2s	3s	Application may be briefly busy
3	4s	7s	Slightly longer wait
4	8s	15s	Application processing may take time
5	16s	31s	Backing off to reduce probe overhead
6	32s	63s	Major processing delay possible
7	60s	123s	Capped at maximum interval
8+	60s	183s+	Continues indefinitely at max interval

Why Probes Continue Indefinitely

Comparison: Retransmission Timer vs. Persistence Timer

Aspect	Retransmission Timer	Persistence Timer
Trigger	Data sent, awaiting ACK	Zero window received
Purpose	Recover from segment loss	Detect window opening
Gives Up	Yes (after max retries)	No (continues forever)
Backoff	Exponential	Exponential
Segment Sent	Retransmit unACKed data	Window probe (1 byte)
Success Condition	ACK received	Window > 0
Typical Max	15 retries, ~15 min	No maximum

These two timers handle completely different scenarios but use similar exponential backoff mechanisms to avoid overwhelming the network.

Window Probe Packet Structure

The window probe segment is carefully constructed to elicit a response without causing problems. Let's examine its structure and the considerations involved.

The One-Byte Probe:

The most common implementation sends a probe containing one byte of data from the next sequence number the sender would transmit:

┌─────────────────────────────────────────────────────────┐
│                    TCP Header (20+ bytes)                │
├─────────────────────────────────────────────────────────┤
│ Source Port: 54321        │ Destination Port: 80       │
├─────────────────────────────────────────────────────────┤
│ Sequence Number: 5000 (next byte to send)               │
├─────────────────────────────────────────────────────────┤
│ ACK Number: 12500                                       │
├─────────────────────────────────────────────────────────┤
│ Flags: ACK                                              │
├─────────────────────────────────────────────────────────┤
│ Window: 65535 (sender's advertised window)              │
├─────────────────────────────────────────────────────────┤
│                      Payload: 1 byte                    │
└─────────────────────────────────────────────────────────┘

What Happens When a Probe Arrives

•If Window Is Still Zero: The receiver still has no buffer space. It ACKs the probe with the same ACK number (not accepting the byte) and Window = 0. The sender sees the window is still closed and schedules another probe.
•If Window Has Opened: The receiver has buffer space now. It can accept the probe's data byte. It ACKs with an advanced ACK number and Window > 0. The sender sees the window opened and resumes normal transmission.
•If Window Opened But Probe Byte Not Needed: Even if the window opened, the receiver ACKs with Window > 0. The probe succeeded—the sender knows to resume.
•Receiver Persist Response: The receiver always responds to a probe with its current window state, breaking any potential deadlock.

The Probe Byte Is Actually Data

Zero-Length Probe Alternative:

Sequence Number: 4999  (already ACKed, receiver will re-ACK it)
Payload: 0 bytes

Trade-offs between approaches:

Probe Type	Advantage	Disadvantage
1-byte data probe	Combines probe with data	May complicate accounting if window is still zero
Zero-byte probe	Clean separation of concerns	Pure overhead, no data transferred

Connection to Silly Window Syndrome

What Is Silly Window Syndrome?

SWS occurs when the sender transmits and the receiver accepts very small segments, leading to high overhead. Consider this scenario:

Receiver buffer is full → advertises Window = 0
Application reads 1 byte → buffer has 1 byte free
Receiver advertises Window = 1
Sender sends segment with 1 byte of data (but 40+ bytes header!)
Receiver buffer is full again → advertises Window = 0
Repeat...

The result is terrible efficiency: 40+ bytes of TCP/IP headers to deliver 1 byte of actual data.

Silly Window Syndrome Effects

•Extremely low throughput (data/header ratio)
•High CPU overhead (processing many tiny segments)
•Network bandwidth wasted on headers
•Increased latency due to many round-trips
•Can cascade to affect other connections

SWS Avoidance Solutions

•Receiver-side (Clark's solution): Don't advertise window until it's at least MSS or 50% of buffer
•Sender-side (Nagle's algorithm): Don't send small segments if data is outstanding
•Wait for window to reach useful size before advertising
•Combine small application writes into larger segments
•Delay ACKs to allow window to accumulate

Clark's Algorithm (Receiver-Side Prevention):

The receiver implements SWS avoidance by following this rule:

Do not advertise a small window. After advertising a zero window, do not advertise a non-zero window until either:

The window is at least one Maximum Segment Size (MSS), OR

The window is at least 50% of the receive buffer size

This means the receiver might have some buffer space but still advertise Window = 0, waiting until it has enough space to be useful.

Implications for Persistence Timer:

sws_avoidance.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
"""
Silly Window Syndrome Avoidance - Clark's Algorithm
 
Demonstrates receiver-side SWS avoidance and its interaction
with the persistence timer mechanism.
"""
 
from dataclasses import dataclass
from typing import Optional
 
 
@dataclass
class ReceiverBuffer:
    """Simulates a TCP receiver buffer with SWS avoidance."""
    
    total_size: int = 65536      # 64KB buffer
    mss: int = 1460              # Maximum Segment Size
    used: int = 0                # Bytes currently in buffer
    
    @property
    def available(self) -> int:
        """Actual available space in buffer."""
        return self.total_size - self.used
    
    def advertised_window(self) -> int:
        """
        Calculate the window to advertise, applying Clark's algorithm.
        
        Clark's Rule: After advertising window = 0, don't advertise
        a non-zero window until it reaches MSS or 50% of buffer.
        """
        available = self.available
        
        # Threshold: larger of (MSS, 50% of buffer)
        threshold = max(self.mss, self.total_size // 2)
        
        if available >= threshold:
            # Window is useful, advertise it
            return available
        elif available == self.total_size:
            # Buffer completely empty, always advertise
            return available
        else:
            # Window too small, advertise zero (SWS avoidance)
            return 0
    
    def receive_data(self, bytes_received: int):
        """Simulate receiving data into buffer."""
        self.used = min(self.total_size, self.used + bytes_received)
        
    def application_read(self, bytes_read: int):
        """Simulate application reading from buffer."""
        self.used = max(0, self.used - bytes_read)
 
 
def demonstrate_sws_avoidance():
    """Show how SWS avoidance affects window advertisements."""
    
    print("=" * 70)
    print("Clark's Algorithm: SWS Avoidance Demonstration")
    print("=" * 70)
    print()
    
    buffer = ReceiverBuffer(total_size=65536, mss=1460)
    
    print(f"Buffer Configuration:")
    print(f"  Total size: {buffer.total_size:,} bytes")
    print(f"  MSS: {buffer.mss:,} bytes")
    print(f"  SWS Threshold: max(MSS, 50% buffer) = "
          f"max({buffer.mss}, {buffer.total_size//2}) = "
          f"{max(buffer.mss, buffer.total_size//2):,} bytes")
    print()
    
    scenarios = [
        ("Initial state (empty buffer)", lambda b: None),
        ("Buffer fills (60KB received)", lambda b: b.receive_data(60000)),
        ("Small read (100 bytes)", lambda b: b.application_read(100)),
        ("Another small read (500 bytes)", lambda b: b.application_read(500)),
        ("Larger read (1000 bytes)", lambda b: b.application_read(1000)),
        ("Read enough for MSS (1500 bytes more)", lambda b: b.application_read(1500)),
        ("Application processes batch (30KB)", lambda b: b.application_read(30000)),
    ]
    
    print(f"{'Scenario':<45} {'Used':>10} {'Available':>10} {'Advertised':>12}")
    print("-" * 77)
    
    for description, action in scenarios:
        action(buffer)
        advertised = buffer.advertised_window()
        status = "" if advertised > 0 else " ← Zero window (SWS)"
        
        print(f"{description:<45} {buffer.used:>10,} {buffer.available:>10,} "
              f"{advertised:>12,}{status}")
    
    print()
    print("Key Observations:")
    print("• Even with some space available, window stays zero (SWS avoidance)")
    print("• Window only opens when available space reaches threshold")
    print("• This is why persistence timer may probe even when receiver has space")
    print("• The probe eventually gets the window update when space is sufficient")
 
 
if __name__ == "__main__":
    demonstrate_sws_avoidance()

Real-World Persistence Timer Behavior

Let's examine how persistence timers behave in real TCP implementations and how to observe and debug them.

Linux Implementation Details:

Linux uses the persistence timer as part of its TCP stack. Key characteristics:

# There's no direct sysctl for persist timer, but related settings:

# Socket option SO_RCVBUF affects when zero-window occurs
sysctl net.core.rmem_max
sysctl net.core.rmem_default

# Related: TCP window scaling
sysctl net.ipv4.tcp_window_scaling

# Zero-window probe timeout starts at RTO value
# and backs off exponentially up to ~60 seconds

Observing Persistence Timer with ss:

# Show detailed TCP socket info including timer state
ss -ti state established

# Look for connections in persist mode
# Example output might show:
#   timer:(persist,1min23sec,7)
#   This means: persist timer active, 1m23s until next probe, 7 probes sent

Debugging Zero-Window Situations
Observation	Likely Cause	Investigation Steps
Connection stuck, no data flowing	Zero-window deadlock (lost update)	Check for persist timer activity; capture traffic to see probes
Periodic small packets with no progress	Persistence probes being sent	Normal behavior; investigate why receiver window stays zero
Receiver shows window=0 for long periods	Application not reading data	Check application; look for blocking calls, deadlocks, slow processing
Sender shows persist timer with high count	Receiver consistently has zero window	Receiver buffer too small or application too slow; tune SO_RCVBUF
Brief zero-windows but quick recovery	Normal operation under load	This is expected; buffer temporarily fills then drains

Wireshark Analysis

Common Causes of Persistent Zero-Window:

Slow Application Processing
- Application reads data slower than it arrives
- Database queries blocking the read loop
- Single-threaded application doing heavy computation
Application Bug
- Application deadlocked and not reading
- File descriptor leak exhausting resources
- Memory pressure causing slow page faults during reads
Intentional Flow Control
- Receiver deliberately signaling to slow down
- Back-pressure mechanism working as designed
- Rate limiting at application layer
Buffer Tuning Issues
- Receive buffer too small for workload
- Large latency × bandwidth product requires larger buffers
- Default buffer sizes insufficient for high-BDP paths

Resolution Approaches:

Increase buffer size: setsockopt(SO_RCVBUF, larger_value)
Application optimization: Process data faster, use async I/O
Diagnostic logging: Add monitoring when window approaches zero
Architecture review: Consider back-pressure aware designs

Summary: Mastering the Persistence Timer

We've explored TCP's persistence timer in depth. Let's consolidate the key concepts:

Key Takeaways

•The Deadlock Problem — When the receiver advertises window=0 and the subsequent window update is lost, neither side will send, creating a permanent deadlock without intervention.
•Window Probes — The persistence timer sends periodic probes (typically 1 byte of data) that force the receiver to respond with its current window state, breaking any deadlock.
•Exponential Backoff — Probes start frequent and back off exponentially (capping around 60 seconds) to balance responsiveness with efficiency.
•Indefinite Probing — Unlike retransmission, persistence probes continue forever because a zero-window might be a legitimate, long-lasting condition.
•SWS Avoidance Connection — The receiver may advertise window=0 even when it has some space (Clark's algorithm) to prevent tiny inefficient transfers.
•Debugging Skills — Recognize persist timer behavior in tools like ss and Wireshark; investigate slow applications when zero-window persists.

What's Next:

Page Complete

2 / 5