Tcp Timers - Learning Module

Loading content...

0/228

TIME_WAIT Timer: The Guardian of Connection Integrity

The Final Guardian Before Connection Closure

You might think that once a TCP connection has exchanged FIN and ACK segments in both directions, it's done. The connection is closed, resources are freed, and both endpoints can move on. But TCP has one more safeguard: the TIME_WAIT state.

After the active closer (the side that initiates the FIN) sends its final ACK, it doesn't immediately release the connection. Instead, it enters TIME_WAIT and waits for a period called 2MSL (twice the Maximum Segment Lifetime). Only after this wait does the connection truly close.

This waiting period serves two critical purposes:

Reliable termination — Ensure the final ACK is delivered
Old duplicate protection — Prevent segments from an old connection from corrupting a new connection using the same port pair

Understanding TIME_WAIT is essential for network programming, especially for high-volume servers that may experience TIME_WAIT accumulation.

What You Will Learn

By the end of this page, you will understand: why TIME_WAIT exists and the problems it solves, the 2MSL timing and how it's calculated, the difference between active closer and passive closer, the TIME_WAIT accumulation problem on busy servers, practical solutions including SO_REUSEADDR and tcp_tw_reuse, and how to monitor and troubleshoot TIME_WAIT issues.

Connection Termination Review

Before diving into TIME_WAIT, let's review how TCP connections terminate. Unlike connection establishment (which uses a three-way handshake), termination uses a four-way handshake (also called four-way FIN exchange):

The Four-Way Handshake:

FIN → The initiator (active closer) sends FIN, entering FIN_WAIT_1
← ACK The receiver acknowledges, initiator moves to FIN_WAIT_2
← FIN The receiver sends its own FIN (when ready), entering LAST_ACK
ACK → The initiator acknowledges, entering TIME_WAIT

The receiver, upon receiving the final ACK, immediately closes. But the initiator waits in TIME_WAIT.

termination_sequence.txt

Diagram

TCP Connection Termination: Four-Way Handshake with TIME_WAIT
 
    Active Closer (Client)              Passive Closer (Server)
         |                                       |
         |  ESTABLISHED                          | ESTABLISHED
         |                                       |
         |----------- FIN (Seq=X) ------------->| 
         |  FIN_WAIT_1                           | CLOSE_WAIT
         |                                       |
         |<---------- ACK (Ack=X+1) ------------|
         |  FIN_WAIT_2                           |
         |                                       | [Server app calls close()]
         |                                       |
         |<---------- FIN (Seq=Y) --------------|
         |                                       | LAST_ACK
         |                                       |
         |----------- ACK (Ack=Y+1) ----------->|
         |  TIME_WAIT                            | CLOSED ✓
         |                                       |
         |  [Waits 2MSL...]                      |
         |                                       |
         |  [2MSL timer expires]                 |
         |  CLOSED ✓                             |
         
         Timeline:
         ─────────────────────────────────────────────────
         │<──── Active termination ────>│<─ 2MSL wait ─>│
                                        │               │
                                        Passive closer  Active closer
                                        is done         can finally
                                                        close

Who Is the Active Closer?

The active closer is whichever side initiates termination by sending the first FIN. This could be the client or the server depending on the application protocol. HTTP/1.0 servers typically close first (server is active closer). HTTP/1.1 with keep-alive often has the client close first (client is active closer).

Why TIME_WAIT Is Necessary

TIME_WAIT might seem like unnecessary delay, but it serves two critical purposes that maintaining network integrity:

Purpose 1: Reliable Delivery of Final ACK

Consider what happens if the final ACK is lost:

Active Closer          Network          Passive Closer
     |                                        |
     |<------------- FIN --------------------|
     |                                        | LAST_ACK
     |------------ ACK ----X (LOST)           |
     | CLOSED                                 |
     |                                        | [No ACK received]
     |                                        | [Retransmits FIN]
     |<------------- FIN --------------------|
     |                                        |
     | [If already CLOSED, would reply RST]   |
     | [Passive closer thinks connection is   |
     |  broken, not gracefully terminated]    |

By staying in TIME_WAIT, the active closer can retransmit the final ACK if the passive closer retransmits its FIN. This ensures both sides agree that the connection terminated gracefully.

Purpose 2: Preventing Old Duplicate Segments

This is the more subtle and critical purpose. Consider this scenario:

Connection A: Host1:5000 ↔ Host2:80 exists and exchanges data
Connection A closes
Immediately, Connection B: Host1:5000 ↔ Host2:80 opens (same port pair!)
A delayed segment from Connection A (stuck in a router somewhere) arrives
Connection B accepts this old segment as valid data!

This "old duplicate" problem could corrupt the new connection's data stream. TIME_WAIT prevents this by keeping the port pair "reserved" long enough for any old segments to expire.

old_duplicate_problem.txt

Diagram

The Old Duplicate Problem (Without TIME_WAIT)
 
Time    Connection A                    Connection B (Same port pair)
        Host1:5000 ↔ Host2:80          Host1:5000 ↔ Host2:80
────────────────────────────────────────────────────────────────────
t=0     Sends Data(Seq=1000)            
        Segment takes slow path          
        through congested router...      
        
t=1     Connection closes               
        (No TIME_WAIT in                
        this hypothetical)               
        
t=2                                     Opens (same ports)
                                        Seq numbers start fresh
                                        (e.g., ISN=950)
                                        
t=3                                     Sends Data(Seq=1000)
                                        Receives ACK(Ack=1500)
                                        
t=4     Old segment finally             
        arrives! ─────────────────────> Arrives at Host2
                                        Seq=1000 (within window!)
                                        
                                        💥 Host2 ACCEPTS old data
                                        as valid!
                                        
                                        Data stream corrupted.
 
───────────────────────────────────────────────────────────────────
 
With TIME_WAIT: Host1 holds port 5000 for 2MSL after Connection A 
closes. The old segment expires in transit. Connection B must wait
or use a different port. Problem prevented.

Sequence Number Wrap-Around

Modern high-speed networks can exhaust 32-bit sequence numbers in seconds. TCP Timestamps (RFC 7323) add protection by requiring that arriving segments have reasonable timestamps, providing additional defense against old duplicate segments even after TIME_WAIT expires.

Understanding Maximum Segment Lifetime (MSL)

The TIME_WAIT duration is defined as 2MSL (twice the Maximum Segment Lifetime). Understanding MSL clarifies why TIME_WAIT lasts as long as it does.

What Is MSL?

MSL is the maximum time a TCP segment can exist in the network before being discarded. It's a conservative upper bound on how long any segment might survive in transit—accounting for slow paths, queue delays, and routing loops.

The IP header's TTL (Time To Live) field limits segment lifetime at the network layer. Each router decrements TTL; when it reaches zero, the segment is discarded. However, TTL is measured in hops, not time. MSL represents the time-based equivalent.

Standard MSL Values:

Specification	MSL Value	TIME_WAIT Duration
RFC 793 (original)	2 minutes	4 minutes
BSD implementations	30 seconds	60 seconds
Linux	30 seconds	60 seconds
Windows	2 minutes	4 minutes

The 2-minute value from RFC 793 was chosen conservatively for 1980s networks. Most modern implementations use 30 seconds, making TIME_WAIT 60 seconds—still conservative for modern low-latency networks.

Why 2MSL (Not Just MSL)?

•First MSL — Allows time for the final ACK to reach the passive closer (traveling one direction)
•Second MSL — Allows time for a retransmitted FIN to return if the ACK was lost (traveling back)
•Total Coverage — The round trip ensures both the final ACK and any response to it can complete within the wait period
•Old Segment Expiry — Any segment from the old connection has at most MSL remaining lifetime when TIME_WAIT starts; waiting 2MSL guarantees it has expired

msl_timing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
"""
Maximum Segment Lifetime (MSL) and TIME_WAIT Duration
 
Demonstrates the relationship between MSL and TIME_WAIT across
different operating systems and how connection closure timing works.
"""
 
from dataclasses import dataclass
from typing import Dict
 
 
@dataclass
class TCPTimingConfig:
    """TCP timing configuration for an operating system."""
    name: str
    msl_seconds: int
    additional_notes: str = ""
    
    @property
    def time_wait_seconds(self) -> int:
        return 2 * self.msl_seconds
    
    @property
    def time_wait_formatted(self) -> str:
        tw = self.time_wait_seconds
        if tw >= 60:
            return f"{tw // 60}m {tw % 60}s"
        return f"{tw}s"
 
 
# Common OS configurations
OS_CONFIGS = {
    "linux_default": TCPTimingConfig(
        name="Linux (default)",
        msl_seconds=30,
        additional_notes="Can be affected by tcp_fin_timeout (not MSL directly)"
    ),
    "linux_lowlatency": TCPTimingConfig(
        name="Linux (tuned)",
        msl_seconds=15,
        additional_notes="Sometimes reduced for high-frequency trading"
    ),
    "windows_default": TCPTimingConfig(
        name="Windows (default)",
        msl_seconds=120,
        additional_notes="Registry: TcpTimedWaitDelay"
    ),
    "macos_default": TCPTimingConfig(
        name="macOS (default)",
        msl_seconds=15,
        additional_notes="Shorter than traditional 30 seconds"
    ),
    "rfc793_spec": TCPTimingConfig(
        name="RFC 793 Specification",
        msl_seconds=120,
        additional_notes="Original conservative specification"
    ),
    "bsd_typical": TCPTimingConfig(
        name="BSD (typical)",
        msl_seconds=30,
        additional_notes="FreeBSD, OpenBSD"
    ),
}
 
 
def print_msl_comparison():
    """Print MSL and TIME_WAIT comparison across operating systems."""
    
    print("=" * 75)
    print("Maximum Segment Lifetime (MSL) and TIME_WAIT Duration")
    print("=" * 75)
    print()
    
    print(f"{'OS/Specification':<25} {'MSL':<10} {'TIME_WAIT':<12} {'Notes'}")
    print("-" * 75)
    
    for config in OS_CONFIGS.values():
        print(f"{config.name:<25} {config.msl_seconds:>4}s     "
              f"{config.time_wait_formatted:<12} {config.additional_notes}")
    
    print()
    print("Formula: TIME_WAIT = 2 × MSL")
    print()
 
 
def calculate_time_wait_impact(connections_per_second: int, 
                               msl_seconds: int = 30) -> Dict[str, float]:
    """
    Calculate the impact of TIME_WAIT on a busy server.
    
    Args:
        connections_per_second: Rate of new connections (active closer)
        msl_seconds: MSL value in seconds
    
    Returns:
        Dictionary with impact metrics
    """
    time_wait_duration = 2 * msl_seconds
    
    # Connections in TIME_WAIT at steady state:
    # = connections_per_second × time_wait_duration
    connections_in_time_wait = connections_per_second * time_wait_duration
    
    # Each connection uses ~1KB memory for TCB
    memory_usage_kb = connections_in_time_wait * 1
    memory_usage_mb = memory_usage_kb / 1024
    
    return {
        "connections_in_time_wait": connections_in_time_wait,
        "time_wait_duration_seconds": time_wait_duration,
        "estimated_memory_mb": memory_usage_mb,
        "ephemeral_port_usage_percent": 
            (connections_in_time_wait / 16384) * 100  # Assuming 16K port range
    }
 
 
def demonstrate_time_wait_accumulation():
    """Show how TIME_WAIT connections accumulate on busy servers."""
    
    print("=" * 75)
    print("TIME_WAIT Accumulation on Busy Servers")
    print("=" * 75)
    print()
    
    scenarios = [
        ("Low traffic website", 10),
        ("Moderate API server", 100),
        ("High traffic service", 1000),
        ("Heavy load balancer", 5000),
        ("Extreme: benchmarking", 10000),
    ]
    
    print(f"{'Scenario':<25} {'Conn/s':<10} {'TW Conns':<12} "
          f"{'Memory':<10} {'Port %'}")
    print("-" * 75)
    
    for name, cps in scenarios:
        impact = calculate_time_wait_impact(cps)
        warning = " ⚠️" if impact["ephemeral_port_usage_percent"] > 50 else ""
        
        print(f"{name:<25} {cps:<10} "
              f"{impact['connections_in_time_wait']:<12,.0f} "
              f"{impact['estimated_memory_mb']:<10.1f}MB "
              f"{impact['ephemeral_port_usage_percent']:.1f}%{warning}")
    
    print()
    print("⚠️  = High ephemeral port usage (may cause port exhaustion)")
    print()
    print("Note: Each TIME_WAIT connection holds a port for 60 seconds (default)")
 
 
if __name__ == "__main__":
    print_msl_comparison()
    print()
    demonstrate_time_wait_accumulation()

The TIME_WAIT Accumulation Problem

While TIME_WAIT serves important purposes, it creates practical problems for high-volume servers. Let's understand the issue and its implications.

The Accumulation Scenario:

Consider a web server handling 1000 requests/second, where the server closes each connection after sending the response (server is active closer):

TIME_WAIT duration: 60 seconds
Steady-state TIME_WAIT connections: 1000 × 60 = 60,000 connections

Each TIME_WAIT connection consumes:

Kernel memory (TCP control block, ~400-1000 bytes)
An entry in the connection hash table
A local port (if server was source of connection, e.g., outbound proxy)

Symptoms of TIME_WAIT Problems:

Signs of TIME_WAIT Accumulation

•High TIME_WAIT Count — netstat -an | grep TIME_WAIT | wc -l shows tens or hundreds of thousands
•Port Exhaustion — New outbound connections fail with 'Cannot assign requested address' (EADDRINUSE or EADDRNOTAVAIL)
•Memory Pressure — Kernel memory usage increases; may trigger OOM on systems with many connections
•Connection Table Pressure — Large hash table lookups become slower
•Monitoring Alerts — Ops tools flag abnormal socket states

Why Does This Happen?

Several application patterns exacerbate TIME_WAIT:

Short-lived connections to many backends: Microservices making many outbound connections, each completing quickly
Server closes first: HTTP/1.0 servers, load balancers terminating connections
High request rate: More connections = more TIME_WAIT accumulation
Same destination address: All TIME_WAITs to the same server:port share the same constraint

Example: Connecting to a Database

Application Server           Database Server
      |                            |
      |===== Conn 1 =============>|  (port 5432)
      |<==== Data ================|  
      | (App closes)               |  
      | TIME_WAIT (port 50001)     |  
      |                            |
      |===== Conn 2 =============>|  (port 5432)
      | TIME_WAIT (port 50002)     |  
      |                            |
      |===== Conn 3 =============>|  (port 5432)
      | TIME_WAIT (port 50003)     |  
      ...                         ...
      | TIME_WAIT × 60,000         |  
      |                            |
      | No more ports available!   |  💥

The solution is connection pooling (reuse connections instead of opening/closing) or the socket options discussed next.

Connection Pooling Is the Best Solution

The best way to avoid TIME_WAIT accumulation is to avoid closing connections in the first place. Connection pools (database pools, HTTP keep-alive, gRPC streams) reuse connections, dramatically reducing the connection open/close rate. A 1000 req/s workload might need only 10 pooled connections instead of 60,000 TIME_WAIT sockets.

Solutions for TIME_WAIT Management

Several techniques address TIME_WAIT accumulation. Each has trade-offs; understanding them helps you choose appropriately.

1. SO_REUSEADDR (Safe and Common)

This socket option allows binding to an address that's in TIME_WAIT. It's safe and widely used:

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('0.0.0.0', 8080))
sock.listen()

Without SO_REUSEADDR, restarting a server that has connections in TIME_WAIT would fail with 'Address already in use'. This option solves that specific problem.

Important: SO_REUSEADDR does NOT allow multiple processes to bind to the same port simultaneously (that's SO_REUSEPORT). It only relaxes the TIME_WAIT restriction.

TIME_WAIT Management Techniques
Technique	Effect	Safety	When to Use
SO_REUSEADDR	Allows bind() to TIME_WAIT address	Safe	Always use on server sockets
SO_LINGER (l_onoff=1, l_linger=0)	Sends RST instead of FIN; no TIME_WAIT	Dangerous	Only for abnormal termination
tcp_tw_reuse (Linux)	Allows outbound connections to reuse TIME_WAIT	Usually safe	High-rate outbound connections
tcp_tw_recycle (Linux, removed)	Aggressive TIME_WAIT recycling	Broken with NAT	NEVER USE (removed in Linux 4.12)
Connection pooling	Reuses existing connections	Best	Always prefer when possible
Let peer close first	Moves TIME_WAIT to peer	Safe	HTTP client closing after server

time_wait_solutions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
"""
TIME_WAIT Management Socket Options
 
Demonstrates various socket options and techniques for
managing TIME_WAIT connections.
"""
 
import socket
import struct
 
 
def create_server_socket_with_reuseaddr(port: int) -> socket.socket:
    """
    Create a server socket with SO_REUSEADDR.
    
    This is the standard, safe approach that should be used
    on virtually all server sockets.
    """
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
    # SO_REUSEADDR: Allow binding to an address in TIME_WAIT
    # This is safe and almost always desirable for servers
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    
    sock.bind(('0.0.0.0', port))
    sock.listen(128)
    
    print(f"Server listening on port {port} with SO_REUSEADDR")
    return sock
 
 
def close_with_rst_no_time_wait(sock: socket.socket):
    """
    Close socket with RST, skipping TIME_WAIT entirely.
    
    ⚠️ WARNING: This is dangerous and should only be used for
    abnormal termination. It can cause data loss.
    
    Setting SO_LINGER with l_linger=0 causes close() to:
    1. Discard any unsent data
    2. Send RST instead of FIN
    3. Close immediately without TIME_WAIT
    """
    # struct linger { int l_onoff; int l_linger; }
    # l_onoff=1 (enable), l_linger=0 (zero timeout = RST)
    linger_struct = struct.pack('ii', 1, 0)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, linger_struct)
    
    print("⚠️  Socket will send RST on close (no TIME_WAIT, but risky!)")
    sock.close()
 
 
def configure_linux_tcp_tw_reuse():
    """
    Instructions for enabling tcp_tw_reuse on Linux.
    
    tcp_tw_reuse allows reusing TIME_WAIT sockets for NEW OUTBOUND
    connections if certain conditions are met:
    - TCP timestamps are enabled
    - New connection is from same local IP (different from SO_REUSEADDR)
    """
    print("Linux tcp_tw_reuse configuration:")
    print("=" * 50)
    print()
    print("# Check current setting")
    print("sysctl net.ipv4.tcp_tw_reuse")
    print()
    print("# Enable (requires timestamps)")
    print("sysctl -w net.ipv4.tcp_tw_reuse=1")
    print()
    print("# Ensure timestamps are enabled (usually default)")
    print("sysctl net.ipv4.tcp_timestamps")
    print()
    print("Note: Only helps OUTBOUND connections, not inbound.")
    print("For servers (inbound), use SO_REUSEADDR instead.")
 
 
def connection_pool_example():
    """
    Conceptual example of connection pooling to avoid TIME_WAIT.
    
    Instead of:
        for request in requests:
            conn = connect(server)
            send(conn, request)
            response = recv(conn)
            close(conn)  # Creates TIME_WAIT!
            
    Use:
        pool = ConnectionPool(server, size=10)
        for request in requests:
            conn = pool.get()  # Reuses existing connection
            send(conn, request)
            response = recv(conn)
            pool.put(conn)  # Returns to pool, stays open
    """
    
    print("Connection Pooling: The Best Solution")
    print("=" * 50)
    print()
    print("Without pooling:")
    print("  1000 req/s × 60s TIME_WAIT = 60,000 sockets")
    print()
    print("With pooling (10 connections):")
    print("  1000 req/s ÷ 100 req/s per conn = 10 active sockets")
    print("  No TIME_WAIT accumulation!")
    print()
    print("Common pooling libraries:")
    print("  - SQLAlchemy (database)")
    print("  - urllib3 (HTTP)")
    print("  - redis-py (Redis)")
    print("  - grpcio (gRPC channels)")
 
 
def demonstrate_who_should_close_first():
    """
    Show the impact of which side closes first.
    
    The side that closes first enters TIME_WAIT.
    Choosing wisely can shift TIME_WAIT to clients or to servers.
    """
    
    print("Who Should Close First?")
    print("=" * 50)
    print()
    print("Server closes first (HTTP/1.0 typical):")
    print("  → Server accumulates TIME_WAIT")
    print("  → Problem: Busy servers exhaust resources")
    print()
    print("Client closes first (HTTP/1.1 typical):")
    print("  → Client accumulates TIME_WAIT")  
    print("  → Better: Distributed across many clients")
    print()
    print("Neither closes (Keep-Alive):")
    print("  → Connections reused")
    print("  → Best: Minimal TIME_WAIT overall")
    print()
    print("Strategy: Use Keep-Alive when possible.")
    print("When closing is needed, prefer client closes first.")
 
 
if __name__ == "__main__":
    configure_linux_tcp_tw_reuse()
    print()
    connection_pool_example()
    print()
    demonstrate_who_should_close_first()

Never Use tcp_tw_recycle

tcp_tw_recycle was removed from Linux 4.12 because it's fundamentally broken with NAT. When multiple clients share a public IP (common with NAT), tcp_tw_recycle can cause connection failures as the server incorrectly rejects valid connections. If you see legacy documentation suggesting tcp_tw_recycle, ignore it.

Monitoring TIME_WAIT Connections

Effective monitoring helps you detect TIME_WAIT issues before they cause problems. Here are the tools and techniques:

Linux Commands:

# Count TIME_WAIT connections
ss -s | grep timewait
# Output: timewait: 1234 (closed...)

# Or using netstat (older, slower)
netstat -an | grep TIME_WAIT | wc -l

# Detailed: group by destination
ss -tan state time-wait | awk '{print $5}' | sort | uniq -c | sort -rn | head

# Watch in real-time
watch -n 1 'ss -s | grep timewait'

# Check kernel parameters
sysctl -a | grep tcp_tw

Interpreting the Numbers:

TIME_WAIT Connection Count Interpretation
Count	Assessment	Action
< 1,000	Normal for most servers	No action needed
1,000 - 10,000	Elevated; review if growing	Consider connection pooling
10,000 - 30,000	High; investigate patterns	Implement pooling; check close patterns
30,000 - 60,000	Warning zone	Urgent: Pooling, SO_REUSEADDR, tcp_tw_reuse
60,000	Critical	Imminent port exhaustion; emergency action needed

Identifying the Source:

# Which remote hosts are we in TIME_WAIT with?
ss -tan state time-wait | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn

# Example output:
#   45123 10.0.0.50    <- Database server (pool needed!)
#   12456 10.0.0.60    <- Service endpoint
#    2341 203.0.113.5  <- External API

# Which local process is creating these?
# Note: TIME_WAIT has no associated process, but you can
# correlate by observing ESTABLISHED→TIME_WAIT transitions
watch "ss -tanp state established | grep 10.0.0.50"

Grafana/Prometheus Metrics:

Expose TIME_WAIT counts as metrics for dashboards:

# Node exporter provides: node_sockstat_TCP_tw
# Or custom collection:

import subprocess
from prometheus_client import Gauge

time_wait_gauge = Gauge('tcp_time_wait_connections', 
                        'Number of sockets in TIME_WAIT state')

def update_time_wait_metric():
    result = subprocess.run(
        ['ss', '-s'],
        capture_output=True, text=True
    )
    for line in result.stdout.split('\n'):
        if 'timewait:' in line:
            # Parse: "timewait: 1234 (closed..."
            count = int(line.split()[1])
            time_wait_gauge.set(count)
            break

Summary: Mastering TIME_WAIT

We've explored TIME_WAIT comprehensively. Let's consolidate the essential knowledge:

Key Takeaways

•Purpose — TIME_WAIT ensures reliable delivery of the final ACK and prevents old segments from corrupting new connections using the same port pair.
•Duration — 2MSL (typically 60-240 seconds depending on OS) provides sufficient time for all old segments to expire.
•Active Closer — The side that initiates termination (sends first FIN) enters TIME_WAIT. Design applications so clients close first when possible.
•Accumulation Problem — High-frequency connections create many TIME_WAIT sockets, consuming memory and ports.
•Solutions — Connection pooling is best; SO_REUSEADDR for servers; tcp_tw_reuse for outbound connections. Never use tcp_tw_recycle.
•Monitoring — Use ss -s to monitor TIME_WAIT count; alert on sustained high values.

What's Next:

We've now covered all four major TCP timers: retransmission (reliable delivery), persistence (zero-window deadlock), keepalive (dead connection detection), and TIME_WAIT (graceful termination). In the final page of this module, we'll bring these timers together to see how timer management works holistically in TCP implementations, how timers interact, and how to diagnose timer-related issues in production systems.

Page Complete

You now understand TIME_WAIT thoroughly—from its protective purposes to its operational challenges. This knowledge is essential for designing high-performance network applications and for diagnosing connection-related issues on busy servers.

TIME_WAIT Timer: The Guardian of Connection Integrity

The Final Guardian Before Connection Closure

This waiting period serves two critical purposes:

Reliable termination — Ensure the final ACK is delivered
Old duplicate protection — Prevent segments from an old connection from corrupting a new connection using the same port pair

Understanding TIME_WAIT is essential for network programming, especially for high-volume servers that may experience TIME_WAIT accumulation.

What You Will Learn

Connection Termination Review

The Four-Way Handshake:

FIN → The initiator (active closer) sends FIN, entering FIN_WAIT_1
← ACK The receiver acknowledges, initiator moves to FIN_WAIT_2
← FIN The receiver sends its own FIN (when ready), entering LAST_ACK
ACK → The initiator acknowledges, entering TIME_WAIT

The receiver, upon receiving the final ACK, immediately closes. But the initiator waits in TIME_WAIT.

termination_sequence.txt

Diagram

TCP Connection Termination: Four-Way Handshake with TIME_WAIT
 
    Active Closer (Client)              Passive Closer (Server)
         |                                       |
         |  ESTABLISHED                          | ESTABLISHED
         |                                       |
         |----------- FIN (Seq=X) ------------->| 
         |  FIN_WAIT_1                           | CLOSE_WAIT
         |                                       |
         |<---------- ACK (Ack=X+1) ------------|
         |  FIN_WAIT_2                           |
         |                                       | [Server app calls close()]
         |                                       |
         |<---------- FIN (Seq=Y) --------------|
         |                                       | LAST_ACK
         |                                       |
         |----------- ACK (Ack=Y+1) ----------->|
         |  TIME_WAIT                            | CLOSED ✓
         |                                       |
         |  [Waits 2MSL...]                      |
         |                                       |
         |  [2MSL timer expires]                 |
         |  CLOSED ✓                             |
         
         Timeline:
         ─────────────────────────────────────────────────
         │<──── Active termination ────>│<─ 2MSL wait ─>│
                                        │               │
                                        Passive closer  Active closer
                                        is done         can finally
                                                        close

Who Is the Active Closer?

Why TIME_WAIT Is Necessary

TIME_WAIT might seem like unnecessary delay, but it serves two critical purposes that maintaining network integrity:

Purpose 1: Reliable Delivery of Final ACK

Consider what happens if the final ACK is lost:

Active Closer          Network          Passive Closer
     |                                        |
     |<------------- FIN --------------------|
     |                                        | LAST_ACK
     |------------ ACK ----X (LOST)           |
     | CLOSED                                 |
     |                                        | [No ACK received]
     |                                        | [Retransmits FIN]
     |<------------- FIN --------------------|
     |                                        |
     | [If already CLOSED, would reply RST]   |
     | [Passive closer thinks connection is   |
     |  broken, not gracefully terminated]    |

By staying in TIME_WAIT, the active closer can retransmit the final ACK if the passive closer retransmits its FIN. This ensures both sides agree that the connection terminated gracefully.

Purpose 2: Preventing Old Duplicate Segments

This is the more subtle and critical purpose. Consider this scenario:

Connection A: Host1:5000 ↔ Host2:80 exists and exchanges data
Connection A closes
Immediately, Connection B: Host1:5000 ↔ Host2:80 opens (same port pair!)
A delayed segment from Connection A (stuck in a router somewhere) arrives
Connection B accepts this old segment as valid data!

This "old duplicate" problem could corrupt the new connection's data stream. TIME_WAIT prevents this by keeping the port pair "reserved" long enough for any old segments to expire.

old_duplicate_problem.txt

Diagram

The Old Duplicate Problem (Without TIME_WAIT)
 
Time    Connection A                    Connection B (Same port pair)
        Host1:5000 ↔ Host2:80          Host1:5000 ↔ Host2:80
────────────────────────────────────────────────────────────────────
t=0     Sends Data(Seq=1000)            
        Segment takes slow path          
        through congested router...      
        
t=1     Connection closes               
        (No TIME_WAIT in                
        this hypothetical)               
        
t=2                                     Opens (same ports)
                                        Seq numbers start fresh
                                        (e.g., ISN=950)
                                        
t=3                                     Sends Data(Seq=1000)
                                        Receives ACK(Ack=1500)
                                        
t=4     Old segment finally             
        arrives! ─────────────────────> Arrives at Host2
                                        Seq=1000 (within window!)
                                        
                                        💥 Host2 ACCEPTS old data
                                        as valid!
                                        
                                        Data stream corrupted.
 
───────────────────────────────────────────────────────────────────
 
With TIME_WAIT: Host1 holds port 5000 for 2MSL after Connection A 
closes. The old segment expires in transit. Connection B must wait
or use a different port. Problem prevented.

Sequence Number Wrap-Around

Understanding Maximum Segment Lifetime (MSL)

The TIME_WAIT duration is defined as 2MSL (twice the Maximum Segment Lifetime). Understanding MSL clarifies why TIME_WAIT lasts as long as it does.

What Is MSL?

Standard MSL Values:

Specification	MSL Value	TIME_WAIT Duration
RFC 793 (original)	2 minutes	4 minutes
BSD implementations	30 seconds	60 seconds
Linux	30 seconds	60 seconds
Windows	2 minutes	4 minutes

Why 2MSL (Not Just MSL)?

•First MSL — Allows time for the final ACK to reach the passive closer (traveling one direction)
•Second MSL — Allows time for a retransmitted FIN to return if the ACK was lost (traveling back)
•Total Coverage — The round trip ensures both the final ACK and any response to it can complete within the wait period
•Old Segment Expiry — Any segment from the old connection has at most MSL remaining lifetime when TIME_WAIT starts; waiting 2MSL guarantees it has expired

msl_timing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
"""
Maximum Segment Lifetime (MSL) and TIME_WAIT Duration
 
Demonstrates the relationship between MSL and TIME_WAIT across
different operating systems and how connection closure timing works.
"""
 
from dataclasses import dataclass
from typing import Dict
 
 
@dataclass
class TCPTimingConfig:
    """TCP timing configuration for an operating system."""
    name: str
    msl_seconds: int
    additional_notes: str = ""
    
    @property
    def time_wait_seconds(self) -> int:
        return 2 * self.msl_seconds
    
    @property
    def time_wait_formatted(self) -> str:
        tw = self.time_wait_seconds
        if tw >= 60:
            return f"{tw // 60}m {tw % 60}s"
        return f"{tw}s"
 
 
# Common OS configurations
OS_CONFIGS = {
    "linux_default": TCPTimingConfig(
        name="Linux (default)",
        msl_seconds=30,
        additional_notes="Can be affected by tcp_fin_timeout (not MSL directly)"
    ),
    "linux_lowlatency": TCPTimingConfig(
        name="Linux (tuned)",
        msl_seconds=15,
        additional_notes="Sometimes reduced for high-frequency trading"
    ),
    "windows_default": TCPTimingConfig(
        name="Windows (default)",
        msl_seconds=120,
        additional_notes="Registry: TcpTimedWaitDelay"
    ),
    "macos_default": TCPTimingConfig(
        name="macOS (default)",
        msl_seconds=15,
        additional_notes="Shorter than traditional 30 seconds"
    ),
    "rfc793_spec": TCPTimingConfig(
        name="RFC 793 Specification",
        msl_seconds=120,
        additional_notes="Original conservative specification"
    ),
    "bsd_typical": TCPTimingConfig(
        name="BSD (typical)",
        msl_seconds=30,
        additional_notes="FreeBSD, OpenBSD"
    ),
}
 
 
def print_msl_comparison():
    """Print MSL and TIME_WAIT comparison across operating systems."""
    
    print("=" * 75)
    print("Maximum Segment Lifetime (MSL) and TIME_WAIT Duration")
    print("=" * 75)
    print()
    
    print(f"{'OS/Specification':<25} {'MSL':<10} {'TIME_WAIT':<12} {'Notes'}")
    print("-" * 75)
    
    for config in OS_CONFIGS.values():
        print(f"{config.name:<25} {config.msl_seconds:>4}s     "
              f"{config.time_wait_formatted:<12} {config.additional_notes}")
    
    print()
    print("Formula: TIME_WAIT = 2 × MSL")
    print()
 
 
def calculate_time_wait_impact(connections_per_second: int, 
                               msl_seconds: int = 30) -> Dict[str, float]:
    """
    Calculate the impact of TIME_WAIT on a busy server.
    
    Args:
        connections_per_second: Rate of new connections (active closer)
        msl_seconds: MSL value in seconds
    
    Returns:
        Dictionary with impact metrics
    """
    time_wait_duration = 2 * msl_seconds
    
    # Connections in TIME_WAIT at steady state:
    # = connections_per_second × time_wait_duration
    connections_in_time_wait = connections_per_second * time_wait_duration
    
    # Each connection uses ~1KB memory for TCB
    memory_usage_kb = connections_in_time_wait * 1
    memory_usage_mb = memory_usage_kb / 1024
    
    return {
        "connections_in_time_wait": connections_in_time_wait,
        "time_wait_duration_seconds": time_wait_duration,
        "estimated_memory_mb": memory_usage_mb,
        "ephemeral_port_usage_percent": 
            (connections_in_time_wait / 16384) * 100  # Assuming 16K port range
    }
 
 
def demonstrate_time_wait_accumulation():
    """Show how TIME_WAIT connections accumulate on busy servers."""
    
    print("=" * 75)
    print("TIME_WAIT Accumulation on Busy Servers")
    print("=" * 75)
    print()
    
    scenarios = [
        ("Low traffic website", 10),
        ("Moderate API server", 100),
        ("High traffic service", 1000),
        ("Heavy load balancer", 5000),
        ("Extreme: benchmarking", 10000),
    ]
    
    print(f"{'Scenario':<25} {'Conn/s':<10} {'TW Conns':<12} "
          f"{'Memory':<10} {'Port %'}")
    print("-" * 75)
    
    for name, cps in scenarios:
        impact = calculate_time_wait_impact(cps)
        warning = " ⚠️" if impact["ephemeral_port_usage_percent"] > 50 else ""
        
        print(f"{name:<25} {cps:<10} "
              f"{impact['connections_in_time_wait']:<12,.0f} "
              f"{impact['estimated_memory_mb']:<10.1f}MB "
              f"{impact['ephemeral_port_usage_percent']:.1f}%{warning}")
    
    print()
    print("⚠️  = High ephemeral port usage (may cause port exhaustion)")
    print()
    print("Note: Each TIME_WAIT connection holds a port for 60 seconds (default)")
 
 
if __name__ == "__main__":
    print_msl_comparison()
    print()
    demonstrate_time_wait_accumulation()

The TIME_WAIT Accumulation Problem

While TIME_WAIT serves important purposes, it creates practical problems for high-volume servers. Let's understand the issue and its implications.

The Accumulation Scenario:

Consider a web server handling 1000 requests/second, where the server closes each connection after sending the response (server is active closer):

TIME_WAIT duration: 60 seconds
Steady-state TIME_WAIT connections: 1000 × 60 = 60,000 connections

Each TIME_WAIT connection consumes:

Kernel memory (TCP control block, ~400-1000 bytes)
An entry in the connection hash table
A local port (if server was source of connection, e.g., outbound proxy)

Symptoms of TIME_WAIT Problems:

Signs of TIME_WAIT Accumulation

•High TIME_WAIT Count — netstat -an | grep TIME_WAIT | wc -l shows tens or hundreds of thousands
•Port Exhaustion — New outbound connections fail with 'Cannot assign requested address' (EADDRINUSE or EADDRNOTAVAIL)
•Memory Pressure — Kernel memory usage increases; may trigger OOM on systems with many connections
•Connection Table Pressure — Large hash table lookups become slower
•Monitoring Alerts — Ops tools flag abnormal socket states

Why Does This Happen?

Several application patterns exacerbate TIME_WAIT:

Short-lived connections to many backends: Microservices making many outbound connections, each completing quickly
Server closes first: HTTP/1.0 servers, load balancers terminating connections
High request rate: More connections = more TIME_WAIT accumulation
Same destination address: All TIME_WAITs to the same server:port share the same constraint

Example: Connecting to a Database

Application Server           Database Server
      |                            |
      |===== Conn 1 =============>|  (port 5432)
      |<==== Data ================|  
      | (App closes)               |  
      | TIME_WAIT (port 50001)     |  
      |                            |
      |===== Conn 2 =============>|  (port 5432)
      | TIME_WAIT (port 50002)     |  
      |                            |
      |===== Conn 3 =============>|  (port 5432)
      | TIME_WAIT (port 50003)     |  
      ...                         ...
      | TIME_WAIT × 60,000         |  
      |                            |
      | No more ports available!   |  💥

The solution is connection pooling (reuse connections instead of opening/closing) or the socket options discussed next.

Connection Pooling Is the Best Solution

Solutions for TIME_WAIT Management

Several techniques address TIME_WAIT accumulation. Each has trade-offs; understanding them helps you choose appropriately.

1. SO_REUSEADDR (Safe and Common)

This socket option allows binding to an address that's in TIME_WAIT. It's safe and widely used:

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('0.0.0.0', 8080))
sock.listen()

Without SO_REUSEADDR, restarting a server that has connections in TIME_WAIT would fail with 'Address already in use'. This option solves that specific problem.

Important: SO_REUSEADDR does NOT allow multiple processes to bind to the same port simultaneously (that's SO_REUSEPORT). It only relaxes the TIME_WAIT restriction.

TIME_WAIT Management Techniques
Technique	Effect	Safety	When to Use
SO_REUSEADDR	Allows bind() to TIME_WAIT address	Safe	Always use on server sockets
SO_LINGER (l_onoff=1, l_linger=0)	Sends RST instead of FIN; no TIME_WAIT	Dangerous	Only for abnormal termination
tcp_tw_reuse (Linux)	Allows outbound connections to reuse TIME_WAIT	Usually safe	High-rate outbound connections
tcp_tw_recycle (Linux, removed)	Aggressive TIME_WAIT recycling	Broken with NAT	NEVER USE (removed in Linux 4.12)
Connection pooling	Reuses existing connections	Best	Always prefer when possible
Let peer close first	Moves TIME_WAIT to peer	Safe	HTTP client closing after server

time_wait_solutions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
"""
TIME_WAIT Management Socket Options
 
Demonstrates various socket options and techniques for
managing TIME_WAIT connections.
"""
 
import socket
import struct
 
 
def create_server_socket_with_reuseaddr(port: int) -> socket.socket:
    """
    Create a server socket with SO_REUSEADDR.
    
    This is the standard, safe approach that should be used
    on virtually all server sockets.
    """
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
    # SO_REUSEADDR: Allow binding to an address in TIME_WAIT
    # This is safe and almost always desirable for servers
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    
    sock.bind(('0.0.0.0', port))
    sock.listen(128)
    
    print(f"Server listening on port {port} with SO_REUSEADDR")
    return sock
 
 
def close_with_rst_no_time_wait(sock: socket.socket):
    """
    Close socket with RST, skipping TIME_WAIT entirely.
    
    ⚠️ WARNING: This is dangerous and should only be used for
    abnormal termination. It can cause data loss.
    
    Setting SO_LINGER with l_linger=0 causes close() to:
    1. Discard any unsent data
    2. Send RST instead of FIN
    3. Close immediately without TIME_WAIT
    """
    # struct linger { int l_onoff; int l_linger; }
    # l_onoff=1 (enable), l_linger=0 (zero timeout = RST)
    linger_struct = struct.pack('ii', 1, 0)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, linger_struct)
    
    print("⚠️  Socket will send RST on close (no TIME_WAIT, but risky!)")
    sock.close()
 
 
def configure_linux_tcp_tw_reuse():
    """
    Instructions for enabling tcp_tw_reuse on Linux.
    
    tcp_tw_reuse allows reusing TIME_WAIT sockets for NEW OUTBOUND
    connections if certain conditions are met:
    - TCP timestamps are enabled
    - New connection is from same local IP (different from SO_REUSEADDR)
    """
    print("Linux tcp_tw_reuse configuration:")
    print("=" * 50)
    print()
    print("# Check current setting")
    print("sysctl net.ipv4.tcp_tw_reuse")
    print()
    print("# Enable (requires timestamps)")
    print("sysctl -w net.ipv4.tcp_tw_reuse=1")
    print()
    print("# Ensure timestamps are enabled (usually default)")
    print("sysctl net.ipv4.tcp_timestamps")
    print()
    print("Note: Only helps OUTBOUND connections, not inbound.")
    print("For servers (inbound), use SO_REUSEADDR instead.")
 
 
def connection_pool_example():
    """
    Conceptual example of connection pooling to avoid TIME_WAIT.
    
    Instead of:
        for request in requests:
            conn = connect(server)
            send(conn, request)
            response = recv(conn)
            close(conn)  # Creates TIME_WAIT!
            
    Use:
        pool = ConnectionPool(server, size=10)
        for request in requests:
            conn = pool.get()  # Reuses existing connection
            send(conn, request)
            response = recv(conn)
            pool.put(conn)  # Returns to pool, stays open
    """
    
    print("Connection Pooling: The Best Solution")
    print("=" * 50)
    print()
    print("Without pooling:")
    print("  1000 req/s × 60s TIME_WAIT = 60,000 sockets")
    print()
    print("With pooling (10 connections):")
    print("  1000 req/s ÷ 100 req/s per conn = 10 active sockets")
    print("  No TIME_WAIT accumulation!")
    print()
    print("Common pooling libraries:")
    print("  - SQLAlchemy (database)")
    print("  - urllib3 (HTTP)")
    print("  - redis-py (Redis)")
    print("  - grpcio (gRPC channels)")
 
 
def demonstrate_who_should_close_first():
    """
    Show the impact of which side closes first.
    
    The side that closes first enters TIME_WAIT.
    Choosing wisely can shift TIME_WAIT to clients or to servers.
    """
    
    print("Who Should Close First?")
    print("=" * 50)
    print()
    print("Server closes first (HTTP/1.0 typical):")
    print("  → Server accumulates TIME_WAIT")
    print("  → Problem: Busy servers exhaust resources")
    print()
    print("Client closes first (HTTP/1.1 typical):")
    print("  → Client accumulates TIME_WAIT")  
    print("  → Better: Distributed across many clients")
    print()
    print("Neither closes (Keep-Alive):")
    print("  → Connections reused")
    print("  → Best: Minimal TIME_WAIT overall")
    print()
    print("Strategy: Use Keep-Alive when possible.")
    print("When closing is needed, prefer client closes first.")
 
 
if __name__ == "__main__":
    configure_linux_tcp_tw_reuse()
    print()
    connection_pool_example()
    print()
    demonstrate_who_should_close_first()

Never Use tcp_tw_recycle

Monitoring TIME_WAIT Connections

Effective monitoring helps you detect TIME_WAIT issues before they cause problems. Here are the tools and techniques:

Linux Commands:

# Count TIME_WAIT connections
ss -s | grep timewait
# Output: timewait: 1234 (closed...)

# Or using netstat (older, slower)
netstat -an | grep TIME_WAIT | wc -l

# Detailed: group by destination
ss -tan state time-wait | awk '{print $5}' | sort | uniq -c | sort -rn | head

# Watch in real-time
watch -n 1 'ss -s | grep timewait'

# Check kernel parameters
sysctl -a | grep tcp_tw

Interpreting the Numbers:

TIME_WAIT Connection Count Interpretation
Count	Assessment	Action
< 1,000	Normal for most servers	No action needed
1,000 - 10,000	Elevated; review if growing	Consider connection pooling
10,000 - 30,000	High; investigate patterns	Implement pooling; check close patterns
30,000 - 60,000	Warning zone	Urgent: Pooling, SO_REUSEADDR, tcp_tw_reuse
60,000	Critical	Imminent port exhaustion; emergency action needed

Identifying the Source:

# Which remote hosts are we in TIME_WAIT with?
ss -tan state time-wait | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn

# Example output:
#   45123 10.0.0.50    <- Database server (pool needed!)
#   12456 10.0.0.60    <- Service endpoint
#    2341 203.0.113.5  <- External API

# Which local process is creating these?
# Note: TIME_WAIT has no associated process, but you can
# correlate by observing ESTABLISHED→TIME_WAIT transitions
watch "ss -tanp state established | grep 10.0.0.50"

Grafana/Prometheus Metrics:

Expose TIME_WAIT counts as metrics for dashboards:

# Node exporter provides: node_sockstat_TCP_tw
# Or custom collection:

import subprocess
from prometheus_client import Gauge

time_wait_gauge = Gauge('tcp_time_wait_connections', 
                        'Number of sockets in TIME_WAIT state')

def update_time_wait_metric():
    result = subprocess.run(
        ['ss', '-s'],
        capture_output=True, text=True
    )
    for line in result.stdout.split('\n'):
        if 'timewait:' in line:
            # Parse: "timewait: 1234 (closed..."
            count = int(line.split()[1])
            time_wait_gauge.set(count)
            break

Summary: Mastering TIME_WAIT

We've explored TIME_WAIT comprehensively. Let's consolidate the essential knowledge:

Key Takeaways

•Purpose — TIME_WAIT ensures reliable delivery of the final ACK and prevents old segments from corrupting new connections using the same port pair.
•Duration — 2MSL (typically 60-240 seconds depending on OS) provides sufficient time for all old segments to expire.
•Active Closer — The side that initiates termination (sends first FIN) enters TIME_WAIT. Design applications so clients close first when possible.
•Accumulation Problem — High-frequency connections create many TIME_WAIT sockets, consuming memory and ports.
•Solutions — Connection pooling is best; SO_REUSEADDR for servers; tcp_tw_reuse for outbound connections. Never use tcp_tw_recycle.
•Monitoring — Use ss -s to monitor TIME_WAIT count; alert on sustained high values.

What's Next:

Page Complete