Tcp Sequence Numbers - Learning Module

Loading content...

0/228

Sequence Number Concept

The Architecture of Reliable Delivery

Imagine sending a 10-volume encyclopedia through a postal system that might lose packages, deliver them out of order, or duplicate them. How would you ensure the recipient can reconstruct the complete, correctly ordered text? You'd need a system to identify each piece, detect gaps, and request missing portions.

TCP sequence numbers solve exactly this problem for digital data. They are the fundamental mechanism that transforms the unreliable, best-effort service of IP into the reliable, ordered byte stream that applications depend upon. Every byte transmitted over a TCP connection is assigned a sequence number, creating an unambiguous identity for each piece of data.

This page explores the conceptual foundations of TCP sequence numbers—why they exist, how they work, and why their design is critical to TCP's reliability guarantees.

What You Will Learn

By the end of this page, you will understand: why sequence numbers are necessary for reliable transport, how TCP uses them to identify every byte of data, the mathematical space in which sequence numbers operate, how receivers use sequence numbers to detect loss and reordering, and the fundamental role sequence numbers play in all of TCP's reliability mechanisms.

The Need for Sequence Numbers

To understand why sequence numbers are essential, we must first understand what TCP is built upon. The Internet Protocol (IP) provides an unreliable, connectionless datagram service. This means:

No delivery guarantee: Packets may be dropped at any router due to congestion, queue overflow, or link failures
No ordering guarantee: Packets may arrive in any order, since different routes may be taken through the network
No duplicate detection: The same packet may be delivered multiple times due to link-layer retransmissions or routing loops
No corruption protection: While IP has a header checksum, the data payload may still be corrupted

TCP's job is to provide reliable, ordered byte stream delivery on top of this unreliable substrate. Sequence numbers are the foundation that makes this possible.

Core Problems Sequence Numbers Solve

•Loss Detection: When data is lost in transit, the receiver notices gaps in sequence numbers and can request retransmission
•Reordering Detection: When packets arrive out of order, sequence numbers allow the receiver to place them in correct positions
•Duplicate Detection: When the same data arrives twice (perhaps due to spurious retransmissions), sequence numbers allow the receiver to discard duplicates
•Progress Tracking: The sender uses sequence numbers returned in acknowledgments to know which data has been successfully received
•Flow Control: Window-based flow control uses sequence numbers to define the range of data the receiver can accept

The Byte-Stream Abstraction

TCP presents a byte stream abstraction to applications—data flows in as a continuous stream of bytes, not as discrete packets. Sequence numbers identify positions within this byte stream, much like byte offsets in a file. This abstraction is preserved across packet boundaries, reordering, and retransmissions.

Sequence Number Mechanics

A TCP sequence number is a 32-bit unsigned integer that identifies the first byte of data in a segment. Let's examine the mechanics in detail.

Mathematical Space: With 32 bits, sequence numbers range from 0 to 4,294,967,295 (2³² - 1). This creates a circular sequence space—after reaching the maximum value, sequence numbers wrap around to 0. This wrapping behavior, known as sequence number wraparound, has important implications for TCP's operation.

TCP Sequence Number Properties
Property	Value	Significance
Bit Width	32 bits	Defines the range of possible sequence numbers
Minimum Value	0	Lowest possible sequence number
Maximum Value	4,294,967,295	Highest value before wraparound
Total Space	4 GB (2³² bytes)	Maximum data identifiable without repetition
Arithmetic	Modulo 2³²	All comparisons use modular arithmetic
Wraparound	Approximately every 4 GB	At 1 Gbps, occurs roughly every 34 seconds

Segment-to-Sequence Mapping:

Each TCP segment's sequence number field contains the sequence number of the first byte of data in that segment. For a segment carrying N bytes of data:

SEQ = sequence number of first byte
SEQ + N - 1 = sequence number of last byte
Next segment should start with SEQ + N (if data is contiguous)

Example: If a segment has sequence number 1000 and carries 500 bytes, the bytes are numbered 1000, 1001, 1002, ..., 1499. The next contiguous segment would start with sequence number 1500.

sequence_number_example.txt
Segment 1:  SEQ=1000, LEN=500  → Bytes 1000-1499
Segment 2:  SEQ=1500, LEN=800  → Bytes 1500-2299
Segment 3:  SEQ=2300, LEN=200  → Bytes 2300-2499
 
Data Stream Position Mapping:
┌─────────────────────────────────────────────────────────┐
│ Position:   0   1   2   ...  499  500 501 ... 1299 1300│
│ SEQ Number: 1000 1001 1002 ... 1499 1500 1501 ... 2299 2300│
│ Segment:    ←── Segment 1 ──→ ←── Segment 2 ──→ ←Seg 3→│
└─────────────────────────────────────────────────────────┘

Control Flags Consume Sequence Space

SYN and FIN flags each consume one sequence number, even though they carry no data. This is crucial: a SYN segment with SEQ=100 means the first data byte will have SEQ=101. Similarly, a FIN with SEQ=5000 occupies that sequence number, so any subsequent data would start at 5001. This design ensures these critical control events are reliably delivered and acknowledged.

Sequence Number Comparisons in Circular Space

Because sequence numbers wrap around, comparing them requires modular arithmetic. A naive comparison (is 4,294,967,290 < 10?) would give the wrong answer for wrapped sequences.

TCP defines sequence number S1 as "less than" S2 if:

(S1 - S2) is negative when interpreted as a signed 32-bit integer

This means sequence numbers within 2³¹ (about 2 billion) of each other can be correctly compared across wraparound boundaries.

sequence_comparison.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* TCP sequence number comparison functions */
 
/* Returns true if s1 < s2 in modular sequence space */
static inline bool before(uint32_t s1, uint32_t s2) {
    return (int32_t)(s1 - s2) < 0;
}
 
/* Returns true if s1 > s2 in modular sequence space */
static inline bool after(uint32_t s1, uint32_t s2) {
    return (int32_t)(s1 - s2) > 0;
}
 
/* Example: Is 4294967290 before 10 after wraparound? */
uint32_t s1 = 4294967290;  /* Near max value */
uint32_t s2 = 10;           /* After wraparound */
 
/* Calculate: (4294967290 - 10) = 4294967280 */
/* As signed int32: 4294967280 = -16 (negative!) */
/* Therefore: 4294967290 is BEFORE 10 ✓ */
 
bool result = before(s1, s2);  /* Returns true */

Visual Representation of Circular Sequence Space:

Imagine sequence numbers arranged in a circle, like numbers on a clock but with 2³² positions. Any two sequence numbers divide the circle into two arcs. We consider S1 "before" S2 if the shorter path from S1 to S2 goes clockwise (in the direction of increasing sequence numbers).

This comparison works correctly as long as the two sequence numbers are within 2³¹ of each other—about 2 billion sequence numbers apart. For typical network speeds and TCP timer settings, this constraint is easily satisfied.

The 2³¹ Window Constraint

TCP can only correctly compare sequence numbers that are within 2³¹ of each other. This is why high-bandwidth connections use TCP timestamps (RFC 7323) to extend sequence space protection, preventing old segments from being mistaken for new ones after wraparound.

Sequence Numbers in Action

Let's trace how sequence numbers enable reliable delivery through a concrete example. Consider a client sending 3000 bytes to a server, assuming an initial sequence number of 1000 and a maximum segment size of 1000 bytes.

Converting Mermaid diagram...

Key Observations:

Each segment's position is unambiguous: SEQ=2000 always refers to the same byte position, regardless of when it arrives
Reordering is transparent: Even if segment 2 (SEQ=2000) arrives before segment 1 (SEQ=1000), the server can buffer segment 2 and deliver data in order
Gaps indicate loss: If SEQ=2000 arrives but SEQ=1000 never does, the server detects a gap and can report it via acknowledgments
Progress is measurable: The acknowledgment number tells the sender exactly how much data has been received contiguously

What Sequence Numbers Enable

•Precise byte-level data identification
•Detection of lost segments
•Restoration of original ordering
•Elimination of duplicate data
•Retransmission of specific byte ranges
•Window-based flow control

Design Properties

•32-bit space handles high bandwidth
•Modular arithmetic handles wraparound
•Byte-granularity enables partial ACKs
•Works with variable-size segments
•Independent of IP fragmentation
•Supports out-of-order buffering

Sender and Receiver Perspectives

The sender and receiver maintain different views of sequence space. Understanding these perspectives is crucial for grasping TCP's reliability mechanisms.

Sender's View:

The sender tracks several key sequence numbers:

Sender's Sequence Number State
Variable	Description	Updates When
SND.UNA	Oldest unacknowledged byte	ACK received from receiver
SND.NXT	Next byte to be sent	Data transmitted
SND.WND	Receiver's advertised window	Window update received
ISS	Initial Send Sequence number	Connection established

The sender's data falls into four categories:

Acknowledged: SEQ < SND.UNA (can be discarded from buffer)
Sent, awaiting ACK: SND.UNA ≤ SEQ < SND.NXT (must be retained for retransmission)
Allowed to send: SND.NXT ≤ SEQ < SND.UNA + SND.WND (can send immediately)
Beyond window: SEQ ≥ SND.UNA + SND.WND (must wait for window to open)

sender_sequence_space.txt
Sender's Sequence Space View:
═══════════════════════════════════════════════════════════════════
│  Acknowledged  │  Sent/Unacked  │  Allowed to Send │  Not Allowed │
│  (discarded)   │  (in flight)   │  (can transmit)  │  (wait)      │
═══════════════════════════════════════════════════════════════════
                 ↑                 ↑                  ↑
              SND.UNA           SND.NXT        SND.UNA + SND.WND
 
Example State:
  ISS = 1000 (Initial Sequence Number)
  SND.UNA = 1500 (last ACK received was for 1500)
  SND.NXT = 2800 (next byte to send is 2800)
  SND.WND = 4000 (receiver window is 4000 bytes)
  
  → Bytes 1000-1499: Acknowledged, freed from buffer
  → Bytes 1500-2799: In flight, awaiting acknowledgment
  → Bytes 2800-5499: Can be sent immediately
  → Bytes 5500+: Must wait for window to advance

Receiver's View:

The receiver tracks incoming sequence numbers to determine what has been received and what to expect next:

Receiver's Sequence Number State
Variable	Description	Updates When
RCV.NXT	Next expected sequence number	In-order data received
RCV.WND	Receive window size	Application reads data
IRS	Initial Receive Sequence number	Connection established

The receiver categorizes incoming segments into:

Old data: SEQ < RCV.NXT (duplicates, discarded)
Expected data: SEQ = RCV.NXT (delivered to application)
Future data: SEQ > RCV.NXT (out-of-order, buffered if within window)
Beyond window: SEQ ≥ RCV.NXT + RCV.WND (discarded)

The Power of Buffering Out-of-Order Data

Modern TCP implementations buffer out-of-order segments rather than discarding them. This optimization dramatically improves performance when packets are reordered in the network. When the missing segment arrives, all buffered segments can be delivered immediately. Without this buffering, every out-of-order segment would need retransmission.

Sequence Numbers and Reliability Guarantees

Sequence numbers are the foundation upon which TCP builds its reliability guarantees. Let's examine how they enable each guarantee:

Ordered Delivery:

Applications receive data in the exact order it was sent. The receiver uses sequence numbers to maintain ordering:

Data arriving in order (SEQ = RCV.NXT) is delivered immediately
Data arriving early (SEQ > RCV.NXT) is buffered until the gap is filled
The byte stream presented to the application is always contiguous

ordering_example.txt
Initial state: RCV.NXT = 1000
 
Arrival order:   SEQ=2000 → SEQ=3000 → SEQ=1000 → SEQ=4000
 
Processing:
  SEQ=2000 arrives: Future data (gap at 1000). Buffer it.
  SEQ=3000 arrives: Future data (still gap at 1000). Buffer it.
  SEQ=1000 arrives: Expected! Deliver 1000-1999, then 2000-2999, then 3000-3999
  SEQ=4000 arrives: Expected (RCV.NXT now 4000). Deliver immediately.
 
Application sees: Bytes 1000, 1001, 1002, ... (perfect order)

No Duplicates:

The same data is never delivered twice. When a segment arrives:

If SEQ < RCV.NXT, it's a duplicate—already delivered
Check against buffered out-of-order segments
Only new data within the receive window is accepted

No Missing Data:

The acknowledgment mechanism ensures no data is lost. The receiver's ACK tells the sender how much contiguous data has been received. Any gaps indicate loss and trigger retransmission through timeout or fast retransmit.

The Complete Reliability Stack

Sequence numbers + Acknowledgments + Retransmission Timers = Reliable Delivery. The receiver uses sequence numbers to detect problems; acknowledgments communicate this detection back to the sender; timers ensure the sender eventually retransmits unacknowledged data. Together, they guarantee that every byte eventually arrives, in order, exactly once.

Practical Implications and Considerations

Understanding sequence numbers has practical implications for network engineers, developers, and security professionals:

Performance Analysis:

When diagnosing TCP performance issues:

Gaps in sequence numbers indicate packet loss
Repeated sequence numbers indicate retransmissions
ACK advancement rate shows effective throughput
Window utilization (SND.NXT - SND.UNA vs SND.WND) shows efficiency

Key Diagnostic Patterns

•Steady ACK growth: Healthy connection with smooth data flow
•ACK plateau: Receiver stopped acknowledging—check for loss or window full
•Repeated SEQ numbers: Sender is retransmitting—look for timeout or duplicate ACKs
•Large SEQ jumps: May indicate selective acknowledgment (SACK) or application gaps
•SEQ wraparound: Expected at high bandwidth; ensure PAWS is enabled

Security Considerations:

Sequence numbers were originally predictable, leading to attacks:

Session Hijacking: Attacker guesses SEQ numbers to inject forged segments
RST Attacks: Injecting RST segments with "acceptable" sequence numbers terminates connections
Blind Data Injection: Sending data with legitimate-looking sequence numbers

Modern TCP implementations use randomized Initial Sequence Numbers (ISN) to mitigate these attacks. RFC 6528 specifies secure ISN generation based on connection identifiers and a secret key.

Sequence Number Prediction is a Security Threat

Never implement TCP with predictable sequence numbers. Attackers can exploit predictability to hijack connections or inject malicious data. Always use cryptographically strong randomization for ISN selection. We explore ISN generation in depth in a later section of this module.

Summary: The Foundation of Reliable Transport

We've explored the conceptual foundations of TCP sequence numbers—the mechanism that enables reliable, ordered delivery over an unreliable network. Let's consolidate the key insights:

Key Takeaways

•Sequence numbers identify every byte — Each byte in the TCP stream has a unique 32-bit sequence number, enabling precise tracking and manipulation
•32-bit circular space — With 4 billion possible values and modular arithmetic, TCP handles high-bandwidth wraparound correctly
•Enables loss detection — Gaps in sequence numbers immediately reveal lost segments that need retransmission
•Enables reordering correction — Out-of-order segments can be buffered and delivered correctly by matching sequence numbers
•Enables duplicate elimination — Previously received sequence numbers are detected and discarded
•Foundation for all reliability mechanisms — Flow control, congestion control, and retransmission all depend on sequence numbers
•Security-critical — Predictable sequence numbers enable attacks; modern TCP uses randomized ISNs

What's Next:

Now that we understand the concept of sequence numbers, we'll examine TCP's byte-oriented nature in detail. Unlike protocols that number packets or messages, TCP numbers individual bytes within the data stream. This design decision has profound implications for how TCP handles varying segment sizes, partial transmissions, and the byte-stream abstraction presented to applications.

Page Complete

You now understand the fundamental concept of TCP sequence numbers and their critical role in reliable data delivery. Next, we examine why TCP numbers bytes rather than packets, and how this byte-oriented design shapes the protocol's behavior.