Loading learning content...
When you send an email, make an API call, or stream a video over TCP, you're not really sending "packets." From TCP's perspective, you're contributing to a continuous stream of bytes—and TCP assigns a sequence number to every single one of those bytes.
This design choice—numbering bytes rather than packets or messages—is one of TCP's most fundamental architectural decisions. It defines how TCP presents data to applications, how it handles varying network conditions, and why the same application data might be split across multiple segments or combined into one.
This page explores TCP's byte-oriented transmission in depth: why it exists, how it works, and what it means for reliable communication.
By the end of this page, you will understand: the byte-stream abstraction TCP provides to applications, why bytes are numbered instead of packets, how segmentation and reassembly work at the byte level, the relationship between application writes and TCP segments, and how byte-orientation enables TCP's flexibility in handling network dynamics.
TCP provides applications with a byte-stream abstraction—the appearance of a continuous, reliable, ordered stream of bytes flowing between endpoints. This abstraction hides all the complexity of the underlying network:
Think of it like a pipe between two applications. One end puts bytes in; the other end receives the same bytes in the same order. The pipe handles all the messy details of the unreliable network in between.
Key Insight: The sender writes "Hello World!" as one call, but TCP might split it across multiple segments. The receiver might receive the segments out of order. Yet the receiving application sees "Hello World!" exactly as sent—the byte-stream abstraction is preserved.
No Message Boundaries:
Unlike UDP, TCP does not preserve application message boundaries. If an application makes three send() calls:
send("Hello");
send(" ");
send("World");
The receiver might see this data in any of these forms:
recv() returning "Hello W"recv() returning "orld"Or:
recv() returning "Hello World"TCP guarantees the bytes arrive in order, but not that message boundaries are preserved. Applications must implement their own message framing if needed.
Since TCP doesn't preserve message boundaries, protocols built on TCP must define their own framing. HTTP uses Content-Length or chunked transfer encoding. Many protocols use length prefixes or delimiters. This is the application's responsibility—TCP only guarantees byte-order, not message structure.
Other protocols number packets or messages. Why does TCP number individual bytes? This design was intentional and provides crucial flexibility.
Consider the alternative: Packet-numbered protocol
If TCP numbered packets instead of bytes:
To acknowledge progress, the receiver could only say "I got packet #2" but couldn't indicate that 1000 bytes of data are ready. If the sender retransmits with a different segment size (maybe 500 bytes each due to path MTU changes), how would the receiver correlate the new packets with the old?
Byte numbering solves this elegantly:
Concrete Example: Dynamic Segmentation
Consider a sender transmitting bytes 1000-3999 (3000 bytes). Initially, the path supports 1000-byte segments:
Path MTU changes, now only 500-byte segments work. The retransmission:
Because sequence numbers identify bytes, the receiver seamlessly accepts the retransmission even though segmentation differs. ACK=4000 confirms all 3000 bytes received.
Byte-oriented numbering decouples the data stream from its physical segmentation. Segments are merely containers for byte ranges. The same data can be carried in different-sized segments, retransmitted with different segmentation, and acknowledged by byte position. This flexibility is essential for TCP's adaptability.
Each TCP segment's sequence number identifies the first byte of data in that segment. The relationship between sequence numbers and byte positions is fundamental to understanding TCP's operation.
Formal Definition:
For a segment with:
The bytes in this segment occupy positions:
123456789101112131415161718192021222324252627282930313233343536373839404142
def calculate_byte_range(seq_number: int, data_length: int) -> tuple: """ Calculate the byte range covered by a TCP segment. Args: seq_number: The sequence number from TCP header data_length: Number of data bytes in the segment Returns: Tuple of (first_byte, last_byte, next_expected) """ first_byte = seq_number last_byte = seq_number + data_length - 1 next_expected = seq_number + data_length return (first_byte, last_byte, next_expected) # Example: Segment with SEQ=5000 carrying 1460 bytesfirst, last, next_exp = calculate_byte_range(5000, 1460)print(f"Bytes {first} to {last} in this segment")print(f"Next segment should start at {next_exp}")# Output:# Bytes 5000 to 6459 in this segment# Next segment should start at 6460 # Example: Multiple segments forming a streamsegments = [ (1000, 500), # SEQ=1000, 500 bytes (1500, 1000), # SEQ=1500, 1000 bytes (2500, 750), # SEQ=2500, 750 bytes] print("\nByte stream composition:")for seq, length in segments: first, last, next_exp = calculate_byte_range(seq, length) print(f" SEQ={seq}: bytes {first}-{last}") # Output:# Byte stream composition:# SEQ=1000: bytes 1000-1499# SEQ=1500: bytes 1500-2499# SEQ=2500: bytes 2500-3249Segment Size Independence:
The same byte stream can be segmented in countless ways. All of the following represent the same 3000 bytes:
Segmentation A (3 segments of 1000 bytes each):
Segmentation B (6 segments of 500 bytes each):
Segmentation C (Mixed sizes):
All three segmentations describe the identical byte stream. The receiver treats them equivalently.
While segments can be any size up to 65,535 bytes theoretically, practical constraints limit segment size. MSS (Maximum Segment Size) is negotiated during connection setup and is typically derived from the path MTU minus IP and TCP header sizes. Common values are 1460 bytes (for Ethernet) or 1360 bytes (accounting for additional headers in some networks).
A critical implication of byte-oriented transmission is that application write boundaries do not correspond to segment boundaries. TCP's segmentation is independent of how the application writes data.
The TCP Send Buffer:
When an application calls send() or write(), data enters TCP's send buffer—a contiguous byte queue. TCP extracts bytes from this buffer and packages them into segments based on:
Application makes these writes: write(100 bytes) → Send buffer: [100 bytes] write(200 bytes) → Send buffer: [300 bytes total] write(1500 bytes) → Send buffer: [1800 bytes total] TCP with MSS=1000 might segment as: Segment 1: 1000 bytes (partial from writes 1, 2, and 3) Segment 2: 800 bytes (remainder of writes) ═════════════════════════════════════════════════════════════Application Writes: | 100 | 200 | 1500 | └───────┴───────────┴───────────────────┘ TCP Segmentation: | 1000 | 800 | └─────────────────────────┴─────────────┘ Segment 1 Segment 2═════════════════════════════════════════════════════════════ Note: Application boundaries (|) don't align with TCP boundaries!Coalescing and Splitting:
Both happen transparently. The byte-stream abstraction is maintained regardless of segmentation.
A common mistake is assuming that if you send() a complete JSON object, it will arrive as one recv() call. It won't, necessarily. TCP may deliver partial objects or combined objects. Always design your application to handle arbitrary byte boundaries in received data.
The receiver uses sequence numbers to reassemble the byte stream correctly, regardless of segment arrival order or size. This reassembly process is central to TCP's reliability.
Reassembly Buffer:
The receiver maintains a reassembly buffer indexed by sequence number. Each arriving segment places its bytes at the correct positions:
Buffer position = Segment SEQ - Initial Receive Sequence (IRS)
This means:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
class TCPReassemblyBuffer: """ Simplified TCP reassembly buffer demonstrating byte-oriented receive processing. """ def __init__(self, initial_receive_seq): self.irs = initial_receive_seq # Initial Receive Sequence self.buffer = {} # SEQ -> bytes mapping self.rcv_nxt = initial_receive_seq # Next expected SEQ def receive_segment(self, seq: int, data: bytes) -> bytes: """ Process an incoming segment. Returns bytes deliverable to the application (contiguous from rcv_nxt). """ # Calculate buffer positions for this segment's bytes for i, byte in enumerate(data): byte_seq = seq + i # Only accept bytes within receive window if byte_seq < self.rcv_nxt: # Duplicate byte - already received continue # Store byte at its sequence number position if byte_seq not in self.buffer: self.buffer[byte_seq] = byte # Deliver contiguous bytes to application deliverable = bytearray() while self.rcv_nxt in self.buffer: deliverable.append(self.buffer.pop(self.rcv_nxt)) self.rcv_nxt += 1 return bytes(deliverable) # Example: Out-of-order arrivalbuffer = TCPReassemblyBuffer(initial_receive_seq=1000) # Segments arrive out of orderresult1 = buffer.receive_segment(1005, b"World") # Future: bufferedprint(f"After SEQ=1005: delivered {len(result1)} bytes")# Output: After SEQ=1005: delivered 0 bytes result2 = buffer.receive_segment(1000, b"Hello") # Expected: fills gap!print(f"After SEQ=1000: delivered {len(result2)} bytes")print(f"Delivered: {result2.decode()}")# Output: After SEQ=1000: delivered 10 bytes# Output: Delivered: HelloWorldKey Reassembly Properties:
Position Independence: Segments place bytes at positions determined by SEQ, not arrival order
Gap Tolerance: Missing bytes create gaps in the buffer; receipt of the missing segment fills the gap
Duplicate Handling: Bytes already in buffer or delivered are discarded
Contiguous Delivery: Only contiguous bytes from RCV.NXT are delivered to the application
Buffer Indexing: The buffer is logically indexed by sequence number, enabling O(1) placement
By buffering out-of-order segments, TCP avoids unnecessary retransmissions. If segments 1, 3, and 4 arrive but segment 2 is delayed, only segment 2 needs retransmission. Without buffering, segments 2, 3, and 4 would all need retransmitting. Modern TCP implementations always buffer out-of-order data.
Byte-oriented numbering profoundly impacts how TCP implements flow control. The receiver's window advertisement specifies exactly how many bytes can be accepted, not how many packets.
Receive Window (rwnd):
The receiver advertises its receive window in each ACK segment:
Window = RCV.BUFFER_SIZE - (RCV.NXT - bytes_delivered_to_app)
This directly tells the sender: "You may send bytes with sequence numbers from RCV.NXT up to RCV.NXT + rwnd - 1."
| Concept | Byte-Oriented Behavior | Why It Matters |
|---|---|---|
| Window Size | Exact byte count | Receiver knows precisely how much buffer space to allocate |
| Window Update | Any byte count | Can open/close by exact bytes as app reads data |
| Zero Window | No bytes allowed | Sender stops until window opens (window probe continues) |
| Window Scaling | Multiplier for bytes | Allows larger windows (up to 1 GB) for high-BDP paths |
| Silly Window Prevention | Byte-level decisions | Clark's algorithm: don't advertise tiny windows |
Precise Control:
Byte-oriented windows enable precise control:
This fine-grained control prevents receiver buffer overflow while maximizing throughput.
Byte-Oriented Flow Control Example:═══════════════════════════════════════════════════════════════════ Initial state: Receiver buffer: 64 KB RCV.NXT = 10000 Advertised window: 65535 bytes Sender may send: bytes 10000 through 75534 (65535 bytes) ─────────────────────────────────────────────────────────────────Event: Sender transmits 20000 bytes (SEQ 10000-29999)───────────────────────────────────────────────────────────────── Receiver state: RCV.NXT = 30000 (after receiving all 20000) Buffer used: 20000 bytes Advertised window: 65535 - 20000 = 45535 bytes Sender may now send: bytes 30000 through 75534 (45535 bytes) ─────────────────────────────────────────────────────────────────Event: Application reads 10000 bytes from receive buffer───────────────────────────────────────────────────────────────── Receiver state: RCV.NXT = 30000 (unchanged - no new data) Buffer used: 10000 bytes (app consumed 10000) Advertised window: 65535 - 10000 = 55535 bytes Sender may now send: bytes 30000 through 85534 (55535 bytes) ↑ Window "opened" by exactly the bytes the application readThe original 16-bit window field limits advertised windows to 64 KB. For high-bandwidth, high-latency paths (high bandwidth-delay product), this is insufficient. RFC 7323 defines window scaling—a multiplier negotiated during handshake that allows effective windows up to 1 GB. The scaling factor applies to the byte count, maintaining byte-oriented semantics.
To fully appreciate TCP's byte-oriented design, let's contrast it with message-oriented protocols like UDP and SCTP.
UDP (User Datagram Protocol):
UDP is message-oriented. Each sendto() creates exactly one datagram; each recvfrom() returns exactly one datagram. Message boundaries are preserved. There are no sequence numbers—datagrams are independent.
| Aspect | TCP (Byte-Oriented) | UDP (Message-Oriented) |
|---|---|---|
| Data unit | Continuous byte stream | Discrete datagrams |
| Boundaries | Not preserved | Preserved exactly |
| Ordering | Guaranteed in-order delivery | No ordering guarantee |
| Partial delivery | Possible (any byte count) | All-or-nothing per datagram |
| Sequence tracking | Per-byte sequence numbers | None (or application-layer) |
| Reassembly | Automatic by TCP | Application responsibility |
| Best for | Continuous data streams | Independent messages/requests |
SCTP (Stream Control Transmission Protocol):
Interestingly, SCTP supports both orientations:
SCTP uses Transmission Sequence Numbers (TSN) that number chunks (containing messages), not bytes. This provides message-boundary preservation with reliability guarantees.
When Each Model Fits:
If your application naturally works with independent messages and needs reliability, consider whether TCP's byte stream adds complexity (requiring framing) or if SCTP's message mode or a UDP-based reliable protocol (like QUIC) might be more natural. TCP's byte stream is powerful but not always the best fit.
TCP's byte-oriented design is a fundamental architectural choice that shapes every aspect of the protocol. Let's consolidate what we've learned:
What's Next:
We've established that every byte has a sequence number—but where do those sequence numbers start? The next page explores the Initial Sequence Number (ISN)—how it's chosen, why randomization matters for security, and its role in TCP connection establishment.
You now understand TCP's byte-oriented nature: how every byte is numbered, how applications interact with the byte stream, how receivers reassemble data, and how this design enables TCP's flexibility. Next, we examine how the initial sequence number is selected and why it matters.