Loading content...
Every transport protocol imposes overhead—additional bytes, processing cycles, and memory that don't directly serve your application's data. This overhead might seem negligible for a single packet, but in systems processing billions of packets daily, even a few bytes or microseconds per packet translates to significant infrastructure costs.
Understanding overhead isn't just academic curiosity. It's the difference between a system that scales gracefully and one that hits unexpected walls. Between infrastructure that costs $10,000/month and infrastructure that costs $100,000/month. Between latency that users tolerate and latency that drives them away.
In this page, we'll dissect the overhead of UDP and TCP with surgical precision—examining every byte, every CPU cycle, and every memory allocation.
By the end of this page, you will understand exactly what overhead each protocol imposes, where that overhead comes from, and how to calculate the efficiency impact for any given workload. You'll be able to quantify the cost difference when choosing between UDP and TCP for your applications.
The most visible overhead is the protocol header—bytes prepended to every packet that serve protocol functions rather than carrying application data.
UDP Header: The Minimalist Approach
UDP's header is among the simplest in networking—just 8 bytes containing four 16-bit fields:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Field Breakdown:┌──────────────────┬────────────┬────────────────────────────────────────┐│ Field │ Size │ Purpose │├──────────────────┼────────────┼────────────────────────────────────────┤│ Source Port │ 16 bits │ Sender's port for demultiplexing ││ Destination Port │ 16 bits │ Receiver's port for demultiplexing ││ Length │ 16 bits │ Total datagram size (header + data) ││ Checksum │ 16 bits │ Error detection (optional in IPv4) │├──────────────────┼────────────┼────────────────────────────────────────┤│ TOTAL │ 64 bits │ = 8 bytes │└──────────────────┴────────────┴────────────────────────────────────────┘TCP Header: The Feature-Rich Approach
TCP's header is significantly larger—20 bytes minimum, up to 60 bytes with options:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |C|E|U|A|P|R|S|F| | | Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window | | | |R|E|G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options (if Data Offset > 5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Field Breakdown:┌──────────────────────┬────────────┬───────────────────────────────────────┐│ Field │ Size │ Purpose │├──────────────────────┼────────────┼───────────────────────────────────────┤│ Source Port │ 16 bits │ Sender's port for demultiplexing ││ Destination Port │ 16 bits │ Receiver's port for demultiplexing ││ Sequence Number │ 32 bits │ Position of first byte in this segment││ Acknowledgment Number│ 32 bits │ Next expected byte from peer ││ Data Offset │ 4 bits │ Header length in 32-bit words ││ Reserved │ 4 bits │ Reserved for future use ││ Flags │ 8 bits │ Control flags (SYN, ACK, FIN, etc.) ││ Window │ 16 bits │ Receive window size for flow control ││ Checksum │ 16 bits │ Error detection (mandatory) ││ Urgent Pointer │ 16 bits │ Offset to urgent data ││ Options │ 0-40 bytes │ Variable: timestamps, window scaling │├──────────────────────┼────────────┼───────────────────────────────────────┤│ MINIMUM TOTAL │ 160 bits │ = 20 bytes ││ MAXIMUM TOTAL │ 480 bits │ = 60 bytes ││ TYPICAL TOTAL │ 256 bits │ = 32 bytes (with timestamps) │└──────────────────────┴────────────┴───────────────────────────────────────┘| Metric | UDP | TCP (minimum) | TCP (typical) | TCP (maximum) |
|---|---|---|---|---|
| Header size | 8 bytes | 20 bytes | 32 bytes | 60 bytes |
| vs UDP | — | +12 bytes (150%) | +24 bytes (300%) | +52 bytes (650%) |
| Fields count | 4 | 10 | 10 + options | 10 + options |
| Variable size? | No | Yes (options) | Yes | Yes |
Modern TCP nearly always uses options: Timestamps (10 bytes) for RTT measurement, Window Scaling (3 bytes) for large windows, and SACK (variable) for selective acknowledgment. A realistic TCP header is 32 bytes, not 20. This means TCP's header is typically 4× larger than UDP's.
Header overhead directly impacts goodput—the amount of useful application data transferred versus total bytes transmitted. The impact varies dramatically based on payload size.
Efficiency Formula:
Efficiency = Payload Size / (Payload Size + Header Size) × 100%
Adding IP and Ethernet overhead for complete picture:
| Layer | Size |
|---|---|
| Ethernet Frame Header | 14 bytes |
| IP Header (IPv4, no options) | 20 bytes |
| Transport Header | 8-60 bytes |
| Ethernet CRC | 4 bytes |
| Total Overhead | 46-98 bytes |
| Payload Size | UDP Efficiency | TCP Efficiency | Efficiency Difference |
|---|---|---|---|
| 1 byte | 2.1% | 1.0% | UDP 2.1× better |
| 10 bytes | 17.2% | 9.6% | UDP 1.8× better |
| 64 bytes | 54.2% | 39.0% | UDP 1.4× better |
| 128 bytes | 70.3% | 56.6% | UDP 1.2× better |
| 256 bytes | 82.6% | 72.7% | UDP 1.1× better |
| 512 bytes | 90.4% | 84.2% | UDP 1.07× better |
| 1024 bytes | 94.9% | 91.4% | UDP 1.04× better |
| 1460 bytes (MTU limit) | 96.4% | 94.0% | UDP 1.03× better |
Critical Insight: Small Payloads Magnify Overhead
For small payloads (under 64 bytes), the overhead difference is substantial. This pattern appears in several real-world scenarios:
When small payloads are unavoidable, batching multiple logical messages into a single packet amortizes header overhead. Nagle's algorithm (TCP) and application-layer coalescing can help, though they add latency. The trade-off continues.
Beyond per-packet header overhead, TCP requires maintaining connection state for every active connection. This state consumes memory and processing resources on both endpoints.
UDP Connection State: Essentially Zero
UDP is stateless from the protocol's perspective. Each datagram is independent. The only 'state' is the socket binding (port allocation), consuming minimal resources:
UDP Socket State (per socket, not per 'connection'):┌────────────────────────────────────────────────────────────────┐│ Local IP Address │ 4-16 bytes (IPv4/IPv6) ││ Local Port │ 2 bytes ││ Receive Buffer │ Configurable (typically 64KB-2MB) ││ Send Buffer │ Configurable (typically 64KB-2MB) │└────────────────────────────────────────────────────────────────┘ Total State Per Socket: ~6-18 bytes + buffer memoryState Per Datagram: 0 bytes (no tracking) Key Insight: One UDP socket can send to ANY number of destinations without additional state per destination.TCP Connection State: Substantial Per-Connection
Every TCP connection requires tracking extensive state for reliability, flow control, and congestion control:
TCP Connection Block (TCB) - State Per Connection:┌────────────────────────────────────────────────────────────────────────────┐│ CONNECTION IDENTIFICATION │├────────────────────────────────────────────────────────────────────────────┤│ Local IP Address │ 4-16 bytes ││ Local Port │ 2 bytes ││ Remote IP Address │ 4-16 bytes ││ Remote Port │ 2 bytes ││ Connection State │ 1 byte (ESTABLISHED, TIME_WAIT, etc.) │├────────────────────────────────────────────────────────────────────────────┤│ SEQUENCE NUMBER MANAGEMENT │├────────────────────────────────────────────────────────────────────────────┤│ Send Sequence Number (SND.NXT) │ 4 bytes ││ Send Unacknowledged (SND.UNA) │ 4 bytes ││ Send Window Size (SND.WND) │ 4 bytes ││ Send Window Scale │ 1 byte ││ Initial Send Sequence (ISS) │ 4 bytes ││ Receive Next (RCV.NXT) │ 4 bytes ││ Receive Window (RCV.WND) │ 4 bytes ││ Receive Window Scale │ 1 byte ││ Initial Receive Sequence (IRS) │ 4 bytes │├────────────────────────────────────────────────────────────────────────────┤│ CONGESTION CONTROL │├────────────────────────────────────────────────────────────────────────────┤│ Congestion Window (cwnd) │ 4 bytes ││ Slow Start Threshold (ssthresh) │ 4 bytes ││ RTT Estimate │ 8 bytes ││ RTT Variance │ 8 bytes ││ Retransmission Timeout (RTO) │ 4 bytes ││ Duplicate ACK Count │ 2 bytes │├────────────────────────────────────────────────────────────────────────────┤│ TIMERS AND TIMESTAMPS │├────────────────────────────────────────────────────────────────────────────┤│ Retransmission Timer │ Timer structure (~16 bytes) ││ Delayed ACK Timer │ Timer structure (~16 bytes) ││ Keepalive Timer │ Timer structure (~16 bytes) ││ TIME_WAIT Timer │ Timer structure (~16 bytes) ││ Persist Timer │ Timer structure (~16 bytes) ││ Last ACK Sent Timestamp │ 8 bytes ││ Last Data Received Timestamp │ 8 bytes │├────────────────────────────────────────────────────────────────────────────┤│ BUFFERS │├────────────────────────────────────────────────────────────────────────────┤│ Send Buffer │ Configurable (typically 16KB-16MB) ││ Receive Buffer │ Configurable (typically 16KB-16MB) ││ Out-of-Order Queue │ Variable (holds out-of-order segments) ││ Retransmit Queue │ Variable (holds unacknowledged data) │├────────────────────────────────────────────────────────────────────────────┤│ SACK INFORMATION │├────────────────────────────────────────────────────────────────────────────┤│ SACK Blocks │ Up to 4 blocks × 8 bytes = 32 bytes ││ SACK Permitted Flag │ 1 bit │└────────────────────────────────────────────────────────────────────────────┘ Estimated TCB Size (without buffers): 200-500 bytes per connectionWith typical buffers: 32KB - 32MB per connection| Concurrent Connections | UDP Memory | TCP Memory (min) | TCP Memory (typical) |
|---|---|---|---|
| 100 | ~20 KB | ~50 KB | ~6.4 MB |
| 1,000 | ~20 KB | ~500 KB | ~64 MB |
| 10,000 | ~20 KB | ~5 MB | ~640 MB |
| 100,000 | ~20 KB | ~50 MB | ~6.4 GB |
| 1,000,000 (C10M) | ~20 KB | ~500 MB | ~64 GB |
For servers handling millions of concurrent connections (the 'C10M' problem), TCP's per-connection state is a significant challenge. While UDP's stateless nature handles this easily, applications using UDP must implement their own application-layer state if reliability is needed—often ending up with similar memory requirements.
Beyond memory, each packet requires CPU cycles for processing. The complexity difference between UDP and TCP processing is substantial.
UDP Processing: Minimal Path
UDP Receive Path (Approximate CPU Operations):─────────────────────────────────────────────────────────────────1. Receive interrupt from NIC │ ~500 cycles2. DMA packet data to memory │ ~200 cycles3. IP header validation │ ~100 cycles4. UDP header extraction (8 bytes) │ ~50 cycles5. Checksum verification (if enabled) │ ~500 cycles6. Port lookup (hash table) │ ~100 cycles7. Queue to socket receive buffer │ ~200 cycles8. Wake waiting application │ ~300 cycles─────────────────────────────────────────────────────────────────TOTAL: ~2,000 CPU cycles per packet UDP Send Path (Approximate):─────────────────────────────────────────────────────────────────1. Application system call │ ~500 cycles2. Socket lookup │ ~100 cycles3. Buffer allocation │ ~200 cycles4. UDP header construction (8 bytes) │ ~50 cycles5. Checksum calculation (optional) │ ~500 cycles6. IP header construction │ ~100 cycles7. Queue to NIC │ ~200 cycles─────────────────────────────────────────────────────────────────TOTAL: ~1,650 CPU cycles per packetTCP Processing: Extended Path
TCP Receive Path (Approximate CPU Operations):─────────────────────────────────────────────────────────────────1. Receive interrupt from NIC │ ~500 cycles2. DMA packet data to memory │ ~200 cycles3. IP header validation │ ~100 cycles4. TCP header extraction (20-60 bytes) │ ~150 cycles5. Checksum verification (mandatory) │ ~500 cycles6. Connection lookup (4-tuple hash) │ ~200 cycles7. State machine validation │ ~300 cycles8. Sequence number validation │ ~200 cycles9. ACK processing │ ~500 cycles - Update send window │ - Release acknowledged data from retransmit Q │ - Update RTT estimate if timestamp present │10. Window update │ ~150 cycles11. Out-of-order handling │ ~300 cycles - Check if fits in sequence │ - Buffer if out of order │ - Coalesce if fillsgap │12. Congestion control update │ ~200 cycles13. Queue to socket receive buffer │ ~200 cycles14. SACK block management │ ~200 cycles15. Delayed ACK timer management │ ~150 cycles16. Generate ACK (if needed) │ ~500 cycles17. Wake waiting application │ ~300 cycles─────────────────────────────────────────────────────────────────TOTAL: ~4,650+ CPU cycles per packet TCP Send Path (Per Segment):─────────────────────────────────────────────────────────────────1. Application write system call │ ~500 cycles2. Socket/connection lookup │ ~200 cycles3. Available window check │ ~150 cycles4. Segment size determination (MSS, cwnd, rwnd) │ ~300 cycles5. Sequence number assignment │ ~100 cycles6. TCP header construction (20-60 bytes) │ ~200 cycles7. Timestamp option insertion │ ~100 cycles8. Checksum calculation │ ~500 cycles9. Copy to retransmit queue │ ~300 cycles10. Start/reset retransmission timer │ ~200 cycles11. Nagle algorithm check │ ~100 cycles12. IP header construction │ ~100 cycles13. Queue to NIC │ ~200 cycles─────────────────────────────────────────────────────────────────TOTAL: ~2,950+ CPU cycles per segment| Operation | UDP Cycles | TCP Cycles | TCP Overhead |
|---|---|---|---|
| Receive processing | ~2,000 | ~4,650 | +132% |
| Send processing | ~1,650 | ~2,950 | +79% |
| Round-trip (send+receive) | ~3,650 | ~7,600 | +108% |
| At 1M packets/sec | 3.65 Gcycles/s | 7.6 Gcycles/s | +3.95 Gcycles/s |
| CPU cores needed @ 3GHz | ~1.2 cores | ~2.5 cores | +1.3 cores |
Modern NICs provide TCP Offload Engine (TOE), Large Receive Offload (LRO), and TCP Segmentation Offload (TSO), reducing CPU overhead significantly. However, UDP also benefits from equivalent offloads (UDP Fragmentation Offload). The relative difference remains, though absolute numbers decrease.
One of TCP's most significant hidden costs is acknowledgment traffic. Every data segment (with some exceptions) generates a corresponding acknowledgment, consuming bandwidth in the reverse direction.
The ACK Amplification Problem
Scenario: Download 100 MB file UDP (no application ACK):─────────────────────────────────────────────────────Download direction: 100 MB of dataUpload direction: 0 bytes (no protocol acknowledgments)Total traffic: 100 MBOverhead: 0% TCP with typical delayed ACK (1 ACK per 2 segments):─────────────────────────────────────────────────────Number of segments: ~68,500 (assuming 1460 byte payload per segment)ACKs generated: ~34,250 (delayed ACK: 1 per 2 segments)ACK packet size: ~52 bytes (Ethernet + IP + TCP headers, no data) Download direction: 100 MB of data (with TCP headers) 100 MB + (68,500 × 32 bytes TCP overhead) = 102.2 MBUpload direction: 34,250 × 52 bytes = 1.78 MBTotal traffic: ~104 MBACK overhead: ~4% TCP with 1:1 immediate ACK (worst case):─────────────────────────────────────────────────────ACKs generated: ~68,500 (one per segment)Upload direction: 68,500 × 52 bytes = 3.56 MBTotal traffic: ~106 MBACK overhead: ~6%The Asymmetric Connection Problem
ACK traffic is particularly problematic on asymmetric connections where upload bandwidth is limited:
| Connection Type | Download | Upload | ACK Rate Limit | Effective Download |
|---|---|---|---|---|
| ADSL 24/1 Mbps | 24 Mbps | 1 Mbps | 2.4 Mbps equivalent | ~90% of max |
| Cable 100/10 Mbps | 100 Mbps | 10 Mbps | 24 Mbps equivalent | ~100% of max |
| Cable 1000/35 Mbps | 1000 Mbps | 35 Mbps | 84 Mbps equivalent | ~8% of max (!!) |
| Satellite 25/3 Mbps | 25 Mbps | 3 Mbps | 7.2 Mbps equivalent | ~30% of max |
On highly asymmetric connections (like Gigabit download with limited upload), TCP acknowledgments can actually become the bottleneck. The upload pipe fills with ACKs, backing up the sender and limiting effective download speed to a fraction of the theoretical maximum. UDP avoids this entirely.
Bidirectional Traffic: Where ACKs Hide
When traffic flows in both directions, TCP's delayed ACK mechanism 'piggybacks' acknowledgments on data packets, hiding the cost:
TCP's three-way handshake adds both latency overhead (covered in Page 1) and bandwidth/processing overhead worth examining.
TCP Handshake Packet Analysis
TCP Connection Establishment Packets:═══════════════════════════════════════════════════════════════════════════════ Packet 1: SYN (Client → Server)───────────────────────────────────────────────────────────────────────────────Ethernet: 14 bytesIP Header: 20 bytesTCP Header: 32 bytes (20 base + 12 options: MSS, Window Scale, SACK OK, Timestamp)TCP Data: 0 bytes───────────────────────────────────────────────────────────────────────────────Total: 66 bytes on wire Packet 2: SYN-ACK (Server → Client) ───────────────────────────────────────────────────────────────────────────────Ethernet: 14 bytesIP Header: 20 bytesTCP Header: 32 bytes (same options echoed)TCP Data: 0 bytes───────────────────────────────────────────────────────────────────────────────Total: 66 bytes on wire Packet 3: ACK (Client → Server)───────────────────────────────────────────────────────────────────────────────Ethernet: 14 bytesIP Header: 20 bytesTCP Header: 32 bytes (timestamp only after handshake)TCP Data: 0 bytes (often first data is included: "ACK+DATA")───────────────────────────────────────────────────────────────────────────────Total: 66 bytes on wire ═══════════════════════════════════════════════════════════════════════════════TOTAL HANDSHAKE OVERHEAD: 198 bytes / 3 packets / 1.5 RTT═══════════════════════════════════════════════════════════════════════════════ TCP Connection Termination (Four-Way Handshake):───────────────────────────────────────────────────────────────────────────────FIN (A→B): 66 bytesACK (B→A): 66 bytesFIN (B→A): 66 bytes (often combined with above ACK)ACK (A→B): 66 bytes───────────────────────────────────────────────────────────────────────────────Total: 198-264 bytes / 3-4 packets / 2-4 RTTImpact on Short-Lived Connections
For long-lived connections transferring megabytes, 200 bytes of handshake overhead is negligible. For short request-response patterns, it dominates:
| Data Transferred | UDP Total | TCP Total | Handshake %age of TCP |
|---|---|---|---|
| 64 bytes (DNS query) | 64 B | ~460 B | 86% overhead |
| 256 bytes (small API) | 256 B | ~656 B | 61% overhead |
| 1 KB | 1 KB | ~1.4 KB | 29% overhead |
| 10 KB | 10 KB | ~10.4 KB | 4% overhead |
| 100 KB | 100 KB | ~100.4 KB | 0.4% overhead |
| 1 MB | 1 MB | ~1.0004 MB | 0.04% overhead |
HTTP/1.1 keep-alive, HTTP/2 multiplexing, and connection pooling exist specifically to amortize TCP handshake overhead across many requests. A single TCP connection can serve thousands of HTTP requests, making the initial handshake negligible. Without connection reuse, TCP overhead is devastating for small requests.
When packets are lost, TCP must retransmit them. This creates additional overhead that doesn't exist in UDP (where lost data simply stays lost).
Quantifying Retransmission Overhead
| Packet Loss Rate | Effective Retransmit Ratio | Bandwidth Overhead | Notes |
|---|---|---|---|
| 0% | 1.000× | 0% | Ideal network |
| 0.1% | 1.001× | 0.1% | Typical good connection |
| 1% | 1.010× | 1% | Acceptable quality |
| 2% | 1.020× | 2% | Noticeable quality issues |
| 5% | 1.053× | 5.3% | Poor connection |
| 10% | 1.111× | 11.1% | Very poor connection |
| 20% | 1.250× | 25% | Barely usable |
Retransmission Interactions with Congestion Control
The true cost of retransmission isn't just the extra bytes sent—it's the cascade effect on TCP's congestion control:
Scenario: 100 Mbps link, 50ms RTT, 2% random packet loss Theoretical maximum: 100 Mbps TCP Throughput Estimation (simplified Mathis formula): Throughput ≈ (MSS / RTT) × (1 / sqrt(loss_rate)) Throughput ≈ (1460 bytes / 0.050 sec) × (1 / sqrt(0.02)) Throughput ≈ 29,200 bytes/sec × 7.07 Throughput ≈ 206,364 bytes/sec Throughput ≈ 1.65 Mbps Achieved: 1.65 Mbps out of 100 Mbps availableEfficiency: 1.65% This is NOT the retransmission bytes (2% extra).This is the CONGESTION CONTROL RESPONSE to those losses.TCP's politeness in the face of loss destroys throughput. UDP comparison: Sent: 100 Mbps Received: ~98 Mbps (2% lost) Efficiency: 98% TCP uses 60× less bandwidth than UDP on the same lossy link.Retransmission overhead is typically described as 'sending the same data twice.' The reality is far worse. TCP's congestion control response to loss reduces throughput by 10-100× more than the actual bytes retransmitted. This is why lossy links (wireless, satellite) perform so much worse with TCP than raw packet loss would suggest.
We've dissected the overhead costs of UDP and TCP across multiple dimensions. Let's consolidate the key insights:
What's Next:
Now that we understand the overhead differences, we'll examine connection handling—how UDP's connectionless nature and TCP's connection-oriented service fundamentally shape application architecture and behavior.
You can now quantify the overhead costs of choosing UDP vs TCP for any workload. You understand where overhead comes from, how it scales, and which scenarios magnify or diminish the differences. This knowledge enables informed protocol selection and performance optimization.