Http2 - Learning Module

Loading content...

0/228

Multiplexing: Concurrent Streams on a Single Connection

The Multiplexing Breakthrough

If binary framing is HTTP/2's foundation, multiplexing is its crown jewel. Multiplexing solves the most fundamental performance problem in HTTP/1.1: the inability to efficiently use network connections for concurrent requests.

In HTTP/1.1, web browsers open 6-8 parallel connections per domain, each carrying one request at a time. Even with connection reuse, this creates artificial bottlenecks. HTTP/2 multiplexing eliminates this constraint entirely—a single connection can carry hundreds of concurrent requests and responses, interleaved at the frame level.

This isn't incremental improvement. It's a paradigm shift in how browsers and servers communicate, enabling page load optimizations that were simply impossible before.

What You Will Learn

By the end of this page, you will understand: (1) Why HTTP/1.1's connection model limits performance, (2) How multiplexing interleaves multiple streams on one connection, (3) The mechanics of frame scheduling and stream management, (4) How multiplexing eliminates head-of-line blocking at the HTTP layer, (5) The practical performance implications for modern web applications, and (6) Why TCP limitations still matter and how HTTP/3 addresses them.

The HTTP/1.1 Connection Problem

To appreciate multiplexing, we must first understand the problem it solves. HTTP/1.1's connection model creates fundamental constraints that limit web performance.

The Request-Response Serialization Problem:

HTTP/1.1 connections are inherently serial. Each connection can process one request-response pair at a time:

Connection 1: [Request A] → [Response A] → [Request B] → [Response B] → ...

While HTTP/1.1 introduced pipelining to send multiple requests without waiting for responses, it had a fatal flaw: responses must return in the same order as requests. If Response A takes 500ms and Response B takes 10ms, Response B waits behind Response A—even though it's ready immediately.

This is head-of-line (HOL) blocking at the HTTP layer.

HTTP/1.1 Connection Limitations
Problem	Description	Performance Impact
Serial Processing	One request-response pair per connection at a time	Connection utilization drops during request/response wait times
Head-of-Line Blocking	Pipelined responses must return in order	Fast responses blocked by slow ones; unpredictable latency
Connection Limits	Browsers limit 6-8 connections per domain	Limits parallelism; creates artificial bottlenecks
Connection Overhead	Each connection requires TCP handshake + TLS setup	3+ RTTs overhead per new connection; expensive for many small resources
Header Redundancy	Every request repeats full headers	Kilobytes of duplicate data per page load

The Workaround Era:

Web developers invented numerous workarounds to mitigate HTTP/1.1's limitations:

Domain Sharding: Split resources across multiple domains (cdn1.example.com, cdn2.example.com) to bypass per-domain connection limits. Adds DNS lookups and TLS handshakes.
Resource Bundling: Concatenate JavaScript and CSS files to reduce request count. Breaks caching granularity; change one line, re-download megabytes.
Image Sprites: Combine many images into one sprite sheet. Complex maintenance; downloads unused images.
Inlining: Embed CSS, JS, or images directly in HTML. Eliminates caching; bloats HTML.
Lazy Loading: Defer non-critical resources. Adds complexity; may delay user experience.

These workarounds added development complexity, hurt caching efficiency, and still couldn't fully solve the fundamental problem. HTTP/2 multiplexing makes most of them unnecessary.

The Real Cost

A typical web page loads 70+ resources. With 6 connections and serial processing, loading these resources requires 12+ round trips. Each additional RTT adds latency proportional to network distance—50ms on a fast connection, 200ms+ on mobile networks. The cumulative effect: pages that could load in 500ms take 2-3 seconds, purely due to HTTP/1.1 protocol limitations.

Multiplexing Fundamentals

HTTP/2 multiplexing allows multiple independent request-response exchanges—called streams—to share a single TCP connection. Frames from different streams interleave freely, with the Stream Identifier in each frame header enabling the receiver to reassemble the correct stream.

The Core Concept:

Instead of:

Connection 1: [Req A]────────[Resp A]────────[Req B]────────[Resp B]
Connection 2: [Req C]────────[Resp C]────────[Req D]────────[Resp D]

HTTP/2 does:

Connection: [A][C][B][A][D][C][B][A][D][C][B][A]...
            │  │  │  │  │  │  │  │  │  │  │  └── Frames interleaved
            └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──── All streams on ONE connection

Each letter represents a frame. Frames from streams A, B, C, and D arrive interleaved, but the receiver reconstructs each logical stream independently.

Converting Mermaid diagram...

Why This Works:

Multiplexing is possible because of HTTP/2's binary framing:

Stream Identifiers: Every frame includes a 31-bit stream ID, enabling demultiplexing
Fixed Frame Boundaries: The receiver knows exactly where each frame ends
Independent Reassembly: Frames are collected per-stream and reassembled independently
No Ordering Requirement: Unlike HTTP/1.1 pipelining, responses can complete in any order

When Stream 3's response finishes before Stream 1, it's delivered immediately. The client processes each stream as it completes—no waiting for earlier requests.

Streams Are Not Connections

A stream is a logical, bidirectional sequence of frames for a single request-response. Streams exist only at the HTTP layer—the underlying TCP connection sees only a sequence of bytes. This key insight explains why HTTP/2 can carry hundreds of concurrent 'conversations' over a single socket.

Stream Mechanics

Understanding stream mechanics is essential for grasping how multiplexing achieves its performance benefits.

Stream Identification:

Streams use odd or even identifiers based on who initiated them:

Initiator	Stream IDs	Purpose
Client	1, 3, 5, 7, ...	Regular requests
Server	2, 4, 6, 8, ...	Server push (covered later)
Connection	0	SETTINGS, PING, GOAWAY

This prevents ID conflicts—client and server can create streams independently without coordination.

Stream States:

Each stream progresses through states as frames are exchanged:

idle → open → half-closed (local) → closed
                    or
idle → open → half-closed (remote) → closed

State transitions are triggered by specific frame types and flags:

HEADERS with END_STREAM: Opens and immediately half-closes (request with no body)
DATA with END_STREAM: Signals the sender has no more data
RST_STREAM: Immediately terminates the stream (error or cancellation)

Stream Creation:

A client creates a stream by sending a HEADERS frame with a new, unused odd stream ID. The stream enters the 'open' state when the HEADERS is received. No explicit "create stream" message exists—sending HEADERS implicitly creates the stream.

Stream State Transitions
State	Legal Actions (Client)	Legal Actions (Server)	Typical Scenario
idle	Send HEADERS	Send PUSH_PROMISE	Stream not yet used
open	Send DATA, HEADERS continuation	Send DATA, HEADERS	Active request-response exchange
half-closed (local)	Receive DATA, HEADERS	Send DATA, HEADERS	Request sent, awaiting response
half-closed (remote)	Send DATA, HEADERS	Receive DATA	Response body being sent
closed	None	None	Stream complete; ID retired

Concurrent Stream Management:

The MAX_CONCURRENT_STREAMS setting limits how many streams can be in the 'open' or 'half-closed' state simultaneously. This prevents resource exhaustion:

Client perspective: Limits how many requests can be in flight at once
Server perspective: Limits memory and CPU allocated to concurrent processing

Typical values: 100-1000 concurrent streams. Compare to HTTP/1.1's 6 connections—HTTP/2 provides 15-150x more parallelism on a single socket!

Stream IDs Never Reuse

Once a stream closes, its ID is permanently retired. This prevents ambiguity when delayed frames arrive—a frame for stream 7 either belongs to an active stream or is an error. After exhausting 2^31-1 odd numbers (~1 billion streams), the client must establish a new connection. At 1000 streams/second, this takes 12+ days—not a practical concern.

Frame Interleaving and Scheduling

Multiplexing enables frame interleaving, but how frames are actually scheduled determines real-world performance. Both senders (client and server) must decide which stream's frames to transmit next.

The Scheduling Challenge:

Consider a server with three responses ready to send:

Stream 1: 100KB HTML document (critical for rendering)
Stream 3: 2MB image (can load progressively)
Stream 5: 50KB JavaScript (blocks rendering)

Naive round-robin scheduling sends frames in rotation:

[S1][S3][S5][S1][S3][S5][S1][S3][S5]...

But this delays critical resources. The JavaScript (Stream 5) and HTML (Stream 1) should arrive before the large image starts consuming bandwidth.

Priority-Based Scheduling:

HTTP/2 includes a prioritization mechanism (explored in detail later in this module) that guides frame scheduling:

Streams declare dependencies on other streams
Streams have relative weights affecting resource allocation
Senders should respect priorities when scheduling frames

With proper priorities:

[S1][S1][S5][S1][S5][S1][S5][S5][S5][S3][S3][S3]...
   HTML first ─┘  └── JS next ───┘  └── Image last

The HTML and JavaScript complete while the image is just starting. The browser can begin rendering much earlier.

Frame Scheduling Considerations

•Fairness vs. Priority: Pure priority starves low-priority streams; some fairness prevents starvation
•Frame Size Impact: Larger frames reduce overhead but increase blocking; 16KB default balances these
•Buffer Availability: Flow control windows limit how much data can be sent per stream
•TCP Considerations: TCP congestion control affects overall throughput regardless of HTTP/2 scheduling
•Server-Side Variability: Different servers implement different scheduling algorithms; behavior varies

Practical Implications:

In practice, browsers signal priorities through stream dependencies:

                   Root (0)
                     │
             ┌───────┴───────┐
         CSS (3)         JS (5)
         w=256           w=32
             │
         HTML (1)
         w=256

This dependency tree tells the server: "HTML depends on CSS (CSS must arrive first for proper styling), and CSS is more important than JS (when bandwidth-constrained, prefer CSS frames)."

However, priority is advisory—servers may ignore it, and many implementations use simplified scheduling. Understanding your CDN's behavior is important for optimization.

Modern Priority Signals

HTTP/2's original priority mechanism proved complex and inconsistently implemented. HTTP/3 and the 'Extensible Priority' proposal simplify this with urgency levels and incremental hints. Many HTTP/2 implementations now support these newer semantics via the Priority header field.

Head-of-Line Blocking: Solved... Sort Of

HTTP/2 multiplexing completely eliminates HTTP-layer head-of-line blocking. If Stream 1's response is slow, Streams 3, 5, and 7 can complete and be processed immediately. This is a massive improvement over HTTP/1.1 pipelining.

The Victory:

HTTP/1.1 Pipelining:
 Request:  [A][B][C] ────────────────→
Response:  [A........................][B][C]
           └── B and C blocked by A ──┘

HTTP/2 Multiplexing:
 Request:  [A][B][C] ────────────────→
Response:  [A chunk][B chunk][C][B][C][A chunk][A]
           └── All complete as ready, no blocking ──┘

For HTTP layer interactions, this is exactly what we wanted.

The TCP Layer Problem

However, HTTP/2 still runs over TCP, and TCP has its own head-of-line blocking. If a single packet is lost, TCP holds all subsequent data until that packet is retransmitted—even if the lost packet contained data for Stream 1 and the waiting packets are for Stream 5. At the TCP layer, there's only one byte stream.

TCP Head-of-Line Blocking Explained:

Consider this scenario:

Server sends frames: [S1 pkt1][S3 pkt1][S5 pkt1][S1 pkt2][S3 pkt2]
Packet 2 (S3 pkt1) is lost in transmission
Packets 3, 4, 5 arrive at the client
TCP cannot deliver packets 3, 4, 5 to HTTP/2 layer
Application waits until packet 2 is retransmitted and received
Only then are packets 2, 3, 4, 5 delivered together

From HTTP/2's perspective, everything stops. Stream 5's data was ready, but TCP won't release it because Stream 3's earlier packet was lost. The multiplexing benefit is negated by TCP's ordering guarantee.

Converting Mermaid diagram...

The Impact:

On reliable networks (wired, data centers), TCP packet loss is rare (~0.01%), and this problem is minimal. On unreliable networks (cellular, weak WiFi), loss rates of 1-5% mean TCP HOL blocking occurs frequently.

Studies have shown that on high-loss networks, HTTP/1.1 with multiple connections can actually outperform HTTP/2 because lost packets affect only one connection's streams, not all of them.

This is the fundamental motivation for HTTP/3, which uses QUIC (UDP-based) to enable per-stream packet loss recovery. More on this in the HTTP/3 section.

When Multiplexing Wins

Despite TCP limitations, HTTP/2 multiplexing provides substantial benefits for most users: reduced connection overhead, better server resource utilization, header compression sharing, and simplified infrastructure (no domain sharding). The TCP HOL blocking issue primarily affects high-latency, high-loss networks—a minority of traffic for most applications.

Practical Performance Benefits

Multiplexing delivers measurable performance improvements across multiple dimensions. Let's quantify the benefits:

Connection Overhead Reduction:

HTTP/1.1 requires multiple connections. Each connection incurs:

DNS lookup (if new domain): 50-150ms
TCP handshake: 1 RTT (~30-100ms)
TLS handshake: 1-2 RTTs (~60-200ms)
TCP slow start: Reduced initial throughput

With HTTP/2, a page loading 70 resources over 6 domains pays this cost once per domain instead of 6+ times. Typical savings: 200-500ms on initial page load.

Connection Cost Comparison (70 Resources, 100ms RTT)
Metric	HTTP/1.1 (6 connections/domain)	HTTP/2 (1 connection/domain)	Improvement
TCP Handshakes	~36 (6 domains × 6 conn)	~6 (1 per domain)	6x reduction
TLS Handshakes	~36	~6	6x reduction
Connection Overhead	~7.2 seconds cumulative	~1.2 seconds cumulative	~6 seconds saved
Server Memory per Client	36 TCP buffers	6 TCP buffers	6x reduction
Concurrent Request Limit	~36 (6 domains × 6)	600+ (100/domain × 6)	15x+ increase

Latency Improvements:

Multiplexing particularly shines for latency-sensitive metrics:

Time to First Byte (TTFB): Similar to HTTP/1.1 for first request; better for subsequent resources
First Contentful Paint (FCP): Critical CSS/JS delivered without waiting for parallel connections
Largest Contentful Paint (LCP): Hero images no longer compete with blocked connections
Time to Interactive (TTI): JavaScript bundles delivered without HOL blocking

Real-world studies show 10-40% improvements in these metrics depending on page structure and network conditions.

Multiplexing Performance Wins

•Eliminates Connection Blocking: No waiting for available connections; all requests start immediately
•Maximizes Bandwidth Utilization: Single connection achieves full throughput faster (no slow start multiplication)
•Reduces Server Load: Fewer TCP states to maintain per client; better scalability
•Enables Asset Unbundling: Many small files no longer penalized; granular caching possible
•Simplifies CDN Configuration: No domain sharding needed; reduced DNS complexity
•Improves Mobile Performance: Fewer connections mean less radio wakeup and better battery life

Unbundling Resources

With HTTP/2, many HTTP/1.1 optimizations become anti-patterns. Bundling multiple scripts into one giant file hurts caching—change one module, invalidate entire bundle. HTTP/2's efficient multiplexing means shipping many small, independently cacheable modules is often superior. Tree-shaking and ESM imports align perfectly with this approach.

Server Resource Management

Multiplexing changes how servers allocate and manage resources. Understanding these changes is essential for operating HTTP/2 infrastructure at scale.

Connection Pooling:

HTTP/1.1 servers maintain connection pools—each pool connection handles one request at a time. With 10,000 concurrent users sending 6 parallel requests each, the server manages 60,000 connections.

HTTP/2 servers serve the same users with ~10,000 connections (one per user), each carrying multiple streams. This dramatically reduces:

Socket descriptors: OS-level limits (ulimit) are less likely to be reached
Memory per connection: TCP buffers are per-connection, not per-request
Context switching: Fewer connections mean less scheduler overhead

Stream Concurrency Control:

The MAX_CONCURRENT_STREAMS setting lets servers control resource allocation:

SETTINGS_MAX_CONCURRENT_STREAMS = 100

This tells clients: "Limit active streams to 100 per connection." Servers set this based on:

Available memory per connection
Expected request processing time
Desired fairness across clients

If a client exceeds this limit, the server responds with REFUSED_STREAM or closes the connection. Clients must queue excess requests locally.

HTTP/2 Server Resource Considerations
Resource	HTTP/1.1 Scaling	HTTP/2 Scaling	Notes
Memory	Per connection (~50KB TCP + TLS)	Per connection + per stream (~10KB/stream)	More memory efficient overall
CPU	Parallel request processing	Parallel stream processing + frame multiplexing	Similar; H2 adds framing overhead
Connections	6+ per client	1 per client (typically)	Major reduction
Backend Latency	Blocks connection	Blocks stream only	Better utilization
Timeouts	Per connection	Per connection + per stream	More granular control

Load Balancing Implications

HTTP/2's long-lived, high-capacity connections change load balancing dynamics. A single connection might carry 100+ requests, so connection-based load balancing is coarser-grained. Some load balancers now offer request-level (L7) distribution for HTTP/2, but at the cost of terminating connections at the balancer.

Client-Side Considerations

Browsers implement HTTP/2 multiplexing transparently, but understanding their behavior helps optimize web applications.

Connection Coalescing:

Browsers attempt to reuse HTTP/2 connections across related origins. If both www.example.com and api.example.com resolve to the same IP and share a TLS certificate with matching SANs (Subject Alternative Names), the browser may use a single connection for both.

This connection coalescing reduces connection overhead further but has implications:

Shared flow control and priority space between origins
Single point of failure for multiple origins
Privacy considerations (origins share connection state)

Request Queuing:

Browsers maintain per-connection queues for streams awaiting transmission:

Application requests resource (e.g., <img src="...">)
Browser adds to request queue
When a stream slot is available (under MAX_CONCURRENT_STREAMS), request becomes active stream
HEADERS frame sent, stream transitions to 'open'
Response frames received and reassembled
Stream closed, slot freed for next queued request

With HTTP/1.1, browsers managed up to 6 parallel queues (one per connection). HTTP/2 uses one queue per connection with much higher concurrency. This simplifies browser internals and reduces scheduling contention.

Browser HTTP/2 Behaviors

•Single Connection per Origin: Browsers prefer one H2 connection per origin (coalescing when possible)
•No Preflight Differences: CORS preflights still happen; they're just multiplexed efficiently
•DevTools Visibility: Network tab shows parallel downloads; connection timing distinguishes H2
•Fallback to HTTP/1.1: If H2 fails or isn't supported, browsers fall back gracefully
•Priority Hints: Resource attributes like fetchpriority influence stream priority signaling

Preconnect Hints

With HTTP/2, <link rel='preconnect'> is more valuable than ever. A single preconnect establishes a connection that can immediately carry dozens of requests. Preconnect to third-party origins you know you'll need (analytics, fonts, CDNs) to eliminate connection latency from critical path.

When Multiplexing Isn't Enough

While multiplexing is transformative, it has limitations that practitioners must understand:

TCP Head-of-Line Blocking (Revisited):

As discussed, TCP's reliable ordering negates multiplexing benefits when packets are lost. On lossy networks:

All streams wait for retransmissions
Latency variation increases unpredictably
HTTP/1.1 with parallel connections may perform better

This is the primary motivation for QUIC/HTTP/3—per-stream reliability with no cross-stream blocking.

Single Point of Failure:

One connection carrying everything means connection failures are catastrophic. If the TCP connection drops:

All in-flight streams are lost
Client must re-establish connection (TLS handshake again)
All pending requests must be retransmitted

HTTP/1.1's multiple connections provided implicit redundancy—losing one connection affected only a fraction of requests.

Mitigations:

Connection health monitoring (PING frames)
Graceful shutdown with GOAWAY (migrate to new connection)
Client retry logic for aborted streams

When Multiplexing Excels

•Low-latency networks (wired, data centers)
•Pages with many small resources
•REST APIs with many parallel calls
•Streaming/chunked responses
•Server push opportunities

When Multiplexing Struggles

•High-loss networks (cellular, satellite)
•Very large file downloads (single stream dominates)
•CPU-constrained servers (framing overhead)
•Legacy proxies that demultiplex to HTTP/1.1
•Scenarios requiring connection redundancy

Evolution to HTTP/3

HTTP/3 (QUIC-based) addresses TCP limitations with UDP transport and per-stream packet loss handling. Multiplexing becomes truly independent—a lost packet delays only the affected stream, not all streams. HTTP/3 represents the full realization of the multiplexing vision, free from TCP's constraints.

Summary: Multiplexing Transforms Web Communication

Multiplexing fundamentally changes how browsers and servers exchange data, eliminating HTTP/1.1's most significant performance limitations.

Key Takeaways

•Streams share one connection — Multiple request-response exchanges interleave on a single TCP connection, eliminating connection overhead.
•HTTP-layer HOL blocking is solved — Slow responses don't block other streams; each completes independently.
•Stream IDs enable demultiplexing — Odd IDs for client requests, even for server push, zero for connection control.
•Frame interleaving requires scheduling — Priority systems guide which streams' frames are sent first.
•Connection pooling is simplified — One connection per origin instead of many; reduced server and client overhead.
•TCP HOL blocking remains — Lost packets still delay all streams at the TCP layer—HTTP/3's primary motivation.
•Enables modern asset strategies — Many small, cacheable files outperform monolithic bundles with HTTP/2.

What's Next:

With multiplexing understood, we now examine how HTTP/2 reduces the overhead of each request. The next page explores Header Compression with HPACK—the algorithm that dramatically reduces repetitive header data across requests, further amplifying multiplexing's efficiency.

Page Complete

You now understand HTTP/2 multiplexing—how streams interleave on a single connection, eliminating HTTP-layer head-of-line blocking and enabling dramatically higher concurrency. This understanding is essential for optimizing modern web applications and appreciating why HTTP/3 was necessary.

Multiplexing: Concurrent Streams on a Single Connection

The Multiplexing Breakthrough

This isn't incremental improvement. It's a paradigm shift in how browsers and servers communicate, enabling page load optimizations that were simply impossible before.

What You Will Learn

The HTTP/1.1 Connection Problem

To appreciate multiplexing, we must first understand the problem it solves. HTTP/1.1's connection model creates fundamental constraints that limit web performance.

The Request-Response Serialization Problem:

HTTP/1.1 connections are inherently serial. Each connection can process one request-response pair at a time:

Connection 1: [Request A] → [Response A] → [Request B] → [Response B] → ...

This is head-of-line (HOL) blocking at the HTTP layer.

HTTP/1.1 Connection Limitations
Problem	Description	Performance Impact
Serial Processing	One request-response pair per connection at a time	Connection utilization drops during request/response wait times
Head-of-Line Blocking	Pipelined responses must return in order	Fast responses blocked by slow ones; unpredictable latency
Connection Limits	Browsers limit 6-8 connections per domain	Limits parallelism; creates artificial bottlenecks
Connection Overhead	Each connection requires TCP handshake + TLS setup	3+ RTTs overhead per new connection; expensive for many small resources
Header Redundancy	Every request repeats full headers	Kilobytes of duplicate data per page load

The Workaround Era:

Web developers invented numerous workarounds to mitigate HTTP/1.1's limitations:

Domain Sharding: Split resources across multiple domains (cdn1.example.com, cdn2.example.com) to bypass per-domain connection limits. Adds DNS lookups and TLS handshakes.
Resource Bundling: Concatenate JavaScript and CSS files to reduce request count. Breaks caching granularity; change one line, re-download megabytes.
Image Sprites: Combine many images into one sprite sheet. Complex maintenance; downloads unused images.
Inlining: Embed CSS, JS, or images directly in HTML. Eliminates caching; bloats HTML.
Lazy Loading: Defer non-critical resources. Adds complexity; may delay user experience.

These workarounds added development complexity, hurt caching efficiency, and still couldn't fully solve the fundamental problem. HTTP/2 multiplexing makes most of them unnecessary.

The Real Cost

Multiplexing Fundamentals

The Core Concept:

Instead of:

Connection 1: [Req A]────────[Resp A]────────[Req B]────────[Resp B]
Connection 2: [Req C]────────[Resp C]────────[Req D]────────[Resp D]

HTTP/2 does:

Connection: [A][C][B][A][D][C][B][A][D][C][B][A]...
            │  │  │  │  │  │  │  │  │  │  │  └── Frames interleaved
            └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──── All streams on ONE connection

Each letter represents a frame. Frames from streams A, B, C, and D arrive interleaved, but the receiver reconstructs each logical stream independently.

Converting Mermaid diagram...

Why This Works:

Multiplexing is possible because of HTTP/2's binary framing:

Stream Identifiers: Every frame includes a 31-bit stream ID, enabling demultiplexing
Fixed Frame Boundaries: The receiver knows exactly where each frame ends
Independent Reassembly: Frames are collected per-stream and reassembled independently
No Ordering Requirement: Unlike HTTP/1.1 pipelining, responses can complete in any order

When Stream 3's response finishes before Stream 1, it's delivered immediately. The client processes each stream as it completes—no waiting for earlier requests.

Streams Are Not Connections

Stream Mechanics

Understanding stream mechanics is essential for grasping how multiplexing achieves its performance benefits.

Stream Identification:

Streams use odd or even identifiers based on who initiated them:

Initiator	Stream IDs	Purpose
Client	1, 3, 5, 7, ...	Regular requests
Server	2, 4, 6, 8, ...	Server push (covered later)
Connection	0	SETTINGS, PING, GOAWAY

This prevents ID conflicts—client and server can create streams independently without coordination.

Stream States:

Each stream progresses through states as frames are exchanged:

idle → open → half-closed (local) → closed
                    or
idle → open → half-closed (remote) → closed

State transitions are triggered by specific frame types and flags:

HEADERS with END_STREAM: Opens and immediately half-closes (request with no body)
DATA with END_STREAM: Signals the sender has no more data
RST_STREAM: Immediately terminates the stream (error or cancellation)

Stream Creation:

Stream State Transitions
State	Legal Actions (Client)	Legal Actions (Server)	Typical Scenario
idle	Send HEADERS	Send PUSH_PROMISE	Stream not yet used
open	Send DATA, HEADERS continuation	Send DATA, HEADERS	Active request-response exchange
half-closed (local)	Receive DATA, HEADERS	Send DATA, HEADERS	Request sent, awaiting response
half-closed (remote)	Send DATA, HEADERS	Receive DATA	Response body being sent
closed	None	None	Stream complete; ID retired

Concurrent Stream Management:

The MAX_CONCURRENT_STREAMS setting limits how many streams can be in the 'open' or 'half-closed' state simultaneously. This prevents resource exhaustion:

Client perspective: Limits how many requests can be in flight at once
Server perspective: Limits memory and CPU allocated to concurrent processing

Typical values: 100-1000 concurrent streams. Compare to HTTP/1.1's 6 connections—HTTP/2 provides 15-150x more parallelism on a single socket!

Stream IDs Never Reuse

Frame Interleaving and Scheduling

Multiplexing enables frame interleaving, but how frames are actually scheduled determines real-world performance. Both senders (client and server) must decide which stream's frames to transmit next.

The Scheduling Challenge:

Consider a server with three responses ready to send:

Stream 1: 100KB HTML document (critical for rendering)
Stream 3: 2MB image (can load progressively)
Stream 5: 50KB JavaScript (blocks rendering)

Naive round-robin scheduling sends frames in rotation:

[S1][S3][S5][S1][S3][S5][S1][S3][S5]...

But this delays critical resources. The JavaScript (Stream 5) and HTML (Stream 1) should arrive before the large image starts consuming bandwidth.

Priority-Based Scheduling:

HTTP/2 includes a prioritization mechanism (explored in detail later in this module) that guides frame scheduling:

Streams declare dependencies on other streams
Streams have relative weights affecting resource allocation
Senders should respect priorities when scheduling frames

With proper priorities:

[S1][S1][S5][S1][S5][S1][S5][S5][S5][S3][S3][S3]...
   HTML first ─┘  └── JS next ───┘  └── Image last

The HTML and JavaScript complete while the image is just starting. The browser can begin rendering much earlier.

Frame Scheduling Considerations

•Fairness vs. Priority: Pure priority starves low-priority streams; some fairness prevents starvation
•Frame Size Impact: Larger frames reduce overhead but increase blocking; 16KB default balances these
•Buffer Availability: Flow control windows limit how much data can be sent per stream
•TCP Considerations: TCP congestion control affects overall throughput regardless of HTTP/2 scheduling
•Server-Side Variability: Different servers implement different scheduling algorithms; behavior varies

Practical Implications:

In practice, browsers signal priorities through stream dependencies:

                   Root (0)
                     │
             ┌───────┴───────┐
         CSS (3)         JS (5)
         w=256           w=32
             │
         HTML (1)
         w=256

This dependency tree tells the server: "HTML depends on CSS (CSS must arrive first for proper styling), and CSS is more important than JS (when bandwidth-constrained, prefer CSS frames)."

However, priority is advisory—servers may ignore it, and many implementations use simplified scheduling. Understanding your CDN's behavior is important for optimization.

Modern Priority Signals

Head-of-Line Blocking: Solved... Sort Of

The Victory:

HTTP/1.1 Pipelining:
 Request:  [A][B][C] ────────────────→
Response:  [A........................][B][C]
           └── B and C blocked by A ──┘

HTTP/2 Multiplexing:
 Request:  [A][B][C] ────────────────→
Response:  [A chunk][B chunk][C][B][C][A chunk][A]
           └── All complete as ready, no blocking ──┘

For HTTP layer interactions, this is exactly what we wanted.

The TCP Layer Problem

TCP Head-of-Line Blocking Explained:

Consider this scenario:

Server sends frames: [S1 pkt1][S3 pkt1][S5 pkt1][S1 pkt2][S3 pkt2]
Packet 2 (S3 pkt1) is lost in transmission
Packets 3, 4, 5 arrive at the client
TCP cannot deliver packets 3, 4, 5 to HTTP/2 layer
Application waits until packet 2 is retransmitted and received
Only then are packets 2, 3, 4, 5 delivered together

Converting Mermaid diagram...

The Impact:

Studies have shown that on high-loss networks, HTTP/1.1 with multiple connections can actually outperform HTTP/2 because lost packets affect only one connection's streams, not all of them.

This is the fundamental motivation for HTTP/3, which uses QUIC (UDP-based) to enable per-stream packet loss recovery. More on this in the HTTP/3 section.

When Multiplexing Wins

Practical Performance Benefits

Multiplexing delivers measurable performance improvements across multiple dimensions. Let's quantify the benefits:

Connection Overhead Reduction:

HTTP/1.1 requires multiple connections. Each connection incurs:

DNS lookup (if new domain): 50-150ms
TCP handshake: 1 RTT (~30-100ms)
TLS handshake: 1-2 RTTs (~60-200ms)
TCP slow start: Reduced initial throughput

With HTTP/2, a page loading 70 resources over 6 domains pays this cost once per domain instead of 6+ times. Typical savings: 200-500ms on initial page load.

Connection Cost Comparison (70 Resources, 100ms RTT)
Metric	HTTP/1.1 (6 connections/domain)	HTTP/2 (1 connection/domain)	Improvement
TCP Handshakes	~36 (6 domains × 6 conn)	~6 (1 per domain)	6x reduction
TLS Handshakes	~36	~6	6x reduction
Connection Overhead	~7.2 seconds cumulative	~1.2 seconds cumulative	~6 seconds saved
Server Memory per Client	36 TCP buffers	6 TCP buffers	6x reduction
Concurrent Request Limit	~36 (6 domains × 6)	600+ (100/domain × 6)	15x+ increase

Latency Improvements:

Multiplexing particularly shines for latency-sensitive metrics:

Time to First Byte (TTFB): Similar to HTTP/1.1 for first request; better for subsequent resources
First Contentful Paint (FCP): Critical CSS/JS delivered without waiting for parallel connections
Largest Contentful Paint (LCP): Hero images no longer compete with blocked connections
Time to Interactive (TTI): JavaScript bundles delivered without HOL blocking

Real-world studies show 10-40% improvements in these metrics depending on page structure and network conditions.

Multiplexing Performance Wins

•Eliminates Connection Blocking: No waiting for available connections; all requests start immediately
•Maximizes Bandwidth Utilization: Single connection achieves full throughput faster (no slow start multiplication)
•Reduces Server Load: Fewer TCP states to maintain per client; better scalability
•Enables Asset Unbundling: Many small files no longer penalized; granular caching possible
•Simplifies CDN Configuration: No domain sharding needed; reduced DNS complexity
•Improves Mobile Performance: Fewer connections mean less radio wakeup and better battery life

Unbundling Resources

Server Resource Management

Multiplexing changes how servers allocate and manage resources. Understanding these changes is essential for operating HTTP/2 infrastructure at scale.

Connection Pooling:

HTTP/1.1 servers maintain connection pools—each pool connection handles one request at a time. With 10,000 concurrent users sending 6 parallel requests each, the server manages 60,000 connections.

HTTP/2 servers serve the same users with ~10,000 connections (one per user), each carrying multiple streams. This dramatically reduces:

Socket descriptors: OS-level limits (ulimit) are less likely to be reached
Memory per connection: TCP buffers are per-connection, not per-request
Context switching: Fewer connections mean less scheduler overhead

Stream Concurrency Control:

The MAX_CONCURRENT_STREAMS setting lets servers control resource allocation:

SETTINGS_MAX_CONCURRENT_STREAMS = 100

This tells clients: "Limit active streams to 100 per connection." Servers set this based on:

Available memory per connection
Expected request processing time
Desired fairness across clients

If a client exceeds this limit, the server responds with REFUSED_STREAM or closes the connection. Clients must queue excess requests locally.

HTTP/2 Server Resource Considerations
Resource	HTTP/1.1 Scaling	HTTP/2 Scaling	Notes
Memory	Per connection (~50KB TCP + TLS)	Per connection + per stream (~10KB/stream)	More memory efficient overall
CPU	Parallel request processing	Parallel stream processing + frame multiplexing	Similar; H2 adds framing overhead
Connections	6+ per client	1 per client (typically)	Major reduction
Backend Latency	Blocks connection	Blocks stream only	Better utilization
Timeouts	Per connection	Per connection + per stream	More granular control

Load Balancing Implications

Client-Side Considerations

Browsers implement HTTP/2 multiplexing transparently, but understanding their behavior helps optimize web applications.

Connection Coalescing:

This connection coalescing reduces connection overhead further but has implications:

Shared flow control and priority space between origins
Single point of failure for multiple origins
Privacy considerations (origins share connection state)

Request Queuing:

Browsers maintain per-connection queues for streams awaiting transmission:

Application requests resource (e.g., <img src="...">)
Browser adds to request queue
When a stream slot is available (under MAX_CONCURRENT_STREAMS), request becomes active stream
HEADERS frame sent, stream transitions to 'open'
Response frames received and reassembled
Stream closed, slot freed for next queued request

Browser HTTP/2 Behaviors

•Single Connection per Origin: Browsers prefer one H2 connection per origin (coalescing when possible)
•No Preflight Differences: CORS preflights still happen; they're just multiplexed efficiently
•DevTools Visibility: Network tab shows parallel downloads; connection timing distinguishes H2
•Fallback to HTTP/1.1: If H2 fails or isn't supported, browsers fall back gracefully
•Priority Hints: Resource attributes like fetchpriority influence stream priority signaling

Preconnect Hints

When Multiplexing Isn't Enough

While multiplexing is transformative, it has limitations that practitioners must understand:

TCP Head-of-Line Blocking (Revisited):

As discussed, TCP's reliable ordering negates multiplexing benefits when packets are lost. On lossy networks:

All streams wait for retransmissions
Latency variation increases unpredictably
HTTP/1.1 with parallel connections may perform better

This is the primary motivation for QUIC/HTTP/3—per-stream reliability with no cross-stream blocking.

Single Point of Failure:

One connection carrying everything means connection failures are catastrophic. If the TCP connection drops:

All in-flight streams are lost
Client must re-establish connection (TLS handshake again)
All pending requests must be retransmitted

HTTP/1.1's multiple connections provided implicit redundancy—losing one connection affected only a fraction of requests.

Mitigations:

Connection health monitoring (PING frames)
Graceful shutdown with GOAWAY (migrate to new connection)
Client retry logic for aborted streams

When Multiplexing Excels

•Low-latency networks (wired, data centers)
•Pages with many small resources
•REST APIs with many parallel calls
•Streaming/chunked responses
•Server push opportunities

When Multiplexing Struggles

•High-loss networks (cellular, satellite)
•Very large file downloads (single stream dominates)
•CPU-constrained servers (framing overhead)
•Legacy proxies that demultiplex to HTTP/1.1
•Scenarios requiring connection redundancy

Evolution to HTTP/3

Summary: Multiplexing Transforms Web Communication

Multiplexing fundamentally changes how browsers and servers exchange data, eliminating HTTP/1.1's most significant performance limitations.

Key Takeaways

•Streams share one connection — Multiple request-response exchanges interleave on a single TCP connection, eliminating connection overhead.
•HTTP-layer HOL blocking is solved — Slow responses don't block other streams; each completes independently.
•Stream IDs enable demultiplexing — Odd IDs for client requests, even for server push, zero for connection control.
•Frame interleaving requires scheduling — Priority systems guide which streams' frames are sent first.
•Connection pooling is simplified — One connection per origin instead of many; reduced server and client overhead.
•TCP HOL blocking remains — Lost packets still delay all streams at the TCP layer—HTTP/3's primary motivation.
•Enables modern asset strategies — Many small, cacheable files outperform monolithic bundles with HTTP/2.

What's Next:

Page Complete