Request-Response Pattern - Learning Module

Loading content...

0/273

Request Lifecycle

The Journey of a Request

When your code executes await fetch('https://api.example.com/users/123'), it appears instantaneous—a simple line that returns data. But beneath this simplicity lies a complex journey involving dozens of systems, protocols, and decisions.

A request must traverse DNS resolution, TCP connection establishment, TLS handshake, HTTP framing, network routing, server processing, and response transmission—each stage adding latency and potential failure modes.

Understanding the complete request lifecycle is essential for:

Performance optimization: Knowing which stages dominate latency
Debugging: Identifying where failures occur
Capacity planning: Understanding resource consumption at each stage
Architecture decisions: Making informed choices about caching, connection pooling, and timeout configuration

This page dissects every stage of the request lifecycle, providing the knowledge to reason about network behavior like an expert.

What You Will Learn

By the end of this page, you will understand every stage of the HTTP request lifecycle from DNS lookup to response processing. You'll comprehend the timing of each stage, know what resources are consumed, understand failure modes at each step, and be able to optimize each phase for performance and reliability.

Overview of the Request Lifecycle

A complete HTTP request lifecycle consists of eight distinct phases. Understanding each phase—and their relative contributions to total latency—is fundamental to system optimization.

The Eight Phases:

┌──────────────────────────────────────────────────────────────────────────────┐
│                        Request Lifecycle Timeline                             │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. DNS Resolution    [█████             ] 20-120ms (uncached)              │
│                       [█                 ] 0-1ms (cached)                   │
│                                                                              │
│  2. TCP Handshake     [███               ] 15-150ms (1 RTT)                 │
│                       [                  ] 0ms (connection reuse)           │
│                                                                              │
│  3. TLS Handshake     [██████            ] 30-150ms (1-2 RTT)               │
│                       [█                 ] 0ms (session resume)             │
│                                                                              │
│  4. Request Sending   [█                 ] 1-10ms (typical)                 │
│                                                                              │
│  5. Server Processing [███████████       ] Variable (1-5000ms)              │
│                                                                              │
│  6. Response Sending  [████              ] 5-500ms (depends on size)        │
│                                                                              │
│  7. Response Parsing  [█                 ] 1-20ms (depends on size/format)  │
│                                                                              │
│  8. Cleanup           [                  ] <1ms                             │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Latency Breakdown for a Typical API Call:

For a new HTTPS connection to an API server 50ms away:

Phase	First Request	Subsequent Request (same connection)
DNS Resolution	50ms	0ms (cached)
TCP Handshake	50ms	0ms (reused)
TLS Handshake	100ms	0ms (reused)
Request Send	5ms	5ms
Server Processing	50ms	50ms
Response Receive	10ms	10ms
Total	265ms	65ms

Connection reuse reduces latency by 75%. This is why persistent connections and connection pooling are so critical.

The First Request Tax

The first request to a new endpoint pays a 'tax' of DNS + TCP + TLS that subsequent requests avoid. For latency-sensitive applications, techniques like connection pre-warming (opening connections before they're needed) and DNS prefetching can eliminate this tax from the critical path.

Phase 1: DNS Resolution

Before any network communication can occur, the client must translate the hostname (e.g., api.example.com) into an IP address. This is DNS resolution.

The DNS Resolution Process:

┌────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│   Client   │────▶│ Local Resolver│────▶│  Root DNS   │────▶│  TLD DNS     │
│ (Browser/  │     │ (ISP/Local)  │     │  Servers    │     │  Servers     │
│  Service)  │     └──────────────┘     └─────────────┘     │  (.com, etc) │
└────────────┘                                              └──────────────┘
                                                                    │
                           ┌──────────────┐     ┌─────────────────┐│
                           │ IP Address   │◀────│ Authoritative   │◀┘
                           │ Returned     │     │ DNS Server      │
                           └──────────────┘     │ (example.com)   │
                                                └─────────────────┘

Step by Step:

Local Cache Check: Client checks its local DNS cache (OS-level, browser-level)
Resolver Query: If not cached, query goes to configured DNS resolver (ISP, corporate, or public like 8.8.8.8)
Resolver Cache: Resolver checks its cache (serves many clients, likely has it)
Recursive Resolution: If not cached, resolver performs recursive lookup:
- Query root servers → get TLD server address
- Query TLD servers (.com) → get authoritative server address
- Query authoritative server → get actual IP address
Response Caching: Result cached at multiple levels with TTL (Time-To-Live)

DNS Latency Factors:

Scenario	Typical Latency	Notes
Browser cache hit	0ms	Instant, no network
OS cache hit	<1ms	System call, no network
Local resolver cache hit	1-5ms	Single network hop
Resolver recursive lookup	20-150ms	Multiple network hops
Authoritative server far away	50-200ms	Geographic latency

DNS Resolution Optimizations:

1. DNS Prefetching: Hint to the browser/client that a hostname will be needed soon:

<link rel="dns-prefetch" href="//api.example.com">

For services, pre-resolve DNS during startup or idle periods.

2. Reduced TTLs for Flexibility (with Trade-offs):

Low TTL (60s): Quick failover, but more DNS queries
High TTL (3600s): Fewer queries, but slow failover
Modern recommendation: 300s (5 min) balances both

3. Multiple A Records (Round-Robin DNS): Return multiple IP addresses; client typically uses first, but has fallbacks.

4. Local DNS Caching: Run a local caching resolver (systemd-resolved, dnsmasq) to reduce latency.

5. DNS over HTTPS (DoH) / DNS over TLS (DoT): Encrypted DNS prevents eavesdropping but may add latency. Use with persistent connection pooling.

Failure Modes:

NXDOMAIN: Domain doesn't exist
SERVFAIL: DNS server error
Timeout: Resolver unreachable or overloaded
Stale cache: Cached IP points to decommissioned server
DNS poisoning: Malicious IP returned (rare, mitigated by DNSSEC)

dns-resolution-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
import * as dns from 'dns';
import { performance } from 'perf_hooks';
 
// DNS resolution with timing and caching insights
async function resolveDNSWithMetrics(hostname: string): Promise<DNSResolutionResult> {
    const startTime = performance.now();
    
    return new Promise((resolve, reject) => {
        // dns.resolve4 gets IPv4 addresses
        dns.resolve4(hostname, { ttl: true }, (err, addresses) => {
            const endTime = performance.now();
            const latencyMs = endTime - startTime;
            
            if (err) {
                reject({
                    hostname,
                    error: err.code,
                    latencyMs,
                    // Common errors:
                    // ENOTFOUND - Domain doesn't exist
                    // ETIMEDOUT - DNS server timeout
                    // ESERVFAIL - DNS server error
                });
                return;
            }
            
            resolve({
                hostname,
                addresses: addresses.map(a => ({
                    address: a.address,
                    ttl: a.ttl,  // How long to cache
                })),
                latencyMs,
                // Estimate if this was cached
                wasCached: latencyMs < 5,  // <5ms suggests cache hit
            });
        });
    });
}
 
// DNS prefetching for known endpoints
class DNSPrefetcher {
    private cache = new Map<string, { addresses: string[]; expiry: number }>();
    private prefetchQueue: string[] = [];
    
    async prefetch(hostnames: string[]): Promise<void> {
        console.log(`Prefetching DNS for ${hostnames.length} hostnames...`);
        
        const results = await Promise.allSettled(
            hostnames.map(h => this.resolveAndCache(h))
        );
        
        const successful = results.filter(r => r.status === 'fulfilled').length;
        console.log(`DNS prefetch complete: ${successful}/${hostnames.length} successful`);
    }
    
    private async resolveAndCache(hostname: string): Promise<void> {
        const existing = this.cache.get(hostname);
        if (existing && existing.expiry > Date.now()) {
            return; // Already cached and valid
        }
        
        const result = await resolveDNSWithMetrics(hostname);
        
        // Cache with TTL (minimum 60s to avoid hammering DNS)
        const minTTL = 60;
        const ttl = Math.max(
            minTTL,
            Math.min(...result.addresses.map(a => a.ttl))
        );
        
        this.cache.set(hostname, {
            addresses: result.addresses.map(a => a.address),
            expiry: Date.now() + (ttl * 1000),
        });
    }
    
    getAddress(hostname: string): string | undefined {
        const cached = this.cache.get(hostname);
        if (cached && cached.expiry > Date.now()) {
            // Round-robin through cached addresses
            return cached.addresses[0];
        }
        return undefined;
    }
}
 
// Usage in service startup
const prefetcher = new DNSPrefetcher();
 
async function initializeService() {
    // Prefetch DNS for all dependent services during startup
    await prefetcher.prefetch([
        'database.internal.example.com',
        'cache.internal.example.com',
        'auth.internal.example.com',
        'metrics.internal.example.com',
    ]);
    
    console.log('Service initialized with pre-resolved DNS');
}

Phase 2: TCP Connection Establishment

With an IP address obtained, the client must establish a TCP connection to the server. This is the famous three-way handshake.

The Three-Way Handshake:

Client                                          Server
   │                                              │
   │  ─────────── SYN (seq=x) ───────────────▶  │   t=0
   │                                              │
   │  ◀──────── SYN-ACK (seq=y, ack=x+1) ──────  │   t=RTT/2
   │                                              │
   │  ─────────── ACK (ack=y+1) ──────────────▶  │   t=RTT
   │                                              │   
   │            [Connection Established]          │
   │                                              │
   
   Total time: 1 RTT (one round-trip time)
   Before data can be sent: 1.5 RTT (if sending immediately after ACK)

Step by Step:

SYN (Synchronize): Client sends SYN packet with initial sequence number (x)
- Client moves to SYN_SENT state
- Allocates resources for pending connection
SYN-ACK: Server acknowledges SYN (ack=x+1) and sends its own SYN (seq=y)
- Server moves to SYN_RECEIVED state
- Allocates resources for pending connection
ACK: Client acknowledges server's SYN (ack=y+1)
- Both sides move to ESTABLISHED state
- Data transmission can begin

Why This Matters:

The three-way handshake ensures both parties are ready to communicate and establishes initial sequence numbers for reliable, ordered delivery. However, it costs 1 RTT before any data can flow.

For a server 50ms away (100ms RTT), this adds 100ms to every new connection.

TCP Connection Overhead by Distance
Server Location	Approx. RTT	Handshake Time	Impact
Same data center	0.5-2ms	1-2ms	Negligible
Same region	5-20ms	5-20ms	Minor
Across continent	30-70ms	30-70ms	Noticeable
Intercontinental	100-200ms	100-200ms	Significant
Opposite hemisphere	200-300ms	200-300ms	Severe

TCP Fast Open (TFO):

TCP Fast Open is an extension that allows data to be sent in the initial SYN packet (for repeat connections):

First connection (obtains TFO cookie):
  SYN → SYN-ACK w/ cookie → ACK + data

Subsequent connections:
  SYN + cookie + data → SYN-ACK + response
  
  Saves 1 RTT on connection establishment!

TFO requires:

Both client and server support
Cookie from previous connection
Idempotent initial request (data may be replayed)

Connection States and Resources:

Each TCP connection consumes server resources:

Resource	Typical Cost	Concern
File descriptor	1	Limited per process (ulimit)
Memory (buffers)	10-50KB	Scales with connection count
CPU (state management)	Minimal	Context switching at scale
Ephemeral port	1 (client-side)	65535 max per destination IP

Connection Limits:

Server side: Max connections often 10K-100K (configurable)
Client side: Ephemeral port exhaustion if many connections to same server
TIME_WAIT state: Closed connections occupy resources for 2×MSL (~60s)

TIME_WAIT Accumulation

After closing a TCP connection, it enters TIME_WAIT state for 60 seconds (typically). A high-volume client making many short connections can exhaust ephemeral ports or accumulate memory in TIME_WAIT sockets. This is a key reason to use connection pooling and persistent connections.

Phase 3: TLS Handshake

For HTTPS connections, after TCP is established, the TLS handshake must complete before any HTTP data can be exchanged. This is the most complex phase of connection establishment.

TLS 1.2 Handshake (Legacy):

Client                                           Server
   │                                               │
   │  ──────── ClientHello ─────────────────────▶ │  t=0
   │           (cipher suites, random)            │
   │                                               │
   │  ◀─────── ServerHello ───────────────────── │  t=RTT/2
   │  ◀─────── Certificate ───────────────────── │
   │  ◀─────── ServerKeyExchange ─────────────── │
   │  ◀─────── ServerHelloDone ────────────────  │
   │                                               │
   │  ──────── ClientKeyExchange ────────────────▶│  t=RTT
   │  ──────── ChangeCipherSpec ─────────────────▶│
   │  ──────── Finished ─────────────────────────▶│
   │                                               │
   │  ◀─────── ChangeCipherSpec ────────────────  │  t=1.5 RTT
   │  ◀─────── Finished ────────────────────────  │
   │                                               │
   │          [Encrypted Tunnel Established]       │  t=2 RTT
   
   Total time: 2 RTT additional (on top of TCP handshake)

TLS 1.3 Handshake (Modern):

TLS 1.3 reduces the handshake to just 1 RTT:

Client                                           Server
   │                                               │
   │  ──────── ClientHello ─────────────────────▶ │  t=0
   │           (cipher suites, key_share)         │
   │           [Key material included!]           │
   │                                               │
   │  ◀─────── ServerHello ───────────────────── │  t=RTT/2
   │  ◀─────── EncryptedExtensions ────────────  │
   │  ◀─────── Certificate ───────────────────── │
   │  ◀─────── CertificateVerify ─────────────── │
   │  ◀─────── Finished ────────────────────────  │
   │                                               │
   │  ──────── Finished ─────────────────────────▶│  t=RTT
   │                                               │
   │          [Encrypted Tunnel Established]       │
   
   Total time: 1 RTT additional (on top of TCP handshake)

TLS 1.3 with 0-RTT Resumption:

For resumed connections with session tickets:

Client                                           Server
   │                                               │
   │  ──── ClientHello + Early Data ────────────▶ │  t=0
   │        (session ticket + encrypted data)     │
   │                                               │
   │  ◀──── ServerHello + Response ─────────────  │  t=RTT/2
   │                                               │
   
   Total additional time: 0 RTT!
   Data sent immediately with first packet.

TLS Handshake Latency Comparison
Scenario	TLS 1.2	TLS 1.3	TLS 1.3 0-RTT
New connection (50ms RTT)	+100ms	+50ms	N/A (no session)
Resumed connection (50ms RTT)	+50ms (w/ tickets)	+50ms	+0ms
New connection (200ms RTT)	+400ms	+200ms	N/A
Resumed connection (200ms RTT)	+200ms	+200ms	+0ms

Certificate Validation:

During the TLS handshake, the client must validate the server's certificate:

Certificate chain validation: Build chain from server cert to trusted root
Signature verification: Verify each certificate's signature
Validity check: Cert not expired, not yet valid
Revocation check: OCSP or CRL check (can add latency)
Name matching: Cert covers the requested hostname

OCSP Stapling:

Without stapling, the client must contact the CA's OCSP server to check revocation—adding latency. OCSP stapling lets the server include a signed, time-stamped OCSP response, eliminating this round-trip:

# Nginx OCSP stapling configuration
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

Session Resumption:

To avoid full handshake overhead on reconnect:

Session IDs (TLS 1.2): Server stores session state, client presents ID
Session Tickets (TLS 1.2/1.3): Encrypted session state sent to client
Pre-Shared Keys (PSK) (TLS 1.3): Key derived from previous session

Session resumption can reduce handshake from 2 RTT to 1 RTT (TLS 1.2) or enable 0-RTT (TLS 1.3).

0-RTT Security Trade-offs

TLS 1.3 0-RTT early data is not protected against replay attacks. An attacker could capture and resend early data. Only use 0-RTT for idempotent requests (GET, HEAD), never for state-changing operations (POST, PUT, DELETE).

Phases 4-5: Request Transmission and Server Processing

With the connection established and encrypted, the HTTP request can finally be transmitted. These phases are where application logic takes over from protocol mechanics.

Phase 4: Request Transmission

The client sends the HTTP request over the established connection:

Serialization: Request object converted to wire format (HTTP/1.1 text or HTTP/2 binary frames)
Encryption: TLS layer encrypts the data
TCP segmentation: Large requests split into TCP segments (typically 1460 bytes each for Ethernet)
Network transmission: Segments traverse the network

Factors Affecting Request Transmission Time:

Factor	Impact	Optimization
Request body size	Linear with size	Compress, minimize
Available bandwidth	Direct	Often limited by last mile
Header size	Per-request overhead	Use HTTP/2 (HPACK)
Number of segments	More segments = more overhead	Large MTU if available
Network quality	Packet loss causes retransmits	QoS, redundant paths

For typical API requests (1-10KB), transmission takes 1-10ms.

Phase 5: Server Processing

This is where the actual work happens and is entirely application-dependent:

┌─────────────────────────────────────────────────────────────────────┐
│                     Server Processing Breakdown                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. Connection Accept & Request Parsing         [1-5ms]             │
│     - Accept connection from listener                               │
│     - Parse HTTP headers                                            │
│     - Route to handler                                              │
│                                                                     │
│  2. Authentication & Authorization              [5-50ms]            │
│     - JWT validation                                                │
│     - Permission checks                                             │
│     - Rate limit evaluation                                         │
│                                                                     │
│  3. Business Logic                              [Variable]          │
│     - Input validation                                              │
│     - Core processing                                               │
│     - External service calls (can add 10-500ms each!)              │
│                                                                     │
│  4. Data Access                                 [1-100ms]           │
│     - Database queries                                              │
│     - Cache lookups                                                 │
│     - File system access                                            │
│                                                                     │
│  5. Response Construction                       [1-20ms]            │
│     - Serialize response body                                       │
│     - Set headers                                                   │
│     - Compression (if enabled)                                      │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Server Processing Optimization:

1. Minimize External Calls: Each synchronous call to another service adds its full request lifecycle. Where possible:

Parallelize independent calls
Cache frequently accessed data
Use async patterns for non-critical operations

2. Optimize Data Access:

Use connection pooling for databases
Add appropriate indexes
Implement read replicas for read-heavy workloads
Use caching layers (Redis, Memcached)

3. Efficient Serialization:

Choose appropriate formats (JSON for APIs, binary for internal services)
Pre-compute serialized responses when possible
Use streaming for large responses

4. Response Compression:

gzip/brotli for text-based responses (JSON, HTML)
Compression ratio typically 70-90% for JSON
Trade-off: CPU for compression vs bandwidth savings

server-processing-timing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
// Middleware for detailed request lifecycle timing
import { performance } from 'perf_hooks';
 
interface RequestTiming {
    parseStart: number;
    parseEnd: number;
    authStart: number;
    authEnd: number;
    handlerStart: number;
    handlerEnd: number;
    dbQueries: Array<{ query: string; durationMs: number }>;
    externalCalls: Array<{ service: string; durationMs: number }>;
    serializeStart: number;
    serializeEnd: number;
    totalServerTime: number;
}
 
function timingMiddleware(req, res, next) {
    const timing: Partial<RequestTiming> = {
        dbQueries: [],
        externalCalls: [],
    };
    
    const startTime = performance.now();
    timing.parseStart = startTime;
    
    // Attach timing object to request
    req.timing = timing;
    
    // Instrument response completion
    const originalEnd = res.end;
    res.end = function(...args) {
        timing.totalServerTime = performance.now() - startTime;
        
        // Add Server-Timing header for client visibility
        res.setHeader('Server-Timing', buildServerTimingHeader(timing));
        
        // Log detailed timing
        logRequestTiming(req, timing);
        
        return originalEnd.apply(this, args);
    };
    
    timing.parseEnd = performance.now();
    next();
}
 
function buildServerTimingHeader(timing: Partial<RequestTiming>): string {
    const entries = [];
    
    entries.push(`parse;dur=${(timing.parseEnd - timing.parseStart).toFixed(1)}`);
    
    if (timing.authEnd && timing.authStart) {
        entries.push(`auth;dur=${(timing.authEnd - timing.authStart).toFixed(1)}`);
    }
    
    if (timing.handlerEnd && timing.handlerStart) {
        entries.push(`handler;dur=${(timing.handlerEnd - timing.handlerStart).toFixed(1)}`);
    }
    
    const totalDb = timing.dbQueries?.reduce((sum, q) => sum + q.durationMs, 0) ?? 0;
    if (totalDb > 0) {
        entries.push(`db;dur=${totalDb.toFixed(1)}`);
    }
    
    const totalExternal = timing.externalCalls?.reduce((sum, c) => sum + c.durationMs, 0) ?? 0;
    if (totalExternal > 0) {
        entries.push(`external;dur=${totalExternal.toFixed(1)}`);
    }
    
    entries.push(`total;dur=${timing.totalServerTime?.toFixed(1)}`);
    
    return entries.join(', ');
}
 
// Usage in handler
async function getUserHandler(req, res) {
    const timing = req.timing as RequestTiming;
    
    // Auth timing
    timing.authStart = performance.now();
    await authenticateRequest(req);
    timing.authEnd = performance.now();
    
    // Handler timing
    timing.handlerStart = performance.now();
    
    // Database query (instrumented)
    const dbStart = performance.now();
    const user = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
    timing.dbQueries.push({
        query: 'getUserById',
        durationMs: performance.now() - dbStart,
    });
    
    // External call (instrumented)
    if (user.needsProfileEnrichment) {
        const extStart = performance.now();
        const profile = await profileService.get(user.id);
        timing.externalCalls.push({
            service: 'profile-service',
            durationMs: performance.now() - extStart,
        });
    }
    
    timing.handlerEnd = performance.now();
    
    // Serialize response
    timing.serializeStart = performance.now();
    const responseBody = JSON.stringify(user);
    timing.serializeEnd = performance.now();
    
    res.json(user);
}

Phases 6-8: Response Transmission, Processing, and Cleanup

The final phases complete the request lifecycle, returning data to the client and releasing resources.

Phase 6: Response Transmission

The server sends the HTTP response back through the same connection:

Server → [Serialize Response] → [Encrypt (TLS)] → [TCP Segment] → [Network] → Client

Response Size Impact:

Response Size	Typical Transmission Time (100 Mbps)	Notes
1 KB	<1ms	Single packet
10 KB	~1ms	Few packets
100 KB	~8ms	Many packets, starts showing
1 MB	~80ms	Significant, consider streaming
10 MB	~800ms	Very significant, definitely stream

Streaming vs Buffered Responses:

For large responses, streaming allows the client to begin processing before the full response is transmitted:

Buffered: [--------- Server builds full response ---------] → [Transmit all]
          |<------------- Client waits --------------->|    |<- Process ->|

Streaming: [Build+Send chunk 1][Chunk 2][Chunk 3][Chunk N]
           |<- Process ->|    |<- ... ->|    |<- Final ->|
           
           First byte arrives much sooner!

Phase 7: Response Processing (Client-Side)

The client processes the received response:

TLS decryption: Decrypt the received data
HTTP parsing: Parse status, headers, body
Decompression: If Content-Encoding indicates compression
Deserialization: JSON.parse, protobuf decode, etc.
Application processing: Callbacks, promise resolution, further processing

Phase 8: Cleanup

Resources are released or prepared for reuse:

Connection decision: Keep alive for reuse or close
Buffer cleanup: Release temporary buffers
Metrics emission: Log timing, update counters
Error handling: If applicable, trigger retries or fallbacks

response-processing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
// Complete request lifecycle with client-side timing
 
interface RequestLifecycleMetrics {
    dnsStart: number;
    dnsEnd: number;
    tcpStart: number;
    tcpEnd: number;
    tlsStart: number;
    tlsEnd: number;
    requestStart: number;
    requestEnd: number;
    responseStart: number;  // First byte
    responseEnd: number;    // Last byte
    parseStart: number;
    parseEnd: number;
    
    // Derived metrics
    dnsTime: number;
    connectionTime: number;  // TCP + TLS
    ttfb: number;           // Time to First Byte
    downloadTime: number;
    totalTime: number;
}
 
// In browsers, use Performance API
function measureWithPerformanceAPI(url: string): PerformanceResourceTiming | null {
    const entries = performance.getEntriesByName(url);
    if (entries.length === 0) return null;
    
    const entry = entries[entries.length - 1] as PerformanceResourceTiming;
    
    console.log(`Request to ${url}:`);
    console.log(`  DNS: ${entry.domainLookupEnd - entry.domainLookupStart}ms`);
    console.log(`  TCP: ${entry.connectEnd - entry.connectStart}ms`);
    console.log(`  TLS: ${entry.secureConnectionStart ? entry.connectEnd - entry.secureConnectionStart : 0}ms`);
    console.log(`  TTFB: ${entry.responseStart - entry.requestStart}ms`);
    console.log(`  Download: ${entry.responseEnd - entry.responseStart}ms`);
    console.log(`  Total: ${entry.responseEnd - entry.startTime}ms`);
    
    return entry;
}
 
// For Node.js, manual instrumentation
async function measureRequestLifecycle<T>(
    url: string,
    options: RequestInit = {}
): Promise<{ data: T; metrics: RequestLifecycleMetrics }> {
    const metrics: Partial<RequestLifecycleMetrics> = {};
    
    // Note: DNS and TCP timing require lower-level access
    // This example focuses on what's measurable at HTTP level
    
    metrics.requestStart = performance.now();
    
    const response = await fetch(url, options);
    
    metrics.responseStart = performance.now();
    metrics.ttfb = metrics.responseStart - metrics.requestStart;
    
    // Read response body
    const text = await response.text();
    
    metrics.responseEnd = performance.now();
    metrics.downloadTime = metrics.responseEnd - metrics.responseStart;
    
    // Parse response
    metrics.parseStart = performance.now();
    const data = JSON.parse(text) as T;
    metrics.parseEnd = performance.now();
    
    metrics.totalTime = metrics.parseEnd - metrics.requestStart;
    
    // Server timing (if provided)
    const serverTiming = response.headers.get('server-timing');
    if (serverTiming) {
        console.log('Server timing:', serverTiming);
    }
    
    return {
        data,
        metrics: metrics as RequestLifecycleMetrics,
    };
}
 
// Streaming response processing
async function processStreamingResponse(url: string): Promise<void> {
    const response = await fetch(url);
    
    if (!response.body) {
        throw new Error('No response body');
    }
    
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    
    let bytesReceived = 0;
    const startTime = performance.now();
    
    while (true) {
        const { done, value } = await reader.read();
        
        if (done) break;
        
        bytesReceived += value.length;
        const elapsed = performance.now() - startTime;
        const throughput = (bytesReceived / 1024) / (elapsed / 1000); // KB/s
        
        console.log(`Received ${bytesReceived} bytes, throughput: ${throughput.toFixed(1)} KB/s`);
        
        // Process chunk immediately instead of waiting for full response
        const chunk = decoder.decode(value, { stream: true });
        await processChunk(chunk);
    }
    
    console.log(`Total: ${bytesReceived} bytes in ${performance.now() - startTime}ms`);
}

End-to-End Lifecycle Optimization

Understanding the request lifecycle enables systematic optimization. Here's a framework for reducing end-to-end latency:

Optimization Priority (by typical impact):

1. Connection Reuse          [Eliminates DNS + TCP + TLS overhead]
   ↓
2. Server Processing Time    [Often the largest component]
   ↓
3. Geographic Proximity      [CDN, edge deployment]
   ↓  
4. Response Size             [Compression, minimal payloads]
   ↓
5. Protocol Selection        [HTTP/2, HTTP/3 for specific cases]

Lifecycle Optimization Techniques

•Connection Pooling — Maintain persistent connections to eliminate handshake overhead. Connection per-host limits typically 5-100.
•DNS Caching — Cache DNS results locally with appropriate TTLs. Prefetch DNS for known endpoints during idle periods.
•TLS Session Resumption — Enable session tickets to reduce TLS handshake from 2 RTT to 1 RTT or 0 RTT.
•Server Processing — Profile and optimize hot paths. Parallelize independent operations. Cache aggressively.
•Response Compression — Enable gzip/brotli for text responses. 70-90% reduction typical for JSON.
•Response Streaming — For large responses, stream to reduce Time to First Byte perception.
•Edge Deployment — Deploy services closer to users via CDN edge compute or regional deployments.
•Protocol Upgrade — Use HTTP/2 for multiplexing benefits. HTTP/3 for mobile/lossy networks.

Measure Before Optimizing

Always measure which lifecycle phases dominate your latency before optimizing. If server processing is 90% of your latency, no amount of connection optimization will help significantly. Use distributed tracing (OpenTelemetry, Jaeger) and client-side metrics (Resource Timing API) to identify bottlenecks.

Summary: The Request Lifecycle

We've traced the complete journey of an HTTP request from initiation to completion. This deep understanding is essential for building and debugging high-performance distributed systems.

Key Takeaways

•The request lifecycle has 8 distinct phases: DNS, TCP, TLS, request transmission, server processing, response transmission, client processing, and cleanup.
•Connection reuse eliminates 75%+ of cold-start latency: DNS + TCP + TLS overhead is paid only on first request to new endpoints.
•DNS resolution is often overlooked but can add 50-200ms on cache misses. Use prefetching and local caching.
•TLS 1.3 reduces handshake latency from 2 RTT to 1 RTT, with 0-RTT possible for resumed connections.
•Server processing is usually the largest variable: Optimize data access, minimize external calls, use caching.
•Streaming responses reduce perceived latency for large payloads by delivering data incrementally.
•Use distributed tracing to identify which lifecycle phases dominate your latency before optimizing.

What's Next:

Now that we understand the complete request lifecycle, we'll examine connection management—how to efficiently manage pools of connections, handle connection failures, and tune connection parameters for optimal performance and resource utilization.

Page Complete

You now have a comprehensive understanding of every phase of the HTTP request lifecycle. This knowledge enables you to identify latency bottlenecks, make informed optimization decisions, and reason about network behavior at a professional level.

Request Lifecycle

The Journey of a Request

Understanding the complete request lifecycle is essential for:

Performance optimization: Knowing which stages dominate latency
Debugging: Identifying where failures occur
Capacity planning: Understanding resource consumption at each stage
Architecture decisions: Making informed choices about caching, connection pooling, and timeout configuration

This page dissects every stage of the request lifecycle, providing the knowledge to reason about network behavior like an expert.

What You Will Learn

Overview of the Request Lifecycle

A complete HTTP request lifecycle consists of eight distinct phases. Understanding each phase—and their relative contributions to total latency—is fundamental to system optimization.

The Eight Phases:

┌──────────────────────────────────────────────────────────────────────────────┐
│                        Request Lifecycle Timeline                             │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. DNS Resolution    [█████             ] 20-120ms (uncached)              │
│                       [█                 ] 0-1ms (cached)                   │
│                                                                              │
│  2. TCP Handshake     [███               ] 15-150ms (1 RTT)                 │
│                       [                  ] 0ms (connection reuse)           │
│                                                                              │
│  3. TLS Handshake     [██████            ] 30-150ms (1-2 RTT)               │
│                       [█                 ] 0ms (session resume)             │
│                                                                              │
│  4. Request Sending   [█                 ] 1-10ms (typical)                 │
│                                                                              │
│  5. Server Processing [███████████       ] Variable (1-5000ms)              │
│                                                                              │
│  6. Response Sending  [████              ] 5-500ms (depends on size)        │
│                                                                              │
│  7. Response Parsing  [█                 ] 1-20ms (depends on size/format)  │
│                                                                              │
│  8. Cleanup           [                  ] <1ms                             │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Latency Breakdown for a Typical API Call:

For a new HTTPS connection to an API server 50ms away:

Phase	First Request	Subsequent Request (same connection)
DNS Resolution	50ms	0ms (cached)
TCP Handshake	50ms	0ms (reused)
TLS Handshake	100ms	0ms (reused)
Request Send	5ms	5ms
Server Processing	50ms	50ms
Response Receive	10ms	10ms
Total	265ms	65ms

Connection reuse reduces latency by 75%. This is why persistent connections and connection pooling are so critical.

The First Request Tax

Phase 1: DNS Resolution

Before any network communication can occur, the client must translate the hostname (e.g., api.example.com) into an IP address. This is DNS resolution.

The DNS Resolution Process:

┌────────────┐     ┌──────────────┐     ┌─────────────┐     ┌──────────────┐
│   Client   │────▶│ Local Resolver│────▶│  Root DNS   │────▶│  TLD DNS     │
│ (Browser/  │     │ (ISP/Local)  │     │  Servers    │     │  Servers     │
│  Service)  │     └──────────────┘     └─────────────┘     │  (.com, etc) │
└────────────┘                                              └──────────────┘
                                                                    │
                           ┌──────────────┐     ┌─────────────────┐│
                           │ IP Address   │◀────│ Authoritative   │◀┘
                           │ Returned     │     │ DNS Server      │
                           └──────────────┘     │ (example.com)   │
                                                └─────────────────┘

Step by Step:

Local Cache Check: Client checks its local DNS cache (OS-level, browser-level)
Resolver Query: If not cached, query goes to configured DNS resolver (ISP, corporate, or public like 8.8.8.8)
Resolver Cache: Resolver checks its cache (serves many clients, likely has it)
Recursive Resolution: If not cached, resolver performs recursive lookup:
- Query root servers → get TLD server address
- Query TLD servers (.com) → get authoritative server address
- Query authoritative server → get actual IP address
Response Caching: Result cached at multiple levels with TTL (Time-To-Live)

DNS Latency Factors:

Scenario	Typical Latency	Notes
Browser cache hit	0ms	Instant, no network
OS cache hit	<1ms	System call, no network
Local resolver cache hit	1-5ms	Single network hop
Resolver recursive lookup	20-150ms	Multiple network hops
Authoritative server far away	50-200ms	Geographic latency

DNS Resolution Optimizations:

1. DNS Prefetching: Hint to the browser/client that a hostname will be needed soon:

<link rel="dns-prefetch" href="//api.example.com">

For services, pre-resolve DNS during startup or idle periods.

2. Reduced TTLs for Flexibility (with Trade-offs):

Low TTL (60s): Quick failover, but more DNS queries
High TTL (3600s): Fewer queries, but slow failover
Modern recommendation: 300s (5 min) balances both

3. Multiple A Records (Round-Robin DNS): Return multiple IP addresses; client typically uses first, but has fallbacks.

4. Local DNS Caching: Run a local caching resolver (systemd-resolved, dnsmasq) to reduce latency.

5. DNS over HTTPS (DoH) / DNS over TLS (DoT): Encrypted DNS prevents eavesdropping but may add latency. Use with persistent connection pooling.

Failure Modes:

NXDOMAIN: Domain doesn't exist
SERVFAIL: DNS server error
Timeout: Resolver unreachable or overloaded
Stale cache: Cached IP points to decommissioned server
DNS poisoning: Malicious IP returned (rare, mitigated by DNSSEC)

dns-resolution-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
import * as dns from 'dns';
import { performance } from 'perf_hooks';
 
// DNS resolution with timing and caching insights
async function resolveDNSWithMetrics(hostname: string): Promise<DNSResolutionResult> {
    const startTime = performance.now();
    
    return new Promise((resolve, reject) => {
        // dns.resolve4 gets IPv4 addresses
        dns.resolve4(hostname, { ttl: true }, (err, addresses) => {
            const endTime = performance.now();
            const latencyMs = endTime - startTime;
            
            if (err) {
                reject({
                    hostname,
                    error: err.code,
                    latencyMs,
                    // Common errors:
                    // ENOTFOUND - Domain doesn't exist
                    // ETIMEDOUT - DNS server timeout
                    // ESERVFAIL - DNS server error
                });
                return;
            }
            
            resolve({
                hostname,
                addresses: addresses.map(a => ({
                    address: a.address,
                    ttl: a.ttl,  // How long to cache
                })),
                latencyMs,
                // Estimate if this was cached
                wasCached: latencyMs < 5,  // <5ms suggests cache hit
            });
        });
    });
}
 
// DNS prefetching for known endpoints
class DNSPrefetcher {
    private cache = new Map<string, { addresses: string[]; expiry: number }>();
    private prefetchQueue: string[] = [];
    
    async prefetch(hostnames: string[]): Promise<void> {
        console.log(`Prefetching DNS for ${hostnames.length} hostnames...`);
        
        const results = await Promise.allSettled(
            hostnames.map(h => this.resolveAndCache(h))
        );
        
        const successful = results.filter(r => r.status === 'fulfilled').length;
        console.log(`DNS prefetch complete: ${successful}/${hostnames.length} successful`);
    }
    
    private async resolveAndCache(hostname: string): Promise<void> {
        const existing = this.cache.get(hostname);
        if (existing && existing.expiry > Date.now()) {
            return; // Already cached and valid
        }
        
        const result = await resolveDNSWithMetrics(hostname);
        
        // Cache with TTL (minimum 60s to avoid hammering DNS)
        const minTTL = 60;
        const ttl = Math.max(
            minTTL,
            Math.min(...result.addresses.map(a => a.ttl))
        );
        
        this.cache.set(hostname, {
            addresses: result.addresses.map(a => a.address),
            expiry: Date.now() + (ttl * 1000),
        });
    }
    
    getAddress(hostname: string): string | undefined {
        const cached = this.cache.get(hostname);
        if (cached && cached.expiry > Date.now()) {
            // Round-robin through cached addresses
            return cached.addresses[0];
        }
        return undefined;
    }
}
 
// Usage in service startup
const prefetcher = new DNSPrefetcher();
 
async function initializeService() {
    // Prefetch DNS for all dependent services during startup
    await prefetcher.prefetch([
        'database.internal.example.com',
        'cache.internal.example.com',
        'auth.internal.example.com',
        'metrics.internal.example.com',
    ]);
    
    console.log('Service initialized with pre-resolved DNS');
}

Phase 2: TCP Connection Establishment

With an IP address obtained, the client must establish a TCP connection to the server. This is the famous three-way handshake.

The Three-Way Handshake:

Client                                          Server
   │                                              │
   │  ─────────── SYN (seq=x) ───────────────▶  │   t=0
   │                                              │
   │  ◀──────── SYN-ACK (seq=y, ack=x+1) ──────  │   t=RTT/2
   │                                              │
   │  ─────────── ACK (ack=y+1) ──────────────▶  │   t=RTT
   │                                              │   
   │            [Connection Established]          │
   │                                              │
   
   Total time: 1 RTT (one round-trip time)
   Before data can be sent: 1.5 RTT (if sending immediately after ACK)

Step by Step:

SYN (Synchronize): Client sends SYN packet with initial sequence number (x)
- Client moves to SYN_SENT state
- Allocates resources for pending connection
SYN-ACK: Server acknowledges SYN (ack=x+1) and sends its own SYN (seq=y)
- Server moves to SYN_RECEIVED state
- Allocates resources for pending connection
ACK: Client acknowledges server's SYN (ack=y+1)
- Both sides move to ESTABLISHED state
- Data transmission can begin

Why This Matters:

The three-way handshake ensures both parties are ready to communicate and establishes initial sequence numbers for reliable, ordered delivery. However, it costs 1 RTT before any data can flow.

For a server 50ms away (100ms RTT), this adds 100ms to every new connection.

TCP Connection Overhead by Distance
Server Location	Approx. RTT	Handshake Time	Impact
Same data center	0.5-2ms	1-2ms	Negligible
Same region	5-20ms	5-20ms	Minor
Across continent	30-70ms	30-70ms	Noticeable
Intercontinental	100-200ms	100-200ms	Significant
Opposite hemisphere	200-300ms	200-300ms	Severe

TCP Fast Open (TFO):

TCP Fast Open is an extension that allows data to be sent in the initial SYN packet (for repeat connections):

First connection (obtains TFO cookie):
  SYN → SYN-ACK w/ cookie → ACK + data

Subsequent connections:
  SYN + cookie + data → SYN-ACK + response
  
  Saves 1 RTT on connection establishment!

TFO requires:

Both client and server support
Cookie from previous connection
Idempotent initial request (data may be replayed)

Connection States and Resources:

Each TCP connection consumes server resources:

Resource	Typical Cost	Concern
File descriptor	1	Limited per process (ulimit)
Memory (buffers)	10-50KB	Scales with connection count
CPU (state management)	Minimal	Context switching at scale
Ephemeral port	1 (client-side)	65535 max per destination IP

Connection Limits:

Server side: Max connections often 10K-100K (configurable)
Client side: Ephemeral port exhaustion if many connections to same server
TIME_WAIT state: Closed connections occupy resources for 2×MSL (~60s)

TIME_WAIT Accumulation

Phase 3: TLS Handshake

For HTTPS connections, after TCP is established, the TLS handshake must complete before any HTTP data can be exchanged. This is the most complex phase of connection establishment.

TLS 1.2 Handshake (Legacy):

Client                                           Server
   │                                               │
   │  ──────── ClientHello ─────────────────────▶ │  t=0
   │           (cipher suites, random)            │
   │                                               │
   │  ◀─────── ServerHello ───────────────────── │  t=RTT/2
   │  ◀─────── Certificate ───────────────────── │
   │  ◀─────── ServerKeyExchange ─────────────── │
   │  ◀─────── ServerHelloDone ────────────────  │
   │                                               │
   │  ──────── ClientKeyExchange ────────────────▶│  t=RTT
   │  ──────── ChangeCipherSpec ─────────────────▶│
   │  ──────── Finished ─────────────────────────▶│
   │                                               │
   │  ◀─────── ChangeCipherSpec ────────────────  │  t=1.5 RTT
   │  ◀─────── Finished ────────────────────────  │
   │                                               │
   │          [Encrypted Tunnel Established]       │  t=2 RTT
   
   Total time: 2 RTT additional (on top of TCP handshake)

TLS 1.3 Handshake (Modern):

TLS 1.3 reduces the handshake to just 1 RTT:

Client                                           Server
   │                                               │
   │  ──────── ClientHello ─────────────────────▶ │  t=0
   │           (cipher suites, key_share)         │
   │           [Key material included!]           │
   │                                               │
   │  ◀─────── ServerHello ───────────────────── │  t=RTT/2
   │  ◀─────── EncryptedExtensions ────────────  │
   │  ◀─────── Certificate ───────────────────── │
   │  ◀─────── CertificateVerify ─────────────── │
   │  ◀─────── Finished ────────────────────────  │
   │                                               │
   │  ──────── Finished ─────────────────────────▶│  t=RTT
   │                                               │
   │          [Encrypted Tunnel Established]       │
   
   Total time: 1 RTT additional (on top of TCP handshake)

TLS 1.3 with 0-RTT Resumption:

For resumed connections with session tickets:

Client                                           Server
   │                                               │
   │  ──── ClientHello + Early Data ────────────▶ │  t=0
   │        (session ticket + encrypted data)     │
   │                                               │
   │  ◀──── ServerHello + Response ─────────────  │  t=RTT/2
   │                                               │
   
   Total additional time: 0 RTT!
   Data sent immediately with first packet.

TLS Handshake Latency Comparison
Scenario	TLS 1.2	TLS 1.3	TLS 1.3 0-RTT
New connection (50ms RTT)	+100ms	+50ms	N/A (no session)
Resumed connection (50ms RTT)	+50ms (w/ tickets)	+50ms	+0ms
New connection (200ms RTT)	+400ms	+200ms	N/A
Resumed connection (200ms RTT)	+200ms	+200ms	+0ms

Certificate Validation:

During the TLS handshake, the client must validate the server's certificate:

Certificate chain validation: Build chain from server cert to trusted root
Signature verification: Verify each certificate's signature
Validity check: Cert not expired, not yet valid
Revocation check: OCSP or CRL check (can add latency)
Name matching: Cert covers the requested hostname

OCSP Stapling:

# Nginx OCSP stapling configuration
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;

Session Resumption:

To avoid full handshake overhead on reconnect:

Session IDs (TLS 1.2): Server stores session state, client presents ID
Session Tickets (TLS 1.2/1.3): Encrypted session state sent to client
Pre-Shared Keys (PSK) (TLS 1.3): Key derived from previous session

Session resumption can reduce handshake from 2 RTT to 1 RTT (TLS 1.2) or enable 0-RTT (TLS 1.3).

0-RTT Security Trade-offs

Phases 4-5: Request Transmission and Server Processing

With the connection established and encrypted, the HTTP request can finally be transmitted. These phases are where application logic takes over from protocol mechanics.

Phase 4: Request Transmission

The client sends the HTTP request over the established connection:

Serialization: Request object converted to wire format (HTTP/1.1 text or HTTP/2 binary frames)
Encryption: TLS layer encrypts the data
TCP segmentation: Large requests split into TCP segments (typically 1460 bytes each for Ethernet)
Network transmission: Segments traverse the network

Factors Affecting Request Transmission Time:

Factor	Impact	Optimization
Request body size	Linear with size	Compress, minimize
Available bandwidth	Direct	Often limited by last mile
Header size	Per-request overhead	Use HTTP/2 (HPACK)
Number of segments	More segments = more overhead	Large MTU if available
Network quality	Packet loss causes retransmits	QoS, redundant paths

For typical API requests (1-10KB), transmission takes 1-10ms.

Phase 5: Server Processing

This is where the actual work happens and is entirely application-dependent:

┌─────────────────────────────────────────────────────────────────────┐
│                     Server Processing Breakdown                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. Connection Accept & Request Parsing         [1-5ms]             │
│     - Accept connection from listener                               │
│     - Parse HTTP headers                                            │
│     - Route to handler                                              │
│                                                                     │
│  2. Authentication & Authorization              [5-50ms]            │
│     - JWT validation                                                │
│     - Permission checks                                             │
│     - Rate limit evaluation                                         │
│                                                                     │
│  3. Business Logic                              [Variable]          │
│     - Input validation                                              │
│     - Core processing                                               │
│     - External service calls (can add 10-500ms each!)              │
│                                                                     │
│  4. Data Access                                 [1-100ms]           │
│     - Database queries                                              │
│     - Cache lookups                                                 │
│     - File system access                                            │
│                                                                     │
│  5. Response Construction                       [1-20ms]            │
│     - Serialize response body                                       │
│     - Set headers                                                   │
│     - Compression (if enabled)                                      │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Server Processing Optimization:

1. Minimize External Calls: Each synchronous call to another service adds its full request lifecycle. Where possible:

Parallelize independent calls
Cache frequently accessed data
Use async patterns for non-critical operations

2. Optimize Data Access:

Use connection pooling for databases
Add appropriate indexes
Implement read replicas for read-heavy workloads
Use caching layers (Redis, Memcached)

3. Efficient Serialization:

Choose appropriate formats (JSON for APIs, binary for internal services)
Pre-compute serialized responses when possible
Use streaming for large responses

4. Response Compression:

gzip/brotli for text-based responses (JSON, HTML)
Compression ratio typically 70-90% for JSON
Trade-off: CPU for compression vs bandwidth savings

server-processing-timing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
// Middleware for detailed request lifecycle timing
import { performance } from 'perf_hooks';
 
interface RequestTiming {
    parseStart: number;
    parseEnd: number;
    authStart: number;
    authEnd: number;
    handlerStart: number;
    handlerEnd: number;
    dbQueries: Array<{ query: string; durationMs: number }>;
    externalCalls: Array<{ service: string; durationMs: number }>;
    serializeStart: number;
    serializeEnd: number;
    totalServerTime: number;
}
 
function timingMiddleware(req, res, next) {
    const timing: Partial<RequestTiming> = {
        dbQueries: [],
        externalCalls: [],
    };
    
    const startTime = performance.now();
    timing.parseStart = startTime;
    
    // Attach timing object to request
    req.timing = timing;
    
    // Instrument response completion
    const originalEnd = res.end;
    res.end = function(...args) {
        timing.totalServerTime = performance.now() - startTime;
        
        // Add Server-Timing header for client visibility
        res.setHeader('Server-Timing', buildServerTimingHeader(timing));
        
        // Log detailed timing
        logRequestTiming(req, timing);
        
        return originalEnd.apply(this, args);
    };
    
    timing.parseEnd = performance.now();
    next();
}
 
function buildServerTimingHeader(timing: Partial<RequestTiming>): string {
    const entries = [];
    
    entries.push(`parse;dur=${(timing.parseEnd - timing.parseStart).toFixed(1)}`);
    
    if (timing.authEnd && timing.authStart) {
        entries.push(`auth;dur=${(timing.authEnd - timing.authStart).toFixed(1)}`);
    }
    
    if (timing.handlerEnd && timing.handlerStart) {
        entries.push(`handler;dur=${(timing.handlerEnd - timing.handlerStart).toFixed(1)}`);
    }
    
    const totalDb = timing.dbQueries?.reduce((sum, q) => sum + q.durationMs, 0) ?? 0;
    if (totalDb > 0) {
        entries.push(`db;dur=${totalDb.toFixed(1)}`);
    }
    
    const totalExternal = timing.externalCalls?.reduce((sum, c) => sum + c.durationMs, 0) ?? 0;
    if (totalExternal > 0) {
        entries.push(`external;dur=${totalExternal.toFixed(1)}`);
    }
    
    entries.push(`total;dur=${timing.totalServerTime?.toFixed(1)}`);
    
    return entries.join(', ');
}
 
// Usage in handler
async function getUserHandler(req, res) {
    const timing = req.timing as RequestTiming;
    
    // Auth timing
    timing.authStart = performance.now();
    await authenticateRequest(req);
    timing.authEnd = performance.now();
    
    // Handler timing
    timing.handlerStart = performance.now();
    
    // Database query (instrumented)
    const dbStart = performance.now();
    const user = await db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);
    timing.dbQueries.push({
        query: 'getUserById',
        durationMs: performance.now() - dbStart,
    });
    
    // External call (instrumented)
    if (user.needsProfileEnrichment) {
        const extStart = performance.now();
        const profile = await profileService.get(user.id);
        timing.externalCalls.push({
            service: 'profile-service',
            durationMs: performance.now() - extStart,
        });
    }
    
    timing.handlerEnd = performance.now();
    
    // Serialize response
    timing.serializeStart = performance.now();
    const responseBody = JSON.stringify(user);
    timing.serializeEnd = performance.now();
    
    res.json(user);
}

Phases 6-8: Response Transmission, Processing, and Cleanup

The final phases complete the request lifecycle, returning data to the client and releasing resources.

Phase 6: Response Transmission

The server sends the HTTP response back through the same connection:

Server → [Serialize Response] → [Encrypt (TLS)] → [TCP Segment] → [Network] → Client

Response Size Impact:

Response Size	Typical Transmission Time (100 Mbps)	Notes
1 KB	<1ms	Single packet
10 KB	~1ms	Few packets
100 KB	~8ms	Many packets, starts showing
1 MB	~80ms	Significant, consider streaming
10 MB	~800ms	Very significant, definitely stream

Streaming vs Buffered Responses:

For large responses, streaming allows the client to begin processing before the full response is transmitted:

Buffered: [--------- Server builds full response ---------] → [Transmit all]
          |<------------- Client waits --------------->|    |<- Process ->|

Streaming: [Build+Send chunk 1][Chunk 2][Chunk 3][Chunk N]
           |<- Process ->|    |<- ... ->|    |<- Final ->|
           
           First byte arrives much sooner!

Phase 7: Response Processing (Client-Side)

The client processes the received response:

TLS decryption: Decrypt the received data
HTTP parsing: Parse status, headers, body
Decompression: If Content-Encoding indicates compression
Deserialization: JSON.parse, protobuf decode, etc.
Application processing: Callbacks, promise resolution, further processing

Phase 8: Cleanup

Resources are released or prepared for reuse:

Connection decision: Keep alive for reuse or close
Buffer cleanup: Release temporary buffers
Metrics emission: Log timing, update counters
Error handling: If applicable, trigger retries or fallbacks

response-processing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
// Complete request lifecycle with client-side timing
 
interface RequestLifecycleMetrics {
    dnsStart: number;
    dnsEnd: number;
    tcpStart: number;
    tcpEnd: number;
    tlsStart: number;
    tlsEnd: number;
    requestStart: number;
    requestEnd: number;
    responseStart: number;  // First byte
    responseEnd: number;    // Last byte
    parseStart: number;
    parseEnd: number;
    
    // Derived metrics
    dnsTime: number;
    connectionTime: number;  // TCP + TLS
    ttfb: number;           // Time to First Byte
    downloadTime: number;
    totalTime: number;
}
 
// In browsers, use Performance API
function measureWithPerformanceAPI(url: string): PerformanceResourceTiming | null {
    const entries = performance.getEntriesByName(url);
    if (entries.length === 0) return null;
    
    const entry = entries[entries.length - 1] as PerformanceResourceTiming;
    
    console.log(`Request to ${url}:`);
    console.log(`  DNS: ${entry.domainLookupEnd - entry.domainLookupStart}ms`);
    console.log(`  TCP: ${entry.connectEnd - entry.connectStart}ms`);
    console.log(`  TLS: ${entry.secureConnectionStart ? entry.connectEnd - entry.secureConnectionStart : 0}ms`);
    console.log(`  TTFB: ${entry.responseStart - entry.requestStart}ms`);
    console.log(`  Download: ${entry.responseEnd - entry.responseStart}ms`);
    console.log(`  Total: ${entry.responseEnd - entry.startTime}ms`);
    
    return entry;
}
 
// For Node.js, manual instrumentation
async function measureRequestLifecycle<T>(
    url: string,
    options: RequestInit = {}
): Promise<{ data: T; metrics: RequestLifecycleMetrics }> {
    const metrics: Partial<RequestLifecycleMetrics> = {};
    
    // Note: DNS and TCP timing require lower-level access
    // This example focuses on what's measurable at HTTP level
    
    metrics.requestStart = performance.now();
    
    const response = await fetch(url, options);
    
    metrics.responseStart = performance.now();
    metrics.ttfb = metrics.responseStart - metrics.requestStart;
    
    // Read response body
    const text = await response.text();
    
    metrics.responseEnd = performance.now();
    metrics.downloadTime = metrics.responseEnd - metrics.responseStart;
    
    // Parse response
    metrics.parseStart = performance.now();
    const data = JSON.parse(text) as T;
    metrics.parseEnd = performance.now();
    
    metrics.totalTime = metrics.parseEnd - metrics.requestStart;
    
    // Server timing (if provided)
    const serverTiming = response.headers.get('server-timing');
    if (serverTiming) {
        console.log('Server timing:', serverTiming);
    }
    
    return {
        data,
        metrics: metrics as RequestLifecycleMetrics,
    };
}
 
// Streaming response processing
async function processStreamingResponse(url: string): Promise<void> {
    const response = await fetch(url);
    
    if (!response.body) {
        throw new Error('No response body');
    }
    
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    
    let bytesReceived = 0;
    const startTime = performance.now();
    
    while (true) {
        const { done, value } = await reader.read();
        
        if (done) break;
        
        bytesReceived += value.length;
        const elapsed = performance.now() - startTime;
        const throughput = (bytesReceived / 1024) / (elapsed / 1000); // KB/s
        
        console.log(`Received ${bytesReceived} bytes, throughput: ${throughput.toFixed(1)} KB/s`);
        
        // Process chunk immediately instead of waiting for full response
        const chunk = decoder.decode(value, { stream: true });
        await processChunk(chunk);
    }
    
    console.log(`Total: ${bytesReceived} bytes in ${performance.now() - startTime}ms`);
}

End-to-End Lifecycle Optimization

Understanding the request lifecycle enables systematic optimization. Here's a framework for reducing end-to-end latency:

Optimization Priority (by typical impact):

1. Connection Reuse          [Eliminates DNS + TCP + TLS overhead]
   ↓
2. Server Processing Time    [Often the largest component]
   ↓
3. Geographic Proximity      [CDN, edge deployment]
   ↓  
4. Response Size             [Compression, minimal payloads]
   ↓
5. Protocol Selection        [HTTP/2, HTTP/3 for specific cases]

Lifecycle Optimization Techniques

•Connection Pooling — Maintain persistent connections to eliminate handshake overhead. Connection per-host limits typically 5-100.
•DNS Caching — Cache DNS results locally with appropriate TTLs. Prefetch DNS for known endpoints during idle periods.
•TLS Session Resumption — Enable session tickets to reduce TLS handshake from 2 RTT to 1 RTT or 0 RTT.
•Server Processing — Profile and optimize hot paths. Parallelize independent operations. Cache aggressively.
•Response Compression — Enable gzip/brotli for text responses. 70-90% reduction typical for JSON.
•Response Streaming — For large responses, stream to reduce Time to First Byte perception.
•Edge Deployment — Deploy services closer to users via CDN edge compute or regional deployments.
•Protocol Upgrade — Use HTTP/2 for multiplexing benefits. HTTP/3 for mobile/lossy networks.

Measure Before Optimizing

Summary: The Request Lifecycle

We've traced the complete journey of an HTTP request from initiation to completion. This deep understanding is essential for building and debugging high-performance distributed systems.

Key Takeaways

•The request lifecycle has 8 distinct phases: DNS, TCP, TLS, request transmission, server processing, response transmission, client processing, and cleanup.
•Connection reuse eliminates 75%+ of cold-start latency: DNS + TCP + TLS overhead is paid only on first request to new endpoints.
•DNS resolution is often overlooked but can add 50-200ms on cache misses. Use prefetching and local caching.
•TLS 1.3 reduces handshake latency from 2 RTT to 1 RTT, with 0-RTT possible for resumed connections.
•Server processing is usually the largest variable: Optimize data access, minimize external calls, use caching.
•Streaming responses reduce perceived latency for large payloads by delivering data incrementally.
•Use distributed tracing to identify which lifecycle phases dominate your latency before optimizing.

What's Next:

Page Complete