Loading content...
When engineers first implement timeouts, a common mistake is treating "timeout" as a single, monolithic concept—setting one timeout value and expecting it to protect against all waiting scenarios. This oversimplification misses a crucial distinction: the failure modes during connection establishment are fundamentally different from the failure modes during data transfer.
A connection timeout protects you during the handshake phase, when your client is attempting to establish a transport-layer connection with a server. A read timeout (sometimes called a socket timeout or response timeout) protects you after the connection is established, during the actual exchange of data. Understanding this distinction is essential for effective timeout configuration—too short on connection means rejecting reachable servers; too short on read means abandoning in-progress work.
By the end of this page, you will understand the precise distinction between connection and read timeouts, the different failure modes each protects against, how to size each timeout appropriately for your use case, and the common pitfalls in timeout configuration.
To understand where different timeouts apply, we must first understand the phases of a network request. Consider a simple HTTP request from a client to a server:
Phase 1: DNS Resolution
Before anything else, the client must resolve the server's hostname to an IP address. This involves:
DNS resolution can fail silently or hang if DNS servers are unreachable. Many libraries have a separate DNS timeout, though it's often bundled with the connection timeout.
Phase 2: TCP Handshake (Connection Establishment)
Once the IP address is known, the client initiates a TCP connection:
This three-way handshake typically completes in milliseconds on a healthy network but can hang indefinitely if:
The connection timeout governs this phase.
Phase 3: TLS Handshake (For HTTPS)
For secure connections, an additional handshake occurs after TCP establishment:
This adds latency (potentially 1-2 additional round trips) and introduces more failure points. TLS handshake issues can cause connections to hang during certificate validation or key exchange. Some libraries include TLS handshake time in the connection timeout; others have a separate TLS timeout.
Phase 4: Request Transmission and Response Reception
Once the connection is established (and optionally secured), the actual HTTP transaction begins:
The read timeout governs the waiting between sending a request and receiving the complete response.
This phase can hang if:
The connection timeout defines the maximum time your client will wait to establish a transport-layer connection with a server. This timeout fires if the TCP (and optionally TLS) handshake doesn't complete within the specified duration.
What it protects against:
Unreachable hosts — The IP is valid but the host is powered off or network-disconnected. Without RST packets (which require an active host), the client keeps retrying SYN packets according to OS-level configurations (often waiting minutes).
Firewall black-holing — Some firewalls silently drop packets rather than returning ICMP unreachable or TCP RST. From the client's perspective, this is indistinguishable from a very slow network.
Network partitions — The path to the server is broken somewhere in the network. Packets leave but never arrive.
Server overwhelm — The server's connection backlog is full. New SYN packets are ignored until existing connections are accepted.
| Failure Mode | Symptom | Without Connection Timeout | With Connection Timeout |
|---|---|---|---|
| Host unreachable (no RST) | SYN sent, no response | Wait for OS-level timeout (2+ minutes) | Fail after configured timeout |
| Firewall dropping packets | SYN sent, silently dropped | Wait indefinitely | Fail after configured timeout |
| Server backlog full | SYN sent, no SYN-ACK | Wait for backlog to drain (may never happen) | Fail after configured timeout |
| DNS failure | Resolution hangs | Wait for DNS timeout (varies) | May be covered if DNS is included |
| TLS certificate issue | Handshake stalls | Wait indefinitely (depending on library) | Fail if included in connection timeout |
Typical connection timeout values:
Connection timeouts are generally short—measured in hundreds of milliseconds to a few seconds. The reasoning:
Common ranges:
Connection pooling implications:
When using connection pools, connection timeout applies when:
Many connection pool implementations have a separate "acquire timeout" for waiting to get a connection from the pool, distinct from the timeout for establishing new connections.
12345678910111213
import java.net.http.HttpClient;import java.time.Duration; // Java 11+ HttpClient with explicit connection timeoutHttpClient client = HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(2)) // Connection timeout .build(); // Legacy Apache HttpClientRequestConfig config = RequestConfig.custom() .setConnectTimeout(2000) // Connection establishment timeout (ms) .setConnectionRequestTimeout(1000) // Time to get connection from pool .build();DNS resolution often happens before the connection timeout starts counting. If DNS takes 30 seconds and then connection establishment times out after 2 seconds, your effective timeout is 32 seconds. Configure DNS timeouts explicitly or use IP addresses for critical paths.
The read timeout (also called socket timeout, response timeout, or data timeout) defines the maximum time your client will wait to receive data after the connection is established and the request is sent. Crucially, this timeout applies to the gaps between data reception, not the total response time.
What it protects against:
Server processing delays — The server received your request but is slow processing it (database locks, external dependencies, CPU saturation).
Server crashes during processing — The server crashed after receiving your request but before sending a response. The TCP connection might remain half-open.
Network issues during transfer — Packets are being lost or delayed, causing slow or stalled data transfer.
Server bugs — The server entered an infinite loop or deadlock and will never respond.
Slow streams — The server is streaming a response very slowly (perhaps because its upstream dependency is slow).
Critical distinction: Socket timeout vs. total timeout
This is where confusion often arises. A read/socket timeout typically measures the time between receiving any data, not the total time for the complete response.
Socket timeout behavior (most HTTP libraries):
Example: 5-second socket timeout with streaming response:
T+0s: Request sent, waiting for response...
T+4s: First byte received, clock resets
T+8s: More data received, clock resets
T+12s: More data received, clock resets
T+15s: Response complete (no timeout, despite 15s total time)
The response took 15 seconds total, but because data arrived every 4 seconds, the 5-second socket timeout never fired.
Total timeout behavior:
Some clients offer a separate "total" or "request" timeout that bounds the entire operation regardless of streaming:
T+0s: Request sent, waiting for response...
T+4s: First byte received
T+8s: More data received
T+10s: TIMEOUT! Total timeout of 10s exceeded
This is safer for many use cases—you don't want a malicious server keeping your connection open indefinitely by dribbling bytes.
| Timeout Type | Clock Behavior | Fires When | Best For |
|---|---|---|---|
| Socket/Read Timeout | Resets on any data | No data for N seconds | Detecting dead connections |
| Response Timeout | Starts on first byte wait | Headers not received in N seconds | Detecting slow servers |
| Total/Request Timeout | Never resets | Total time exceeds N seconds | Bounding overall latency |
| Idle Timeout | Resets on any activity | No activity in either direction | Connection pool cleanup |
Typical read timeout values:
Read timeouts are generally longer than connection timeouts because they include server processing time:
Common ranges by operation type:
The timeout should reflect your SLA, not your hope:
A common mistake is setting read timeouts based on average response time. If your service averages 100ms but occasionally takes 5s, setting a 200ms timeout means those slow responses fail. Instead:
If your overall SLA is 500ms, and you call 3 downstream services sequentially, you cannot give each service a 500ms timeout—you'd potentially wait 1.5 seconds. Budget your timeouts: perhaps 150ms each, with 50ms buffer for your own processing.
Configuring timeouts correctly requires understanding both the technical mechanics and the operational context. These patterns help avoid common pitfalls.
Pattern: Layered timeout configuration
In complex systems, you often have multiple layers that each need timeout configuration:
API Gateway (total request timeout: 30s)
└── Service A (client timeout: 10s)
└── Database Pool (acquire timeout: 2s)
└── Database Query (statement timeout: 8s)
Each layer should have timeouts, but inner layer timeouts should be shorter than outer layers. If the database query timeout is 8s and Service A's overall timeout is 10s, there's 2s buffer for connection, network, and processing overhead.
Anti-pattern: Timeouts longer than outer layer:
API Gateway (total request timeout: 10s) # Outer layer
└── Service A (client timeout: 30s) # WRONG: Inner > Outer
Here, Service A might wait 30 seconds for a dependency, but the API Gateway already gave up after 10 seconds. The client got an error, but Service A continues working on a doomed request, wasting resources.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// Layered timeout configuration exampleconst config = { // External gateway/proxy timeout (if applicable) externalTimeout: 30_000, // 30s // Service-level request timeout requestTimeout: 25_000, // 25s, less than external // Individual downstream call timeouts downstreamTimeouts: { paymentService: { connection: 2_000, // 2s to connect read: 15_000, // 15s for payment processing }, inventoryService: { connection: 1_000, // 1s to connect (same DC) read: 5_000, // 5s for inventory check }, cache: { connection: 500, // 500ms to Redis read: 1_000, // 1s max for cache operations }, database: { acquire: 2_000, // 2s to get connection from pool query: 10_000, // 10s for query execution }, },}; // Validation: ensure inner timeouts don't exceed outerfunction validateTimeouts(config: typeof config): void { const maxDownstream = Math.max( config.downstreamTimeouts.paymentService.connection + config.downstreamTimeouts.paymentService.read, config.downstreamTimeouts.inventoryService.connection + config.downstreamTimeouts.inventoryService.read, // ... etc ); if (maxDownstream > config.requestTimeout) { throw new Error( `Downstream timeout (${maxDownstream}ms) exceeds request timeout (${config.requestTimeout}ms)` ); }}Consider using different timeout values for different environments. Development environments might use longer timeouts for debugging ease, while production uses strict timeouts. CI/CD environments might use the strictest timeouts to catch timeout-sensitive code early.
Even experienced engineers stumble over certain timeout-related issues. These pitfalls are common enough to deserve explicit attention.
Pitfall deep dive: The retry multiplication problem
Retries and timeouts interact in subtle ways. Consider this configuration:
Read timeout: 10 seconds
Max retries: 3 (meaning 4 total attempts)
No backoff or exponential delay
Worst case latency: 40 seconds (10s × 4 attempts)
If your SLA is 30 seconds, this configuration violates the SLA even with aggressive per-request timeouts.
The solution: Calculate your timeout budget considering retries:
SLA: 30 seconds
Max attempts: 4 (1 initial + 3 retries)
Per-attempt timeout: 30s ÷ 4 = 7.5s (round down to 7s for safety)
Alternatively, implement a total timeout that spans all retry attempts:
Total request timeout: 30 seconds (hard deadline)
Per-attempt timeout: 10 seconds
Retries continue until total timeout exhausted
A subtle pitfall: when using connection pools, a long read timeout doesn't just affect one request—it holds a connection from the pool. If all pooled connections are stuck waiting for slow responses, new requests can't acquire connections. Set connection pool acquire timeouts to fail fast rather than queue indefinitely.
While connection and read timeouts receive most of the attention, there's a third timeout type that's often overlooked: the write timeout. This timeout governs how long the client waits to send data to the server.
When write timeouts matter:
The mechanics:
TCP uses flow control to prevent fast senders from overwhelming slow receivers. If the receiver's buffer is full, the sender blocks until buffer space is available. A write timeout protects the sender from blocking indefinitely in this situation.
| Timeout Type | Governs | Typical Values | When It Matters |
|---|---|---|---|
| Connection | TCP + TLS handshake | 1-5 seconds | Every connection |
| Read | Waiting for response data | 5-60 seconds | Every request |
| Write | Sending request data | 5-30 seconds | Large request bodies, slow networks |
Write timeout configuration:
Many HTTP client libraries don't expose write timeouts directly, or bundle them with read timeouts. Here's how to configure them in libraries that support it:
1234567891011121314151617181920212223242526
import ( "net/http" "time" "net") transport := &http.Transport{ DialContext: (&net.Dialer{ Timeout: 2 * time.Second, // Connection timeout }).DialContext, // Write timeout is not directly configurable on Transport // Use context deadline or http.Server.WriteTimeout for servers // For servers, configure WriteTimeout: // server := &http.Server{ // WriteTimeout: 10 * time.Second, // }} // For clients, use context with deadline that covers the entire operationctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)defer cancel() req, _ := http.NewRequestWithContext(ctx, "POST", url, largeBody)resp, err := client.Do(req)If your service accepts file uploads or large POST bodies, write timeouts are essential. Without them, a client with a slow network (or a malicious slow-loris attack) can hold your server's connection and memory indefinitely while slowly trickling data.
Let's synthesize everything into actionable configuration guidance for common scenarios.
| Scenario | Connection Timeout | Read Timeout | Total Timeout | Rationale |
|---|---|---|---|---|
| Internal service (same DC) | 500ms | 5s | 10s | Low latency expected; fast failure on issues |
| Internal service (different region) | 2s | 10s | 30s | Network latency is higher; more variability |
| External API (payment) | 3s | 30s | 45s | Third-party may be slow; their SLA matters |
| Cache (Redis/Memcached) | 200ms | 1s | 2s | Cache should be fast; slowly hitting cache defeats purpose |
| Database query (simple) | 1s | 5s | 10s | Simple queries should be quick |
| Database query (complex) | 1s | 30s | 60s | Analytics queries can be slow |
| Health check | 500ms | 2s | 3s | Health checks must be fast; slow check = unhealthy |
| File upload endpoint | 2s | 10s per MB | Based on size | Scale with expected file sizes |
Template configuration for a typical web service:
123456789101112131415161718192021222324252627282930313233343536373839
# Timeout configuration templatetimeouts: # Default timeouts for service-to-service calls default: connection_ms: 1000 # 1s to establish connection read_ms: 10000 # 10s to receive response write_ms: 10000 # 10s to send request total_ms: 15000 # 15s overall deadline # Override per service/operation services: payment-service: connection_ms: 2000 # Payment infra may be in different network read_ms: 30000 # Payment processing can be slow total_ms: 45000 # Give plenty of room inventory-service: connection_ms: 500 # Same datacenter read_ms: 5000 # Quick lookups total_ms: 8000 recommendation-engine: connection_ms: 1000 read_ms: 3000 total_ms: 5000 # Note: recommendations are non-critical; fail fast # Infrastructure cache: connection_ms: 200 read_ms: 500 total_ms: 1000 database: connection_pool: acquire_ms: 2000 # Time to get connection from pool connection_ms: 1000 default_query_ms: 5000 # Simple queries complex_query_ms: 30000 # Reports, aggregationsIt's better to start with aggressive timeouts and loosen them based on metrics than to start permissive and tighten. Aggressive timeouts surface latency issues early; permissive timeouts hide problems until they become outages.
What's next:
Now that we understand the types of timeouts and when each applies, the next page explores timeout propagation—how timeout values flow through a chain of service calls and how to ensure that deadlines are respected across service boundaries.
You now understand the crucial distinction between connection and read timeouts, their failure modes, and how to configure them appropriately. Next, we'll explore how timeouts propagate through distributed systems and the challenges of maintaining consistent deadlines across service boundaries.