Loading content...
While NGINX and HAProxy were born in the era of monolithic applications and physical servers, Envoy emerged from a fundamentally different context: the cloud-native, microservices-first world of modern distributed systems. Created by Matt Klein at Lyft in 2016 and open-sourced shortly after, Envoy was designed specifically to address the challenges of dynamic, containerized, ephemeral infrastructure.
Envoy's creation was motivated by a critical observation: traditional proxies were designed for static configuration files and long-lived servers. In a Kubernetes world where pods spin up and down constantly, where services auto-scale in response to load, and where deployments happen dozens of times per day, the reload-based configuration model becomes a significant operational burden.
Today, Envoy serves as the foundation for major service mesh implementations including Istio, AWS App Mesh, Google Cloud Traffic Director, and the Consul Connect data plane. Its adoption spans organizations from startups to hyperscalers, making it arguably the most important networking software of the cloud-native era.
By completing this page, you will understand Envoy's architecture and threading model, master its xDS dynamic configuration APIs, comprehend its observability and debugging capabilities, and recognize optimal deployment patterns including sidecar proxies and edge deployments.
Envoy's architecture diverges from traditional proxies in several fundamental ways, all driven by the requirements of cloud-native infrastructure.
Core Architectural Principles:
Threading Model:
Envoy employs a multi-threaded model with careful attention to thread-local storage (TLS) to minimize lock contention:
When configuration updates arrive via xDS, the main thread atomically swaps shared pointers in each worker's TLS. Workers pick up changes on their next event loop iteration without any locking on the hot path.
Connection Draining:
Unlike NGINX (which requires external scripting for graceful shutdown) or HAProxy (which does this well), Envoy has sophisticated connection draining built-in. When configuration changes remove a cluster, listeners are drained gracefully—existing connections complete while new connections are refused.
Envoy's 'hot restart' capability enables binary upgrades without dropping connections. The new process starts, inherits listening sockets from the old process, and takes over new connections while the old process drains existing ones. This enables true zero-downtime upgrades in production.
Envoy's xDS (x Discovery Services) APIs represent a paradigm shift in proxy configuration. Rather than static configuration files that require reloads, xDS enables a control plane to stream configuration updates to Envoy instances in real-time.
The xDS API Family:
| API | Purpose | Configuration Managed |
|---|---|---|
| LDS (Listener DS) | Configure listeners | Ports, protocols, TLS settings, filter chains |
| RDS (Route DS) | Configure routing | Virtual hosts, routes, traffic policies |
| CDS (Cluster DS) | Configure clusters | Backend groups, load balancing, health checks |
| EDS (Endpoint DS) | Configure endpoints | Individual backend addresses and weights |
| SDS (Secret DS) | Configure secrets | TLS certificates, keys, trusted CAs |
| ECDS (Extension Config DS) | Configure extensions | Dynamic Wasm filters, custom extensions |
| ADS (Aggregated DS) | Unified stream | All xDS resources over single gRPC stream |
Configuration Hierarchy:
Envoy's configuration model follows a logical hierarchy:
Listeners (LDS)
└── Filter Chains
└── HTTP Connection Manager
└── Routes (RDS)
└── Virtual Hosts
└── Routes
└── Clusters (CDS)
└── Endpoints (EDS)
This separation enables independent updates to different configuration layers. For example, when a new service instance starts, only EDS updates are needed—the listener, routes, and cluster configuration remain unchanged.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
# Envoy bootstrap configuration# Defines admin interface and how to connect to xDS control plane admin: address: socket_address: address: 0.0.0.0 port_value: 9901 # Static resources (bootstrap-time configuration)static_resources: clusters: # Control plane cluster for xDS - name: xds_cluster type: STRICT_DNS connect_timeout: 5s load_assignment: cluster_name: xds_cluster endpoints: - lb_endpoints: - endpoint: address: socket_address: address: control-plane.example.com port_value: 18000 typed_extension_protocol_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions: "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions explicit_http_config: http2_protocol_options: {} # Dynamic resources via xDSdynamic_resources: ads_config: api_type: GRPC transport_api_version: V3 grpc_services: - envoy_grpc: cluster_name: xds_cluster # Get listeners from LDS lds_config: resource_api_version: V3 ads: {} # Get clusters from CDS cds_config: resource_api_version: V3 ads: {} # Node identification for control planenode: cluster: my-cluster id: my-node-id metadata: role: sidecar namespace: production service: api-gatewayxDS Protocol Details:
xDS operates over gRPC streaming connections. The control plane pushes configuration resources, and Envoy acknowledges receipt. This enables:
Popular xDS control planes include:
go-control-plane or similar librariesEnvoy requires a bootstrap configuration file to start—this configures the admin interface and tells Envoy how to reach the xDS control plane. All other configuration can then be dynamically managed. In Kubernetes, the bootstrap is typically generated by an init container or mutating webhook.
Envoy implements sophisticated load balancing capabilities designed for microservices environments where dozens of upstream services, each with variable instance counts, must be managed dynamically.
| Policy | Behavior | Use Case |
|---|---|---|
| Round Robin | Cycles through endpoints sequentially | Default, general-purpose |
| Least Request | Routes to endpoint with fewest active requests | Variable latency backends |
| Ring Hash | Consistent hashing on configurable keys | Cache affinity, sharding |
| Maglev | Google's consistent hash variant | Large scale, minimal disruption |
| Random | Random selection (weighted) | Simple, high-scale scenarios |
| Original DST | Routes to original destination IP | Transparent proxying |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
# Cluster with least request balancingclusters: - name: api_cluster type: EDS connect_timeout: 5s lb_policy: LEAST_REQUEST least_request_lb_config: choice_count: 2 # Power of two random choices # Health checking health_checks: - timeout: 5s interval: 10s healthy_threshold: 2 unhealthy_threshold: 3 http_health_check: path: /health expected_statuses: - start: 200 end: 299 # Circuit breaker settings circuit_breakers: thresholds: - priority: DEFAULT max_connections: 1000 max_pending_requests: 1000 max_requests: 1000 max_retries: 10 # Outlier detection (automatic ejection of failing endpoints) outlier_detection: consecutive_5xx: 5 interval: 10s base_ejection_time: 30s max_ejection_percent: 50 enforcing_consecutive_5xx: 100 enforcing_success_rate: 100 success_rate_minimum_hosts: 5 success_rate_request_volume: 100 success_rate_stdev_factor: 1900 # Connection pooling typed_extension_protocol_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions: "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions common_http_protocol_options: max_connections_per_host: 100 idle_timeout: 60s explicit_http_config: http2_protocol_options: {} # Ring hash for cache distribution - name: cache_cluster type: EDS connect_timeout: 5s lb_policy: RING_HASH ring_hash_lb_config: minimum_ring_size: 1024 maximum_ring_size: 8388608 hash_function: XX_HASH # Configure what to hash on # (done at route level with hash_policy)Advanced Traffic Management:
Envoy's route configuration enables sophisticated traffic manipulation beyond basic load balancing:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
# Route configuration with traffic managementroutes: - match: prefix: /api/ route: # Weighted cluster routing (canary deployments) weighted_clusters: clusters: - name: api_v1 weight: 90 - name: api_v2 weight: 10 # Retry policy retry_policy: retry_on: "5xx,reset,connect-failure" num_retries: 3 per_try_timeout: 5s retry_back_off: base_interval: 0.1s max_interval: 1s retriable_status_codes: - 503 - 504 # Timeout configuration timeout: 60s idle_timeout: 30s # Hash policy for consistent routing hash_policy: - header: header_name: "x-user-id" - cookie: name: session_id ttl: 3600s - connection_properties: source_ip: true # Request hedging (send duplicate requests) hedge_policy: hedge_on_per_try_timeout: true initial_requests: 2 # Traffic mirroring (shadow testing) request_mirror_policies: - cluster: api_shadow runtime_fraction: default_value: numerator: 10 denominator: HUNDRED # Header-based routing for A/B testing - match: prefix: / headers: - name: x-experiment exact_match: "treatment" route: cluster: experiment_treatment - match: prefix: / route: cluster: experiment_controlEnvoy's request mirroring sends duplicate requests to a shadow cluster without affecting primary traffic. This enables testing new backend versions with production traffic patterns while completely isolating the shadow responses. A powerful technique for validating performance and correctness before promotion.
Envoy was designed with the principle that observability should be built-in, not bolted-on. Every proxied connection generates rich telemetry without requiring application changes—a critical capability for debugging distributed systems.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# Stats configurationstats_config: stats_tags: - tag_name: cluster_name regex: "^cluster\.((.+?)\.)" - tag_name: route_name regex: "^http\.route\.((.+?)\.)" use_all_default_tags: true # Access logging with detailed timing breakdownhttp_filters: - name: envoy.filters.http.router typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router # In listener/route configuration:access_log: - name: envoy.access_loggers.stream typed_config: "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog log_format: json_format: # Request info timestamp: "%START_TIME%" method: "%REQ(:METHOD)%" path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%" protocol: "%PROTOCOL%" # Response info response_code: "%RESPONSE_CODE%" response_flags: "%RESPONSE_FLAGS%" # Timing breakdown (microseconds) duration: "%DURATION%" time_to_first_byte: "%RESPONSE_DURATION%" time_to_last_rx_byte: "%REQUEST_DURATION%" upstream_response_time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%" # Connection info downstream_address: "%DOWNSTREAM_REMOTE_ADDRESS%" upstream_host: "%UPSTREAM_HOST%" upstream_cluster: "%UPSTREAM_CLUSTER%" # Tracing request_id: "%REQ(X-REQUEST-ID)%" trace_id: "%REQ(X-B3-TRACEID)%" # TLS info tls_version: "%DOWNSTREAM_TLS_VERSION%" tls_cipher: "%DOWNSTREAM_TLS_CIPHER%" # Distributed tracing configurationtracing: http: name: envoy.tracers.zipkin typed_config: "@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig collector_cluster: zipkin_cluster collector_endpoint: /api/v2/spans collector_endpoint_version: HTTP_JSON shared_span_context: true trace_id_128bit: trueResponse Flags:
Envoy's access logs include RESPONSE_FLAGS that provide crucial debugging information. Common flags include:
UH — No healthy upstream (all endpoints unavailable)UF — Upstream connection failureUO — Upstream overflow (circuit breaker triggered)NR — No route configured for requestURX — Request rejected by outlier detectionDC — Downstream connection terminationLH — Local service health check failureUT — Upstream request timeoutUC — Upstream connection terminationThese flags enable rapid root cause identification when investigating failures.
Envoy natively exposes the four golden signals of SRE: Latency (histogram metrics), Traffic (request rate), Errors (response codes, connection failures), and Saturation (connection pool usage, pending requests). This enables comprehensive service-level monitoring without application instrumentation.
Envoy's defining deployment pattern is as a sidecar proxy in service mesh architectures. In this model, every application instance is paired with its own Envoy instance, creating a distributed network of proxies that handle all service-to-service communication.
Service Mesh Architecture:
┌─────────────────────────────────────────────────────────────────┐│ Control Plane ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ Istio │ │ Consul │ │ Gloo │ ││ │ Pilot │ │ Connect │ │ Edge │ ││ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ││ │ │ │ ││ └────────────────┼────────────────┘ ││ │ xDS (gRPC) │└──────────────────────────┼───────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ Data Plane ││ ││ ┌───────────────────────────┐ ┌───────────────────────────┐ ││ │ Pod A │ │ Pod B │ ││ │ ┌─────────┐┌──────────┐ │ │ ┌─────────┐┌──────────┐ │ ││ │ │Service │─│ Envoy │─┼───┼──│ Envoy ││ Service │ │ ││ │ │ A │ │ Sidecar │ │ │ │ Sidecar ││ B │ │ ││ │ └─────────┘└──────────┘ │ │ └─────────┘└──────────┘ │ ││ └───────────────────────────┘ └───────────────────────────┘ ││ │└──────────────────────────────────────────────────────────────────┘Sidecar Injection:
In Kubernetes environments, sidecar injection happens automatically via mutating admission webhooks. When a pod is created, the mesh's admission controller modifies the pod specification to add:
Traffic Interception:
Envoy as a sidecar intercepts both inbound and outbound traffic using iptables REDIRECT rules:
123456789101112
# Inbound traffic: Redirect incoming connections to Envoy (port 15006)iptables -t nat -A PREROUTING -p tcp -j REDIRECT --to-port 15006 # Outbound traffic: Redirect outgoing connections to Envoy (port 15001)iptables -t nat -A OUTPUT -p tcp -j REDIRECT --to-port 15001 # Exclude Envoy's own traffic (prevent loops)iptables -t nat -A OUTPUT -m owner --uid-owner 1337 -j RETURN # Applications connect to localhost:$PORT but actually talk to remote service## Flow: App → iptables → Envoy (15001) → Remote Envoy (15006) → Remote AppSidecars add resource overhead: ~50MB memory per instance, ~5-10% CPU increase, 1-3ms latency per hop. At scale (thousands of pods), this translates to significant infrastructure cost. Evaluate whether the benefits (mTLS, observability, traffic control) justify the overhead for your use case.
Envoy's extensibility model enables sophisticated customization without forking the core proxy. The primary extension mechanisms are filters, WebAssembly (Wasm), and external processors.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
http_filters: # Rate limiting filter - name: envoy.filters.http.ratelimit typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit domain: "api_gateway" rate_limit_service: grpc_service: envoy_grpc: cluster_name: ratelimit_cluster transport_api_version: V3 # JWT authentication - name: envoy.filters.http.jwt_authn typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication providers: auth0: issuer: https://your-domain.auth0.com/ audiences: - your-api-identifier remote_jwks: http_uri: uri: https://your-domain.auth0.com/.well-known/jwks.json cluster: auth0_cluster timeout: 5s cache_duration: 600s forward: true forward_payload_header: x-jwt-payload rules: - match: prefix: /api/ requires: provider_name: auth0 # External authorization - name: envoy.filters.http.ext_authz typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz grpc_service: envoy_grpc: cluster_name: authz_cluster timeout: 2s include_peer_certificate: true transport_api_version: V3 # Fault injection for chaos testing - name: envoy.filters.http.fault typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault delay: header_delay: {} fixed_delay: 5s percentage: numerator: 10 denominator: HUNDRED abort: header_abort: {} http_status: 503 percentage: numerator: 5 denominator: HUNDRED # WASM extension - name: envoy.filters.http.wasm typed_config: "@type": type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm config: name: my_custom_filter vm_config: runtime: envoy.wasm.runtime.v8 code: local: filename: /etc/envoy/wasm/custom_filter.wasm configuration: "@type": type.googleapis.com/google.protobuf.StringValue value: | {"setting": "value"} # Router must be last - name: envoy.filters.http.routerWebAssembly Extensions:
Wasm enables writing custom filters in Rust, Go, C++, or AssemblyScript that run within Envoy's sandbox. This provides near-native performance with memory safety and isolated execution.
Use cases include:
Wasm filters can be dynamically loaded via ECDS (Extension Config Discovery Service), enabling runtime updates without Envoy restarts.
The proxy-wasm project provides standard SDKs for building Envoy-compatible Wasm filters. The Rust SDK (proxy-wasm-rust-sdk) is most mature. Filters built with proxy-wasm are portable across Envoy, NGINX (with ngx_wasm_module), and other compatible proxies.
Envoy's performance profile reflects its design priorities: feature richness and observability, with acceptable overhead rather than maximum raw throughput.
| Aspect | Typical Value | Notes |
|---|---|---|
| Latency overhead | 1-3ms per hop | Higher with complex filter chains, TLS |
| Memory per instance | 50-100MB baseline | Increases with connection count, config size |
| CPU usage | 5-15% of app CPU | Higher with tracing, complex routing |
| Throughput | 10,000-50,000 RPS/core | Depends on request size, filter complexity |
| Connection overhead | ~8KB per connection | Includes TLS state if enabled |
Comparison with HAProxy/NGINX:
Envoy typically shows 10-30% higher latency than HAProxy for equivalent workloads. This is the cost of:
For edge/ingress deployments where every microsecond matters, HAProxy or NGINX may be preferable. For sidecar deployments where developer experience and observability are primary concerns, Envoy's overhead is acceptable.
Tuning Recommendations:
--concurrency to number of CPU cores allocatedIn Kubernetes, configure sidecar resource requests/limits based on observed usage. Start with 100m CPU / 128Mi memory for light sidecars, 250m CPU / 256Mi for heavy ones. Use Vertical Pod Autoscaler to optimize over time.
Envoy's unique capabilities make it the clear choice for certain architectures while being overkill for others.
Summary:
Envoy represents the next generation of service proxies, designed from the ground up for cloud-native, dynamic, microservices architectures. Its combination of xDS-based dynamic configuration, rich observability, and extensibility via Wasm makes it the foundation of modern service mesh infrastructure.
The trade-off is complexity: Envoy's configuration model is more sophisticated than NGINX or HAProxy, and its operational overhead (memory, CPU, latency) is higher. For organizations investing in platform engineering and service mesh infrastructure, this complexity is manageable and the benefits are substantial. For simpler deployments, NGINX or HAProxy remains preferable.
In the next page, we'll examine AWS ALB/NLB—managed load balancing services that eliminate operational overhead entirely.
You now possess comprehensive knowledge of Envoy as a cloud-native service proxy—from its xDS configuration APIs to observability features, service mesh integration, and extensibility. Next, we'll explore AWS ALB and NLB, understanding when managed cloud solutions are the optimal choice.