Loading content...
Mutual TLS (mTLS) represents the most robust form of service-to-service authentication. Unlike standard TLS where only the server proves its identity, mTLS requires both parties—client and server—to present and validate certificates. This mutual verification establishes cryptographic trust in both directions.
In microservices architectures, mTLS provides:
Strong Identity: Each service has a cryptographic identity (certificate) that cannot be forged without access to the private key.
Transport Encryption: All communication is encrypted, protecting against eavesdropping and tampering.
Zero-Trust Foundation: mTLS is a cornerstone of zero-trust architecture—services trust no one by default and verify every connection.
Compliance Requirements: Industries like finance, healthcare, and government often mandate mTLS for internal communications.
This page explores mTLS in depth: how it works, how to implement it at the API Gateway, certificate lifecycle management, and the operational challenges of running mTLS at scale.
Standard TLS (what you use with HTTPS) only authenticates the server—the client verifies the server's certificate and trusts it. The server has no assurance about who the client is. mTLS adds client authentication: the server also requests and validates a client certificate, establishing mutual trust.
Understanding the mTLS handshake is essential for proper implementation and troubleshooting. Let's examine the protocol in detail.
The mTLS handshake extends the standard TLS handshake with client certificate exchange:
1. Certificate Request (mTLS specific)
The server sends a CertificateRequest message indicating it requires client authentication. This message includes:
2. Client Certificate The client sends its certificate chain. This includes:
3. CertificateVerify The client proves it possesses the private key corresponding to the certificate by signing a hash of all handshake messages exchanged so far. The server verifies this signature using the public key from the client's certificate.
4. Certificate Validation Both parties validate the other's certificate:
123456789101112131415161718192021222324
Certificate: Data: Version: 3 (0x2) Serial Number: 0a:bc:de:f1:23:45:67:89 Signature Algorithm: sha256WithRSAEncryption Issuer: CN = Internal CA, O = Example Corp Validity: Not Before: Jan 1 00:00:00 2024 GMT Not After : Jan 1 00:00:00 2025 GMT Subject: CN = payment-service.internal Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public-Key: (2048 bit) X509v3 extensions: X509v3 Key Usage: critical Digital Signature, Key Encipherment X509v3 Extended Key Usage: TLS Web Client Authentication, TLS Web Server Authentication X509v3 Subject Alternative Name: DNS:payment-service.internal DNS:payment-service.prod.svc.cluster.local URI:spiffe://example.com/ns/prod/sa/payment-service Signature Algorithm: sha256WithRSAEncryption a1:b2:c3:d4:e5...The SPIFFE (Secure Production Identity Framework for Everyone) URI in the Subject Alternative Name (SAN) provides a standardized way to identify services across different platforms. SPIFFE URIs follow the format spiffe://trust-domain/path and are widely used in service mesh implementations.
A robust mTLS implementation requires a well-designed Certificate Authority (CA) infrastructure. For internal service-to-service communication, you typically operate your own private CA rather than using public CAs.
A production CA architecture typically includes multiple layers:
┌─────────────────┐
│ Root CA │ ← Offline, HSM-protected
│ (10+ years) │
└────────┬────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌───────▼───────┐ ┌──────▼───────┐ ┌──────▼───────┐
│ Intermediate │ │ Intermediate │ │ Intermediate │
│ CA (Prod) │ │ CA (Stage) │ │ CA (Dev) │
│ (1-3 years) │ │ (1-3 years) │ │ (1-3 years) │
└───────┬───────┘ └──────────────┘ └──────────────┘
│
┌───────┴───────────────────┐
│ │
┌───▼────┐ ┌───────┐ ┌───────┐│
│Service │ │Service│ │Service││
│ Cert │ │ Cert │ │ Cert ││
│(hours) │ │(hours)│ │(hours)││
└────────┘ └───────┘ └───────┘
Root CA: Offline, protected by Hardware Security Module (HSM), used only to sign intermediate CAs. Validity 10-20 years.
Intermediate CAs: Online (possibly automated), sign end-entity certificates. Often segmented by environment or team. Validity 1-5 years.
End-Entity Certificates: The actual service certificates. Short-lived (hours to days) in modern systems.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
# Internal CA configuration for service certificates ca: root: algorithm: RSA keySize: 4096 validity: "87600h" # 10 years storage: HSM # Hardware Security Module pathLenConstraint: 2 # Allow 2 levels of intermediate CAs intermediate: production: algorithm: RSA keySize: 4096 validity: "26280h" # 3 years pathLenConstraint: 0 # Cannot sign other CAs nameConstraints: permitted: - ".prod.internal" - ".prod.svc.cluster.local" excluded: - ".stage.internal" - ".dev.internal" staging: algorithm: ECDSA curve: P-256 validity: "8760h" # 1 year pathLenConstraint: 0 nameConstraints: permitted: - ".stage.internal" # End-entity (service) certificate policyservice_certificates: algorithm: ECDSA curve: P-256 validity: "24h" # Short-lived certificates max_validity: "168h" # 1 week maximum required_extensions: - keyUsage: ["digitalSignature", "keyEncipherment"] - extKeyUsage: ["clientAuth", "serverAuth"] san_policies: # Must include SPIFFE URI require_spiffe: true spiffe_trust_domain: "example.com" # Allowed DNS patterns allowed_dns_patterns: - "*.prod.internal" - "*.prod.svc.cluster.local" # Allowed IP SANs (for pod IPs if needed) allow_ip_san: trueThe root CA private key is the most critical secret in your mTLS infrastructure. If compromised, an attacker can issue trusted certificates for any service. Keep the root CA offline and protect it with HSM. Only bring it online to sign new intermediate CAs (rare events).
The API Gateway serves as the mTLS termination point for external services and the mTLS origination point for internal services. Configuration must address both directions.
Configure the gateway to require and validate client certificates:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
# Envoy Proxy mTLS Configuration static_resources: listeners: - name: mtls_listener address: socket_address: address: 0.0.0.0 port_value: 8443 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager stat_prefix: ingress_mtls route_config: name: local_route virtual_hosts: - name: backend domains: ["*"] routes: - match: prefix: "/" route: cluster: backend_cluster transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext require_client_certificate: true common_tls_context: # Server certificate and key tls_certificates: - certificate_chain: filename: "/certs/gateway.crt" private_key: filename: "/certs/gateway.key" # Trusted CAs for client certificates validation_context: trusted_ca: filename: "/certs/ca-bundle.crt" # Optional: Certificate Revocation List crl: filename: "/certs/crl.pem" # Verify client certificate against specific SANs match_typed_subject_alt_names: - san_type: DNS matcher: suffix: ".internal" - san_type: URI matcher: prefix: "spiffe://example.com/" # TLS parameters tls_params: tls_minimum_protocol_version: TLSv1_2 tls_maximum_protocol_version: TLSv1_3 cipher_suites: - ECDHE-ECDSA-AES128-GCM-SHA256 - ECDHE-RSA-AES128-GCM-SHA256 - ECDHE-ECDSA-AES256-GCM-SHA384 - ECDHE-RSA-AES256-GCM-SHA384When the gateway connects to backend services, it presents its own client certificate:
1234567891011121314151617181920212223242526272829303132333435363738
# Envoy Cluster with mTLS to backend clusters: - name: payment_service type: STRICT_DNS lb_policy: ROUND_ROBIN load_assignment: cluster_name: payment_service endpoints: - lb_endpoints: - endpoint: address: socket_address: address: payment-service.internal port_value: 8443 transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext common_tls_context: # Client certificate for gateway-to-service auth tls_certificates: - certificate_chain: filename: "/certs/gateway-client.crt" private_key: filename: "/certs/gateway-client.key" # Validate server certificate validation_context: trusted_ca: filename: "/certs/internal-ca.crt" match_typed_subject_alt_names: - san_type: DNS matcher: exact: "payment-service.internal" # SNI for proper routing sni: payment-service.internalAfter mTLS validation, extract the client identity from the certificate and propagate it to backend services. Common approaches: pass the Subject Distinguished Name (DN), the SPIFFE ID, or a specific SAN value as a header. This allows services to make authorization decisions based on the verified identity.
Managing certificates at scale is one of the biggest challenges of mTLS. Modern systems issue thousands of certificates with short lifetimes, requiring robust automation.
Short-lived certificates reduce the blast radius of key compromise. However, they require automated rotation to avoid outages.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
interface CertificateInfo { certificate: string; // PEM-encoded certificate privateKey: string; // PEM-encoded private key chain: string[]; // Intermediate CA chain notBefore: Date; notAfter: Date; serialNumber: string; subjectAltNames: string[];} class CertificateManager { private currentCert: CertificateInfo | null = null; private rotationTimer: NodeJS.Timer | null = null; constructor( private caClient: CAClient, private config: CertificateConfig ) {} async start(): Promise<void> { // Initial certificate fetch await this.rotateCertificate(); // Schedule rotation this.scheduleNextRotation(); } async rotateCertificate(): Promise<void> { console.log("Requesting new certificate from CA..."); // Generate new key pair const { publicKey, privateKey } = await crypto.subtle.generateKey( { name: "ECDSA", namedCurve: "P-256", }, true, ["sign", "verify"] ); // Create Certificate Signing Request (CSR) const csr = await this.generateCSR(publicKey, privateKey); // Request signed certificate from CA const signedCert = await this.caClient.signCSR({ csr: csr, validity: this.config.validity, sans: this.config.subjectAltNames, spiffeId: this.config.spiffeId, }); // Prepare new certificate bundle const newCert: CertificateInfo = { certificate: signedCert.certificate, privateKey: await this.exportPrivateKey(privateKey), chain: signedCert.chain, notBefore: new Date(signedCert.notBefore), notAfter: new Date(signedCert.notAfter), serialNumber: signedCert.serialNumber, subjectAltNames: signedCert.sans, }; // Atomic swap of current certificate const previousCert = this.currentCert; this.currentCert = newCert; // Notify listeners (e.g., reload TLS context) await this.notifyCertificateChange(newCert, previousCert); console.log(`Certificate rotated. New expiry: ${newCert.notAfter.toISOString()}`); } private scheduleNextRotation(): void { if (!this.currentCert) return; // Rotate at 75% of certificate lifetime const lifetime = this.currentCert.notAfter.getTime() - this.currentCert.notBefore.getTime(); const rotationPoint = this.currentCert.notBefore.getTime() + (lifetime * 0.75); const delay = rotationPoint - Date.now(); console.log(`Next rotation scheduled in ${Math.round(delay / 1000)} seconds`); this.rotationTimer = setTimeout(async () => { try { await this.rotateCertificate(); this.scheduleNextRotation(); } catch (error) { console.error("Certificate rotation failed:", error); // Retry with exponential backoff await this.retryRotation(); } }, Math.max(delay, 60000)); // Minimum 1 minute } private async retryRotation(): Promise<void> { const maxRetries = 5; for (let attempt = 1; attempt <= maxRetries; attempt++) { const delay = Math.pow(2, attempt) * 1000; // Exponential backoff console.log(`Retry attempt ${attempt}/${maxRetries} in ${delay}ms`); await new Promise(resolve => setTimeout(resolve, delay)); try { await this.rotateCertificate(); this.scheduleNextRotation(); return; } catch (error) { console.error(`Rotation retry ${attempt} failed:`, error); } } // All retries failed - alert and continue with existing cert this.alertCritical("Certificate rotation failed after all retries"); } getCertificate(): CertificateInfo | null { return this.currentCert; }}Several tools automate certificate issuance for mTLS:
| Tool | Description | Use Case |
|---|---|---|
| cert-manager | Kubernetes-native certificate management | Container workloads |
| Vault PKI | HashiCorp Vault as CA | Multi-environment, hybrid |
| SPIRE/SPIFFE | Universal identity framework | Cross-platform service mesh |
| AWS PCA | Managed private CA | AWS-native workloads |
| Step-ca | Lightweight, open-source CA | Small to medium scale |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
# cert-manager Certificate for service apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: payment-service namespace: productionspec: # Secret to store certificate and key secretName: payment-service-tls # Certificate validity duration: 24h renewBefore: 8h # Rotate when 16 hours of validity remain # Subject commonName: payment-service.production.svc.cluster.local # Subject Alternative Names dnsNames: - payment-service - payment-service.production - payment-service.production.svc - payment-service.production.svc.cluster.local uris: - spiffe://cluster.local/ns/production/sa/payment-service # Private key settings privateKey: algorithm: ECDSA size: 256 rotationPolicy: Always # Generate new key on each renewal # Issuer reference (Intermediate CA) issuerRef: name: internal-ca-issuer kind: ClusterIssuer # Key usages for mTLS usages: - digital signature - key encipherment - client auth - server auth ---# Issuer configuration (CA setup)apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: internal-ca-issuerspec: # Using HashiCorp Vault as CA vault: server: https://vault.internal:8200 path: pki/sign/service-role auth: kubernetes: role: cert-manager mountPath: auth/kubernetesWhen a certificate's private key is compromised or a service is decommissioned, you must revoke the certificate to prevent misuse. Two primary mechanisms exist for revocation checking:
CRLs are files published by the CA containing serial numbers of revoked certificates. Clients download and cache the CRL, checking incoming certificates against it.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
import { X509Certificate, X509Crl } from "node:crypto"; class CrlChecker { private crlCache: Map<string, { crl: X509Crl; fetchedAt: Date }> = new Map(); private refreshInterval = 30 * 60 * 1000; // 30 minutes async isRevoked(certificate: X509Certificate): Promise<boolean> { // Extract CRL Distribution Point from certificate const crlUrls = this.extractCrlDistributionPoints(certificate); for (const url of crlUrls) { const crl = await this.getCrl(url); if (crl && this.checkCrl(crl, certificate)) { return true; // Certificate is revoked } } return false; // Not found in any CRL } private async getCrl(url: string): Promise<X509Crl | null> { // Check cache const cached = this.crlCache.get(url); if (cached && this.isFresh(cached.fetchedAt)) { return cached.crl; } // Fetch fresh CRL try { const response = await fetch(url, { signal: AbortSignal.timeout(5000), }); if (!response.ok) { console.warn(`Failed to fetch CRL from ${url}: HTTP ${response.status}`); return cached?.crl || null; // Use stale cache if available } const derData = await response.arrayBuffer(); const crl = new X509Crl(Buffer.from(derData)); // Cache the CRL this.crlCache.set(url, { crl, fetchedAt: new Date() }); return crl; } catch (error) { console.error(`Error fetching CRL from ${url}:`, error); return cached?.crl || null; } } private checkCrl(crl: X509Crl, certificate: X509Certificate): boolean { const serialNumber = certificate.serialNumber; // Check if serial number is in the revoked list return crl.isRevoked(certificate); } private isFresh(fetchedAt: Date): boolean { return Date.now() - fetchedAt.getTime() < this.refreshInterval; }}OCSP provides real-time revocation checking. The client sends the certificate's serial number to an OCSP responder, which returns the current status.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
import { X509Certificate } from "node:crypto"; interface OcspResponse { status: "good" | "revoked" | "unknown"; producedAt: Date; thisUpdate: Date; nextUpdate: Date; revocationReason?: string; revocationTime?: Date;} class OcspChecker { private responseCache: Map<string, OcspResponse> = new Map(); async checkStatus( certificate: X509Certificate, issuerCertificate: X509Certificate ): Promise<OcspResponse> { const cacheKey = certificate.serialNumber; // Check cache const cached = this.responseCache.get(cacheKey); if (cached && this.isValid(cached)) { return cached; } // Build OCSP request const ocspRequest = this.buildOcspRequest(certificate, issuerCertificate); // Get OCSP responder URL from certificate const ocspUrl = this.extractOcspResponder(certificate); if (!ocspUrl) { throw new Error("No OCSP responder URL in certificate"); } // Send OCSP request const response = await fetch(ocspUrl, { method: "POST", headers: { "Content-Type": "application/ocsp-request", }, body: ocspRequest, signal: AbortSignal.timeout(5000), }); if (!response.ok) { throw new Error(`OCSP request failed: HTTP ${response.status}`); } const ocspResponseData = await response.arrayBuffer(); const ocspResponse = this.parseOcspResponse(Buffer.from(ocspResponseData)); // Verify OCSP response signature if (!this.verifyOcspSignature(ocspResponse, issuerCertificate)) { throw new Error("OCSP response signature verification failed"); } // Cache successful response this.responseCache.set(cacheKey, ocspResponse); return ocspResponse; } private isValid(response: OcspResponse): boolean { const now = new Date(); return now >= response.thisUpdate && now <= response.nextUpdate; } // ... implementation details for building/parsing OCSP messages} // OCSP Stapling: Server-side optimization// Gateway fetches OCSP response and "staples" it to TLS handshake// Eliminates client's need to contact OCSP responder // Envoy OCSP stapling configuration:// tls_certificates:// - certificate_chain: { filename: "/certs/server.crt" }// private_key: { filename: "/certs/server.key" }// ocsp_staple: { filename: "/certs/server.ocsp" }| Aspect | CRL | OCSP |
|---|---|---|
| Timeliness | May be hours old | Real-time |
| Bandwidth | Full list downloaded | Single certificate query |
| Privacy | Better (no per-cert query) | CA knows which certs are validated |
| Availability | Cached, offline-friendly | Requires responder availability |
| Scalability | Better for many revocations | Better for few revocations |
With very short-lived certificates (hours to days), revocation checking becomes less critical. If a certificate expires in 24 hours, the compromise window is inherently limited. Many modern systems use this approach to avoid the complexity of revocation infrastructure entirely.
Service meshes like Istio, Linkerd, and Consul Connect automate mTLS for all service-to-service communication. The mesh handles certificate issuance, rotation, and mTLS configuration transparently.
In a service mesh, each pod receives a sidecar proxy that handles all network traffic:
┌─────────────────────────────────────────────────────────┐
│ Pod │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Service │◄─────────►│ Sidecar │◄──────────┤──mTLS──►
│ │ Container │ localhost │ Proxy │ │
│ │ (plain HTTP) │ │(terminates │ │
│ └──────────────┘ │ mTLS) │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
The application communicates with the sidecar over localhost (plain HTTP), while the sidecar handles mTLS with other pods.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
# Namespace-wide mTLS enforcement apiVersion: security.istio.io/v1beta1kind: PeerAuthenticationmetadata: name: default namespace: productionspec: mtls: mode: STRICT # Require mTLS for all traffic ---# Authorization based on mTLS identity apiVersion: security.istio.io/v1beta1kind: AuthorizationPolicymetadata: name: payment-service-policy namespace: productionspec: selector: matchLabels: app: payment-service rules: # Allow only specific services to access payment-service - from: - source: principals: - "cluster.local/ns/production/sa/order-service" - "cluster.local/ns/production/sa/refund-service" to: - operation: methods: ["GET", "POST"] paths: ["/api/v1/payments/*"] # Deny all other traffic - from: - source: notPrincipals: - "cluster.local/ns/production/sa/*" ---# Certificate configuration apiVersion: security.istio.io/v1kind: CertificateConfigmetadata: name: default namespace: istio-systemspec: # Certificate validity certificateValidity: 24h # Refresh before expiration certificateRefreshGracePeriod: 1h # Root CA configuration rootCertificates: - secretName: cacerts namespace: istio-system| Benefit | Description |
|---|---|
| Zero Application Changes | Apps use plain HTTP; sidecar handles mTLS |
| Automatic Certificate Management | Mesh control plane handles issuance/rotation |
| Identity-Based Authorization | Policies use SPIFFE identities, not IP addresses |
| Consistent Security | Every pod gets the same security model |
| Observability | Mesh provides encrypted traffic metrics |
When using a service mesh, the API Gateway typically operates in one of two modes:
1. Mesh-Aware Gateway: Gateway is part of the mesh, receives mesh identity, mTLS to all internal services handled automatically.
2. External Gateway: Gateway sits outside the mesh, terminates external TLS, and initiates new mTLS connections into the mesh.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
# Istio Ingress Gateway with mTLS termination apiVersion: networking.istio.io/v1beta1kind: Gatewaymetadata: name: api-gateway namespace: istio-systemspec: selector: istio: ingressgateway servers: # External HTTPS with client certificate (mTLS) - port: number: 443 name: https protocol: HTTPS tls: mode: MUTUAL # Require client certificate credentialName: api-gateway-tls # Server cert Secret caCertificates: /etc/istio/client-certs/ca.crt # Client CA hosts: - "api.example.com" ---# Virtual Service routing apiVersion: networking.istio.io/v1beta1kind: VirtualServicemetadata: name: api-routes namespace: productionspec: hosts: - "api.example.com" gateways: - istio-system/api-gateway http: - match: - uri: prefix: /payments route: - destination: host: payment-service port: number: 8080 # Gateway-to-service: automatic mTLS via meshRunning mTLS at scale introduces operational complexity. Understanding common challenges and solutions is essential for production success.
mTLS failures can be cryptic. A systematic approach helps:
1234567891011121314151617181920212223242526272829
# 1. Test mTLS connection with OpenSSLopenssl s_client -connect payment-service:8443 -cert /certs/client.crt -key /certs/client.key -CAfile /certs/ca.crt -verify 2 -verify_return_error # Expected output includes:# - Verify return code: 0 (ok)# - Server certificate chain# - "Acceptable client certificate CA names" (what CAs server trusts) # 2. Check certificate detailsopenssl x509 -in /certs/client.crt -text -noout # Key things to verify:# - Validity dates (Not Before, Not After)# - Subject and Issuer# - Subject Alternative Names (SANs)# - Key Usage and Extended Key Usage (must include clientAuth) # 3. Verify certificate chainopenssl verify -CAfile /certs/root-ca.crt -untrusted /certs/intermediate-ca.crt /certs/client.crt # 4. Test with curl (verbose)curl -v --cert /certs/client.crt --key /certs/client.key --cacert /certs/ca.crt https://payment-service:8443/health # 5. Check for common issues# - Clock skew between client and server (affects validity checks)# - Wrong CA in trust store# - Certificate missing required Extended Key Usage# - Expired certificate or CRL# - Hostname/SAN mismatchclientAuth for client certs, serverAuth for server certs.notBefore or behind local time causes failures.Monitor certificate lifecycle and mTLS errors proactively:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
# Prometheus alerting rules for certificate monitoring groups: - name: certificate-alerts rules: # Alert when certificate expires within 24 hours - alert: CertificateExpiringSoon expr: | (x509_cert_not_after - time()) / 3600 < 24 for: 5m labels: severity: critical annotations: summary: "Certificate expiring within 24 hours" description: "Certificate {{ $labels.subject }} expires at {{ $labels.not_after }}" # Alert when certificate expires within 7 days - alert: CertificateExpiringInWeek expr: | (x509_cert_not_after - time()) / 3600 / 24 < 7 for: 5m labels: severity: warning annotations: summary: "Certificate expiring within 7 days" # Alert on mTLS handshake failures - alert: MTLSHandshakeFailures expr: | rate(envoy_ssl_handshake_error_total[5m]) > 0.1 for: 5m labels: severity: warning annotations: summary: "Elevated mTLS handshake errors" description: "{{ $value | humanize }} errors per second on {{ $labels.pod }}" # Alert on certificate rotation failure - alert: CertificateRotationFailed expr: | certmanager_certificate_ready_status{condition="False"} == 1 for: 15m labels: severity: critical annotations: summary: "Certificate rotation failed" description: "cert-manager failed to renew {{ $labels.name }}" ---# Metrics to collect # From Envoy sidecar:# - envoy_ssl_handshake_total# - envoy_ssl_handshake_error_total# - envoy_ssl_connection_duration_ms # From cert-manager:# - certmanager_certificate_ready_status# - certmanager_certificate_renewal_timestamp_seconds# - certmanager_certificate_expiration_timestamp_secondsMutual TLS provides the strongest form of service-to-service authentication, establishing cryptographic trust in both directions. While it adds operational complexity, the security benefits make it essential for zero-trust architectures and sensitive internal communications.
You've now mastered the four pillars of gateway authentication: centralized authentication architecture, OAuth2/JWT validation, API key management, and mTLS for service-to-service security. These patterns form the foundation of secure, production-grade API gateways. Continue to the next module to explore Backend-for-Frontend (BFF) patterns.