Loading content...
You can have the most sophisticated tracing infrastructure in the world—Jaeger clusters, petabytes of storage, beautiful dashboards—and it will all be worthless if your requests don't carry trace context across service boundaries.
Context propagation is the single most critical aspect of distributed tracing implementation. It's also where most tracing implementations fail.
A trace is only as good as the chain of context propagation that creates it. One service that doesn't propagate context creates a gap. One middleware component that strips headers fragments your traces. One message queue that loses metadata breaks causal chains.
This page will make you an expert in context propagation—understanding how it works, where it breaks, and how to ensure your traces are complete.
By the end of this page, you will understand: how trace context flows through HTTP headers; the W3C Trace Context standard; propagation through message queues; handling asynchronous and scheduled tasks; common propagation failures and how to detect them; and best practices for ensuring complete traces.
When a request moves from one service to another, the trace context must move with it. But process boundaries are barriers—nothing automatically flows from one process to another. Every piece of information that needs to survive the journey must be explicitly transmitted.
What happens without propagation:
Service A (Order Service) Service B (Payment Service)───────────────────────── ────────────────────────── Trace ID: aaaa-aaaa-aaaa Trace ID: bbbb-bbbb-bbbb ← NEW trace!Span: "ProcessOrder" Span: "ProcessPayment" (root span!)│ │├─ Span: "ValidateOrder" ├─ Span: "ValidateCard"├─ Span: "CalculateTax" └─ Span: "ChargeCard"└─ Span: "HTTP POST /payment" │ └───── HTTP Call ────────▸ (Context NOT propagated!) Result: TWO separate traces instead of ONE unified trace No parent-child relationship between services Cannot see the complete request flow Debugging requires manual correlation by timestampWhat happens with proper propagation:
Service A (Order Service) Service B (Payment Service)───────────────────────── ────────────────────────── Trace ID: aaaa-aaaa-aaaa Trace ID: aaaa-aaaa-aaaa ← SAME trace!Span: "ProcessOrder" │ ├─ Span: "ValidateOrder" ├─ Span: "CalculateTax" └─ Span: "HTTP POST /payment" │ └───── HTTP Call ────────▸ Span: "ProcessPayment" (child span!) │ parentSpanId: "HTTP POST /payment" (Headers include:) │ traceparent: 00-aaaa-.. ├─ Span: "ValidateCard" └─ Span: "ChargeCard" Result: ONE unified trace showing complete request flow Clear parent-child relationship Full visibility into end-to-end latencyThe difference is simple but profound: with propagation, you have one story; without it, you have scattered fragments.
Broken propagation is insidious because it fails silently. You still get traces—they're just incomplete. Services still report spans—they just don't connect. Unless you specifically look for propagation gaps, you might not notice the problem until an incident forces you to trace a request across services.
Before standardization, each tracing system used its own propagation format: Zipkin used B3 headers, Jaeger used uber-trace-id, AWS X-Ray used X-Amzn-Trace-Id. This fragmentation made it impossible to trace requests across systems using different vendors.
The W3C Trace Context specification solved this by defining a universal format that all tracing systems can support.
The traceparent Header Format:
The traceparent header has a precise format:
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags} Example:traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01 Breakdown:┌────┬──────────────────────────────────┬──────────────────┬──────────┐│ 00 │ 0af7651916cd43dd8448eb211c80319c │ b7ad6b7169203331 │ 01 │├────┼──────────────────────────────────┼──────────────────┼──────────┤│ ^ │ ^ │ ^ │ ^ ││ │ │ │ │ │ │ │ ││ │ │ │ │ │ │ └─ Trace flags│ │ │ │ │ │ │ 01 = sampled│ │ │ │ │ │ │ 00 = not sampled│ │ │ │ │ │ │ │ │ │ │ │ └─ Parent span ID (16 hex chars = 8 bytes)│ │ │ │ │ The span ID of the immediate parent│ │ │ │ │ │ │ │ └─ Trace ID (32 hex chars = 16 bytes)│ │ │ Unique identifier for the entire trace│ │ │ Generated once at trace start, never changes│ │ │ │ └──└─ Version (always "00" for current spec)│ Reserved for future format changes Total header size: 55 characters (very lightweight!)The tracestate Header:
The tracestate header allows vendors to attach their own data while maintaining interoperability:
# Simple vendor-specific statetracestate: vendorname=vendorvalue # Multiple vendors (each can add their own)tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE # Real-world examples:tracestate: dd=s:1;t.dm:-0 # Datadog sampling decisiontracestate: nr=0-0-1234567-abcdef123 # New Relic trace statetracestate: @aws-xray=Root=1-5f84c7a7-abc123def456W3C Trace Context is now the industry standard. OpenTelemetry uses it by default. Major cloud providers (AWS, Azure, GCP) support it. Modern tracing backends (Jaeger, Zipkin, Tempo) all understand it. If you're starting fresh, use W3C Trace Context. If you have legacy systems, most tracing libraries support both W3C and legacy formats during migration.
HTTP is the most common transport in microservices architectures, and HTTP header propagation is the most common propagation mechanism. Let's see how this works in practice.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
// SERVICE A: Making an outgoing HTTP call import { context, trace, SpanKind } from '@opentelemetry/api';import { W3CTraceContextPropagator } from '@opentelemetry/core'; const propagator = new W3CTraceContextPropagator(); async function callPaymentService(orderId: string, amount: number) { const tracer = trace.getTracer('order-service'); // Start a CLIENT span for the outgoing call return tracer.startActiveSpan( 'PaymentService.processPayment', { kind: SpanKind.CLIENT }, async (span) => { try { // Prepare headers object const headers: Record<string, string> = { 'Content-Type': 'application/json', }; // INJECT: Add trace context to outgoing headers // This adds the 'traceparent' and 'tracestate' headers propagator.inject(context.active(), headers, { set: (carrier, key, value) => { carrier[key] = value; }, }); // Headers now contain: // { // 'Content-Type': 'application/json', // 'traceparent': '00-abc123...-def456...-01', // 'tracestate': '' // } const response = await fetch('http://payment-service/api/process', { method: 'POST', headers, body: JSON.stringify({ orderId, amount }), }); span.setStatus({ code: SpanStatusCode.OK }); return response.json(); } catch (error) { span.setStatus({ code: SpanStatusCode.ERROR, message: error.message }); throw error; } finally { span.end(); } } );}The Inject/Extract Pattern:
Context propagation follows a consistent pattern across all transports:
1. INJECT (Sender Side)
2. EXTRACT (Receiver Side)
Most tracing libraries handle this automatically through middleware and instrumentation, but understanding the underlying mechanism helps when debugging propagation issues.
In practice, you rarely write inject/extract code manually. OpenTelemetry and other tracing libraries provide automatic instrumentation for popular HTTP libraries (Express, Flask, Spring, axios, requests, etc.) that handles propagation transparently. The code examples above show what happens under the hood.
Message queues present a unique challenge for context propagation. Unlike synchronous HTTP calls where request and response are tightly coupled, message queues introduce:
Despite these challenges, maintaining trace context through message queues is essential for understanding asynchronous workflows.
| System | Where to Store Context | Key Considerations |
|---|---|---|
| Apache Kafka | Record headers (key-value pairs) | Headers survive serialization; use string encoding for trace context |
| RabbitMQ | Message properties/headers | AMQP headers support arbitrary key-value pairs |
| AWS SQS | Message attributes | Limited to 10 attributes; reserve space for trace context |
| AWS SNS | Message attributes | Attributes propagate to subscribed SQS queues |
| Google Pub/Sub | Message attributes | Attribute values are base64 strings |
| Azure Service Bus | Application properties | Custom properties support trace context |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
// PRODUCER: Injecting context into Kafka message import { context, trace, SpanKind } from '@opentelemetry/api';import { Kafka, RecordHeaders } from 'kafkajs';import { W3CTraceContextPropagator } from '@opentelemetry/core'; const propagator = new W3CTraceContextPropagator();const kafka = new Kafka({ brokers: ['kafka:9092'] });const producer = kafka.producer();const tracer = trace.getTracer('order-service'); async function publishOrderEvent(order: Order) { return tracer.startActiveSpan( 'kafka.produce order.created', { kind: SpanKind.PRODUCER }, // PRODUCER span kind for async messaging async (span) => { try { // Create headers from trace context const headers: RecordHeaders = {}; propagator.inject(context.active(), headers, { set: (carrier, key, value) => { // Kafka requires Buffer for header values carrier[key] = Buffer.from(value); }, }); // Add span attributes span.setAttributes({ 'messaging.system': 'kafka', 'messaging.destination': 'order.created', 'messaging.destination_kind': 'topic', 'order.id': order.id, }); await producer.send({ topic: 'order.created', messages: [{ key: order.id, value: JSON.stringify(order), headers, // Headers contain traceparent and tracestate }], }); span.setStatus({ code: SpanStatusCode.OK }); } finally { span.end(); } } );}For message queues, you have a choice: treat the consumer span as a child of the producer span (same trace), or start a new trace and use 'links' to reference the producer. Child relationships work well for short queues. Links work better when messages sit in queues for hours/days, as they avoid extremely long-duration traces.
Within a single service, trace context must also propagate across threads, async operations, and scheduled tasks. This is actually where many propagation issues occur—not at service boundaries, but within a single codebase.
123456789101112131415161718192021222324252627282930313233343536373839
// PROBLEM: Thread pool loses context ExecutorService executor = Executors.newFixedThreadPool(10); public void processOrder(Order order) { Span currentSpan = Span.current(); // Has context here // BAD: Context is lost when work runs on pool thread executor.submit(() -> { // Span.current() returns a no-op span here! validateOrder(order); // This span has no parent });} // SOLUTION: Wrap with context propagation import io.opentelemetry.context.Context; public void processOrderFixed(Order order) { Context currentContext = Context.current(); // Capture context // GOOD: Make context current in the new thread executor.submit(currentContext.wrap(() -> { // Context.current() now returns the captured context // Span.current() works correctly validateOrder(order); // This span IS connected }));} // EVEN BETTER: Use context-aware executorExecutorService tracedExecutor = Context.taskWrapping( Executors.newFixedThreadPool(10)); // Now all submitted tasks automatically carry contexttracedExecutor.submit(() -> { // Context automatically propagated validateOrder(order);});Every language/runtime handles async context differently. Java uses ThreadLocal with explicit wrapping. Node.js uses AsyncLocalStorage. Go uses context.Context passed explicitly. Python uses contextvars. Always use your tracing library's recommended patterns for your language—don't try to invent your own context propagation.
Real-world systems often mix technologies: HTTP services calling gRPC services, REST APIs publishing to Kafka, Lambda functions receiving from SQS. Each technology boundary requires appropriate propagation handling.
| Protocol/Transport | Standard Header/Field | Format |
|---|---|---|
| HTTP/1.1 & HTTP/2 | traceparent, tracestate headers | W3C Trace Context |
| gRPC | grpc-trace-bin metadata (binary) or W3C headers | Binary or W3C |
| GraphQL | HTTP headers (GraphQL over HTTP) | W3C Trace Context |
| WebSocket | HTTP headers in handshake + custom frames | App-specific |
| TCP (raw) | Application-defined framing | App-specific |
| Database Queries | Query comments or session attributes | Vendor-specific |
Special Case: Database Query Tracing
Database calls need special handling. You can't add HTTP headers to a SQL query. Options include:
-- Option 1: SQL Comments (SQLCommenter pattern)-- Trace context embedded in query comment (parsed by database for tracing) SELECT * FROM orders WHERE user_id = 123/*traceparent='00-abc123-def456-01', db.name='ecommerce', action='getOrders', controller='OrderController'*/ -- Pros: Works with any database, visible in slow query logs-- Cons: Adds overhead, requires parsing on database side -- Option 2: Session Variables (PostgreSQL example) SET application_name = 'order-service';SET myapp.trace_id = '00-abc123-def456-01';SELECT * FROM orders WHERE user_id = 123; -- Trace info now visible in pg_stat_activity-- Can be logged by database for correlation -- Option 3: Distributed tracing database extensions-- Some databases have native tracing support:-- - PostgreSQL: pg_stat_statements + OpenTelemetry exporter-- - MySQL: Performance Schema integration-- - MongoDB: Profiler with trace correlationSpecial Case: Serverless Functions
Serverless environments (AWS Lambda, Azure Functions, Google Cloud Functions) require unique handling because:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
// AWS Lambda with multiple trigger types import { APIGatewayProxyEvent, SQSEvent, Context } from 'aws-lambda';import { context as otelContext, trace } from '@opentelemetry/api';import { W3CTraceContextPropagator } from '@opentelemetry/core'; const propagator = new W3CTraceContextPropagator(); // HTTP Trigger - extract from headersexport function handleHttpTrigger(event: APIGatewayProxyEvent) { const ctx = propagator.extract( otelContext.active(), event.headers, { get: (h, k) => h[k] || h[k.toLowerCase()] } ); return otelContext.with(ctx, () => { // Process with context from HTTP caller });} // SQS Trigger - extract from message attributesexport function handleSqsTrigger(event: SQSEvent) { for (const record of event.Records) { const attributes = record.messageAttributes; // Build headers-like object from SQS attributes const headers: Record<string, string> = {}; if (attributes.traceparent) { headers.traceparent = attributes.traceparent.stringValue!; } if (attributes.tracestate) { headers.tracestate = attributes.tracestate.stringValue!; } const ctx = propagator.extract(otelContext.active(), headers, { get: (h, k) => h[k], }); otelContext.with(ctx, () => { // Process with context from message producer }); }}AWS X-Ray, Google Cloud Trace, and Azure Application Insights all have native trace context propagation. When using these services, understand how they interact with W3C Trace Context and OpenTelemetry. Most now support W3C format, making interoperability possible.
When traces are fragmented, you need systematic approaches to identify where propagation is breaking.
# PromQL: Find services creating too many root spans# (indicates propagation is breaking at those services) sum by (service_name) ( rate(trace_span_started_total{parent_span_id=""}[5m])) / sum by (service_name) ( rate(trace_span_started_total[5m])) > 0.5 # This shows services where >50% of spans are root spans# Most internal services should have very few root spansThe most common propagation breakers: (1) API gateways/proxies that don't forward headers, (2) Services not using instrumented HTTP clients, (3) Thread pool usage without context wrapping, (4) Legacy code using old tracing libraries, (5) Message queue consumers not extracting context, (6) WebSocket connections where initial handshake isn't propagated.
We've thoroughly explored context propagation—the critical mechanism that makes distributed tracing possible. Let's consolidate:
What's Next:
Now that we understand how traces flow through systems, we need to look at the tools that collect, store, and visualize them. The next page covers Jaeger and Zipkin—the two most popular open-source distributed tracing systems. You'll learn their architectures, how to deploy them, and how to choose between them.
You now have deep expertise in context propagation—the invisible thread that ties distributed traces together. This knowledge is essential for implementing tracing correctly and diagnosing issues when traces are incomplete. Every fragmented trace is a propagation problem waiting to be solved.