Distributed Tracing - Learning Module

Loading content...

0/273

Context Propagation

The Thread That Holds Tracing Together

You can have the most sophisticated tracing infrastructure in the world—Jaeger clusters, petabytes of storage, beautiful dashboards—and it will all be worthless if your requests don't carry trace context across service boundaries.

Context propagation is the single most critical aspect of distributed tracing implementation. It's also where most tracing implementations fail.

A trace is only as good as the chain of context propagation that creates it. One service that doesn't propagate context creates a gap. One middleware component that strips headers fragments your traces. One message queue that loses metadata breaks causal chains.

This page will make you an expert in context propagation—understanding how it works, where it breaks, and how to ensure your traces are complete.

What You Will Learn

By the end of this page, you will understand: how trace context flows through HTTP headers; the W3C Trace Context standard; propagation through message queues; handling asynchronous and scheduled tasks; common propagation failures and how to detect them; and best practices for ensuring complete traces.

The Propagation Problem

When a request moves from one service to another, the trace context must move with it. But process boundaries are barriers—nothing automatically flows from one process to another. Every piece of information that needs to survive the journey must be explicitly transmitted.

What happens without propagation:

Trace Without Propagation (Broken)
Service A (Order Service)           Service B (Payment Service)
─────────────────────────           ──────────────────────────
 
Trace ID: aaaa-aaaa-aaaa            Trace ID: bbbb-bbbb-bbbb  ← NEW trace!
Span: "ProcessOrder"                Span: "ProcessPayment" (root span!)
│                                   │
├─ Span: "ValidateOrder"            ├─ Span: "ValidateCard"
├─ Span: "CalculateTax"             └─ Span: "ChargeCard"
└─ Span: "HTTP POST /payment"
         │
         └───── HTTP Call ────────▸  (Context NOT propagated!)
 
Result: TWO separate traces instead of ONE unified trace
        No parent-child relationship between services
        Cannot see the complete request flow
        Debugging requires manual correlation by timestamp

What happens with proper propagation:

Trace With Propagation (Correct)
Service A (Order Service)           Service B (Payment Service)
─────────────────────────           ──────────────────────────
 
Trace ID: aaaa-aaaa-aaaa            Trace ID: aaaa-aaaa-aaaa  ← SAME trace!
Span: "ProcessOrder"                
│                                   
├─ Span: "ValidateOrder"            
├─ Span: "CalculateTax"             
└─ Span: "HTTP POST /payment"       
         │                          
         └───── HTTP Call ────────▸  Span: "ProcessPayment" (child span!)
                                    │ parentSpanId: "HTTP POST /payment"
         (Headers include:)         │
         traceparent: 00-aaaa-..    ├─ Span: "ValidateCard"
                                    └─ Span: "ChargeCard"
 
Result: ONE unified trace showing complete request flow
        Clear parent-child relationship
        Full visibility into end-to-end latency

The difference is simple but profound: with propagation, you have one story; without it, you have scattered fragments.

The Silent Failure Mode

Broken propagation is insidious because it fails silently. You still get traces—they're just incomplete. Services still report spans—they just don't connect. Unless you specifically look for propagation gaps, you might not notice the problem until an incident forces you to trace a request across services.

W3C Trace Context: The Standard

Before standardization, each tracing system used its own propagation format: Zipkin used B3 headers, Jaeger used uber-trace-id, AWS X-Ray used X-Amzn-Trace-Id. This fragmentation made it impossible to trace requests across systems using different vendors.

The W3C Trace Context specification solved this by defining a universal format that all tracing systems can support.

W3C Trace Context Headers

•traceparent — The core header containing trace-id, parent-id, and trace flags (REQUIRED)
•tracestate — Vendor-specific data that accompanies the trace (OPTIONAL)

The traceparent Header Format:

The traceparent header has a precise format:

traceparent Format Breakdown

Format

traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
 
Example:
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
 
Breakdown:
┌────┬──────────────────────────────────┬──────────────────┬──────────┐
│ 00 │ 0af7651916cd43dd8448eb211c80319c │ b7ad6b7169203331 │    01    │
├────┼──────────────────────────────────┼──────────────────┼──────────┤
│ ^  │ ^                                │ ^                │ ^        │
│ │  │ │                                │ │                │ │        │
│ │  │ │                                │ │                │ └─ Trace flags
│ │  │ │                                │ │                │    01 = sampled
│ │  │ │                                │ │                │    00 = not sampled
│ │  │ │                                │ │                │          
│ │  │ │                                │ └─ Parent span ID (16 hex chars = 8 bytes)
│ │  │ │                                │    The span ID of the immediate parent
│ │  │ │                                │    
│ │  │ └─ Trace ID (32 hex chars = 16 bytes)
│ │  │    Unique identifier for the entire trace
│ │  │    Generated once at trace start, never changes
│ │  │    
│ └──└─ Version (always "00" for current spec)
│       Reserved for future format changes
 
Total header size: 55 characters (very lightweight!)

The tracestate Header:

The tracestate header allows vendors to attach their own data while maintaining interoperability:

tracestate Examples
HTTP
# Simple vendor-specific state
tracestate: vendorname=vendorvalue
 
# Multiple vendors (each can add their own)
tracestate: rojo=00f067aa0ba902b7,congo=t61rcWkgMzE
 
# Real-world examples:
tracestate: dd=s:1;t.dm:-0  # Datadog sampling decision
tracestate: nr=0-0-1234567-abcdef123  # New Relic trace state
tracestate: @aws-xray=Root=1-5f84c7a7-abc123def456

W3C Trace Context Adoption

W3C Trace Context is now the industry standard. OpenTelemetry uses it by default. Major cloud providers (AWS, Azure, GCP) support it. Modern tracing backends (Jaeger, Zipkin, Tempo) all understand it. If you're starting fresh, use W3C Trace Context. If you have legacy systems, most tracing libraries support both W3C and legacy formats during migration.

HTTP Context Propagation in Practice

HTTP is the most common transport in microservices architectures, and HTTP header propagation is the most common propagation mechanism. Let's see how this works in practice.

Context Propagation Flow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// SERVICE A: Making an outgoing HTTP call
 
import { context, trace, SpanKind } from '@opentelemetry/api';
import { W3CTraceContextPropagator } from '@opentelemetry/core';
 
const propagator = new W3CTraceContextPropagator();
 
async function callPaymentService(orderId: string, amount: number) {
  const tracer = trace.getTracer('order-service');
  
  // Start a CLIENT span for the outgoing call
  return tracer.startActiveSpan(
    'PaymentService.processPayment',
    { kind: SpanKind.CLIENT },
    async (span) => {
      try {
        // Prepare headers object
        const headers: Record<string, string> = {
          'Content-Type': 'application/json',
        };
        
        // INJECT: Add trace context to outgoing headers
        // This adds the 'traceparent' and 'tracestate' headers
        propagator.inject(context.active(), headers, {
          set: (carrier, key, value) => {
            carrier[key] = value;
          },
        });
        
        // Headers now contain:
        // {
        //   'Content-Type': 'application/json',
        //   'traceparent': '00-abc123...-def456...-01',
        //   'tracestate': ''
        // }
        
        const response = await fetch('http://payment-service/api/process', {
          method: 'POST',
          headers,
          body: JSON.stringify({ orderId, amount }),
        });
        
        span.setStatus({ code: SpanStatusCode.OK });
        return response.json();
      } catch (error) {
        span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
        throw error;
      } finally {
        span.end();
      }
    }
  );
}

The Inject/Extract Pattern:

Context propagation follows a consistent pattern across all transports:

1. INJECT (Sender Side)

Get the current trace context from the active span
Serialize it into the transport format (HTTP headers, message attributes, etc.)
Attach it to the outgoing request

2. EXTRACT (Receiver Side)

Read the serialized context from the incoming request
Deserialize it back into a trace context object
Use that context as the parent for new spans

Most tracing libraries handle this automatically through middleware and instrumentation, but understanding the underlying mechanism helps when debugging propagation issues.

Automatic Instrumentation

In practice, you rarely write inject/extract code manually. OpenTelemetry and other tracing libraries provide automatic instrumentation for popular HTTP libraries (Express, Flask, Spring, axios, requests, etc.) that handles propagation transparently. The code examples above show what happens under the hood.

Propagation Through Message Queues

Message queues present a unique challenge for context propagation. Unlike synchronous HTTP calls where request and response are tightly coupled, message queues introduce:

Temporal decoupling: Producer and consumer execute at different times
Spatial decoupling: Producer and consumer may be different service instances
Multiplicity: One message might be consumed multiple times (retries) or by multiple consumers

Despite these challenges, maintaining trace context through message queues is essential for understanding asynchronous workflows.

Context Propagation in Popular Message Systems
System	Where to Store Context	Key Considerations
Apache Kafka	Record headers (key-value pairs)	Headers survive serialization; use string encoding for trace context
RabbitMQ	Message properties/headers	AMQP headers support arbitrary key-value pairs
AWS SQS	Message attributes	Limited to 10 attributes; reserve space for trace context
AWS SNS	Message attributes	Attributes propagate to subscribed SQS queues
Google Pub/Sub	Message attributes	Attribute values are base64 strings
Azure Service Bus	Application properties	Custom properties support trace context

Kafka Producer with Context Propagation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// PRODUCER: Injecting context into Kafka message
 
import { context, trace, SpanKind } from '@opentelemetry/api';
import { Kafka, RecordHeaders } from 'kafkajs';
import { W3CTraceContextPropagator } from '@opentelemetry/core';
 
const propagator = new W3CTraceContextPropagator();
const kafka = new Kafka({ brokers: ['kafka:9092'] });
const producer = kafka.producer();
const tracer = trace.getTracer('order-service');
 
async function publishOrderEvent(order: Order) {
  return tracer.startActiveSpan(
    'kafka.produce order.created',
    { kind: SpanKind.PRODUCER },  // PRODUCER span kind for async messaging
    async (span) => {
      try {
        // Create headers from trace context
        const headers: RecordHeaders = {};
        
        propagator.inject(context.active(), headers, {
          set: (carrier, key, value) => {
            // Kafka requires Buffer for header values
            carrier[key] = Buffer.from(value);
          },
        });
        
        // Add span attributes
        span.setAttributes({
          'messaging.system': 'kafka',
          'messaging.destination': 'order.created',
          'messaging.destination_kind': 'topic',
          'order.id': order.id,
        });
        
        await producer.send({
          topic: 'order.created',
          messages: [{
            key: order.id,
            value: JSON.stringify(order),
            headers,  // Headers contain traceparent and tracestate
          }],
        });
        
        span.setStatus({ code: SpanStatusCode.OK });
      } finally {
        span.end();
      }
    }
  );
}

Links vs Parent-Child for Async

For message queues, you have a choice: treat the consumer span as a child of the producer span (same trace), or start a new trace and use 'links' to reference the producer. Child relationships work well for short queues. Links work better when messages sit in queues for hours/days, as they avoid extremely long-duration traces.

Thread and Async Context Propagation

Within a single service, trace context must also propagate across threads, async operations, and scheduled tasks. This is actually where many propagation issues occur—not at service boundaries, but within a single codebase.

Common In-Process Propagation Failures

•Thread Pool Submission: Spawning work to a thread pool loses context unless explicitly passed
•Async/Await Boundaries: Some runtimes lose context across await points without special handling
•Callbacks: Callback functions don't automatically capture the context from when they were created
•Reactive Streams: Operators in reactive pipelines need explicit context propagation
•Scheduled Tasks: Timers and cron jobs start without any parent context

Context Loss and Recovery Patterns
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// PROBLEM: Thread pool loses context
 
ExecutorService executor = Executors.newFixedThreadPool(10);
 
public void processOrder(Order order) {
    Span currentSpan = Span.current();  // Has context here
    
    // BAD: Context is lost when work runs on pool thread
    executor.submit(() -> {
        // Span.current() returns a no-op span here!
        validateOrder(order);  // This span has no parent
    });
}
 
// SOLUTION: Wrap with context propagation
 
import io.opentelemetry.context.Context;
 
public void processOrderFixed(Order order) {
    Context currentContext = Context.current();  // Capture context
    
    // GOOD: Make context current in the new thread
    executor.submit(currentContext.wrap(() -> {
        // Context.current() now returns the captured context
        // Span.current() works correctly
        validateOrder(order);  // This span IS connected
    }));
}
 
// EVEN BETTER: Use context-aware executor
ExecutorService tracedExecutor = Context.taskWrapping(
    Executors.newFixedThreadPool(10)
);
 
// Now all submitted tasks automatically carry context
tracedExecutor.submit(() -> {
    // Context automatically propagated
    validateOrder(order);
});

Context Propagation is Language-Specific

Every language/runtime handles async context differently. Java uses ThreadLocal with explicit wrapping. Node.js uses AsyncLocalStorage. Go uses context.Context passed explicitly. Python uses contextvars. Always use your tracing library's recommended patterns for your language—don't try to invent your own context propagation.

Propagation Across Technology Boundaries

Real-world systems often mix technologies: HTTP services calling gRPC services, REST APIs publishing to Kafka, Lambda functions receiving from SQS. Each technology boundary requires appropriate propagation handling.

Propagation Mechanisms by Protocol
Protocol/Transport	Standard Header/Field	Format
HTTP/1.1 & HTTP/2	traceparent, tracestate headers	W3C Trace Context
gRPC	grpc-trace-bin metadata (binary) or W3C headers	Binary or W3C
GraphQL	HTTP headers (GraphQL over HTTP)	W3C Trace Context
WebSocket	HTTP headers in handshake + custom frames	App-specific
TCP (raw)	Application-defined framing	App-specific
Database Queries	Query comments or session attributes	Vendor-specific

Special Case: Database Query Tracing

Database calls need special handling. You can't add HTTP headers to a SQL query. Options include:

Database Context Propagation Options
SQL
-- Option 1: SQL Comments (SQLCommenter pattern)
-- Trace context embedded in query comment (parsed by database for tracing)
 
SELECT * FROM orders WHERE user_id = 123
/*traceparent='00-abc123-def456-01',
  db.name='ecommerce',
  action='getOrders',
  controller='OrderController'*/
 
-- Pros: Works with any database, visible in slow query logs
-- Cons: Adds overhead, requires parsing on database side
 
-- Option 2: Session Variables (PostgreSQL example)
 
SET application_name = 'order-service';
SET myapp.trace_id = '00-abc123-def456-01';
SELECT * FROM orders WHERE user_id = 123;
 
-- Trace info now visible in pg_stat_activity
-- Can be logged by database for correlation
 
-- Option 3: Distributed tracing database extensions
-- Some databases have native tracing support:
-- - PostgreSQL: pg_stat_statements + OpenTelemetry exporter
-- - MySQL: Performance Schema integration
-- - MongoDB: Profiler with trace correlation

Special Case: Serverless Functions

Serverless environments (AWS Lambda, Azure Functions, Google Cloud Functions) require unique handling because:

Cold starts: The function might initialize with no parent context
Event sources: Each trigger type (HTTP, SQS, S3, etc.) has different context formats
Platform integration: Cloud providers have their own tracing (X-Ray, Cloud Trace) that may need bridging

Lambda Context Extraction
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// AWS Lambda with multiple trigger types
 
import { APIGatewayProxyEvent, SQSEvent, Context } from 'aws-lambda';
import { context as otelContext, trace } from '@opentelemetry/api';
import { W3CTraceContextPropagator } from '@opentelemetry/core';
 
const propagator = new W3CTraceContextPropagator();
 
// HTTP Trigger - extract from headers
export function handleHttpTrigger(event: APIGatewayProxyEvent) {
  const ctx = propagator.extract(
    otelContext.active(),
    event.headers,
    { get: (h, k) => h[k] || h[k.toLowerCase()] }
  );
  
  return otelContext.with(ctx, () => {
    // Process with context from HTTP caller
  });
}
 
// SQS Trigger - extract from message attributes
export function handleSqsTrigger(event: SQSEvent) {
  for (const record of event.Records) {
    const attributes = record.messageAttributes;
    
    // Build headers-like object from SQS attributes
    const headers: Record<string, string> = {};
    if (attributes.traceparent) {
      headers.traceparent = attributes.traceparent.stringValue!;
    }
    if (attributes.tracestate) {
      headers.tracestate = attributes.tracestate.stringValue!;
    }
    
    const ctx = propagator.extract(otelContext.active(), headers, {
      get: (h, k) => h[k],
    });
    
    otelContext.with(ctx, () => {
      // Process with context from message producer
    });
  }
}

Cloud Provider Native Tracing

AWS X-Ray, Google Cloud Trace, and Azure Application Insights all have native trace context propagation. When using these services, understand how they interact with W3C Trace Context and OpenTelemetry. Most now support W3C format, making interoperability possible.

Debugging Propagation Failures

When traces are fragmented, you need systematic approaches to identify where propagation is breaking.

Propagation Debugging Checklist

•Check for orphaned root spans: Query your trace backend for root spans from services that should only have child spans. Each orphaned root indicates a propagation break.
•Verify header forwarding: Use a request interceptor or proxy to log all headers at each hop. Confirm traceparent is present and unchanged.
•Check middleware order: In HTTP servers, tracing middleware must run before request handlers. Incorrect ordering loses context.
•Validate library versions: Old tracing library versions may not support W3C format. Ensure all services use compatible versions.
•Look for header stripping: Load balancers, API gateways, and WAFs sometimes strip unknown headers. Whitelist trace headers.
•Audit custom HTTP clients: Automatic instrumentation only works for known libraries. Custom HTTP implementations need manual instrumentation.
•Check async boundaries: Add logging around thread pool submissions and async operations to verify context is preserved.

Propagation Debugging Techniques

# PromQL: Find services creating too many root spans
# (indicates propagation is breaking at those services)
 
sum by (service_name) (
  rate(trace_span_started_total{parent_span_id=""}[5m])
) / 
sum by (service_name) (
  rate(trace_span_started_total[5m])
) > 0.5
 
# This shows services where >50% of spans are root spans
# Most internal services should have very few root spans

Common Culprits

The most common propagation breakers: (1) API gateways/proxies that don't forward headers, (2) Services not using instrumented HTTP clients, (3) Thread pool usage without context wrapping, (4) Legacy code using old tracing libraries, (5) Message queue consumers not extracting context, (6) WebSocket connections where initial handshake isn't propagated.

Summary: Context Propagation

We've thoroughly explored context propagation—the critical mechanism that makes distributed tracing possible. Let's consolidate:

Key Takeaways

•Context propagation is the foundation of distributed tracing — Without it, you have scattered spans, not unified traces
•W3C Trace Context is the standard — Use traceparent and tracestate headers for interoperability across vendors
•Inject on send, extract on receive — This pattern applies to HTTP, gRPC, message queues, and any transport
•In-process propagation matters too — Thread pools, async operations, and callbacks can lose context without proper handling
•Different technologies need different approaches — HTTP uses headers, Kafka uses record headers, databases use comments or session variables
•Broken propagation fails silently — You get traces, just incomplete ones. Proactively audit for orphaned root spans
•Use automatic instrumentation where possible — Let libraries handle propagation; manual propagation is error-prone

What's Next:

Now that we understand how traces flow through systems, we need to look at the tools that collect, store, and visualize them. The next page covers Jaeger and Zipkin—the two most popular open-source distributed tracing systems. You'll learn their architectures, how to deploy them, and how to choose between them.

Page Complete

You now have deep expertise in context propagation—the invisible thread that ties distributed traces together. This knowledge is essential for implementing tracing correctly and diagnosing issues when traces are incomplete. Every fragmented trace is a propagation problem waiting to be solved.