Logging in LLD - Learning Module

Loading content...

0/246

Structured Logging

Beyond Free-Form Text

Traditional logging produces output like this:

2024-01-15 10:23:45 INFO OrderService - Processing order 12345 for user john@example.com, total $149.99
2024-01-15 10:23:46 ERROR PaymentProcessor - Payment failed for order 12345: card declined (insufficient funds)

This looks readable to humans, but it's a nightmare for machines. Want to find all orders over $100? You need regex. Want to correlate by user email? More regex. Want to count payment failures by decline reason? Even more complex pattern matching.

Now consider the same information as structured logs:

{"timestamp":"2024-01-15T10:23:45.847Z","level":"INFO","event":"ORDER_PROCESSING_STARTED","orderId":"12345","userId":"usr_abc123","userEmail":"john@example.com","totalAmount":149.99,"currency":"USD"}
{"timestamp":"2024-01-15T10:23:46.234Z","level":"ERROR","event":"PAYMENT_FAILED","orderId":"12345","errorCode":"card_declined","declineReason":"insufficient_funds"}

Now you can query: event:PAYMENT_FAILED AND declineReason:insufficient_funds to find all insufficient funds declines. You can aggregate by declineReason to see failure distributions. You can join on orderId to trace complete order flows. The structure enables analysis that was previously impossible.

What You Will Learn

By the end of this page, you will understand why structured logging is essential at scale, how to design effective log schemas, implement structured logging in major languages, and avoid common pitfalls that undermine the benefits.

Why Structured Logging?

Structured logging is the practice of outputting log entries as structured data—typically JSON—rather than free-form text strings. This seemingly simple change has profound implications for how you can use your logs.

Unstructured Log Problems

•Parsing is fragile — Format changes break extraction; different developers format differently
•Search is limited — Can't distinguish 'error' in the message vs. error level
•Aggregation is expensive — Counting by field requires complex regex processing
•Evolution is painful — Adding new fields requires updating all parsers
•Tools struggle — Log management tools can't index effectively

Structured Log Benefits

•Reliable parsing — JSON parsers don't break; fields are explicit
•Powerful search — Query specific fields: errorCode:TIMEOUT AND service:payment
•Efficient aggregation — Group by any field instantly: count by errorCode
•Schema evolution — Add fields without breaking existing queries
•Tool integration — Elasticsearch, Datadog, Splunk, etc. work natively with structure

Structured vs Unstructured: Query Comparison
Task	Unstructured Approach	Structured Approach
Find all errors for user X	grep + regex extraction + hope format is consistent	`level:ERROR AND userId:X`
Count orders by status	Parse logs, extract status field, aggregate	`event:ORDER_* \| stats count by status`
Calculate p99 latency	Parse duration from log text, convert units, calculate	`durationMs:* \| percentile(durationMs, 99)`
Find slow database queries	grep 'database' + parse duration + compare threshold	`event:DATABASE_QUERY AND durationMs:>100`
Correlate request to response	Match request ID across logs manually	`correlationId:abc123 \| sort timestamp`

The Investment Pays Off at Scale

For a small application, unstructured logs might suffice. But as soon as you have multiple services, high volume, or need to investigate complex issues, structured logging becomes essential. Start structured from day one—retrofitting is painful.

JSON as the Standard Format

While structured logging can use various formats (logfmt, XML, etc.), JSON has become the de facto standard for several reasons:

Universal tooling support — Every language has JSON libraries; every log platform understands JSON
Hierarchical structure — Objects and arrays allow nested, complex data
Schema flexibility — No predefined schema required; fields can vary by log entry
Human readable — Developers can read JSON (reasonably) when necessary
Ecosystem — jq, Elasticsearch, Datadog, Splunk—all work natively with JSON

JsonLogFormats.json
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Each log entry is a single line of JSON (JSON Lines format)
// Basic structure: timestamp, level, event, then context-specific fields
 
// Application startup
{"timestamp":"2024-01-15T10:00:00.000Z","level":"INFO","event":"SERVICE_STARTED","service":"order-service","version":"2.4.1","environment":"production","instanceId":"i-0abc123","startupTimeMs":3247}
 
// Business operation
{"timestamp":"2024-01-15T10:00:15.123Z","level":"INFO","event":"ORDER_CREATED","correlationId":"req-xyz-789","userId":"usr-abc-123","orderId":"ord-456","itemCount":3,"totalAmount":149.99,"currency":"USD","source":"web"}
 
// External call with timing
{"timestamp":"2024-01-15T10:00:15.347Z","level":"INFO","event":"PAYMENT_PROCESSED","correlationId":"req-xyz-789","orderId":"ord-456","gateway":"stripe","transactionId":"txn-stripe-999","amount":149.99,"durationMs":224,"responseCode":"approved"}
 
// Error with full context
{"timestamp":"2024-01-15T10:05:23.847Z","level":"ERROR","event":"PAYMENT_FAILED","correlationId":"req-abc-456","orderId":"ord-789","gateway":"stripe","amount":299.99,"errorCode":"card_declined","declineReason":"insufficient_funds","cardLast4":"4242","retryable":false,"stack":"at PaymentProcessor.charge (payment.js:47)\n  at OrderService.process (order.js:123)"}
 
// Nested structure for complex data
{"timestamp":"2024-01-15T10:10:00.000Z","level":"INFO","event":"BATCH_COMPLETED","batchId":"batch-123","stats":{"processed":1247,"failed":3,"skipped":12},"timing":{"totalMs":45230,"avgPerItemMs":36},"failures":[{"itemId":"item-1","error":"validation_failed"},{"itemId":"item-2","error":"duplicate_key"},{"itemId":"item-3","error":"timeout"}]}

JSON Lines Format:

Notice that each log entry is a single line of JSON. This is the JSON Lines (jsonl) format, and it's critical for log processing:

Streaming — Logs can be processed line by line without buffering entire files
Appending — New entries are simply appended to the file
Parsing — Each line is independently parseable; one corrupted line doesn't break the whole file
Tooling — jq, grep, and log aggregators expect single-line JSON

Never output pretty-printed (multi-line) JSON in production logs.

One Line Per Entry

Multi-line JSON breaks log processing pipelines. Newlines within string values must be escaped (\n). Stack traces should be a single escaped string field, not multiple log lines.

Designing a Log Schema

While JSON is flexible, consistency is crucial. A well-designed log schema ensures that logs across your organization can be queried, aggregated, and analyzed together.

Core Schema Fields (Always Present)

•timestamp — ISO 8601 format with timezone (preferably UTC): 2024-01-15T10:23:45.847Z
•level — Standard level: TRACE, DEBUG, INFO, WARN, ERROR, FATAL
•event — Event type identifier: SCREAMING_SNAKE_CASE (ORDER_CREATED, PAYMENT_FAILED)
•service — Service name that produced the log
•version — Service version (enables deployment correlation)

Recommended Additional Fields

•correlationId — Request/trace identifier for correlation
•environment — prod, staging, dev
•hostname / instanceId — Specific instance identification
•userId — Actor performing the operation (when applicable)
•durationMs — Operation duration in milliseconds (when applicable)
•error — Error details object (when level is ERROR)

LogSchema.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
/**
 * Core log schema defining required and optional fields.
 * Use this as a contract across all services.
 */
 
// Core fields present in every log entry
interface LogEntryCore {
    // ISO 8601 timestamp with timezone
    timestamp: string;
    
    // Standard log level
    level: 'TRACE' | 'DEBUG' | 'INFO' | 'WARN' | 'ERROR' | 'FATAL';
    
    // Event type identifier - SCREAMING_SNAKE_CASE
    event: string;
    
    // Service that produced the log
    service: string;
    
    // Service version (semantic version)
    version: string;
}
 
// Contextual fields (recommend for all logs)
interface LogContext {
    // Request correlation ID
    correlationId?: string;
    
    // Deployment environment
    environment?: 'production' | 'staging' | 'development';
    
    // Instance identification
    hostname?: string;
    instanceId?: string;
    
    // Actor information
    userId?: string;
    sessionId?: string;
}
 
// Timing information (for operations)
interface LogTiming {
    // Operation duration in milliseconds
    durationMs?: number;
    
    // Specific phase timings
    phases?: Record<string, number>;
}
 
// Error information (for ERROR/FATAL levels)
interface LogError {
    error?: {
        message: string;
        type: string;
        code?: string;
        stack?: string;
        cause?: string;
    };
}
 
// Complete log entry type
type LogEntry = LogEntryCore & LogContext & LogTiming & LogError & {
    // Additional event-specific fields
    [key: string]: unknown;
};
 
// Example usage with type safety
const logEntry: LogEntry = {
    timestamp: new Date().toISOString(),
    level: 'INFO',
    event: 'ORDER_CREATED',
    service: 'order-service',
    version: '2.4.1',
    correlationId: 'req-abc-123',
    environment: 'production',
    userId: 'usr-xyz-789',
    orderId: 'ord-456',
    totalAmount: 149.99,
    itemCount: 3,
    durationMs: 47
};

Schema Enforcement

Consider using a shared logging library across services that enforces the core schema. This prevents drift and ensures queryability. The library can add common fields automatically (timestamp, service, version, hostname).

Field Naming Conventions

Consistent naming conventions make logs queryable and understandable across teams. Establish standards early and enforce them through shared libraries and code review.

Recommended Naming Conventions
Category	Convention	Examples
Field names	camelCase	`userId`, `orderId`, `totalAmount`, `durationMs`
Event names	SCREAMING_SNAKE_CASE	`ORDER_CREATED`, `PAYMENT_FAILED`, `USER_LOGIN`
Boolean fields	Positive phrasing, no `is` prefix in JSON	`enabled`, `retryable`, `success` (not `isEnabled`)
IDs	Include type in name	`userId`, `orderId`, `transactionId` (not just `id`)
Durations	Include unit in name	`durationMs`, `timeoutSeconds`, `cacheTtlMinutes`
Counts	Include 'Count' suffix	`itemCount`, `retryCount`, `errorCount`
Amounts	Include currency separately	`amount: 149.99, currency: "USD"`

Naming Anti-Patterns to Avoid

•Ambiguous names — id (which ID?), value (what value?), data (what data?)
•Inconsistent casing — Mixing user_id, userId, UserId across services
•Abbreviations — amt, qty, msg — use full words for clarity
•Units without labels — timeout: 30 (seconds? milliseconds? minutes?)
•Negative booleans — notFound, disabled — harder to reason about

Bad
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "ts": "2024-01-15T10:00:00Z",
  "lvl": "err",
  "msg": "failed",
  "id": "123",
  "amt": 149.99,
  "dur": 250,
  "is_success": false,
  "retry_count": 3
}
 
// Problems:
// - Abbreviations (ts, lvl, amt, dur)
// - Ambiguous id
// - Missing units on dur
// - Inconsistent casing
// - Negative boolean phrasing

Good
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
  "timestamp": "2024-01-15T10:00:00Z",
  "level": "ERROR",
  "event": "PAYMENT_FAILED",
  "orderId": "ord-123",
  "amount": 149.99,
  "currency": "USD",
  "durationMs": 250,
  "success": false,
  "retryCount": 3
}
 
// Clear:
// - Full field names
// - Typed ID (orderId)
// - Unit in name (durationMs)
// - camelCase consistency
// - Positive boolean

Implementation in Major Languages

Most modern logging frameworks support structured output natively or via configuration. Here's how to implement structured logging in common languages:

StructuredLoggingJava.java
Java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// Using SLF4J with Logback and logstash-logback-encoder
 
// 1. Add dependency: net.logstash.logback:logstash-logback-encoder
 
// 2. Configure logback.xml for JSON output
/*
<configuration>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <includeMdcKeyName>correlationId</includeMdcKeyName>
            <includeMdcKeyName>userId</includeMdcKeyName>
            <customFields>{"service":"order-service","version":"2.4.1"}</customFields>
        </encoder>
    </appender>
    <root level="INFO">
        <appender-ref ref="STDOUT" />
    </root>
</configuration>
*/
 
// 3. Use structured logging in code
import net.logstash.logback.argument.StructuredArguments;
import static net.logstash.logback.argument.StructuredArguments.*;
 
public class OrderService {
    private static final Logger logger = LoggerFactory.getLogger(OrderService.class);
    
    public Order createOrder(CreateOrderRequest request) {
        // Set MDC context (included in all logs)
        MDC.put("correlationId", request.getCorrelationId());
        MDC.put("userId", request.getUserId());
        
        try {
            long startTime = System.currentTimeMillis();
            
            Order order = processOrder(request);
            
            // Structured logging with key-value pairs
            logger.info("ORDER_CREATED",
                kv("event", "ORDER_CREATED"),
                kv("orderId", order.getId()),
                kv("itemCount", order.getItems().size()),
                kv("totalAmount", order.getTotalAmount()),
                kv("currency", order.getCurrency()),
                kv("durationMs", System.currentTimeMillis() - startTime));
            
            return order;
            
        } catch (Exception e) {
            logger.error("ORDER_CREATION_FAILED",
                kv("event", "ORDER_CREATION_FAILED"),
                kv("errorCode", e instanceof BusinessException ? 
                    ((BusinessException) e).getCode() : "UNKNOWN"),
                kv("errorMessage", e.getMessage()),
                e);
            throw e;
        } finally {
            MDC.clear();
        }
    }
}
 
/**
 * Output:
 * {"@timestamp":"2024-01-15T10:00:00.000Z","level":"INFO","logger":"OrderService",
 *  "message":"ORDER_CREATED","correlationId":"req-123","userId":"usr-456",
 *  "service":"order-service","version":"2.4.1","event":"ORDER_CREATED",
 *  "orderId":"ord-789","itemCount":3,"totalAmount":149.99,"currency":"USD",
 *  "durationMs":47}
 */

Handling Nested and Complex Data

Structured logging can represent complex, nested data structures. This is powerful but requires thoughtful design to maintain queryability.

NestedLogging.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Nested objects for related data
logger.info({
    event: 'BATCH_PROCESSING_COMPLETE',
    batchId: 'batch-123',
    // Nested object for aggregate stats
    stats: {
        processed: 1247,
        failed: 3,
        skipped: 12,
        successRate: 99.76
    },
    // Nested object for timing breakdown
    timing: {
        totalMs: 45230,
        avgPerItemMs: 36,
        p99Ms: 127,
        phases: {
            fetchMs: 5000,
            transformMs: 35000,
            persistMs: 5230
        }
    },
    // Array of failure details (keep small!)
    failures: [
        { itemId: 'item-1', error: 'validation_failed', field: 'email' },
        { itemId: 'item-2', error: 'duplicate_key', key: 'user-123' },
        { itemId: 'item-3', error: 'timeout', operation: 'external_api' }
    ]
});
 
// Output enables queries like:
// - stats.successRate:>95
// - timing.phases.transformMs:>30000
// - failures.error:timeout
 
// ⚠️ CAUTION: Keep nested structures shallow
// Deep nesting (>3 levels) becomes hard to query
 
// ❌ BAD: Deeply nested, hard to query
logger.info({
    event: 'ORDER_PLACED',
    order: {
        customer: {
            address: {
                shipping: {
                    city: 'Seattle' // 4 levels deep - hard to query!
                }
            }
        }
    }
});
 
// ✅ GOOD: Flattened for queryability
logger.info({
    event: 'ORDER_PLACED',
    orderId: 'ord-123',
    customerId: 'cust-456',
    shippingCity: 'Seattle',
    shippingState: 'WA',
    shippingCountry: 'US'
});

Flatten for Query, Nest for Organization

Use nesting to organize related fields logically (stats, timing, error), but avoid deep hierarchies. For fields you'll query frequently, consider flattening to top level with clear prefixes: shippingCity instead of address.shipping.city.

Common Pitfalls in Structured Logging

Even with structured logging, there are ways to undermine its benefits. Watch out for these common mistakes:

Structured Logging Pitfalls

•Concatenating values into message — event: 'Order ' + orderId + ' created' defeats the purpose. Use event: 'ORDER_CREATED', orderId: orderId
•Inconsistent types — Sometimes userId: '123', sometimes userId: 123. Pick one (string for IDs) and stick with it.
•Unbounded arrays — Logging 10,000 item IDs in an array. Log counts instead; write details to separate debug logs.
•Massive objects — Logging entire request/response bodies. Extract relevant fields; log body hash + size if needed for correlation.
•Multi-line JSON — Pretty printing in production. Always single-line JSON for log entries.
•Missing event name — Logging without a clear event identifier. Every log should have a searchable event name.
•Inconsistent field names — userId vs user_id vs UserId across services. Standardize and enforce.

Bad
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// ❌ String concatenation
logger.info({
    message: `Order ${orderId} created for user ${userId}`,
    amount: `$${amount}`
});
 
// ❌ Inconsistent types
logger.info({ userId: 123 });       // number
logger.info({ userId: '456' });     // string
 
// ❌ Unbounded array
logger.info({
    event: 'BATCH_COMPLETE',
    processedIds: allIds  // could be 10,000 items!
});
 
// ❌ Massive object dump
logger.info({
    event: 'REQUEST_RECEIVED',
    body: request.body  // could be megabytes
});

Good
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// ✅ Structured fields
logger.info({
    event: 'ORDER_CREATED',
    orderId: orderId,
    userId: userId,
    amount: amount,
    currency: 'USD'
});
 
// ✅ Consistent string type for IDs
logger.info({ userId: '123' });
logger.info({ userId: '456' });
 
// ✅ Bounded summary
logger.info({
    event: 'BATCH_COMPLETE',
    processedCount: allIds.length,
    sampleIds: allIds.slice(0, 5)
});
 
// ✅ Extract relevant fields
logger.info({
    event: 'REQUEST_RECEIVED',
    contentType: request.contentType,
    contentLength: request.body.length,
    bodyHash: hash(request.body)
});

Integration with Log Platforms

Structured logs shine when integrated with modern log management platforms. Here's how they enable powerful analysis:

Log Platform Capabilities with Structured Logs
Platform	Key Features	Query Example
Elasticsearch/OpenSearch	Full-text + structured search, aggregations	`event:PAYMENT_* AND durationMs:>1000`
Datadog	Live tail, pattern recognition, alerting	`@event:ORDER_CREATED @totalAmount:>100`
Splunk	SPL queries, dashboards, ML	`event=PAYMENT_FAILED \| stats count by declineReason`
Grafana Loki	Label-based, Prometheus-like	`{service="order-service"} \|= "ERROR"`
AWS CloudWatch	Insights queries, alarms	`fields event, durationMs \| filter level="ERROR"`

Dashboard and Alerting Capabilities:

With structured logs, you can build dashboards showing:

Error rates by error code over time
Latency percentiles by operation type
Request volume by endpoint
Failed payment breakdown by decline reason
User activity patterns

You can alert on:

Error rate exceeding threshold
p99 latency increasing
Specific error codes appearing
Unusual patterns detected by ML

None of this is practical with unstructured logs—you'd need complex regex that breaks every time the format changes.

Platform-Specific Field Limits

Log platforms have limits on unique field names and cardinality. Avoid creating fields with unbounded values (like logging every unique user ID as a field name). Use fields for structure, values for data.

Summary: Structured Logging

Let's consolidate the key insights about structured logging:

Key Takeaways

•Structure enables analysis — JSON logs can be queried, aggregated, and correlated in ways impossible with free-form text
•JSON Lines is the standard — Single-line JSON entries for reliable parsing and streaming
•Design a schema — Core fields (timestamp, level, event, service) plus contextual fields; enforce consistency
•Name fields consistently — camelCase, clear names, units included, avoid abbreviations
•All major languages support it — Logback, Pino, structlog, zerolog—use them
•Keep nesting shallow — Flatten frequently-queried fields; limit depth to 2-3 levels
•Avoid the pitfalls — No string concatenation, consistent types, bounded arrays, no massive dumps
•Platforms amplify value — Elasticsearch, Datadog, Splunk—structured logs unlock their full power

Module Complete:

You've now completed the comprehensive module on Logging in LLD. You understand:

Why logging matters — Observability, debugging, auditing, and operational insight
Logging levels — When to use TRACE, DEBUG, INFO, WARN, ERROR, and FATAL
What to log — Identifiers, context, business data, timing, errors, and what to avoid
Structured logging — JSON format, schema design, implementation, and integration

With this knowledge, you can design logging that transforms your system from a black box into an observable, debuggable, measurable platform.

Module Complete

Congratulations! You've mastered Logging in LLD. You can now design and implement logging strategies that provide comprehensive observability, enable rapid debugging, and support operational excellence. Next, explore Observability Design to learn about metrics, tracing, and building fully observable systems.