Loading learning content...
In the early days of computing, logging was simple: printf("Something happened at %s", timestamp). This worked when systems were monolithic, traffic was predictable, and a single engineer could mentally trace execution through log files. Those days are long gone.
Modern distributed systems generate petabytes of logs daily. Netflix produces over 400 billion logging events per day. Google's distributed systems emit logs from millions of machines simultaneously. At this scale, unstructured logs—human-readable text strings—become essentially useless.
Structured logging represents a fundamental paradigm shift: instead of logging text for humans to read, we log machine-parseable data structures that humans can query. This distinction enables everything modern observability depends on: full-text search, metric aggregation, anomaly detection, and cross-service correlation.
By the end of this page, you will understand why structured logging is non-negotiable for production systems. You'll master JSON logging formats, schema design patterns, field typing conventions, and parsing strategies that enable your logs to become a queryable source of truth rather than an unreadable stream of text.
To appreciate structured logging, we must first understand why unstructured logging fails. Consider a typical unstructured log line:
2024-01-15 14:32:07 ERROR Payment processing failed for user john@example.com, amount $99.99, reason: insufficient funds
This looks perfectly readable. An engineer can glance at it and understand what happened. But consider the challenges at scale:
user (\S+@\S+). But what if another developer logs for customer john@example.com? Your parser breaks. Every format variation requires parser updates.$99.99 the transaction amount or the account balance? Without explicit field names, meaning depends on position, which changes across log versions.SUM(amount) or GROUP BY reason when those values are embedded in prose.| Dimension | Unstructured Log | Structured Log (JSON) |
|---|---|---|
| Example | Payment failed for user john@example.com, amount $99.99 | {"event":"payment_failed","user":"john@example.com","amount":99.99} |
| Field Extraction | Regex parsing (fragile) | Direct key access (reliable) |
| Schema Evolution | Breaks parsers silently | Explicit field versioning |
| Aggregation | Manual text processing | Native database aggregations |
| Storage Efficiency | High (embedded labels) | Lower (compressed key-value) |
| Query Speed | O(n) text scanning | O(1) indexed field lookup |
Engineers often argue that unstructured logs are more "readable." This is only true when reading individual lines. At scale, no human reads logs line-by-line. We query, filter, and aggregate. Optimizing for line-by-line readability actively harms queryability—the property that actually matters.
JSON (JavaScript Object Notation) has become the de facto standard for structured logging. Its ubiquity stems from several properties that make it ideal for logging infrastructure:
123456789101112131415161718192021222324252627282930313233343536373839
{ "timestamp": "2024-01-15T14:32:07.123456Z", "level": "ERROR", "logger": "payment.processor", "message": "Payment processing failed", "service": { "name": "payment-service", "version": "2.14.3", "environment": "production", "instance_id": "payment-prod-us-east-1a-7f8c9d" }, "trace": { "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "00f067aa0ba902b7", "parent_span_id": "83d9c3e5a8d2f71c" }, "event": { "type": "payment_failure", "user_id": "usr_a1b2c3d4e5", "amount_cents": 9999, "currency": "USD", "failure_reason": "insufficient_funds", "payment_method": "credit_card", "card_last_four": "4242" }, "context": { "request_id": "req_x1y2z3", "session_id": "sess_m9n8o7", "user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0)", "client_ip": "192.168.1.100", "geo_country": "US" }, "error": { "type": "InsufficientFundsException", "message": "Account balance 45.00 insufficient for charge 99.99", "stack_trace": "at PaymentProcessor.charge(PaymentProcessor.java:142)..." }}Anatomy of a Production Log Entry
The example above demonstrates enterprise-grade structured logging. Let's analyze each section:
Core Fields (timestamp, level, logger, message): Universal across all log entries. The timestamp uses ISO 8601 with microsecond precision—critical for ordering events in distributed systems.
Service Metadata: Identifies where the log originated. In a microservices architecture with hundreds of services, this is essential for filtering and routing.
Distributed Tracing Context: The trace_id links this log to all other logs from the same request across services. This enables reconstructing full request flows.
Event Specifics: The event object contains domain-specific data. Note that amount_cents is an integer (not a float)—this prevents floating-point precision issues in financial calculations.
Request Context: Client information for security auditing, debugging, and analytics.
Error Details: Structured error information enables automated error categorization and alerting on specific exception types.
In production, each JSON log entry MUST be a single line—no pretty-printing. Log aggregators parse line-by-line; multi-line logs break parsing. Stack traces should be escaped newlines within a string field, not literal newlines in the JSON.
Structured logging without consistent schema design descends into chaos. If each team invents their own field names and structures, aggregation becomes impossible. Schema design establishes the contract between log producers and log consumers.
The Two-Level Schema Pattern
Production logging schemas typically follow a two-level pattern:
This pattern enables consistent querying across all logs while allowing flexibility for domain-specific data.
1234567891011121314151617
// Common Envelope (same for all logs){ "timestamp": "2024-01-15T14:32:07.123456Z", "level": "INFO", "service": "order-service", "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "00f067aa0ba902b7", // Domain Payload (varies by event type) "event_type": "order_placed", "payload": { "order_id": "ord_x1y2z3", "customer_id": "cust_a1b2c3", "total_cents": 15999, "items_count": 3 }}user_id, order_total. Camel case works too, but pick one and enforce it organization-wide.user_id, order_id, use user.id, order.id or event.user_id. Prevents collisions.schema_version: "1.2" when making breaking changes. Consumers can handle multiple versions.user_id in one service, don't call it userId, uid, or customer_id elsewhere.| Field | Type | Required | Description |
|---|---|---|---|
| timestamp | ISO 8601 string | Yes | When the event occurred (UTC, microsecond precision) |
| level | enum string | Yes | Severity: TRACE, DEBUG, INFO, WARN, ERROR, FATAL |
| logger | string | Yes | Fully qualified logger name (e.g., com.company.service.Class) |
| message | string | Yes | Human-readable summary (for rare manual inspection) |
| service | string | Yes | Originating service/application name |
| version | string | Recommended | Service version (semver) |
| environment | enum string | Yes | dev, staging, production |
| instance_id | string | Recommended | Unique identifier for the process/container |
| trace_id | string | Recommended | Distributed trace identifier (W3C format) |
| span_id | string | Recommended | Current span within the trace |
Teams that allow ad-hoc field naming discover, during their first production incident, that they cannot join logs across services. The payment-service logged customerId, auth-service logged userId, and order-service logged cust_id—all referring to the same entity. Schema governance prevents this.
Structured logging's power depends on consistent field typing. Log aggregation systems like Elasticsearch create indexes based on field types. If the same field contains a string in one log and a number in another, indexing fails catastrophically.
The Type Mapping Problem
Consider this scenario: Your payment service logs {"amount": 99.99} (float). Your refund service logs {"amount": "99.99"} (string). When Elasticsearch indexes the first log, it creates a float mapping. The second log's string value triggers a mapping conflict error—the log is rejected or the field is ignored.
This problem is insidious because it manifests at aggregation time, not at logging time. Your services run fine for months, then a dashboard breaks because 5% of logs have type mismatches.
| Data Category | Type | Example Field | Rationale |
|---|---|---|---|
| Identifiers | string | {"user_id": "usr_a1b2c3"} | IDs may contain letters/hyphens, should never be treated as numbers |
| Monetary Values | integer (cents) | {"amount_cents": 9999} | Avoids floating-point precision errors; divide by 100 for display |
| Timestamps | ISO 8601 string | {"created_at": "2024-01-15T14:32:07Z"} | Universal parsing, timezone-aware, sortable as string |
| Durations | integer (milliseconds) | {"latency_ms": 142} | Consistent unit; convert to seconds for display if needed |
| Booleans | boolean | {"is_premium": true} | Never use strings like "true" or "false" |
| Enumerations | string | {"status": "completed"} | Lowercase, underscored values from fixed set |
| Counts | integer | {"retry_count": 3} | Always non-negative integers |
| Bytes/Sizes | integer | {"payload_bytes": 1048576} | Use bytes as base unit; convert for display |
| Percentages | float (0-1) or integer (0-100) | {"cpu_usage": 0.75} | Document which convention you use; be consistent |
| IP Addresses | string | {"client_ip": "192.168.1.1"} | Enables CIDR queries in some systems |
{"order_id": 12345} breaks when IDs exceed JavaScript's safe integer range or when your ID format changes to include letters.{"success": "true"} requires string comparison instead of boolean logic in queries.null or undefined. Be consistent—some systems treat these differently.{"tags": ["premium", 42, true]} causes indexing issues. Array elements should share a type.{"amount": 99.99} can produce 99.98999999 due to floating-point representation. Use integer cents.{"time": 1705329127} loses timezone context and requires manual conversion. ISO 8601 is universally parseable.12345678910111213141516171819202122232425
interface LogEvent { // Common envelope - always present timestamp: string; // ISO 8601 level: 'TRACE' | 'DEBUG' | 'INFO' | 'WARN' | 'ERROR' | 'FATAL'; service: string; version: string; trace_id?: string; span_id?: string;} interface PaymentFailedEvent extends LogEvent { event_type: 'payment_failed'; payload: { user_id: string; // String, not number amount_cents: number; // Integer cents currency: string; // ISO 4217 code failure_reason: 'insufficient_funds' | 'card_declined' | 'fraud_suspected'; payment_method: 'credit_card' | 'bank_transfer' | 'crypto'; };} // Type system prevents logging wrong typesfunction logPaymentFailure(event: PaymentFailedEvent): void { console.log(JSON.stringify(event));}Don't rely on developers remembering type conventions. Use typed logging interfaces (TypeScript, Java generics, Protocol Buffers) that make it impossible to log wrong types. Catch type errors at compile time, not during a production incident.
Structured logs are only valuable if your infrastructure can parse and process them efficiently. This section covers the journey from application log emission to queryable storage.
The Log Processing Pipeline
Modern log processing follows a consistent architecture regardless of specific tooling:
1234567891011121314151617181920212223242526272829303132333435363738394041
[SERVICE] Flush 1 Log_Level info Parsers_File parsers.conf [INPUT] Name tail Path /var/log/containers/*.log Parser docker Tag kube.* Mem_Buf_Limit 50MB Skip_Long_Lines On [FILTER] Name parser Match kube.* Key_Name log Parser json Reserve_Data On [FILTER] Name kubernetes Match kube.* Kube_URL https://kubernetes.default.svc:443 Merge_Log On K8S-Logging.Parser On [FILTER] Name modify Match * # Add processing metadata Add pipeline_version 1.2.3 Add processed_at ${TIMESTAMP} [OUTPUT] Name es Match * Host elasticsearch.logging.svc Port 9200 Index logs-%Y.%m.%d Type _docOrganizations that emit unstructured logs and parse them centrally suffer from chronic pipeline instability. A single application changing log format can break parsing for all services. Invariably, someone asks: 'Why didn't we just log JSON from the start?' The answer is: you should have.
Structured logging introduces overhead compared to simple string logging. Understanding and mitigating this overhead is crucial for performance-sensitive applications.
Sources of Structured Logging Overhead
| Logging Approach | Ops/Second | Latency p99 | Memory Allocation |
|---|---|---|---|
| String concatenation | 2,500,000 | 0.8μs | 48 bytes/log |
| StringBuilder pattern | 3,100,000 | 0.6μs | 128 bytes/log |
| Naive JSON (allocating) | 450,000 | 4.2μs | 1,024 bytes/log |
| Optimized JSON (pooled) | 1,800,000 | 1.1μs | 96 bytes/log |
| Async JSON (buffered) | 2,400,000 | 0.9μs | 64 bytes/log (amortized) |
12345678910111213141516171819202122
// ANTI-PATTERN: Allocates on every log calllogger.info(new LogEvent() .setTimestamp(Instant.now().toString()) .setLevel("INFO") .setUserId(userId) .setAction("login") .toJson()); // OPTIMIZED: Reusable builders, lazy evaluationimport com.yourcompany.logging.StructuredLogger; logger.info(log -> log .event("user_login") .field("user_id", userId) .field("session_id", () -> expensiveSessionLookup()) // Only called if logging .field("user_agent", request::getUserAgent)); // Method reference, no lambda allocation // Implementation uses:// 1. Thread-local StringBuilder pool for JSON building// 2. Lazy field evaluation (suppliers only invoked if log level enabled)// 3. Async writes with bounded queue// 4. Pre-allocated field name constants to avoid String allocationlog.debug(() -> expensiveToString()) costs nothing in production if DEBUG is disabled.Asynchronous logging dramatically improves performance but introduces a risk: if the application crashes, logs still in the async queue are lost. For critical audit logs (financial transactions, security events), use synchronous writes. For debug/info logs, async is usually acceptable.
Adopting structured logging in existing applications requires integrating with your language's logging framework. Here are patterns for major ecosystems:
The Adapter Pattern
Most applications use a logging facade (SLF4J, Python logging, Winston) with pluggable backends. Structured logging is typically implemented as:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
import structlogimport logging # Configure structlog for JSON outputstructlog.configure( processors=[ structlog.stdlib.filter_by_level, structlog.stdlib.add_logger_name, structlog.stdlib.add_log_level, structlog.stdlib.PositionalArgumentsFormatter(), structlog.processors.TimeStamper(fmt="iso", utc=True), structlog.processors.StackInfoRenderer(), structlog.processors.format_exc_info, structlog.processors.UnicodeDecoder(), structlog.processors.JSONRenderer() # Output as JSON ], wrapper_class=structlog.stdlib.BoundLogger, context_class=dict, logger_factory=structlog.stdlib.LoggerFactory(), cache_logger_on_first_use=True,) # Usage in application codelogger = structlog.get_logger() # Bind context that persists across log callslogger = logger.bind( service="payment-service", version="2.14.3", environment="production") # In request handler - bind request-specific contextlogger = logger.bind( trace_id=request.headers.get("X-Trace-Id"), user_id=authenticated_user.id, request_id=generate_request_id()) # Log with event-specific fieldslogger.info( "payment_processed", amount_cents=9999, currency="USD", payment_method="credit_card", latency_ms=142) # Output:# {"timestamp":"2024-01-15T14:32:07.123456Z","level":"info","logger":"payment.handler",# "service":"payment-service","version":"2.14.3","environment":"production",# "trace_id":"4bf92f...","user_id":"usr_a1b2c3","request_id":"req_x1y2z3",# "event":"payment_processed","amount_cents":9999,"currency":"USD",...}123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
<!-- logback.xml configuration --><configuration> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"> <encoder class="net.logstash.logback.encoder.LogstashEncoder"> <includeMdcKeyName>trace_id</includeMdcKeyName> <includeMdcKeyName>span_id</includeMdcKeyName> <includeMdcKeyName>user_id</includeMdcKeyName> <customFields> {"service":"payment-service","version":"2.14.3"} </customFields> </encoder> </appender> <root level="INFO"> <appender-ref ref="STDOUT"/> </root></configuration> // Application codeimport org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.slf4j.MDC;import net.logstash.logback.marker.Markers; public class PaymentService { private static final Logger logger = LoggerFactory.getLogger(PaymentService.class); public void processPayment(PaymentRequest request) { // Set request context in MDC (automatically included in all logs) MDC.put("trace_id", request.getTraceId()); MDC.put("user_id", request.getUserId()); try { // Log with structured fields using Markers logger.info( Markers.append("event", "payment_initiated") .and(Markers.append("amount_cents", request.getAmountCents())) .and(Markers.append("currency", request.getCurrency())), "Processing payment" ); // ... payment logic ... } finally { MDC.clear(); // Clean up thread-local context } }}MDC (Mapped Diagnostic Context) uses thread-local storage. When using async processing (CompletableFuture, reactive streams), context is lost when execution moves to another thread. Use context-propagating schedulers or explicit context passing in async code.
Structured logging is the foundation of production observability. Without it, logs are archaeological artifacts requiring manual excavation. With it, logs become a queryable database of system behavior.
Key takeaways from this page:
What's next:
Now that you understand how to structure logs, the next page explores what to log. Log levels (DEBUG, INFO, WARN, ERROR) seem simple but are frequently misused. We'll establish rigorous criteria for when each level applies and how improper leveling corrupts observability.
You now understand structured logging—the first pillar of production-grade logging systems. You can design schemas, choose appropriate field types, build parsing pipelines, and integrate structured logging into existing applications. Next, we'll master log levels to ensure your logs contain the right information at the right verbosity.