System Design (LLD)Logging in LLD

Logging in Low-Level Design

LevelIntermediate

Duration60 mins

TopicLogging in LLD

1 / 4

Why Logging Matters

The Silent Witness of Production Systems

At 3:47 AM on a Friday night, your phone buzzes with an urgent alert: "Critical system failure. Revenue impact: $50,000/hour." You stumble to your laptop, bleary-eyed, and start investigating. But where do you begin?

Without logs, you're flying blind. The system is a black box—you see that it's failing, but you have no visibility into why. Was it a spike in traffic? A database connection issue? A corrupted message in the queue? A third-party API timeout? A null pointer exception nobody anticipated?

With comprehensive, well-designed logging, the story is different. Within minutes, you've traced the request flow, identified the failing component, found the exact error with full context, and either fixed the issue or implemented a workaround. The difference between a 4-hour outage and a 15-minute resolution often comes down to one thing: how well you've instrumented your code with logging.

What You Will Learn

By the end of this page, you will understand why logging is not a luxury or an afterthought—but a fundamental engineering discipline that separates production-ready code from fragile prototypes. You'll see logging as your primary tool for observability, debugging, auditing, and understanding system behavior in the wild.

What Is Logging?

Logging is the practice of recording events, states, and diagnostic information from a running software system into a persistent or semi-persistent store for later analysis. Unlike console output during development, logs are designed to survive beyond the immediate execution context and serve multiple stakeholders: developers, operators, security teams, and even automated monitoring systems.

Logs are fundamentally a communication channel between your running code and the humans (and machines) that need to understand what happened. They answer questions like:

What sequence of events led to this outcome?
What were the inputs to this operation?
Where did the error occur, and what was the system state at that moment?
How is the system performing over time?
Who performed what action, and when?

But logging is more than just print() statements scattered through your code. Professional logging is intentional, structured, and designed for the consumers who will read the logs—often under stressful conditions like production incidents.

The Log Reader's Perspective

Always write logs with your future self in mind—exhausted, stressed, debugging at 3 AM with limited context. Will this log message tell you what you need to know? Will it help you narrow down the problem quickly? If not, it's not a useful log.

Logs vs. Other Observability Signals:

In modern systems, logging is one pillar of the three pillars of observability:

Logs: Discrete records of events (what happened)
Metrics: Aggregated numerical measurements (how much, how fast)
Traces: Request paths across distributed services (where did it go)

While all three are essential, logs remain the most detailed source of truth. Metrics tell you that response times increased; traces show you which services were involved; but logs tell you exactly what went wrong, with the full context necessary to understand and fix the problem.

The Core Purposes of Logging

Logging serves multiple purposes, and understanding each helps you log the right information at the right level. Every log statement should exist for a clear reason—random logging creates noise that obscures the signals you need.

Primary Purposes of Logging

•Debugging and Troubleshooting — The most immediate purpose: understanding why something went wrong. Good logs let you reconstruct the sequence of events leading to a failure, including inputs, intermediate states, and error details.
•Operational Monitoring — Logs power alerting and dashboards. You can detect anomalies, measure throughput, identify slow operations, and track resource usage—all from log data.
•Auditing and Compliance — For many systems, especially financial, healthcare, and government applications, logging is legally mandated. Audit logs track who did what, when, and why—essential for compliance and forensics.
•Performance Analysis — Timing logs help you identify bottlenecks, understand latency distributions, and track performance over time. They're essential for capacity planning and optimization.
•Business Intelligence — Logs capture user behavior, feature usage, and business events. This data informs product decisions, A/B testing analysis, and business metrics.
•Security Monitoring — Security logs detect intrusions, track access patterns, identify suspicious activity, and provide evidence for incident response and forensic analysis.

Logging Purposes and What to Capture
Purpose	Key Questions Answered	Example Log Entry
Debugging	Why did this fail? What was the state?	`ERROR: Failed to process order #12345 - inventory check failed for SKU XYZ, available: 0, requested: 5`
Operational	Is the system healthy? What's the throughput?	`INFO: Processed 1,247 requests in the last minute, avg latency: 45ms, p99: 120ms`
Auditing	Who did what, and when?	`AUDIT: User admin@company.com deleted customer record #67890 at 2024-01-15T10:23:45Z`
Performance	Where are the bottlenecks?	`DEBUG: Database query SELECT * FROM orders took 2.3s (threshold: 100ms)`
Security	Was there unauthorized access?	`WARN: Login attempt for user 'admin' failed 5 times from IP 192.168.1.100 in 60 seconds`

The Common Thread:

All these purposes share a common requirement: context. A log without context is nearly useless. Knowing that an error occurred is not helpful if you don't know which request caused it, what the inputs were, which server handled it, and what operations preceded it.

The art of logging is providing enough context to be useful without so much noise that the important information gets buried. This balance is a skill that develops with experience—and is the focus of later sections in this module.

Logging in the Development Lifecycle

Logging is not just for production. It plays different but equally important roles across the entire development lifecycle. Understanding these roles helps you design logging that serves all stages effectively.

Development & Testing

•Local Development — Verbose logs help you understand code flow and debug issues in real-time. You typically want maximum detail here.
•Unit Testing — Logs verify that the right operations occurred. Testing log outputs can validate behavior without inspecting internal state.
•Integration Testing — Logs help diagnose failures when multiple components interact. They're often the first clue when an integration test fails.
•Code Review — Reviewers examine logging to ensure adequate observability. Missing logs in error paths is a code smell.

Production & Operations

•Staging — Validates that production-level logging works correctly. Catches issues with log format, routing, or volume.
•Production — The ultimate consumer. Logs must be reliable, performant, and provide enough detail for incident response.
•Incident Response — The crucible where logging proves its worth. Under time pressure, logs are your primary diagnostic tool.
•Post-Mortem Analysis — Logs reconstruct what happened during an incident, informing prevention strategies.

The Logging Investment

Good logging takes time to design and implement. It's an investment that pays dividends throughout the system's lifetime. Like tests, logging is often skipped under deadline pressure—and like skipping tests, this decision creates debt that compounds over time.

The Cost of Poor Logging

Poor logging creates real business costs and engineering friction. These costs are often invisible until an incident reveals the gaps—at which point it's too late to add the logs you needed.

Common Logging Anti-Patterns and Their Costs

•No Logging — The worst case. When things fail, you have no visibility. Debugging becomes guesswork. Incident resolution takes hours instead of minutes. This happens more often than you'd think, especially in internal tools and 'simple' services.
•Insufficient Context — Logs that say 'Error occurred' without saying which request, which user, which input, or which server. You know something's wrong but can't trace it. Common in rushed implementations.
•Excessive Logging — Logging everything at high volume buries important signals in noise. Storage costs explode. Query times become unusable. The logs are technically there but practically inaccessible.
•Inconsistent Formats — Different developers, different formats. One uses JSON, another uses plain text. Timestamps vary. Log aggregation and searching becomes impossible.
•Sensitive Data Exposure — Logging passwords, tokens, PII, or financial data. This creates security vulnerabilities and compliance violations that can result in fines, lawsuits, and reputational damage.
•Log and Forget — Writing logs but never setting up infrastructure to collect, store, query, or alert on them. The logs exist but are inaccessible when needed.

Real-World Impact of Logging Gaps
Scenario	Poor Logging Result	Business Impact
Payment failures	Error logged without transaction ID	4-hour MTTR instead of 15 minutes; customer escalation
Security breach	No access logs for admin actions	Cannot determine scope of breach; regulatory penalties
Performance degradation	No timing information logged	Cannot identify bottleneck; weeks of speculation
Data corruption	Mutations logged without before/after values	Cannot reconstruct correct state; data loss
Intermittent failures	Error logged without request context	Cannot reproduce; issue persists for months

The 3 AM Test

Every log statement should pass the '3 AM test': If you're woken up at 3 AM because of an issue related to this code, will this log help you understand what happened? If not, it's either missing critical information or shouldn't exist at all.

Logging as a First-Class Concern

In mature engineering organizations, logging is not an afterthought—it's a first-class concern, treated with the same rigor as functionality, testing, and security. This mindset shift has profound implications for how code is designed and reviewed.

Characteristics of First-Class Logging

•Designed Upfront — Logging is considered during design, not bolted on after implementation. What events matter? What context is needed? What levels are appropriate?
•Reviewed in Code Reviews — PRs are checked for adequate logging. Missing logs in error paths, critical operations, or external integrations are flagged.
•Tested — Log output is tested, especially for critical paths. You verify that the right information is logged with the right level.
•Standardized — The organization has logging standards: format, levels, required fields, naming conventions. Consistency is enforced by tooling.
•Monitored — Log volume, error rates, and patterns are monitored. Unusual log patterns trigger alerts.
•Actionable — Logs drive alerts and runbooks. High-severity logs have associated response procedures.

The Design Perspective:

When you design a class or module, ask yourself:

What can go wrong here, and how will I know when it does?
What information will I need to diagnose issues?
What operations are significant enough to record?
Who will read these logs, and what are they looking for?

These questions are not an afterthought—they inform the design itself. Sometimes, considering logging reveals that you need to capture additional context, add timing, or structure error handling differently.

Logging influences design, and good design makes logging easier.

OrderProcessor.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
public class OrderProcessor {
    private static final Logger logger = LoggerFactory.getLogger(OrderProcessor.class);
    
    private final InventoryService inventoryService;
    private final PaymentService paymentService;
    private final OrderRepository orderRepository;
    
    /**
     * Process an order with comprehensive logging for observability.
     * 
     * Note how logging is woven into the design:
     * - Entry logging with key context (order ID, user)
     * - Stage completions for tracing flow
     * - Performance timing for bottleneck identification
     * - Error logging with full context for diagnosis
     * - Exit logging with outcome summary
     */
    public OrderResult processOrder(Order order) {
        String orderId = order.getId();
        String userId = order.getUserId();
        long startTime = System.currentTimeMillis();
        
        // Entry log with key identifiers
        logger.info("ORDER_PROCESSING_STARTED: orderId={}, userId={}, itemCount={}, totalAmount={}",
            orderId, userId, order.getItems().size(), order.getTotalAmount());
        
        try {
            // Inventory check with timing
            long inventoryStart = System.currentTimeMillis();
            InventoryResult inventoryResult = inventoryService.checkAndReserve(order.getItems());
            long inventoryDuration = System.currentTimeMillis() - inventoryStart;
            
            logger.info("INVENTORY_CHECK_COMPLETE: orderId={}, available={}, durationMs={}",
                orderId, inventoryResult.isAvailable(), inventoryDuration);
            
            if (!inventoryResult.isAvailable()) {
                logger.warn("ORDER_REJECTED_INVENTORY: orderId={}, unavailableItems={}",
                    orderId, inventoryResult.getUnavailableItems());
                return OrderResult.rejected("Insufficient inventory");
            }
            
            // Payment processing with timing
            long paymentStart = System.currentTimeMillis();
            PaymentResult paymentResult = paymentService.processPayment(order.getPaymentDetails());
            long paymentDuration = System.currentTimeMillis() - paymentStart;
            
            logger.info("PAYMENT_PROCESSING_COMPLETE: orderId={}, success={}, transactionId={}, durationMs={}",
                orderId, paymentResult.isSuccess(), paymentResult.getTransactionId(), paymentDuration);
            
            if (!paymentResult.isSuccess()) {
                // Roll back inventory reservation
                inventoryService.releaseReservation(inventoryResult.getReservationId());
                
                logger.warn("ORDER_REJECTED_PAYMENT: orderId={}, reason={}, errorCode={}",
                    orderId, paymentResult.getFailureReason(), paymentResult.getErrorCode());
                return OrderResult.rejected("Payment failed: " + paymentResult.getFailureReason());
            }
            
            // Persist order
            order.setStatus(OrderStatus.CONFIRMED);
            order.setPaymentTransactionId(paymentResult.getTransactionId());
            orderRepository.save(order);
            
            long totalDuration = System.currentTimeMillis() - startTime;
            
            // Success log with complete summary
            logger.info("ORDER_PROCESSING_COMPLETE: orderId={}, userId={}, status=CONFIRMED, " +
                "transactionId={}, totalDurationMs={}, inventoryDurationMs={}, paymentDurationMs={}",
                orderId, userId, paymentResult.getTransactionId(), 
                totalDuration, inventoryDuration, paymentDuration);
            
            return OrderResult.success(orderId, paymentResult.getTransactionId());
            
        } catch (InventoryServiceException e) {
            logger.error("ORDER_PROCESSING_FAILED: orderId={}, stage=INVENTORY, error={}, stackTrace={}",
                orderId, e.getMessage(), e.getClass().getSimpleName(), e);
            return OrderResult.error("Inventory service error");
            
        } catch (PaymentServiceException e) {
            logger.error("ORDER_PROCESSING_FAILED: orderId={}, stage=PAYMENT, error={}, errorCode={}",
                orderId, e.getMessage(), e.getErrorCode(), e);
            return OrderResult.error("Payment service error");
            
        } catch (Exception e) {
            long totalDuration = System.currentTimeMillis() - startTime;
            logger.error("ORDER_PROCESSING_FAILED: orderId={}, userId={}, stage=UNKNOWN, " +
                "error={}, durationMs={}", orderId, userId, e.getMessage(), totalDuration, e);
            throw new OrderProcessingException("Unexpected error processing order", e);
        }
    }
}

Key observations from the code:

Consistent Event Naming: Every log has an event name (ORDER_PROCESSING_STARTED, INVENTORY_CHECK_COMPLETE) that makes grepping and filtering easy.
Contextual Identifiers: orderId and userId appear in every related log, enabling correlation across the request lifecycle.
Timing Data: Duration measurements at each stage allow performance analysis without additional instrumentation.
Outcome Logging: Success and failure paths both log their outcomes with appropriate levels and relevant details.
Exception Handling: Errors are logged with stage identification and stack traces, making diagnosis straightforward.

This is what first-class logging looks like—intentional, comprehensive, and designed for the people who will read it under pressure.

Logging and System Architecture

In modern distributed systems, logging is not just about individual components writing to local files. It's an architectural concern that spans the entire system. Understanding this architecture helps you design effective logging strategies.

Converting Mermaid diagram...

The Logging Pipeline:

Sources: Every component produces logs—services, databases, infrastructure, external integrations. Each has different formats and volumes.
Collection: Agents collect logs from sources and ship them to central infrastructure. This must be reliable—lost logs are useless logs.
Processing: Logs are parsed into structured data, enriched with metadata (environment, datacenter, version), and filtered for relevance.
Storage: Logs are stored for querying and long-term retention. Hot storage for recent logs; cold storage for compliance.
Consumption: Dashboards visualize patterns, alerts trigger on anomalies, and engineers query during investigations.

Implications for Class Design:

Understanding this pipeline affects how you write logs:

Structured formats (JSON) parse more reliably than free-form text
Consistent fields enable cross-service correlation
Appropriate volume prevents pipeline overload
Required context must be included at log time—enrichment can't add what wasn't there

Log What You Need, When You Need It

Logs cannot be retroactively enhanced. If you don't log the correlation ID, the user ID, or the input parameters when the event happens, you cannot add them later. Think about future debugging needs now.

The Mindset of Effective Logging

Effective logging is not just technical—it's a mindset. The best loggers think like operators, anticipate failure modes, and design for observability from the start. Here are the mental models that experienced engineers apply:

Principles of Effective Logging

•Log for the reader, not the writer — Don't log what's easy; log what's useful. The person reading this log will have no context about your code. Give them what they need.
•Assume you won't have access to the machine — In production, you typically can't SSH in and inspect state. The logs are your only window. Make them complete.
•Log at change boundaries — When data enters or leaves your component, log it. These boundaries are where issues manifest and where context needs to be captured.
•Include correlation handles — Every log should include IDs that let you tie it to a request, user, or transaction. Without correlation, logs are scattered puzzle pieces.
•Think in stories, not statements — A good logging strategy tells the story of a request: it started here, did these things, encountered this condition, and ended this way.
•Error logs should be actionable — If an error log doesn't tell you what failed, why, and what to do next, it's not helping. Include context for remediation.
•Distinguish expected from unexpected — A user entering a wrong password is not the same as a database connection failure. Log levels and messages should reflect this difference.

The Voice in the Logs

Well-written logs have a consistent voice—clear, informative, professional. They don't include sarcastic comments, vague descriptions, or developer inside jokes. They're written as if they'll be read by a customer support engineer who needs to explain the issue to a client.

Summary: Why Logging Matters

We've explored the foundational importance of logging in software systems. Let's consolidate the key insights:

Key Takeaways

•Logging is observability — In production, logs are often your only window into what the system is doing. Without them, debugging is guesswork.
•Logging serves multiple purposes — Debugging, auditing, monitoring, security, and business intelligence all depend on well-designed logging.
•Context is everything — A log without correlation IDs, inputs, and state is nearly useless. Capture what future you will need to know.
•Logging is a first-class concern — It should be designed upfront, reviewed in code reviews, and tested like any other critical functionality.
•The cost of poor logging is high — Extended outages, compliance failures, and undiagnosable bugs all trace back to inadequate logging.
•Think like an operator — Write logs for the stressed engineer at 3 AM who needs to fix the issue quickly with incomplete information.

What's Next:

Now that we understand why logging matters, we'll explore how to log effectively. The next page covers logging levels—the hierarchy of severity that helps you balance signal and noise, and ensures that the most important information is always visible.

Page Complete

You now understand the fundamental importance of logging in software systems. Logging is not overhead—it's insurance against the unknown. Next, we'll learn how to use logging levels to categorize and prioritize log messages effectively.

1 / 4

Loading learning content...

System Design (LLD)Logging in LLD

Logging in Low-Level Design

LevelIntermediate

Duration60 mins

TopicLogging in LLD

1 / 4

Why Logging Matters

The Silent Witness of Production Systems

What You Will Learn

What Is Logging?

Logs are fundamentally a communication channel between your running code and the humans (and machines) that need to understand what happened. They answer questions like:

What sequence of events led to this outcome?
What were the inputs to this operation?
Where did the error occur, and what was the system state at that moment?
How is the system performing over time?
Who performed what action, and when?

The Log Reader's Perspective

Logs vs. Other Observability Signals:

In modern systems, logging is one pillar of the three pillars of observability:

Logs: Discrete records of events (what happened)
Metrics: Aggregated numerical measurements (how much, how fast)
Traces: Request paths across distributed services (where did it go)

The Core Purposes of Logging

Primary Purposes of Logging

•Debugging and Troubleshooting — The most immediate purpose: understanding why something went wrong. Good logs let you reconstruct the sequence of events leading to a failure, including inputs, intermediate states, and error details.
•Operational Monitoring — Logs power alerting and dashboards. You can detect anomalies, measure throughput, identify slow operations, and track resource usage—all from log data.
•Auditing and Compliance — For many systems, especially financial, healthcare, and government applications, logging is legally mandated. Audit logs track who did what, when, and why—essential for compliance and forensics.
•Performance Analysis — Timing logs help you identify bottlenecks, understand latency distributions, and track performance over time. They're essential for capacity planning and optimization.
•Business Intelligence — Logs capture user behavior, feature usage, and business events. This data informs product decisions, A/B testing analysis, and business metrics.
•Security Monitoring — Security logs detect intrusions, track access patterns, identify suspicious activity, and provide evidence for incident response and forensic analysis.

Logging Purposes and What to Capture
Purpose	Key Questions Answered	Example Log Entry
Debugging	Why did this fail? What was the state?	`ERROR: Failed to process order #12345 - inventory check failed for SKU XYZ, available: 0, requested: 5`
Operational	Is the system healthy? What's the throughput?	`INFO: Processed 1,247 requests in the last minute, avg latency: 45ms, p99: 120ms`
Auditing	Who did what, and when?	`AUDIT: User admin@company.com deleted customer record #67890 at 2024-01-15T10:23:45Z`
Performance	Where are the bottlenecks?	`DEBUG: Database query SELECT * FROM orders took 2.3s (threshold: 100ms)`
Security	Was there unauthorized access?	`WARN: Login attempt for user 'admin' failed 5 times from IP 192.168.1.100 in 60 seconds`

The Common Thread:

Logging in the Development Lifecycle

Development & Testing

•Local Development — Verbose logs help you understand code flow and debug issues in real-time. You typically want maximum detail here.
•Unit Testing — Logs verify that the right operations occurred. Testing log outputs can validate behavior without inspecting internal state.
•Integration Testing — Logs help diagnose failures when multiple components interact. They're often the first clue when an integration test fails.
•Code Review — Reviewers examine logging to ensure adequate observability. Missing logs in error paths is a code smell.

Production & Operations

•Staging — Validates that production-level logging works correctly. Catches issues with log format, routing, or volume.
•Production — The ultimate consumer. Logs must be reliable, performant, and provide enough detail for incident response.
•Incident Response — The crucible where logging proves its worth. Under time pressure, logs are your primary diagnostic tool.
•Post-Mortem Analysis — Logs reconstruct what happened during an incident, informing prevention strategies.

The Logging Investment

The Cost of Poor Logging

Poor logging creates real business costs and engineering friction. These costs are often invisible until an incident reveals the gaps—at which point it's too late to add the logs you needed.

Common Logging Anti-Patterns and Their Costs

•No Logging — The worst case. When things fail, you have no visibility. Debugging becomes guesswork. Incident resolution takes hours instead of minutes. This happens more often than you'd think, especially in internal tools and 'simple' services.
•Insufficient Context — Logs that say 'Error occurred' without saying which request, which user, which input, or which server. You know something's wrong but can't trace it. Common in rushed implementations.
•Excessive Logging — Logging everything at high volume buries important signals in noise. Storage costs explode. Query times become unusable. The logs are technically there but practically inaccessible.
•Inconsistent Formats — Different developers, different formats. One uses JSON, another uses plain text. Timestamps vary. Log aggregation and searching becomes impossible.
•Sensitive Data Exposure — Logging passwords, tokens, PII, or financial data. This creates security vulnerabilities and compliance violations that can result in fines, lawsuits, and reputational damage.
•Log and Forget — Writing logs but never setting up infrastructure to collect, store, query, or alert on them. The logs exist but are inaccessible when needed.

Real-World Impact of Logging Gaps
Scenario	Poor Logging Result	Business Impact
Payment failures	Error logged without transaction ID	4-hour MTTR instead of 15 minutes; customer escalation
Security breach	No access logs for admin actions	Cannot determine scope of breach; regulatory penalties
Performance degradation	No timing information logged	Cannot identify bottleneck; weeks of speculation
Data corruption	Mutations logged without before/after values	Cannot reconstruct correct state; data loss
Intermittent failures	Error logged without request context	Cannot reproduce; issue persists for months

The 3 AM Test

Logging as a First-Class Concern

Characteristics of First-Class Logging

•Designed Upfront — Logging is considered during design, not bolted on after implementation. What events matter? What context is needed? What levels are appropriate?
•Reviewed in Code Reviews — PRs are checked for adequate logging. Missing logs in error paths, critical operations, or external integrations are flagged.
•Tested — Log output is tested, especially for critical paths. You verify that the right information is logged with the right level.
•Standardized — The organization has logging standards: format, levels, required fields, naming conventions. Consistency is enforced by tooling.
•Monitored — Log volume, error rates, and patterns are monitored. Unusual log patterns trigger alerts.
•Actionable — Logs drive alerts and runbooks. High-severity logs have associated response procedures.

The Design Perspective:

When you design a class or module, ask yourself:

What can go wrong here, and how will I know when it does?
What information will I need to diagnose issues?
What operations are significant enough to record?
Who will read these logs, and what are they looking for?

Logging influences design, and good design makes logging easier.

OrderProcessor.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
public class OrderProcessor {
    private static final Logger logger = LoggerFactory.getLogger(OrderProcessor.class);
    
    private final InventoryService inventoryService;
    private final PaymentService paymentService;
    private final OrderRepository orderRepository;
    
    /**
     * Process an order with comprehensive logging for observability.
     * 
     * Note how logging is woven into the design:
     * - Entry logging with key context (order ID, user)
     * - Stage completions for tracing flow
     * - Performance timing for bottleneck identification
     * - Error logging with full context for diagnosis
     * - Exit logging with outcome summary
     */
    public OrderResult processOrder(Order order) {
        String orderId = order.getId();
        String userId = order.getUserId();
        long startTime = System.currentTimeMillis();
        
        // Entry log with key identifiers
        logger.info("ORDER_PROCESSING_STARTED: orderId={}, userId={}, itemCount={}, totalAmount={}",
            orderId, userId, order.getItems().size(), order.getTotalAmount());
        
        try {
            // Inventory check with timing
            long inventoryStart = System.currentTimeMillis();
            InventoryResult inventoryResult = inventoryService.checkAndReserve(order.getItems());
            long inventoryDuration = System.currentTimeMillis() - inventoryStart;
            
            logger.info("INVENTORY_CHECK_COMPLETE: orderId={}, available={}, durationMs={}",
                orderId, inventoryResult.isAvailable(), inventoryDuration);
            
            if (!inventoryResult.isAvailable()) {
                logger.warn("ORDER_REJECTED_INVENTORY: orderId={}, unavailableItems={}",
                    orderId, inventoryResult.getUnavailableItems());
                return OrderResult.rejected("Insufficient inventory");
            }
            
            // Payment processing with timing
            long paymentStart = System.currentTimeMillis();
            PaymentResult paymentResult = paymentService.processPayment(order.getPaymentDetails());
            long paymentDuration = System.currentTimeMillis() - paymentStart;
            
            logger.info("PAYMENT_PROCESSING_COMPLETE: orderId={}, success={}, transactionId={}, durationMs={}",
                orderId, paymentResult.isSuccess(), paymentResult.getTransactionId(), paymentDuration);
            
            if (!paymentResult.isSuccess()) {
                // Roll back inventory reservation
                inventoryService.releaseReservation(inventoryResult.getReservationId());
                
                logger.warn("ORDER_REJECTED_PAYMENT: orderId={}, reason={}, errorCode={}",
                    orderId, paymentResult.getFailureReason(), paymentResult.getErrorCode());
                return OrderResult.rejected("Payment failed: " + paymentResult.getFailureReason());
            }
            
            // Persist order
            order.setStatus(OrderStatus.CONFIRMED);
            order.setPaymentTransactionId(paymentResult.getTransactionId());
            orderRepository.save(order);
            
            long totalDuration = System.currentTimeMillis() - startTime;
            
            // Success log with complete summary
            logger.info("ORDER_PROCESSING_COMPLETE: orderId={}, userId={}, status=CONFIRMED, " +
                "transactionId={}, totalDurationMs={}, inventoryDurationMs={}, paymentDurationMs={}",
                orderId, userId, paymentResult.getTransactionId(), 
                totalDuration, inventoryDuration, paymentDuration);
            
            return OrderResult.success(orderId, paymentResult.getTransactionId());
            
        } catch (InventoryServiceException e) {
            logger.error("ORDER_PROCESSING_FAILED: orderId={}, stage=INVENTORY, error={}, stackTrace={}",
                orderId, e.getMessage(), e.getClass().getSimpleName(), e);
            return OrderResult.error("Inventory service error");
            
        } catch (PaymentServiceException e) {
            logger.error("ORDER_PROCESSING_FAILED: orderId={}, stage=PAYMENT, error={}, errorCode={}",
                orderId, e.getMessage(), e.getErrorCode(), e);
            return OrderResult.error("Payment service error");
            
        } catch (Exception e) {
            long totalDuration = System.currentTimeMillis() - startTime;
            logger.error("ORDER_PROCESSING_FAILED: orderId={}, userId={}, stage=UNKNOWN, " +
                "error={}, durationMs={}", orderId, userId, e.getMessage(), totalDuration, e);
            throw new OrderProcessingException("Unexpected error processing order", e);
        }
    }
}

Key observations from the code:

Consistent Event Naming: Every log has an event name (ORDER_PROCESSING_STARTED, INVENTORY_CHECK_COMPLETE) that makes grepping and filtering easy.
Contextual Identifiers: orderId and userId appear in every related log, enabling correlation across the request lifecycle.
Timing Data: Duration measurements at each stage allow performance analysis without additional instrumentation.
Outcome Logging: Success and failure paths both log their outcomes with appropriate levels and relevant details.
Exception Handling: Errors are logged with stage identification and stack traces, making diagnosis straightforward.

This is what first-class logging looks like—intentional, comprehensive, and designed for the people who will read it under pressure.

Logging and System Architecture

Converting Mermaid diagram...

The Logging Pipeline:

Sources: Every component produces logs—services, databases, infrastructure, external integrations. Each has different formats and volumes.
Collection: Agents collect logs from sources and ship them to central infrastructure. This must be reliable—lost logs are useless logs.
Processing: Logs are parsed into structured data, enriched with metadata (environment, datacenter, version), and filtered for relevance.
Storage: Logs are stored for querying and long-term retention. Hot storage for recent logs; cold storage for compliance.
Consumption: Dashboards visualize patterns, alerts trigger on anomalies, and engineers query during investigations.

Implications for Class Design:

Understanding this pipeline affects how you write logs:

Structured formats (JSON) parse more reliably than free-form text
Consistent fields enable cross-service correlation
Appropriate volume prevents pipeline overload
Required context must be included at log time—enrichment can't add what wasn't there

Log What You Need, When You Need It

The Mindset of Effective Logging

Principles of Effective Logging

•Log for the reader, not the writer — Don't log what's easy; log what's useful. The person reading this log will have no context about your code. Give them what they need.
•Assume you won't have access to the machine — In production, you typically can't SSH in and inspect state. The logs are your only window. Make them complete.
•Log at change boundaries — When data enters or leaves your component, log it. These boundaries are where issues manifest and where context needs to be captured.
•Include correlation handles — Every log should include IDs that let you tie it to a request, user, or transaction. Without correlation, logs are scattered puzzle pieces.
•Think in stories, not statements — A good logging strategy tells the story of a request: it started here, did these things, encountered this condition, and ended this way.
•Error logs should be actionable — If an error log doesn't tell you what failed, why, and what to do next, it's not helping. Include context for remediation.
•Distinguish expected from unexpected — A user entering a wrong password is not the same as a database connection failure. Log levels and messages should reflect this difference.

The Voice in the Logs

Summary: Why Logging Matters

We've explored the foundational importance of logging in software systems. Let's consolidate the key insights:

Key Takeaways

•Logging is observability — In production, logs are often your only window into what the system is doing. Without them, debugging is guesswork.
•Logging serves multiple purposes — Debugging, auditing, monitoring, security, and business intelligence all depend on well-designed logging.
•Context is everything — A log without correlation IDs, inputs, and state is nearly useless. Capture what future you will need to know.
•Logging is a first-class concern — It should be designed upfront, reviewed in code reviews, and tested like any other critical functionality.
•The cost of poor logging is high — Extended outages, compliance failures, and undiagnosable bugs all trace back to inadequate logging.
•Think like an operator — Write logs for the stressed engineer at 3 AM who needs to fix the issue quickly with incomplete information.

What's Next:

Page Complete

1 / 4