System Design (LLD)Observability Design

Observability Design

LevelAdvanced

Duration60 mins

TopicObservability Design

1 / 4

Designing Observable Classes

The Invisible Becomes Visible

In the realm of production systems, there exists a fundamental truth that separates mature engineering organizations from struggling ones: you cannot fix what you cannot see. Software systems in production are like submarines operating deep beneath the ocean surface—their internal state is hidden from direct observation, and operators must rely on carefully designed instrumentation to understand what's happening inside.

Observability is the practice of designing systems that emit sufficient information about their internal state to enable operators, developers, and automated systems to understand behavior, diagnose issues, and make informed decisions. Unlike traditional monitoring, which focuses on watching for known problems, observability enables answering arbitrary questions about system behavior—including questions you didn't think to ask when you built the system.

What You Will Learn

By the end of this page, you will understand the core principles of observable class design, learn to distinguish observability from monitoring, master techniques for exposing internal state without compromising encapsulation, and develop the mindset of 'observability by design' that characterizes elite engineering organizations.

Observability vs Monitoring: A Critical Distinction

Before diving into implementation, we must establish a clear understanding of what observability means and how it differs from traditional monitoring. This distinction fundamentally shapes how we design our classes.

Monitoring is about watching for known problems. You define metrics, thresholds, and alerts in advance based on anticipated failure modes. When a metric exceeds a threshold—CPU at 90%, error rate above 1%—you trigger an alert. Monitoring answers questions like "Is X within acceptable bounds?"

Observability is about understanding unknown problems. It's the property of a system that allows you to ask arbitrary questions about its internal state without deploying new code. Observability answers questions like "Why did request latency spike at 3:47 AM for users in the EU region with premium accounts accessing the billing service?"

Monitoring vs Observability
Aspect	Traditional Monitoring	Observability
Philosophy	Watch for known problems	Enable investigation of unknown problems
Questions Answered	Predefined metrics and thresholds	Arbitrary ad-hoc queries
Data Model	Aggregated metrics (averages, percentiles)	High-cardinality events with rich context
Investigation Style	Dashboard-first, alert-driven	Exploration-first, hypothesis-driven
Failure Mode	Misses novel problems not anticipated	May have data volume/cost challenges
Design Impact	Add metrics after the fact	Build visibility into the architecture

The Three Pillars of Observability

Observability is traditionally built on three complementary data types: Logs (discrete events with timestamps and context), Metrics (numerical measurements aggregated over time), and Traces (request flow across distributed components). Each provides a different lens into system behavior, and truly observable systems integrate all three.

The observability equation:

A system's observability is not a binary property—it exists on a spectrum. We can express it conceptually as:

Observability = (Data Richness × Context) / (Noise + Effort)

Data Richness: How much relevant information is emitted
Context: How well events can be correlated and understood in relation to each other
Noise: Irrelevant data that obscures signal
Effort: How hard it is to extract insights

The goal of observable class design is to maximize the numerator (rich, contextual data) while minimizing the denominator (noise and investigative effort).

Principles of Observable Class Design

Designing observable classes requires internalizing a set of principles that guide every design decision. These principles ensure that observability becomes a first-class architectural concern rather than an afterthought bolted on during production incidents.

Core Principles of Observable Design

•Principle 1: Design for Answering Questions — Every piece of instrumentation should help answer questions about system behavior. Before adding a metric or log statement, ask: "What question does this help answer?"
•Principle 2: Context is King — A log without context is noise. Every event should carry sufficient context to be understood in isolation and correlated with related events. Include request IDs, user identifiers, operation names, and relevant parameters.
•Principle 3: Preserve High-Cardinality Data — Traditional monitoring sacrifices detail for efficiency through aggregation. Observable systems preserve high-cardinality dimensions (user IDs, request IDs, specific error messages) to enable granular investigation.
•Principle 4: Consistent Instrumentation Patterns — Adopt conventions across your codebase. Every service method should emit similar structures. Consistency dramatically reduces cognitive load during investigations.
•Principle 5: Separate Signal from Noise — Not all information is equally valuable. Design hierarchical verbosity levels and sampling strategies so that critical signals aren't buried in routine noise.
•Principle 6: Correlation Over Causation — Enable tracing requests across component boundaries. Every operation should be correlatable to a larger transaction through trace IDs, span IDs, and parent-child relationships.

The Observability Tax

Observability isn't free. Every metric, log, and trace has costs in CPU cycles, memory, network bandwidth, and storage. The goal isn't to instrument everything—it's to instrument strategically. Elite teams develop intuition for what matters and what doesn't, maximizing return on observability investment.

Anatomy of an Observable Class

Let's examine what an observable class looks like in practice. We'll build from a naive implementation to a fully instrumented one, demonstrating how observability concerns integrate with business logic.

The Non-Observable Baseline:

Consider a typical service class that processes orders:

OrderService.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Non-observable: Invisible to operators
public class OrderService {
    private final InventoryClient inventory;
    private final PaymentClient payments;
    private final OrderRepository repository;
    
    public OrderService(
        InventoryClient inventory, 
        PaymentClient payments, 
        OrderRepository repository
    ) {
        this.inventory = inventory;
        this.payments = payments;
        this.repository = repository;
    }
    
    public Order processOrder(OrderRequest request) {
        // Reserve inventory
        inventory.reserve(request.getItems());
        
        // Process payment
        payments.charge(request.getCustomerId(), request.getTotal());
        
        // Create and save order
        Order order = new Order(request);
        return repository.save(order);
    }
}

This implementation works but is completely opaque. When orders fail in production, operators have no visibility into:

Which step failed?
How long did each step take?
What was the state of the request?
How many orders are being processed concurrently?

The Observable Version:

ObservableOrderService.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
public class ObservableOrderService {
    private static final Logger log = LoggerFactory.getLogger(ObservableOrderService.class);
    
    // Metrics for quantitative behavior
    private final Counter ordersProcessed;
    private final Counter ordersFailed;
    private final Timer orderLatency;
    private final Gauge activeOrders;
    
    // Dependencies
    private final InventoryClient inventory;
    private final PaymentClient payments;
    private final OrderRepository repository;
    private final Tracer tracer;
    
    public ObservableOrderService(
        InventoryClient inventory, 
        PaymentClient payments, 
        OrderRepository repository,
        MeterRegistry metrics,
        Tracer tracer
    ) {
        this.inventory = inventory;
        this.payments = payments;
        this.repository = repository;
        this.tracer = tracer;
        
        // Initialize metrics with meaningful tags
        this.ordersProcessed = metrics.counter("orders.processed", 
            "service", "order-service");
        this.ordersFailed = metrics.counter("orders.failed", 
            "service", "order-service");
        this.orderLatency = metrics.timer("orders.latency", 
            "service", "order-service");
        this.activeOrders = metrics.gauge("orders.active", 
            new AtomicInteger(0), AtomicInteger::get);
    }
    
    public Order processOrder(OrderRequest request) {
        // Create trace span for distributed tracing
        Span span = tracer.spanBuilder("processOrder")
            .setAttribute("order.customerId", request.getCustomerId())
            .setAttribute("order.itemCount", request.getItems().size())
            .setAttribute("order.total", request.getTotal().doubleValue())
            .startSpan();
        
        // Structured logging with context
        try (Scope scope = span.makeCurrent()) {
            log.info("Processing order started",
                kv("customerId", request.getCustomerId()),
                kv("itemCount", request.getItems().size()),
                kv("totalAmount", request.getTotal()),
                kv("traceId", span.getSpanContext().getTraceId())
            );
            
            activeOrders.incrementAndGet();
            
            return orderLatency.record(() -> {
                try {
                    // Step 1: Reserve inventory with observability
                    Span inventorySpan = tracer.spanBuilder("reserveInventory")
                        .startSpan();
                    try {
                        inventory.reserve(request.getItems());
                        inventorySpan.setStatus(StatusCode.OK);
                    } finally {
                        inventorySpan.end();
                    }
                    
                    // Step 2: Process payment with observability
                    Span paymentSpan = tracer.spanBuilder("processPayment")
                        .setAttribute("payment.amount", request.getTotal().doubleValue())
                        .startSpan();
                    try {
                        payments.charge(request.getCustomerId(), request.getTotal());
                        paymentSpan.setStatus(StatusCode.OK);
                    } finally {
                        paymentSpan.end();
                    }
                    
                    // Step 3: Persist order
                    Order order = new Order(request);
                    Order saved = repository.save(order);
                    
                    ordersProcessed.increment();
                    span.setStatus(StatusCode.OK);
                    span.setAttribute("order.id", saved.getId());
                    
                    log.info("Order processed successfully",
                        kv("orderId", saved.getId()),
                        kv("customerId", request.getCustomerId()),
                        kv("traceId", span.getSpanContext().getTraceId())
                    );
                    
                    return saved;
                    
                } catch (Exception e) {
                    ordersFailed.increment(Tags.of("reason", e.getClass().getSimpleName()));
                    span.recordException(e);
                    span.setStatus(StatusCode.ERROR, e.getMessage());
                    
                    log.error("Order processing failed",
                        kv("customerId", request.getCustomerId()),
                        kv("errorType", e.getClass().getSimpleName()),
                        kv("errorMessage", e.getMessage()),
                        kv("traceId", span.getSpanContext().getTraceId()),
                        e
                    );
                    
                    throw e;
                } finally {
                    activeOrders.decrementAndGet();
                }
            });
        } finally {
            span.end();
        }
    }
}

What makes this observable:

The instrumented version provides multiple dimensions of visibility:

Structured Logs: Every operation emits contextual logs with consistent key-value pairs
Metrics: Counters, timers, and gauges quantify behavior over time
Traces: Distributed spans enable following request flow across services
Error Attribution: Failures are tagged with error types for categorization
Active Work Tracking: Gauges show concurrent processing state

Balancing Encapsulation and Observability

A common concern when designing observable classes is the apparent tension between encapsulation (hiding internal state) and observability (exposing internal state). How do we reconcile these seemingly contradictory goals?

The key insight is that encapsulation hides implementation details from callers while observability exposes behavior to operators. These are different audiences with different needs:

Callers (other classes): Need to know what a class does, not how
Operators (DevOps, SRE, Support): Need to know how the system behaves in production

What Encapsulation Protects

•Internal data representations
•Algorithm implementation details
•Private helper methods
•Mutable state access
•Implementation dependencies

What Observability Exposes

•Operation durations (how long things take)
•Error rates and types
•Throughput and concurrency
•State transitions and events
•Resource utilization

Techniques for Preserving Encapsulation:

1. Expose Behavior, Not State

Rather than exposing internal variables, emit events about what the class did:

// Bad: Exposes internal state structure
metrics.gauge("cache.entries", cache.getInternalMap().size());

// Good: Exposes behavioral metric
metrics.gauge("cache.size", () -> cache.size());

2. Use Opaque Identifiers

Include correlation IDs without exposing their meaning or generation strategy:

// Good: Opaque but correlatable
log.info("Request processed", kv("requestId", request.getId()));

3. Abstract Instrumentation

Create observability facades that hide instrumentation details:

public interface ObservabilityFacade {
    void recordSuccess(String operation);
    void recordFailure(String operation, Exception e);
    void recordLatency(String operation, Duration duration);
}

This allows changing instrumentation implementation (Prometheus to Datadog) without touching business logic.

The Observability Interface Pattern

Consider defining an Observable interface that standardizes how classes expose their state for monitoring. This creates a contract between classes and observability infrastructure, making instrumentation consistent and predictable across your codebase.

Observability Injection Patterns

Embedding observability directly in business logic clutters code and creates tight coupling. Several patterns exist to separate concerns while maintaining comprehensive instrumentation.

Inject observability infrastructure as dependencies, enabling substitution for testing and configuration flexibility:

public class UserService {
    private final Metrics metrics;
    private final Logger log;
    
    // Injected observability concerns
    public UserService(Metrics metrics, LoggerFactory loggerFactory) {
        this.metrics = metrics;
        this.log = loggerFactory.getLogger(UserService.class);
    }
    
    public User createUser(CreateUserRequest request) {
        return metrics.timed("user.create", () -> {
            log.info("Creating user", kv("email", request.getEmail()));
            // Business logic...
        });
    }
}

Advantages: Testable, flexible, explicit dependencies

Disadvantages: Adds constructor parameters, requires DI framework

Choosing the Right Pattern

Use Dependency Injection as the default for most services. Apply AOP for cross-cutting concerns with consistent behavior (timing all controller methods). Use Decorators when you need fine-grained control or when working with interfaces from external libraries.

Designing Observable State

Some classes maintain significant internal state—caches, connection pools, queues, buffers. This state directly impacts system behavior and must be visible to operators.

ObservableConnectionPool.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
public class ObservableConnectionPool implements ConnectionPool {
    private final BlockingQueue<Connection> availableConnections;
    private final Set<Connection> borrowedConnections;
    private final PoolConfig config;
    
    // State metrics
    private final Gauge availableGauge;
    private final Gauge borrowedGauge;
    private final Counter borrowSuccessCounter;
    private final Counter borrowFailureCounter;
    private final Counter exhaustedCounter;
    private final Timer borrowLatency;
    private final Histogram waitTimeHistogram;
    
    public ObservableConnectionPool(PoolConfig config, MeterRegistry registry) {
        this.config = config;
        this.availableConnections = new ArrayBlockingQueue<>(config.getMaxSize());
        this.borrowedConnections = ConcurrentHashMap.newKeySet();
        
        // Initialize gauges as real-time state reporters
        this.availableGauge = registry.gauge("pool.connections.available", 
            Tags.of("pool", config.getName()),
            this, pool -> pool.availableConnections.size());
        
        this.borrowedGauge = registry.gauge("pool.connections.borrowed", 
            Tags.of("pool", config.getName()),
            this, pool -> pool.borrowedConnections.size());
        
        this.borrowSuccessCounter = registry.counter("pool.borrow.success", 
            "pool", config.getName());
        this.borrowFailureCounter = registry.counter("pool.borrow.failure", 
            "pool", config.getName());
        this.exhaustedCounter = registry.counter("pool.exhausted", 
            "pool", config.getName());
        
        this.borrowLatency = registry.timer("pool.borrow.latency", 
            "pool", config.getName());
        this.waitTimeHistogram = registry.histogram("pool.wait.time", 
            "pool", config.getName());
    }
    
    @Override
    public Connection borrow(Duration timeout) {
        return borrowLatency.record(() -> {
            long startWait = System.nanoTime();
            
            try {
                Connection conn = availableConnections.poll(
                    timeout.toMillis(), TimeUnit.MILLISECONDS);
                
                long waitTime = System.nanoTime() - startWait;
                waitTimeHistogram.record(waitTime, TimeUnit.NANOSECONDS);
                
                if (conn == null) {
                    exhaustedCounter.increment();
                    borrowFailureCounter.increment();
                    throw new PoolExhaustedException("No connections available");
                }
                
                borrowedConnections.add(conn);
                borrowSuccessCounter.increment();
                return conn;
                
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                borrowFailureCounter.increment();
                throw new PoolException("Interrupted while waiting", e);
            }
        });
    }
    
    // Expose pool state for debugging/admin endpoints
    public PoolStats getStats() {
        return new PoolStats(
            availableConnections.size(),
            borrowedConnections.size(),
            config.getMaxSize(),
            exhaustedCounter.count(),
            borrowSuccessCounter.count()
        );
    }
}

Observable State Best Practices

•Use Gauges for Current State — Gauges should report instantaneous values (pool size, queue depth, active threads)
•Use Counters for Cumulative Events — Counters track totals over time (requests processed, errors encountered)
•Use Histograms for Distributions — Histograms capture value distributions (latency percentiles, request sizes)
•Tag with Identifying Dimensions — Include pool names, service identifiers, regions to enable drill-down
•Expose Admin Endpoints — Provide REST endpoints for human-readable state inspection

Summary: Designing Observable Classes

We've established the foundational principles for designing classes that expose their behavior for operational visibility. Let's consolidate the key takeaways:

Key Takeaways

•Observability ≠ Monitoring — Observability enables answering arbitrary questions; monitoring watches for known problems
•Design for Questions — Every piece of instrumentation should answer a question about system behavior
•Context is Essential — Events without context are noise; include correlation IDs, identifiers, and parameters
•Encapsulation Compatible — Observability exposes behavior to operators, not implementation to callers
•Injection Patterns — Use dependency injection, AOP, or decorators to separate observability from business logic
•State Visibility — Classes with significant internal state must expose that state through gauges and stats endpoints

What's Next:

Now that we understand how to design observable classes, the next page focuses on Metrics and Monitoring Hooks—the specific techniques for quantifying class behavior through counters, gauges, timers, and histograms, and how to design classes with built-in metric emission.

Page Complete

You now understand the principles and patterns for designing observable classes. This foundation enables building systems where behavior is visible, problems are diagnosable, and operators have the information they need to maintain reliability at scale.

1 / 4

Loading learning content...

System Design (LLD)Observability Design

Observability Design

LevelAdvanced

Duration60 mins

TopicObservability Design

1 / 4

Designing Observable Classes

The Invisible Becomes Visible

What You Will Learn

Observability vs Monitoring: A Critical Distinction

Monitoring vs Observability
Aspect	Traditional Monitoring	Observability
Philosophy	Watch for known problems	Enable investigation of unknown problems
Questions Answered	Predefined metrics and thresholds	Arbitrary ad-hoc queries
Data Model	Aggregated metrics (averages, percentiles)	High-cardinality events with rich context
Investigation Style	Dashboard-first, alert-driven	Exploration-first, hypothesis-driven
Failure Mode	Misses novel problems not anticipated	May have data volume/cost challenges
Design Impact	Add metrics after the fact	Build visibility into the architecture

The Three Pillars of Observability

The observability equation:

A system's observability is not a binary property—it exists on a spectrum. We can express it conceptually as:

Observability = (Data Richness × Context) / (Noise + Effort)

Data Richness: How much relevant information is emitted
Context: How well events can be correlated and understood in relation to each other
Noise: Irrelevant data that obscures signal
Effort: How hard it is to extract insights

The goal of observable class design is to maximize the numerator (rich, contextual data) while minimizing the denominator (noise and investigative effort).

Principles of Observable Class Design

Core Principles of Observable Design

•Principle 1: Design for Answering Questions — Every piece of instrumentation should help answer questions about system behavior. Before adding a metric or log statement, ask: "What question does this help answer?"
•Principle 2: Context is King — A log without context is noise. Every event should carry sufficient context to be understood in isolation and correlated with related events. Include request IDs, user identifiers, operation names, and relevant parameters.
•Principle 3: Preserve High-Cardinality Data — Traditional monitoring sacrifices detail for efficiency through aggregation. Observable systems preserve high-cardinality dimensions (user IDs, request IDs, specific error messages) to enable granular investigation.
•Principle 4: Consistent Instrumentation Patterns — Adopt conventions across your codebase. Every service method should emit similar structures. Consistency dramatically reduces cognitive load during investigations.
•Principle 5: Separate Signal from Noise — Not all information is equally valuable. Design hierarchical verbosity levels and sampling strategies so that critical signals aren't buried in routine noise.
•Principle 6: Correlation Over Causation — Enable tracing requests across component boundaries. Every operation should be correlatable to a larger transaction through trace IDs, span IDs, and parent-child relationships.

The Observability Tax

Anatomy of an Observable Class

The Non-Observable Baseline:

Consider a typical service class that processes orders:

OrderService.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Non-observable: Invisible to operators
public class OrderService {
    private final InventoryClient inventory;
    private final PaymentClient payments;
    private final OrderRepository repository;
    
    public OrderService(
        InventoryClient inventory, 
        PaymentClient payments, 
        OrderRepository repository
    ) {
        this.inventory = inventory;
        this.payments = payments;
        this.repository = repository;
    }
    
    public Order processOrder(OrderRequest request) {
        // Reserve inventory
        inventory.reserve(request.getItems());
        
        // Process payment
        payments.charge(request.getCustomerId(), request.getTotal());
        
        // Create and save order
        Order order = new Order(request);
        return repository.save(order);
    }
}

This implementation works but is completely opaque. When orders fail in production, operators have no visibility into:

Which step failed?
How long did each step take?
What was the state of the request?
How many orders are being processed concurrently?

The Observable Version:

ObservableOrderService.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
public class ObservableOrderService {
    private static final Logger log = LoggerFactory.getLogger(ObservableOrderService.class);
    
    // Metrics for quantitative behavior
    private final Counter ordersProcessed;
    private final Counter ordersFailed;
    private final Timer orderLatency;
    private final Gauge activeOrders;
    
    // Dependencies
    private final InventoryClient inventory;
    private final PaymentClient payments;
    private final OrderRepository repository;
    private final Tracer tracer;
    
    public ObservableOrderService(
        InventoryClient inventory, 
        PaymentClient payments, 
        OrderRepository repository,
        MeterRegistry metrics,
        Tracer tracer
    ) {
        this.inventory = inventory;
        this.payments = payments;
        this.repository = repository;
        this.tracer = tracer;
        
        // Initialize metrics with meaningful tags
        this.ordersProcessed = metrics.counter("orders.processed", 
            "service", "order-service");
        this.ordersFailed = metrics.counter("orders.failed", 
            "service", "order-service");
        this.orderLatency = metrics.timer("orders.latency", 
            "service", "order-service");
        this.activeOrders = metrics.gauge("orders.active", 
            new AtomicInteger(0), AtomicInteger::get);
    }
    
    public Order processOrder(OrderRequest request) {
        // Create trace span for distributed tracing
        Span span = tracer.spanBuilder("processOrder")
            .setAttribute("order.customerId", request.getCustomerId())
            .setAttribute("order.itemCount", request.getItems().size())
            .setAttribute("order.total", request.getTotal().doubleValue())
            .startSpan();
        
        // Structured logging with context
        try (Scope scope = span.makeCurrent()) {
            log.info("Processing order started",
                kv("customerId", request.getCustomerId()),
                kv("itemCount", request.getItems().size()),
                kv("totalAmount", request.getTotal()),
                kv("traceId", span.getSpanContext().getTraceId())
            );
            
            activeOrders.incrementAndGet();
            
            return orderLatency.record(() -> {
                try {
                    // Step 1: Reserve inventory with observability
                    Span inventorySpan = tracer.spanBuilder("reserveInventory")
                        .startSpan();
                    try {
                        inventory.reserve(request.getItems());
                        inventorySpan.setStatus(StatusCode.OK);
                    } finally {
                        inventorySpan.end();
                    }
                    
                    // Step 2: Process payment with observability
                    Span paymentSpan = tracer.spanBuilder("processPayment")
                        .setAttribute("payment.amount", request.getTotal().doubleValue())
                        .startSpan();
                    try {
                        payments.charge(request.getCustomerId(), request.getTotal());
                        paymentSpan.setStatus(StatusCode.OK);
                    } finally {
                        paymentSpan.end();
                    }
                    
                    // Step 3: Persist order
                    Order order = new Order(request);
                    Order saved = repository.save(order);
                    
                    ordersProcessed.increment();
                    span.setStatus(StatusCode.OK);
                    span.setAttribute("order.id", saved.getId());
                    
                    log.info("Order processed successfully",
                        kv("orderId", saved.getId()),
                        kv("customerId", request.getCustomerId()),
                        kv("traceId", span.getSpanContext().getTraceId())
                    );
                    
                    return saved;
                    
                } catch (Exception e) {
                    ordersFailed.increment(Tags.of("reason", e.getClass().getSimpleName()));
                    span.recordException(e);
                    span.setStatus(StatusCode.ERROR, e.getMessage());
                    
                    log.error("Order processing failed",
                        kv("customerId", request.getCustomerId()),
                        kv("errorType", e.getClass().getSimpleName()),
                        kv("errorMessage", e.getMessage()),
                        kv("traceId", span.getSpanContext().getTraceId()),
                        e
                    );
                    
                    throw e;
                } finally {
                    activeOrders.decrementAndGet();
                }
            });
        } finally {
            span.end();
        }
    }
}

What makes this observable:

The instrumented version provides multiple dimensions of visibility:

Structured Logs: Every operation emits contextual logs with consistent key-value pairs
Metrics: Counters, timers, and gauges quantify behavior over time
Traces: Distributed spans enable following request flow across services
Error Attribution: Failures are tagged with error types for categorization
Active Work Tracking: Gauges show concurrent processing state

Balancing Encapsulation and Observability

The key insight is that encapsulation hides implementation details from callers while observability exposes behavior to operators. These are different audiences with different needs:

Callers (other classes): Need to know what a class does, not how
Operators (DevOps, SRE, Support): Need to know how the system behaves in production

What Encapsulation Protects

•Internal data representations
•Algorithm implementation details
•Private helper methods
•Mutable state access
•Implementation dependencies

What Observability Exposes

•Operation durations (how long things take)
•Error rates and types
•Throughput and concurrency
•State transitions and events
•Resource utilization

Techniques for Preserving Encapsulation:

1. Expose Behavior, Not State

Rather than exposing internal variables, emit events about what the class did:

// Bad: Exposes internal state structure
metrics.gauge("cache.entries", cache.getInternalMap().size());

// Good: Exposes behavioral metric
metrics.gauge("cache.size", () -> cache.size());

2. Use Opaque Identifiers

Include correlation IDs without exposing their meaning or generation strategy:

// Good: Opaque but correlatable
log.info("Request processed", kv("requestId", request.getId()));

3. Abstract Instrumentation

Create observability facades that hide instrumentation details:

public interface ObservabilityFacade {
    void recordSuccess(String operation);
    void recordFailure(String operation, Exception e);
    void recordLatency(String operation, Duration duration);
}

This allows changing instrumentation implementation (Prometheus to Datadog) without touching business logic.

The Observability Interface Pattern

Observability Injection Patterns

Embedding observability directly in business logic clutters code and creates tight coupling. Several patterns exist to separate concerns while maintaining comprehensive instrumentation.

Inject observability infrastructure as dependencies, enabling substitution for testing and configuration flexibility:

public class UserService {
    private final Metrics metrics;
    private final Logger log;
    
    // Injected observability concerns
    public UserService(Metrics metrics, LoggerFactory loggerFactory) {
        this.metrics = metrics;
        this.log = loggerFactory.getLogger(UserService.class);
    }
    
    public User createUser(CreateUserRequest request) {
        return metrics.timed("user.create", () -> {
            log.info("Creating user", kv("email", request.getEmail()));
            // Business logic...
        });
    }
}

Advantages: Testable, flexible, explicit dependencies

Disadvantages: Adds constructor parameters, requires DI framework

Choosing the Right Pattern

Designing Observable State

Some classes maintain significant internal state—caches, connection pools, queues, buffers. This state directly impacts system behavior and must be visible to operators.

ObservableConnectionPool.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
public class ObservableConnectionPool implements ConnectionPool {
    private final BlockingQueue<Connection> availableConnections;
    private final Set<Connection> borrowedConnections;
    private final PoolConfig config;
    
    // State metrics
    private final Gauge availableGauge;
    private final Gauge borrowedGauge;
    private final Counter borrowSuccessCounter;
    private final Counter borrowFailureCounter;
    private final Counter exhaustedCounter;
    private final Timer borrowLatency;
    private final Histogram waitTimeHistogram;
    
    public ObservableConnectionPool(PoolConfig config, MeterRegistry registry) {
        this.config = config;
        this.availableConnections = new ArrayBlockingQueue<>(config.getMaxSize());
        this.borrowedConnections = ConcurrentHashMap.newKeySet();
        
        // Initialize gauges as real-time state reporters
        this.availableGauge = registry.gauge("pool.connections.available", 
            Tags.of("pool", config.getName()),
            this, pool -> pool.availableConnections.size());
        
        this.borrowedGauge = registry.gauge("pool.connections.borrowed", 
            Tags.of("pool", config.getName()),
            this, pool -> pool.borrowedConnections.size());
        
        this.borrowSuccessCounter = registry.counter("pool.borrow.success", 
            "pool", config.getName());
        this.borrowFailureCounter = registry.counter("pool.borrow.failure", 
            "pool", config.getName());
        this.exhaustedCounter = registry.counter("pool.exhausted", 
            "pool", config.getName());
        
        this.borrowLatency = registry.timer("pool.borrow.latency", 
            "pool", config.getName());
        this.waitTimeHistogram = registry.histogram("pool.wait.time", 
            "pool", config.getName());
    }
    
    @Override
    public Connection borrow(Duration timeout) {
        return borrowLatency.record(() -> {
            long startWait = System.nanoTime();
            
            try {
                Connection conn = availableConnections.poll(
                    timeout.toMillis(), TimeUnit.MILLISECONDS);
                
                long waitTime = System.nanoTime() - startWait;
                waitTimeHistogram.record(waitTime, TimeUnit.NANOSECONDS);
                
                if (conn == null) {
                    exhaustedCounter.increment();
                    borrowFailureCounter.increment();
                    throw new PoolExhaustedException("No connections available");
                }
                
                borrowedConnections.add(conn);
                borrowSuccessCounter.increment();
                return conn;
                
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                borrowFailureCounter.increment();
                throw new PoolException("Interrupted while waiting", e);
            }
        });
    }
    
    // Expose pool state for debugging/admin endpoints
    public PoolStats getStats() {
        return new PoolStats(
            availableConnections.size(),
            borrowedConnections.size(),
            config.getMaxSize(),
            exhaustedCounter.count(),
            borrowSuccessCounter.count()
        );
    }
}

Observable State Best Practices

•Use Gauges for Current State — Gauges should report instantaneous values (pool size, queue depth, active threads)
•Use Counters for Cumulative Events — Counters track totals over time (requests processed, errors encountered)
•Use Histograms for Distributions — Histograms capture value distributions (latency percentiles, request sizes)
•Tag with Identifying Dimensions — Include pool names, service identifiers, regions to enable drill-down
•Expose Admin Endpoints — Provide REST endpoints for human-readable state inspection

Summary: Designing Observable Classes

We've established the foundational principles for designing classes that expose their behavior for operational visibility. Let's consolidate the key takeaways:

Key Takeaways

•Observability ≠ Monitoring — Observability enables answering arbitrary questions; monitoring watches for known problems
•Design for Questions — Every piece of instrumentation should answer a question about system behavior
•Context is Essential — Events without context are noise; include correlation IDs, identifiers, and parameters
•Encapsulation Compatible — Observability exposes behavior to operators, not implementation to callers
•Injection Patterns — Use dependency injection, AOP, or decorators to separate observability from business logic
•State Visibility — Classes with significant internal state must expose that state through gauges and stats endpoints

What's Next:

Page Complete

1 / 4