Three Pillars - Learning Module

Loading content...

0/273

Logs: Event Records

The Narrative of System Behavior

If metrics are the vital signs of your system—heart rate, temperature, blood pressure—then logs are the patient's medical history. They tell the story of what happened, when, why, and in what context. Metrics tell you something is wrong; logs tell you what went wrong.

Consider this scenario: Your latency dashboard shows a sudden spike. Requests that normally complete in 50ms are now taking 2 seconds. The metric alerts you to the problem, but it doesn't explain the cause. Is it a database query? A third-party API timeout? A lock contention issue? A bad deployment?

Logs answer these questions. They capture the individual events—the specific requests, errors, and state changes—that together tell the complete story of what your system is doing.

In this page, we'll examine logs as the second pillar of observability: what they are, how to structure them for maximum utility, how to aggregate and query them at scale, and the design patterns that make logging effective in production systems.

What You Will Learn

By the end of this page, you will understand logs not as an afterthought but as a first-class observability signal. You'll learn the distinction between unstructured and structured logging, master log level semantics, understand aggregation architectures, and internalize the practices that make logs useful during incident response.

What Are Logs?

At their core, logs are timestamped records of discrete events that occur within a system. Each log entry (or log line, log event, log message) represents something that happened at a specific point in time:

A user logged in
A request was received and processed
An error occurred during database write
A configuration was reloaded
A background job completed

Logs differ fundamentally from metrics in their granularity and nature:

Metrics vs Logs Comparison
Aspect	Metrics	Logs
Nature	Numerical measurements	Event records (text/structured)
Granularity	Aggregated (sampled at intervals)	Individual events
Cardinality	Bounded by labels	Unbounded (any string)
Query pattern	Mathematical operations, aggregations	Search, filtering, pattern matching
Storage cost	Lower (numerical, compressed)	Higher (text, verbose)
Use case	Alerting, dashboards, trends	Debugging, audit trails, forensics

The anatomy of a log entry:

A well-formed log entry contains several components:

•Timestamp — When the event occurred, typically in ISO 8601 format with timezone
•Level — Severity classification (DEBUG, INFO, WARN, ERROR, FATAL)
•Message — Human-readable description of what happened
•Context — Additional data: request IDs, user IDs, operation names, durations
•Source — Where the log came from: service name, host, container, file, line number

log-examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Unstructured log (traditional format)
2024-01-08 10:23:45.123 INFO  [checkout-service] User 12345 placed order ORD-789 for $299.99
 
# Semi-structured log (key-value pairs)
2024-01-08T10:23:45.123Z level=INFO service=checkout user_id=12345 order_id=ORD-789 amount=299.99 msg="Order placed successfully"
 
# Structured log (JSON)
{
  "timestamp": "2024-01-08T10:23:45.123Z",
  "level": "INFO",
  "service": "checkout-service",
  "host": "checkout-pod-abc123",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "message": "Order placed successfully",
  "user_id": "12345",
  "order_id": "ORD-789",
  "amount": 299.99,
  "currency": "USD",
  "payment_method": "credit_card",
  "duration_ms": 127
}

Structured Logging Is Essential

Always use structured logging (JSON or another machine-parseable format). Unstructured text logs are easy to write but nearly impossible to query effectively at scale. 'Find all errors for user 12345' becomes a regex nightmare with unstructured logs, but a simple filter with structured logs.

Log Levels and Their Semantics

Log levels are not arbitrary classifications—they carry specific semantic meaning that guides both what to log and how urgently to investigate. Misusing log levels creates noise (everything is ERROR) or blindness (critical issues are INFO).

Here's the standard hierarchy and when to use each:

FATAL / CRITICAL

•Meaning: The application cannot continue running
•Action: Immediate investigation required; usually means the process is about to crash
•Examples: Unable to connect to required database, out of memory, corrupted essential data
•Frequency: Extremely rare—ideally never in production

ERROR

•Meaning: An operation failed; something that should work, didn't
•Action: Investigation required, but not necessarily immediate; the system is still functioning
•Examples: Failed to process a request, exception thrown, external API returned error
•Frequency: Should be infrequent; high error rates indicate a problem
•Key distinction: Reserved for actual failures, not 'expected' negative outcomes (e.g., 'user not found' might not be an error)

WARN

•Meaning: Something unexpected happened, but recovery was possible
•Action: Worth monitoring; may indicate degradation or an approaching problem
•Examples: Retrying operation after transient failure, using fallback/default, resource near capacity
•Frequency: Occasional; a pattern of warnings may indicate emerging issues
•Key use: 'I handled this, but you should know about it'

INFO

•Meaning: Normal operational events worth recording
•Action: No immediate action; provides audit trail and context
•Examples: Server started, configuration loaded, user authenticated, request processed
•Frequency: Regular but not excessive; the primary 'what happened' record
•Key principle: Should be useful during post-incident review

DEBUG

•Meaning: Detailed information for diagnosing problems
•Action: Used during active debugging; typically disabled in production
•Examples: Variable values, execution flow, intermediate states, SQL queries
•Frequency: Very high when enabled; generates extreme volume
•Production rule: Usually disabled; enable dynamically when needed

TRACE

•Meaning: Extremely detailed, low-level information
•Action: For deep debugging of specific issues
•Examples: Every function entry/exit, loop iterations, field-level data
•Frequency: Extremely high volume; almost never enabled
•Production rule: Only enabled briefly for specific investigations

The 'Everything is ERROR' Anti-Pattern

A common mistake is logging routine failures as ERROR. If a 'user not found' condition in an API is logged as ERROR, you'll see thousands of 'errors' that are actually normal behavior. Reserve ERROR for true failures—things that indicate bugs or system problems, not expected outcomes. Use appropriate levels to maintain signal clarity.

log-level-usage.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
// Correct log level usage examples
 
// FATAL: Application cannot continue
logger.fatal({
  err: dbConnectionError,
  msg: "Cannot connect to database after 10 retries, shutting down"
});
process.exit(1);
 
// ERROR: Operation failed unexpectedly
try {
  await processPayment(order);
} catch (err) {
  logger.error({
    err,
    orderId: order.id,
    userId: order.userId,
    msg: "Payment processing failed"
  });
  throw err;
}
 
// WARN: Recovered from issue, but noteworthy
const result = await fetchWithRetry(url, { retries: 3 });
if (result.attemptCount > 1) {
  logger.warn({
    url,
    attempts: result.attemptCount,
    msg: "Succeeded after retries"
  });
}
 
// INFO: Normal operation worth recording
logger.info({
  userId: user.id,
  action: "login",
  method: "oauth",
  provider: "google",
  msg: "User authenticated successfully"
});
 
// DEBUG: Detailed diagnostic info (usually disabled in prod)
logger.debug({
  query: sql,
  params,
  duration_ms: queryDuration,
  rows_returned: result.length,
  msg: "Database query executed"
});
 
// NOT an error - expected business logic outcome
const user = await findUser(userId);
if (!user) {
  // WRONG: logger.error("User not found");
  // CORRECT:
  logger.debug({ userId, msg: "User lookup returned no results" });
  return res.status(404).json({ error: "User not found" });
}

Structured Logging Deep Dive

Structured logging—emitting logs as machine-parseable data structures (typically JSON)—fundamentally changes what you can do with your logs. Let's understand why it matters and how to implement it effectively.

Why structured logging wins:

Unstructured Logs

•Free-form text makes parsing difficult
•Regex-based searching is fragile
•Inconsistent formats across services
•'Find all logs for user X' requires complex patterns
•Hard to aggregate statistics from logs

Structured Logs

•Machine-parseable fields enable precise queries
•Indexing on specific fields is efficient
•Schema consistency enables tooling
•'Find all logs for user X' is simply: user_id=X
•Statistical analysis of fields is trivial

Essential fields for structured logs:

Every log event should include a core set of fields that provide context without requiring the message to be parsed:

Recommended Structured Log Fields
Field	Purpose	Example
timestamp	When the event occurred (ISO 8601)	2024-01-08T10:23:45.123Z
level	Severity classification	INFO, ERROR, WARN
message	Human-readable event description	Order placed successfully
service	Name of the emitting service	checkout-service
host	Hostname or pod name	checkout-pod-abc123
trace_id	Distributed trace identifier	4bf92f3577b34da6a3ce929d0e0e4736
span_id	Span within the trace	00f067aa0ba902b7
environment	Deployment environment	production, staging
version	Application version/build	v1.2.3-abc123
request_id	Unique request identifier	req_xyz789

structured-logging-config.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
// TypeScript example: Setting up structured logging with Pino
 
import pino from 'pino';
 
// Create a base logger with standard fields
const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  
  // Format options for structured output
  formatters: {
    level: (label) => ({ level: label }),
    bindings: (bindings) => ({
      host: bindings.hostname,
      pid: bindings.pid,
    }),
  },
  
  // Add standard context to every log
  base: {
    service: 'checkout-service',
    version: process.env.APP_VERSION || 'unknown',
    environment: process.env.NODE_ENV || 'development',
  },
  
  // ISO timestamp format
  timestamp: pino.stdTimeFunctions.isoTime,
});
 
// Create child loggers with request-specific context
function createRequestLogger(req) {
  return logger.child({
    request_id: req.id,
    trace_id: req.headers['x-trace-id'],
    user_id: req.user?.id,
    method: req.method,
    path: req.path,
  });
}
 
// Usage in request handler
app.use((req, res, next) => {
  req.log = createRequestLogger(req);
  
  const start = Date.now();
  
  res.on('finish', () => {
    req.log.info({
      status: res.statusCode,
      duration_ms: Date.now() - start,
      msg: 'Request completed'
    });
  });
  
  next();
});
 
// In application code - all logs automatically include request context
async function processOrder(req, order) {
  req.log.info({
    order_id: order.id,
    amount: order.total,
    items: order.items.length,
    msg: 'Processing order'
  });
  
  // ... processing logic
  
  req.log.info({
    order_id: order.id,
    msg: 'Order completed successfully'
  });
}

Context Propagation

Use 'child loggers' to propagate context without repeating fields. Create a child logger at request entry with request_id, user_id, and trace_id, then pass it through your code. Every log automatically includes this context without explicit repetition.

Log Aggregation Architecture

In distributed systems, logs are born across hundreds or thousands of containers, VMs, and serverless functions. Making sense of them requires centralized log aggregation: collecting, processing, storing, and querying logs from all sources in one place.

The standard log aggregation pipeline:

•Generation — Applications emit structured log events to stdout/stderr or log files
•Collection — Log shippers (Fluent Bit, Fluentd, Filebeat) collect logs from containers/hosts
•Processing — Optional transformation: parsing, enrichment, filtering, sampling
•Transport — Logs are sent to the aggregation system via Kafka, HTTP, or direct ingestion
•Indexing — Logs are indexed for fast searching (Elasticsearch, OpenSearch, Loki)
•Storage — Short-term in hot storage (SSDs), long-term in cold storage (S3, blob storage)
•Query — Users search and analyze logs through dashboards (Kibana, Grafana) or APIs

Popular log aggregation stacks:

Log Aggregation Stack Comparison
Stack	Components	Best For	Trade-offs
ELK/Elastic	Elasticsearch, Logstash, Kibana	Full-text search, complex queries	Resource-intensive, operational complexity
EFK	Elasticsearch, Fluent Bit/Fluentd, Kibana	Kubernetes-native collection	Similar to ELK, lighter collection
Grafana Loki	Loki, Promtail, Grafana	Label-based indexing, cost-efficient	Less full-text capability, simpler queries
AWS CloudWatch	CloudWatch Logs, Insights	AWS-native, serverless-friendly	AWS lock-in, query limitations
Datadog/Splunk	Managed SaaS platforms	Ease of operation, features	Cost at scale, vendor lock-in

Loki's Philosophy: Labels, Not All Text

Grafana Loki takes a different approach than Elasticsearch. Instead of indexing every word in every log, Loki indexes only the labels (like Prometheus). Log content is stored as compressed chunks. This dramatically reduces storage and operational cost, at the expense of less powerful full-text search. For most operational uses, label-based filtering (service, pod, level) is sufficient.

fluent-bit-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Fluent Bit configuration for Kubernetes log collection
# This collects container logs and enriches them with K8s metadata
 
[SERVICE]
    Flush         1
    Log_Level     info
    Daemon        off
    Parsers_File  parsers.conf
 
[INPUT]
    Name              tail
    Path              /var/log/containers/*.log
    Parser            docker
    Tag               kube.*
    Refresh_Interval  5
    Mem_Buf_Limit     50MB
    Skip_Long_Lines   On
 
[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Merge_Log           On
    Merge_Log_Key       log_processed
    K8S-Logging.Parser  On
    K8S-Logging.Exclude On
 
[FILTER]
    Name   modify
    Match  kube.*
    # Add cluster identifier
    Add    cluster production-us-east
    # Remove unnecessary fields to reduce size
    Remove $.kubernetes.pod_id
    Remove $.kubernetes.docker_id
 
[OUTPUT]
    Name            loki
    Match           kube.*
    Host            loki.monitoring.svc
    Port            3100
    Labels          job=kubernetes, cluster=${cluster}, namespace=${kubernetes['namespace_name']}, pod=${kubernetes['pod_name']}, container=${kubernetes['container_name']}
    Line_Format     json
    Auto_Kubernetes_Labels off

Log Retention and Cost Management

Logs are expensive. Unlike metrics (small numerical values), logs contain rich textual data—often kilobytes per event. At scale, log storage and processing costs can dwarf your actual application infrastructure.

The cost equation:

Real-World Log Costs

A mid-sized deployment might generate 100GB of logs per day. At $0.50/GB/month for storage and $0.10/GB for ingestion, that's $1,500/month just for storage, plus $300/day for ingestion—$10,800/month total. And that's before query costs. Companies routinely find logging is 20-40% of their cloud bill.

Cost control strategies:

Reducing Log Volume and Cost

•Log level management — Run at INFO in production, not DEBUG. Enable DEBUG temporarily for specific investigations only.
•Sampling — For high-volume logs (every request), sample a percentage rather than logging everything. Sample 10% of successful requests but 100% of errors.
•Filtering at source — Drop unnecessary logs before they leave the container. Health check logs, verbose library logs, and routine events often add no value.
•Compression — Use efficient compression (gzip, snappy) during transport and storage. JSON compresses well.
•Tiered retention — Keep recent logs (7-30 days) in hot storage for fast queries; archive older logs to cold storage (S3) for compliance.
•Field reduction — Don't log entire request/response bodies unless needed. Log a summary, not the full payload.

Tiered Log Retention Strategy
Tier	Retention	Storage	Access	Use Case
Hot	7 days	Elasticsearch/Loki SSD	Interactive queries	Active debugging, incident response
Warm	30 days	Elasticsearch/Loki HDD	Slower queries	Recent investigation, trend analysis
Cold	1+ years	S3/Blob storage	Batch retrieval	Compliance, audit, forensics
Archive	7+ years	Glacier/Archive tier	Rare retrieval	Legal/regulatory requirements

log-sampling.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// Implementing log sampling for high-volume events
 
import pino from 'pino';
import crypto from 'crypto';
 
// Create logger with sampling configuration
const logger = pino({
  level: 'info',
});
 
// Sampling configuration
const SAMPLE_RATES = {
  'request.success': 0.1,    // Log 10% of successful requests
  'request.error': 1.0,      // Log 100% of errors
  'health.check': 0.01,      // Log 1% of health checks
  'cache.hit': 0.05,         // Log 5% of cache hits
  'cache.miss': 0.2,         // Log 20% of cache misses
};
 
// Deterministic sampling based on request ID
// This ensures the same request is always sampled the same way
// across all services (important for distributed tracing)
function shouldSample(eventType: string, requestId: string): boolean {
  const rate = SAMPLE_RATES[eventType] ?? 1.0;
  
  if (rate >= 1.0) return true;
  if (rate <= 0) return false;
  
  // Hash the request ID to get a consistent random value
  const hash = crypto.createHash('md5').update(requestId).digest();
  const value = hash.readUInt32BE(0) / 0xFFFFFFFF;
  
  return value < rate;
}
 
// Wrapper for sampled logging
function logSampled(level: string, eventType: string, data: any) {
  const requestId = data.request_id || crypto.randomUUID();
  
  if (shouldSample(eventType, requestId)) {
    logger[level]({
      ...data,
      event_type: eventType,
      sampled: true,
      sample_rate: SAMPLE_RATES[eventType] ?? 1.0,
    });
  }
}
 
// Usage
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const eventType = res.statusCode >= 400 ? 'request.error' : 'request.success';
    
    logSampled('info', eventType, {
      request_id: req.id,
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration_ms: Date.now() - start,
      msg: 'Request completed',
    });
  });
  
  next();
});

Correlation and Context

In a monolith, following a request's path through logs is straightforward—everything is in one process. In microservices, a single user action might touch 10 different services, each generating independent logs. Without correlation, you're looking at 10 disconnected log files with no way to connect them.

Correlation IDs solve this problem.

A correlation ID (also called request ID or trace ID) is a unique identifier generated at the system's entry point (API gateway, load balancer, or first service) and propagated through every subsequent service call. Every log from every service includes this ID, enabling you to query 'show me all logs for request abc123' and see the complete picture.

correlation-propagation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Correlation ID propagation across services
 
// ========== API Gateway / Entry Point ==========
app.use((req, res, next) => {
  // Generate or accept correlation ID from incoming request
  const correlationId = req.headers['x-correlation-id'] 
    || req.headers['x-request-id'] 
    || crypto.randomUUID();
  
  // Make it available throughout this request
  req.correlationId = correlationId;
  
  // Include in response for client correlation
  res.setHeader('x-correlation-id', correlationId);
  
  // Create logger with correlation context
  req.log = logger.child({
    correlation_id: correlationId,
    trace_id: req.headers['traceparent']?.split('-')[1] || correlationId,
  });
  
  next();
});
 
// ========== When calling downstream services ==========
async function callPaymentService(order, correlationId) {
  const response = await fetch('http://payment-service/process', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-correlation-id': correlationId,  // Propagate!
      'traceparent': `00-${correlationId}-${spanId}-01`,  // W3C trace context
    },
    body: JSON.stringify(order),
  });
  
  return response.json();
}
 
// ========== Downstream service receives and continues ==========
// (In the payment service)
app.use((req, res, next) => {
  // Extract correlation ID from upstream
  const correlationId = req.headers['x-correlation-id'];
  
  req.log = logger.child({
    correlation_id: correlationId,
    service: 'payment-service',
    upstream: req.headers['x-caller-service'],
  });
  
  req.log.info({ msg: 'Request received' });
  
  next();
});

Context propagation patterns:

•Request ID — Unique per incoming request; used for request-level correlation
•Trace ID — Distributed tracing identifier (W3C Trace Context standard); spans the entire request path
•Span ID — Identifies the current operation within a trace
•User ID — The authenticated user making the request
•Session ID — Groups requests within a user session
•Tenant ID — For multi-tenant systems, identifies the tenant

W3C Trace Context Standard

The W3C Trace Context specification defines a standard way to propagate trace information via HTTP headers ('traceparent' and 'tracestate'). Using this standard ensures compatibility with OpenTelemetry and other distributed tracing systems. A single trace ID flows through logs and traces, connecting all observability signals.

Querying correlated logs:

With proper correlation, debugging becomes dramatically easier:

log-queries.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Elasticsearch/Kibana query - All logs for a specific request
correlation_id: "abc-123-def-456"
 
# Filter by service within that request
correlation_id: "abc-123-def-456" AND service: "payment-service"
 
# Loki LogQL - All logs for a request across all services
{cluster="production"} |= "abc-123-def-456"
 
# With JSON parsing in Loki
{cluster="production"} | json | correlation_id = "abc-123-def-456"
 
# Show only errors for this request
{cluster="production"} | json | correlation_id = "abc-123-def-456" | level = "ERROR"
 
# Timeline reconstruction - seeing the flow
{cluster="production"} | json | correlation_id = "abc-123-def-456" | line_format "{{.timestamp}} [{{.service}}] {{.message}}"

Log Security and Compliance

Logs are a double-edged sword. They provide invaluable insight but can also become a liability. Logs often contain sensitive information—and that information can persist far longer than you intend.

What accidentally ends up in logs:

Common Sensitive Data in Logs

•Authentication tokens — JWTs, API keys, session tokens logged in headers
•Personal data — Names, emails, addresses logged in request bodies
•Financial data — Credit card numbers, bank accounts in payment logs
•Health data — PHI/HIPAA-protected information in medical applications
•Passwords — Visible in login request bodies if not careful
•Internal IPs/Architecture — Infrastructure details useful to attackers

Real-World Log Leaks

Numerous breaches have been traced to logs. In one famous case, a company logged full HTTP requests including Authorization headers. Those logs were stored in an insufficiently protected log aggregation system. Attackers accessed the logs and extracted tokens to impersonate users. The fix was easy; the damage was done.

Log security best practices:

•Never log secrets — Passwords, API keys, tokens, private keys should never appear in logs
•Redact PII — Mask or hash personal identifiers (emails, phone numbers, card numbers)
•Audit log access — Monitor who queries logs, especially those containing user data
•Encrypt logs in transit — Use TLS for log shipping
•Encrypt logs at rest — Use encrypted storage for log files and databases
•RBAC for log access — Not everyone needs access to all logs; limit by role and service
•Retention limits — Delete logs containing sensitive data after compliance periods end

log-redaction.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// Implementing log redaction
 
const REDACT_PATTERNS = [
  // Credit card numbers
  { pattern: /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g, replacement: '[REDACTED_CC]' },
  // Email addresses
  { pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, replacement: '[REDACTED_EMAIL]' },
  // AWS keys
  { pattern: /AKIA[0-9A-Z]{16}/g, replacement: '[REDACTED_AWS_KEY]' },
  // JWT tokens
  { pattern: /eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*/g, replacement: '[REDACTED_JWT]' },
  // Authorization headers
  { pattern: /Bearer [a-zA-Z0-9._-]+/g, replacement: 'Bearer [REDACTED]' },
];
 
const SENSITIVE_FIELDS = ['password', 'secret', 'token', 'apiKey', 'authorization', 'creditCard', 'ssn'];
 
function redactSensitiveData(obj: any): any {
  if (typeof obj === 'string') {
    let result = obj;
    for (const { pattern, replacement } of REDACT_PATTERNS) {
      result = result.replace(pattern, replacement);
    }
    return result;
  }
  
  if (Array.isArray(obj)) {
    return obj.map(redactSensitiveData);
  }
  
  if (obj && typeof obj === 'object') {
    const result: any = {};
    for (const [key, value] of Object.entries(obj)) {
      // Completely redact known sensitive fields
      if (SENSITIVE_FIELDS.some(f => key.toLowerCase().includes(f.toLowerCase()))) {
        result[key] = '[REDACTED]';
      } else {
        result[key] = redactSensitiveData(value);
      }
    }
    return result;
  }
  
  return obj;
}
 
// Pino redaction hook
const logger = pino({
  level: 'info',
  hooks: {
    logMethod(inputArgs, method) {
      if (inputArgs.length >= 1) {
        inputArgs[0] = redactSensitiveData(inputArgs[0]);
      }
      return method.apply(this, inputArgs);
    },
  },
  // Built-in redaction for known paths
  redact: {
    paths: ['req.headers.authorization', 'req.headers.cookie', '*.password', '*.secret'],
    censor: '[REDACTED]',
  },
});

Summary: Logs as Event Narrative

We've explored logs comprehensively—from their fundamental nature to practical implementation patterns. Let's consolidate the key takeaways:

Key Takeaways

•Logs are event records — Timestamped, contextual records of discrete events that tell the story of what happened.
•Structured logging is essential — JSON or other machine-parseable formats enable effective querying at scale.
•Log levels have semantics — Use ERROR for failures, WARN for recoverable issues, INFO for normal events, DEBUG for diagnostics.
•Aggregation is required at scale — Centralized log systems (ELK, Loki, cloud services) make logs queryable.
•Cost management is critical — Logs are expensive; sample, filter, tier, and retain strategically.
•Correlation connects the dots — Correlation IDs enable tracing a request across services.
•Security is non-negotiable — Redact sensitive data, encrypt logs, control access.

What's next:

Metrics tell you what is happening—quantitative measurements over time. Logs tell you what happened—the events and context behind the numbers. But when a request flows through 10 services, neither metrics nor logs alone show you the complete path.

Next, we'll explore Traces—the third pillar of observability. Distributed tracing captures the full journey of a request through your system, connecting the dots between services and revealing where time is spent.

Page Complete

You now understand logs as the narrative companion to metrics. Structured logging with proper correlation and security practices enables effective debugging and incident response. Next, we'll see how traces complete the observability picture by showing request paths through distributed systems.