Metrics Collection - Learning Module

Loading content...

0/273

Metric Naming Conventions

Names Are Interface Contracts

A metric name is an interface contract. Once you expose a metric, dashboards and alerts depend on its name, labels, and semantics. Renaming a metric means updating every query, dashboard, and alert that references it. This is why thoughtful naming from the start is one of the highest-ROI investments in observability.

Beyond avoiding churn, good naming enables discoverability. Engineers should be able to guess metric names based on conventions. When someone wants to know HTTP error rates, they should know to look for http_*_total with a status label—without reading documentation.

This page covers established naming conventions from the Prometheus community, anti-patterns to avoid, and strategies for maintaining consistency across large organizations.

What You Will Learn

By the end of this page, you will understand metric naming best practices: base units, suffixes for metric types, label usage, namespace prefixes, and documentation requirements. You'll be able to design metric names that are consistent, predictable, and maintainable at scale.

The Anatomy of a Good Metric Name

A well-designed metric name communicates what is being measured at a glance. The Prometheus naming convention follows a structured pattern:

<namespace>_<subsystem>_<name>_<unit>_<suffix>

Each component serves a specific purpose:

Metric Name Components
Component	Purpose	Examples
namespace	Application or organization prefix	myapp_, prometheus_, node_
subsystem	Functional area within application	http_, db_, cache_, queue_
name	What is actually being measured	requests_, bytes_, duration_
unit	The base unit of measurement	seconds_, bytes_, total_
suffix	Metric type indicator	_total, _count, _sum, _bucket

Complete Examples:

Metric Name	Breakdown
`http_requests_total`	http (subsystem) + requests (name) + total (suffix)
`myapp_db_query_duration_seconds`	myapp (namespace) + db (subsystem) + query_duration (name) + seconds (unit)
`node_disk_read_bytes_total`	node (namespace) + disk (subsystem) + read_bytes (name) + total (suffix)
`process_cpu_seconds_total`	process (namespace) + cpu (subsystem) + seconds (unit) + total (suffix)

Character Rules:

Use only ASCII letters, digits, and underscores: [a-zA-Z_][a-zA-Z0-9_]*
Start with a letter or underscore (never a digit)
Use snake_case (lowercase with underscores)
Avoid reserved prefixes: __ (double underscore reserved for internal use)

Predictability Is Key

A good naming convention means engineers can guess metric names. If they know your HTTP metrics follow 'http_<action>_<unit>_total', they can guess http_requests_total, http_response_bytes_total, http_errors_total without looking at documentation.

Base Units: The Foundation

One of the most important conventions is using base units consistently. Using base units eliminates confusion and enables mathematical operations without conversion.

The Base Unit Rule:

Always use the fundamental SI base unit, not derived units:

Measurement	Use (Base Unit)	Avoid (Derived Units)
Time	seconds	milliseconds, microseconds
Data size	bytes	kilobytes, megabytes
Temperature	celsius	fahrenheit
Ratio	ratio (0-1)	percentage (0-100)

Why This Matters:

Consider mixing units in queries:

# If some metrics are in seconds and some in milliseconds:
http_request_duration_seconds{service="A"} + db_query_latency_ms{service="A"}
# This produces meaningless results!

# With consistent base units:
http_request_duration_seconds{service="A"} + db_query_duration_seconds{service="A"}
# Clean addition, correct results

Common Base Units

•seconds — All time durations (not ms, µs, or ns)
•bytes — All data sizes (not KB, MB, or GB)
•meters — All distances (rarely used in software metrics)
•ratio — Proportions as 0.0-1.0 (not percentages)
•celsius — Temperatures (when needed)
•total — Event counts (suffix indicating counter)

base_units_example.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// CORRECT: Base units
var (
    // Time in seconds
    requestDuration = prometheus.NewHistogram(
        prometheus.HistogramOpts{
            Name: "http_request_duration_seconds",
            Help: "HTTP request latency in seconds",
            Buckets: []float64{.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
        },
    )
    
    // Size in bytes
    responseSize = prometheus.NewHistogram(
        prometheus.HistogramOpts{
            Name: "http_response_size_bytes",
            Help: "HTTP response size in bytes",
            Buckets: prometheus.ExponentialBuckets(100, 10, 6),
        },
    )
    
    // Ratio (0.0 to 1.0)
    cacheHitRatio = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "cache_hit_ratio",
            Help: "Cache hit ratio (0.0 to 1.0)",
        },
    )
)
 
// INCORRECT: Don't do this
var (
    // DON'T use milliseconds
    requestDurationMs = prometheus.NewHistogram(
        prometheus.HistogramOpts{
            Name: "http_request_duration_ms",  // Wrong!
            // ...
        },
    )
    
    // DON'T use percentages
    cacheHitPercent = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "cache_hit_percent",  // Wrong!
            // ...
        },
    )
)

Displaying in Dashboards

Base units don't mean you display base units. Grafana (and similar tools) can convert seconds to milliseconds or bytes to megabytes for display. Store in base units; transform in presentation. This keeps storage consistent while allowing flexible visualization.

Type-Specific Suffixes

Suffixes indicate the metric type, enabling correct usage without checking documentation. Each metric type has expected suffixes:

Metric Type Suffixes
Metric Type	Suffix	Example	Notes
Counter	_total	http_requests_total	Always use _total for counters
Gauge	(none or _<unit>)	temperature_celsius	No special suffix needed
Histogram bucket	_bucket	http_request_duration_seconds_bucket	Auto-generated with {le="..."}
Histogram sum	_sum	http_request_duration_seconds_sum	Auto-generated, total of observations
Histogram count	_count	http_request_duration_seconds_count	Auto-generated, number of observations
Summary quantile	(none)	go_gc_duration_seconds	With {quantile="..."}
Info metric	_info	build_info	Value always 1, labels carry data

Counter Naming:

<namespace>_<subsystem>_<name>_total

http_requests_total
errors_total  
processed_bytes_total
cache_hits_total

Gauge Naming:

Gauges don't need a suffix, but include the unit:

temperature_celsius
memory_usage_bytes
queue_depth
active_connections

Histogram Naming:

Histograms auto-generate three series. You name the base; Prometheus adds suffixes:

# You define:
http_request_duration_seconds

# Prometheus creates:
http_request_duration_seconds_bucket{le="0.1"}
http_request_duration_seconds_bucket{le="0.5"}
http_request_duration_seconds_bucket{le="+Inf"}
http_request_duration_seconds_sum
http_request_duration_seconds_count

info_metric_example.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Info metrics: expose metadata as labels
// Value is always 1; labels carry the information
 
var buildInfo = prometheus.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "myapp_build_info",
        Help: "Build information about the running instance",
    },
    []string{"version", "commit_sha", "build_date", "go_version"},
)
 
func init() {
    // Set value to 1, labels carry the actual info
    buildInfo.WithLabelValues(
        "1.2.3",
        "abc123def",
        "2024-01-15",
        runtime.Version(),
    ).Set(1)
}
 
// Query in PromQL:
// myapp_build_info{version="1.2.3"}
// Returns 1 if this version is running
 
// Join with other metrics:
// rate(http_requests_total[5m]) 
//   * on(instance) group_left(version) 
//   myapp_build_info

Don't Omit _total

Never create a counter without the _total suffix. 'http_requests' (without _total) could be misinterpreted as a gauge. The suffix signals that rate() should be applied and that values only increase.

Label Best Practices

Labels are the dimensionality mechanism in Prometheus—they turn one metric into many time series. Carefully chosen labels enable powerful queries; poorly chosen labels cause cardinality explosions.

The Golden Rule:

Labels should have a bounded, small set of possible values.

Every unique combination of labels creates a distinct time series. Labels with unbounded cardinality (user IDs, request IDs, email addresses) can create millions of series, overwhelming Prometheus.

Good Labels

•method (GET, POST, PUT, DELETE)
•status_code (200, 201, 400, 500)
•endpoint (/api/users, /api/orders)
•service (auth, payments, users)
•region (us-east-1, eu-west-1)
•environment (prod, staging, dev)
•cache_result (hit, miss)
•error_type (timeout, validation, db)

Dangerous Labels

•user_id (millions of values)
•request_id (unique per request)
•email (unbounded)
•timestamp (always unique)
•session_id (per-session unique)
•transaction_id (per-transaction)
•ip_address (many thousands)
•full_path with IDs (/users/12345)

Cardinality Impact:

Let's calculate series created by label choices:

# Good: Bounded labels
http_requests_total{method, status_code, endpoint}
5 methods × 10 status codes × 50 endpoints = 2,500 series

# Bad: Unbounded labels  
http_requests_total{method, status_code, user_id}
5 methods × 10 status codes × 1,000,000 users = 50,000,000 series!

50 million series would overwhelm even the largest Prometheus deployment.

Label Naming:

Use snake_case: status_code, not statusCode
Be specific: http_status_code > status if ambiguity exists
Avoid redundancy: Don't repeat metric name in labels
Use consistent values: true/false, not yes/no/1/0

label_examples.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// GOOD: Bounded, meaningful labels
httpRequests := prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "Total HTTP requests",
    },
    []string{"method", "endpoint", "status_code"},
)
 
// GOOD: Normalize unbounded values
func normalizeEndpoint(path string) string {
    // Convert /users/12345 → /users/:id
    // Convert /orders/abc-123/items/456 → /orders/:id/items/:id
    patterns := []struct {
        regex       *regexp.Regexp
        replacement string
    }{
        {regexp.MustCompile(`/users/[^/]+`), "/users/:id"},
        {regexp.MustCompile(`/orders/[^/]+`), "/orders/:id"},
        {regexp.MustCompile(`/items/[^/]+`), "/items/:id"},
    }
    
    result := path
    for _, p := range patterns {
        result = p.regex.ReplaceAllString(result, p.replacement)
    }
    return result
}
 
// Usage:
func handleRequest(r *http.Request, statusCode int) {
    normalizedPath := normalizeEndpoint(r.URL.Path)
    httpRequests.WithLabelValues(
        r.Method,
        normalizedPath,  // Bounded!
        strconv.Itoa(statusCode),
    ).Inc()
}
 
// BAD: Don't do this
httpRequestsBad := prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "http_requests_total",
        Help: "Total HTTP requests",
    },
    []string{"method", "path", "user_id"},  // path and user_id are unbounded!
)

Template Endpoints

For dynamic URL paths, use the route template (with placeholders) rather than the actual path. '/users/:id' instead of '/users/12345'. Most web frameworks can provide the matched route template for exactly this purpose.

Namespace and Subsystem Organization

As organizations grow, metrics from different teams and applications must coexist. Namespaces and subsystems provide organizational structure.

Namespace:

The namespace prefix identifies the application or domain. It prevents collisions when multiple teams contribute metrics:

mycompany_payments_*     # Payments team
mycompany_auth_*         # Auth team
mycompany_inventory_*    # Inventory team

Subsystem:

Within a namespace, subsystems identify functional areas:

namespace_structure.txt

Text

# Namespace: myapp
# Subsystems: http, db, cache, queue
 
myapp_http_requests_total
myapp_http_request_duration_seconds
myapp_http_response_size_bytes
 
myapp_db_query_duration_seconds
myapp_db_connections_active
myapp_db_errors_total
 
myapp_cache_hits_total
myapp_cache_misses_total
myapp_cache_size_bytes
 
myapp_queue_messages_total
myapp_queue_depth
myapp_queue_processing_seconds
 
# Standard exporters use their own namespaces:
node_*           # Node exporter (Linux system metrics)
process_*        # Process metrics (built into client libraries)
go_*             # Go runtime metrics
mysql_*          # MySQL exporter
redis_*          # Redis exporter

When to Use Namespaces:

Scenario	Recommendation
Single small application	Optional, but recommended
Multiple internal applications	Required for each app
Shared libraries	Use library name as namespace
Exporters	Use target system name
Company-wide standards	Company or org prefix

Prometheus Client Library Namespacing:

namespace_usage.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
package metrics
 
import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)
 
// Define namespace and subsystems as constants
const (
    namespace = "myapp"
    
    subsystemHTTP  = "http"
    subsystemDB    = "db"
    subsystemCache = "cache"
)
 
// HTTP metrics
var (
    HTTPRequestsTotal = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Namespace: namespace,
            Subsystem: subsystemHTTP,
            Name:      "requests_total",
            Help:      "Total HTTP requests",
        },
        []string{"method", "endpoint", "status_code"},
    )
    // Result: myapp_http_requests_total
    
    HTTPRequestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Namespace: namespace,
            Subsystem: subsystemHTTP,
            Name:      "request_duration_seconds",
            Help:      "HTTP request duration in seconds",
            Buckets:   prometheus.DefBuckets,
        },
        []string{"method", "endpoint"},
    )
    // Result: myapp_http_request_duration_seconds
)
 
// Database metrics
var (
    DBQueryDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Namespace: namespace,
            Subsystem: subsystemDB,
            Name:      "query_duration_seconds",
            Help:      "Database query duration in seconds",
            Buckets:   []float64{.001, .005, .01, .05, .1, .5, 1},
        },
        []string{"query_type"},
    )
    // Result: myapp_db_query_duration_seconds
)

Consistency Over Perfection

The specific naming scheme matters less than consistent application. Document your conventions, enforce them in code reviews, and use linting tools to check compliance. Inconsistent naming is far worse than imperfect but consistent naming.

Documentation Requirements

Every metric should be self-documenting through its HELP string. The HELP text appears in /metrics output and metric browsers, serving as inline documentation.

Good HELP strings:

Describe what is being measured
Specify the unit
Clarify any ambiguity
Mention important labels

help_strings.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// GOOD: Descriptive HELP strings
var (
    httpRequestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests received, labeled by method, endpoint, and status code.",
        },
        []string{"method", "endpoint", "status_code"},
    )
    
    httpRequestDurationSeconds = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request latency in seconds, from request received to response sent.",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "endpoint"},
    )
    
    dbConnectionsActive = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "db_connections_active",
            Help: "Number of currently active database connections by pool name. Includes connections in use and idle.",
        },
        []string{"pool"},
    )
    
    cacheHitRatio = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "cache_hit_ratio",
            Help: "Cache hit ratio over the last minute (0.0 = all misses, 1.0 = all hits). Computed as hits / (hits + misses).",
        },
        []string{"cache_name"},
    )
)
 
// BAD: Unhelpful HELP strings
var (
    badMetric1 = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "HTTP requests",  // Too vague
        },
    )
    
    badMetric2 = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "errors_total",
            Help: "Total errors",  // What kind? Where?
        },
    )
)

Documentation Beyond HELP:

For comprehensive observability, maintain additional documentation:

Metric catalogs: Searchable inventory of all metrics with ownership
Runbook links: Connect alerts to remediation procedures
Dashboard guides: Explain what each panel shows and why
Label value inventories: Document expected values for each label

Example Metric Catalog Entry:

metric: http_request_duration_seconds
type: histogram
subsystem: http
owner: platform-team
description: |
  Latency distribution for HTTP requests, measured from when the 
  request handler begins to when the response is fully written.
labels:
  method: HTTP method (GET, POST, PUT, DELETE)
  endpoint: Normalized route pattern (e.g., /users/:id)
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
alerts:
  - high_latency_p99 (warning at 1s, critical at 5s)
dashboards:
  - service-overview
  - api-latency-breakdown

Generate Documentation

Automate metric documentation from code. Parse your instrumentation to extract metric definitions and generate catalog entries. This keeps documentation synchronized with reality.

Common Anti-Patterns

Recognizing anti-patterns helps you avoid common naming mistakes:

Metric Naming Anti-Patterns

•Encoding labels in metric names — http_requests_get_total instead of http_requests_total{method="GET"}. Labels allow filtering; metric names don't.
•Inconsistent units — response_time_ms mixed with query_duration_seconds. Pick base units (seconds) and stick to them.
•Missing _total suffix — http_requests for a counter. Always add _total for counters.
•Unbounded labels — Using user_id, session_id, or request_id as labels. Creates millions of series.
•Type misunderstanding — Using gauge behavior (incrementing/decrementing) for something named *_total.
•Inconsistent casing — Mixing responseTime, response_time, and ResponseTime. Always use snake_case.
•Redundant labels — http_requests_total{http_method="GET"}. The http_ in the label duplicates the namespace.
•Generic names — count, value, metric. Names should describe specifically what's measured.

Anti-Pattern: Encoding Labels in Names

# BAD: Label values encoded in metric name
http_requests_get_total
http_requests_post_total
http_requests_put_total
http_requests_delete_total

# GOOD: Use labels
http_requests_total{method="GET"}
http_requests_total{method="POST"}
http_requests_total{method="PUT"}
http_requests_total{method="DELETE"}

Why labels are better:

Single query for all methods: http_requests_total
Easy filtering: http_requests_total{method="GET"}
Aggregation: sum by(endpoint)(rate(http_requests_total[5m]))
New methods don't require new metric definitions

Anti-Pattern Corrections
Anti-Pattern	Problem	Correct Version
request_count	No _total, vague	http_requests_total
latency_ms	Non-base unit	request_duration_seconds
cpu_percent	Percentage, not ratio	cpu_usage_ratio (0-1)
activeConnections	CamelCase	active_connections
requests_total{userId="123"}	Unbounded label	Drop user_id or aggregate
database_query_time	Vague, missing unit	db_query_duration_seconds

Legacy Metrics

Renaming existing metrics breaks all queries, dashboards, and alerts that reference them. For legacy metrics with bad names, consider adding correctly-named aliases rather than renaming. Document the migration path and phase out old names gradually.

Organizational Standards

For organizations with many teams, establishing and enforcing naming standards is essential. Here's a practical approach:

Building Organizational Standards

•Document conventions — Create a living style guide covering naming, units, labels, and documentation requirements.
•Provide templates — Offer code templates and generator tools that produce correctly-named metrics.
•Automate validation — Build linters that check metric names against conventions during CI/CD.
•Review new metrics — Include metric definitions in code review checklists.
•Create shared libraries — Centralize common metric patterns in reusable libraries.
•Publish catalogs — Maintain searchable registries of all metrics with ownership information.
•Track adoption — Measure compliance and celebrate teams that follow conventions.

metric_linter.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#!/usr/bin/env python3
"""
Example metric naming linter - validates conventions
"""
 
import re
import sys
from typing import List, Tuple
 
# Naming convention rules
RULES = {
    "snake_case": r'^[a-z][a-z0-9_]*$',
    "valid_chars": r'^[a-zA-Z_:][a-zA-Z0-9_:]*$',
    "no_double_underscore": r'^(?!.*__)',
    "base_units": ['seconds', 'bytes', 'total', 'ratio', 'celsius'],
    "counter_suffix": '_total',
    "reserved_prefixes": ['__'],
}
 
def lint_metric_name(name: str) -> List[str]:
    """Check a metric name against conventions."""
    errors = []
    
    # Check snake_case
    if not re.match(RULES['snake_case'], name):
        errors.append(f"'{name}' should be snake_case (lowercase with underscores)")
    
    # Check valid characters
    if not re.match(RULES['valid_chars'], name):
        errors.append(f"'{name}' contains invalid characters")
    
    # Check for double underscores
    if not re.match(RULES['no_double_underscore'], name):
        errors.append(f"'{name}' should not contain double underscores")
    
    # Check for reserved prefixes
    for prefix in RULES['reserved_prefixes']:
        if name.startswith(prefix):
            errors.append(f"'{name}' uses reserved prefix '{prefix}'")
    
    return errors
 
def lint_label_name(name: str) -> List[str]:
    """Check a label name against conventions."""
    errors = []
    
    if not re.match(r'^[a-z][a-z0-9_]*$', name):
        errors.append(f"Label '{name}' should be snake_case")
    
    # Check for high-cardinality indicators
    high_cardinality_patterns = ['user_id', 'session_id', 'request_id', 
                                  'transaction_id', 'ip_address', 'email']
    for pattern in high_cardinality_patterns:
        if pattern in name:
            errors.append(f"Label '{name}' may have unbounded cardinality")
    
    return errors
 
def recommend_improvements(name: str) -> List[str]:
    """Suggest improvements to metric names."""
    suggestions = []
    
    # Check for missing unit
    has_unit = any(name.endswith(f'_{unit}') for unit in RULES['base_units'])
    if not has_unit and not name.endswith('_info'):
        suggestions.append(f"Consider adding a unit suffix (e.g., _seconds, _bytes)")
    
    # Check counter suffix
    if 'count' in name or 'total' in name.split('_')[:-1]:
        if not name.endswith('_total'):
            suggestions.append(f"Counter metrics should end with '_total'")
    
    return suggestions
 
# Usage example
if __name__ == '__main__':
    test_metrics = [
        'http_requests_total',           # Good
        'responseTime',                   # Bad: not snake_case
        'http_requests',                  # Should be _total
        'user_session_seconds_total',     # Good
        '__internal_metric',              # Bad: reserved prefix
    ]
    
    for metric in test_metrics:
        errors = lint_metric_name(metric)
        suggestions = recommend_improvements(metric)
        
        print(f"\n{metric}:")
        if errors:
            print(f"  Errors: {errors}")
        if suggestions:
            print(f"  Suggestions: {suggestions}")
        if not errors and not suggestions:
            print(f"  ✓ Passes all checks")

Cultural Change

Standards only work if adopted. Make compliance easy (provide tools and templates), visible (publish leaderboards), and valued (celebrate great observability). Technical enforcement without cultural buy-in leads to workarounds.

Summary: Naming Conventions Mastered

Metric naming may seem like a minor concern, but it's a force multiplier. Consistent naming enables discoverability, prevents confusion, and reduces cognitive load for everyone who uses your observability system.

Key Takeaways

•Use the standard structure: <namespace>_<subsystem>_<name>_<unit>_<suffix> provides consistency and discoverability.
•Always use base units: seconds (not ms), bytes (not KB), ratio (not percentage). Transform in dashboards.
•Add type suffixes: Counters end with _total; histograms auto-generate _bucket, _sum, _count.
•Labels must be bounded: Never use user IDs, request IDs, or other high-cardinality values as labels.
•Normalize dynamic paths: Use route templates (/users/:id) not actual paths (/users/12345).
•Document with HELP strings: Every metric should have a descriptive HELP text explaining its meaning.
•Establish organizational standards: Document conventions, automate validation, and review in CI/CD.

What's Next:

With naming conventions established, the next page addresses cardinality considerations—the single most common cause of observability system failures. You'll learn to calculate cardinality impact, identify problematic patterns, and design metrics that scale.

Page Complete

You now have a comprehensive understanding of metric naming conventions. These standards will serve you throughout your observability journey, making your metrics discoverable, consistent, and maintainable.