Loading content...
A metric name is an interface contract. Once you expose a metric, dashboards and alerts depend on its name, labels, and semantics. Renaming a metric means updating every query, dashboard, and alert that references it. This is why thoughtful naming from the start is one of the highest-ROI investments in observability.
Beyond avoiding churn, good naming enables discoverability. Engineers should be able to guess metric names based on conventions. When someone wants to know HTTP error rates, they should know to look for http_*_total with a status label—without reading documentation.
This page covers established naming conventions from the Prometheus community, anti-patterns to avoid, and strategies for maintaining consistency across large organizations.
By the end of this page, you will understand metric naming best practices: base units, suffixes for metric types, label usage, namespace prefixes, and documentation requirements. You'll be able to design metric names that are consistent, predictable, and maintainable at scale.
A well-designed metric name communicates what is being measured at a glance. The Prometheus naming convention follows a structured pattern:
<namespace>_<subsystem>_<name>_<unit>_<suffix>
Each component serves a specific purpose:
| Component | Purpose | Examples |
|---|---|---|
| namespace | Application or organization prefix | myapp_, prometheus_, node_ |
| subsystem | Functional area within application | http_, db_, cache_, queue_ |
| name | What is actually being measured | requests_, bytes_, duration_ |
| unit | The base unit of measurement | seconds_, bytes_, total_ |
| suffix | Metric type indicator | _total, _count, _sum, _bucket |
Complete Examples:
| Metric Name | Breakdown |
|---|---|
http_requests_total | http (subsystem) + requests (name) + total (suffix) |
myapp_db_query_duration_seconds | myapp (namespace) + db (subsystem) + query_duration (name) + seconds (unit) |
node_disk_read_bytes_total | node (namespace) + disk (subsystem) + read_bytes (name) + total (suffix) |
process_cpu_seconds_total | process (namespace) + cpu (subsystem) + seconds (unit) + total (suffix) |
Character Rules:
[a-zA-Z_][a-zA-Z0-9_]*__ (double underscore reserved for internal use)A good naming convention means engineers can guess metric names. If they know your HTTP metrics follow 'http_<action>_<unit>_total', they can guess http_requests_total, http_response_bytes_total, http_errors_total without looking at documentation.
One of the most important conventions is using base units consistently. Using base units eliminates confusion and enables mathematical operations without conversion.
The Base Unit Rule:
Always use the fundamental SI base unit, not derived units:
| Measurement | Use (Base Unit) | Avoid (Derived Units) |
|---|---|---|
| Time | seconds | milliseconds, microseconds |
| Data size | bytes | kilobytes, megabytes |
| Temperature | celsius | fahrenheit |
| Ratio | ratio (0-1) | percentage (0-100) |
Why This Matters:
Consider mixing units in queries:
# If some metrics are in seconds and some in milliseconds:
http_request_duration_seconds{service="A"} + db_query_latency_ms{service="A"}
# This produces meaningless results!
# With consistent base units:
http_request_duration_seconds{service="A"} + db_query_duration_seconds{service="A"}
# Clean addition, correct results
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
// CORRECT: Base unitsvar ( // Time in seconds requestDuration = prometheus.NewHistogram( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "HTTP request latency in seconds", Buckets: []float64{.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5}, }, ) // Size in bytes responseSize = prometheus.NewHistogram( prometheus.HistogramOpts{ Name: "http_response_size_bytes", Help: "HTTP response size in bytes", Buckets: prometheus.ExponentialBuckets(100, 10, 6), }, ) // Ratio (0.0 to 1.0) cacheHitRatio = prometheus.NewGauge( prometheus.GaugeOpts{ Name: "cache_hit_ratio", Help: "Cache hit ratio (0.0 to 1.0)", }, )) // INCORRECT: Don't do thisvar ( // DON'T use milliseconds requestDurationMs = prometheus.NewHistogram( prometheus.HistogramOpts{ Name: "http_request_duration_ms", // Wrong! // ... }, ) // DON'T use percentages cacheHitPercent = prometheus.NewGauge( prometheus.GaugeOpts{ Name: "cache_hit_percent", // Wrong! // ... }, ))Base units don't mean you display base units. Grafana (and similar tools) can convert seconds to milliseconds or bytes to megabytes for display. Store in base units; transform in presentation. This keeps storage consistent while allowing flexible visualization.
Suffixes indicate the metric type, enabling correct usage without checking documentation. Each metric type has expected suffixes:
| Metric Type | Suffix | Example | Notes |
|---|---|---|---|
| Counter | _total | http_requests_total | Always use _total for counters |
| Gauge | (none or _<unit>) | temperature_celsius | No special suffix needed |
| Histogram bucket | _bucket | http_request_duration_seconds_bucket | Auto-generated with {le="..."} |
| Histogram sum | _sum | http_request_duration_seconds_sum | Auto-generated, total of observations |
| Histogram count | _count | http_request_duration_seconds_count | Auto-generated, number of observations |
| Summary quantile | (none) | go_gc_duration_seconds | With {quantile="..."} |
| Info metric | _info | build_info | Value always 1, labels carry data |
Counter Naming:
<namespace>_<subsystem>_<name>_total
http_requests_total
errors_total
processed_bytes_total
cache_hits_total
Gauge Naming:
Gauges don't need a suffix, but include the unit:
temperature_celsius
memory_usage_bytes
queue_depth
active_connections
Histogram Naming:
Histograms auto-generate three series. You name the base; Prometheus adds suffixes:
# You define:
http_request_duration_seconds
# Prometheus creates:
http_request_duration_seconds_bucket{le="0.1"}
http_request_duration_seconds_bucket{le="0.5"}
http_request_duration_seconds_bucket{le="+Inf"}
http_request_duration_seconds_sum
http_request_duration_seconds_count
1234567891011121314151617181920212223242526272829
// Info metrics: expose metadata as labels// Value is always 1; labels carry the information var buildInfo = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Name: "myapp_build_info", Help: "Build information about the running instance", }, []string{"version", "commit_sha", "build_date", "go_version"},) func init() { // Set value to 1, labels carry the actual info buildInfo.WithLabelValues( "1.2.3", "abc123def", "2024-01-15", runtime.Version(), ).Set(1)} // Query in PromQL:// myapp_build_info{version="1.2.3"}// Returns 1 if this version is running // Join with other metrics:// rate(http_requests_total[5m]) // * on(instance) group_left(version) // myapp_build_infoNever create a counter without the _total suffix. 'http_requests' (without _total) could be misinterpreted as a gauge. The suffix signals that rate() should be applied and that values only increase.
Labels are the dimensionality mechanism in Prometheus—they turn one metric into many time series. Carefully chosen labels enable powerful queries; poorly chosen labels cause cardinality explosions.
The Golden Rule:
Labels should have a bounded, small set of possible values.
Every unique combination of labels creates a distinct time series. Labels with unbounded cardinality (user IDs, request IDs, email addresses) can create millions of series, overwhelming Prometheus.
Cardinality Impact:
Let's calculate series created by label choices:
# Good: Bounded labels
http_requests_total{method, status_code, endpoint}
5 methods × 10 status codes × 50 endpoints = 2,500 series
# Bad: Unbounded labels
http_requests_total{method, status_code, user_id}
5 methods × 10 status codes × 1,000,000 users = 50,000,000 series!
50 million series would overwhelm even the largest Prometheus deployment.
Label Naming:
status_code, not statusCodehttp_status_code > status if ambiguity existstrue/false, not yes/no/1/01234567891011121314151617181920212223242526272829303132333435363738394041424344454647
// GOOD: Bounded, meaningful labelshttpRequests := prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total HTTP requests", }, []string{"method", "endpoint", "status_code"},) // GOOD: Normalize unbounded valuesfunc normalizeEndpoint(path string) string { // Convert /users/12345 → /users/:id // Convert /orders/abc-123/items/456 → /orders/:id/items/:id patterns := []struct { regex *regexp.Regexp replacement string }{ {regexp.MustCompile(`/users/[^/]+`), "/users/:id"}, {regexp.MustCompile(`/orders/[^/]+`), "/orders/:id"}, {regexp.MustCompile(`/items/[^/]+`), "/items/:id"}, } result := path for _, p := range patterns { result = p.regex.ReplaceAllString(result, p.replacement) } return result} // Usage:func handleRequest(r *http.Request, statusCode int) { normalizedPath := normalizeEndpoint(r.URL.Path) httpRequests.WithLabelValues( r.Method, normalizedPath, // Bounded! strconv.Itoa(statusCode), ).Inc()} // BAD: Don't do thishttpRequestsBad := prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total HTTP requests", }, []string{"method", "path", "user_id"}, // path and user_id are unbounded!)For dynamic URL paths, use the route template (with placeholders) rather than the actual path. '/users/:id' instead of '/users/12345'. Most web frameworks can provide the matched route template for exactly this purpose.
As organizations grow, metrics from different teams and applications must coexist. Namespaces and subsystems provide organizational structure.
Namespace:
The namespace prefix identifies the application or domain. It prevents collisions when multiple teams contribute metrics:
mycompany_payments_* # Payments team
mycompany_auth_* # Auth team
mycompany_inventory_* # Inventory team
Subsystem:
Within a namespace, subsystems identify functional areas:
12345678910111213141516171819202122232425
# Namespace: myapp# Subsystems: http, db, cache, queue myapp_http_requests_totalmyapp_http_request_duration_secondsmyapp_http_response_size_bytes myapp_db_query_duration_secondsmyapp_db_connections_activemyapp_db_errors_total myapp_cache_hits_totalmyapp_cache_misses_totalmyapp_cache_size_bytes myapp_queue_messages_totalmyapp_queue_depthmyapp_queue_processing_seconds # Standard exporters use their own namespaces:node_* # Node exporter (Linux system metrics)process_* # Process metrics (built into client libraries)go_* # Go runtime metricsmysql_* # MySQL exporterredis_* # Redis exporterWhen to Use Namespaces:
| Scenario | Recommendation |
|---|---|
| Single small application | Optional, but recommended |
| Multiple internal applications | Required for each app |
| Shared libraries | Use library name as namespace |
| Exporters | Use target system name |
| Company-wide standards | Company or org prefix |
Prometheus Client Library Namespacing:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
package metrics import ( "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto") // Define namespace and subsystems as constantsconst ( namespace = "myapp" subsystemHTTP = "http" subsystemDB = "db" subsystemCache = "cache") // HTTP metricsvar ( HTTPRequestsTotal = promauto.NewCounterVec( prometheus.CounterOpts{ Namespace: namespace, Subsystem: subsystemHTTP, Name: "requests_total", Help: "Total HTTP requests", }, []string{"method", "endpoint", "status_code"}, ) // Result: myapp_http_requests_total HTTPRequestDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Namespace: namespace, Subsystem: subsystemHTTP, Name: "request_duration_seconds", Help: "HTTP request duration in seconds", Buckets: prometheus.DefBuckets, }, []string{"method", "endpoint"}, ) // Result: myapp_http_request_duration_seconds) // Database metricsvar ( DBQueryDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Namespace: namespace, Subsystem: subsystemDB, Name: "query_duration_seconds", Help: "Database query duration in seconds", Buckets: []float64{.001, .005, .01, .05, .1, .5, 1}, }, []string{"query_type"}, ) // Result: myapp_db_query_duration_seconds)The specific naming scheme matters less than consistent application. Document your conventions, enforce them in code reviews, and use linting tools to check compliance. Inconsistent naming is far worse than imperfect but consistent naming.
Every metric should be self-documenting through its HELP string. The HELP text appears in /metrics output and metric browsers, serving as inline documentation.
Good HELP strings:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
// GOOD: Descriptive HELP stringsvar ( httpRequestsTotal = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total number of HTTP requests received, labeled by method, endpoint, and status code.", }, []string{"method", "endpoint", "status_code"}, ) httpRequestDurationSeconds = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "HTTP request latency in seconds, from request received to response sent.", Buckets: prometheus.DefBuckets, }, []string{"method", "endpoint"}, ) dbConnectionsActive = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Name: "db_connections_active", Help: "Number of currently active database connections by pool name. Includes connections in use and idle.", }, []string{"pool"}, ) cacheHitRatio = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Name: "cache_hit_ratio", Help: "Cache hit ratio over the last minute (0.0 = all misses, 1.0 = all hits). Computed as hits / (hits + misses).", }, []string{"cache_name"}, )) // BAD: Unhelpful HELP stringsvar ( badMetric1 = prometheus.NewCounter( prometheus.CounterOpts{ Name: "http_requests_total", Help: "HTTP requests", // Too vague }, ) badMetric2 = prometheus.NewCounter( prometheus.CounterOpts{ Name: "errors_total", Help: "Total errors", // What kind? Where? }, ))Documentation Beyond HELP:
For comprehensive observability, maintain additional documentation:
Example Metric Catalog Entry:
metric: http_request_duration_seconds
type: histogram
subsystem: http
owner: platform-team
description: |
Latency distribution for HTTP requests, measured from when the
request handler begins to when the response is fully written.
labels:
method: HTTP method (GET, POST, PUT, DELETE)
endpoint: Normalized route pattern (e.g., /users/:id)
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
alerts:
- high_latency_p99 (warning at 1s, critical at 5s)
dashboards:
- service-overview
- api-latency-breakdown
Automate metric documentation from code. Parse your instrumentation to extract metric definitions and generate catalog entries. This keeps documentation synchronized with reality.
Recognizing anti-patterns helps you avoid common naming mistakes:
http_requests_get_total instead of http_requests_total{method="GET"}. Labels allow filtering; metric names don't.response_time_ms mixed with query_duration_seconds. Pick base units (seconds) and stick to them.http_requests for a counter. Always add _total for counters.*_total.responseTime, response_time, and ResponseTime. Always use snake_case.http_requests_total{http_method="GET"}. The http_ in the label duplicates the namespace.count, value, metric. Names should describe specifically what's measured.Anti-Pattern: Encoding Labels in Names
# BAD: Label values encoded in metric name
http_requests_get_total
http_requests_post_total
http_requests_put_total
http_requests_delete_total
# GOOD: Use labels
http_requests_total{method="GET"}
http_requests_total{method="POST"}
http_requests_total{method="PUT"}
http_requests_total{method="DELETE"}
Why labels are better:
http_requests_totalhttp_requests_total{method="GET"}sum by(endpoint)(rate(http_requests_total[5m]))| Anti-Pattern | Problem | Correct Version |
|---|---|---|
| request_count | No _total, vague | http_requests_total |
| latency_ms | Non-base unit | request_duration_seconds |
| cpu_percent | Percentage, not ratio | cpu_usage_ratio (0-1) |
| activeConnections | CamelCase | active_connections |
| requests_total{userId="123"} | Unbounded label | Drop user_id or aggregate |
| database_query_time | Vague, missing unit | db_query_duration_seconds |
Renaming existing metrics breaks all queries, dashboards, and alerts that reference them. For legacy metrics with bad names, consider adding correctly-named aliases rather than renaming. Document the migration path and phase out old names gradually.
For organizations with many teams, establishing and enforcing naming standards is essential. Here's a practical approach:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
#!/usr/bin/env python3"""Example metric naming linter - validates conventions""" import reimport sysfrom typing import List, Tuple # Naming convention rulesRULES = { "snake_case": r'^[a-z][a-z0-9_]*$', "valid_chars": r'^[a-zA-Z_:][a-zA-Z0-9_:]*$', "no_double_underscore": r'^(?!.*__)', "base_units": ['seconds', 'bytes', 'total', 'ratio', 'celsius'], "counter_suffix": '_total', "reserved_prefixes": ['__'],} def lint_metric_name(name: str) -> List[str]: """Check a metric name against conventions.""" errors = [] # Check snake_case if not re.match(RULES['snake_case'], name): errors.append(f"'{name}' should be snake_case (lowercase with underscores)") # Check valid characters if not re.match(RULES['valid_chars'], name): errors.append(f"'{name}' contains invalid characters") # Check for double underscores if not re.match(RULES['no_double_underscore'], name): errors.append(f"'{name}' should not contain double underscores") # Check for reserved prefixes for prefix in RULES['reserved_prefixes']: if name.startswith(prefix): errors.append(f"'{name}' uses reserved prefix '{prefix}'") return errors def lint_label_name(name: str) -> List[str]: """Check a label name against conventions.""" errors = [] if not re.match(r'^[a-z][a-z0-9_]*$', name): errors.append(f"Label '{name}' should be snake_case") # Check for high-cardinality indicators high_cardinality_patterns = ['user_id', 'session_id', 'request_id', 'transaction_id', 'ip_address', 'email'] for pattern in high_cardinality_patterns: if pattern in name: errors.append(f"Label '{name}' may have unbounded cardinality") return errors def recommend_improvements(name: str) -> List[str]: """Suggest improvements to metric names.""" suggestions = [] # Check for missing unit has_unit = any(name.endswith(f'_{unit}') for unit in RULES['base_units']) if not has_unit and not name.endswith('_info'): suggestions.append(f"Consider adding a unit suffix (e.g., _seconds, _bytes)") # Check counter suffix if 'count' in name or 'total' in name.split('_')[:-1]: if not name.endswith('_total'): suggestions.append(f"Counter metrics should end with '_total'") return suggestions # Usage exampleif __name__ == '__main__': test_metrics = [ 'http_requests_total', # Good 'responseTime', # Bad: not snake_case 'http_requests', # Should be _total 'user_session_seconds_total', # Good '__internal_metric', # Bad: reserved prefix ] for metric in test_metrics: errors = lint_metric_name(metric) suggestions = recommend_improvements(metric) print(f"\n{metric}:") if errors: print(f" Errors: {errors}") if suggestions: print(f" Suggestions: {suggestions}") if not errors and not suggestions: print(f" ✓ Passes all checks")Standards only work if adopted. Make compliance easy (provide tools and templates), visible (publish leaderboards), and valued (celebrate great observability). Technical enforcement without cultural buy-in leads to workarounds.
Metric naming may seem like a minor concern, but it's a force multiplier. Consistent naming enables discoverability, prevents confusion, and reduces cognitive load for everyone who uses your observability system.
<namespace>_<subsystem>_<name>_<unit>_<suffix> provides consistency and discoverability._total; histograms auto-generate _bucket, _sum, _count.What's Next:
With naming conventions established, the next page addresses cardinality considerations—the single most common cause of observability system failures. You'll learn to calculate cardinality impact, identify problematic patterns, and design metrics that scale.
You now have a comprehensive understanding of metric naming conventions. These standards will serve you throughout your observability journey, making your metrics discoverable, consistent, and maintainable.