Time Series Databases - Learning Module

Loading content...

0/252

InfluxDB

The Pioneer of Modern Time-Series Databases

When the monitoring and observability revolution accelerated in the mid-2010s, the database landscape wasn't ready. Organizations attempted to shoehorn time-series data into PostgreSQL, Cassandra, or OpenTSDB (built on HBase), but these solutions were either operationally complex, underperformed at scale, or both. Into this gap stepped InfluxDB—a purpose-built time-series database designed from the ground up for the unique characteristics of time-stamped data.

Launched in 2013 by InfluxData, InfluxDB pioneered many concepts that have become standard in the TSDB space: a schemaless approach to time-series storage, aggressive compression through custom storage engines, specialized query languages for temporal data, and first-class support for metrics, events, logs, and traces. Today, InfluxDB powers monitoring systems at thousands of organizations, from startups to Fortune 500 enterprises, handling trillions of data points across IoT, infrastructure monitoring, real-time analytics, and scientific research.

What You Will Learn

By the end of this page, you will understand: (1) InfluxDB's evolution from 1.x to the modern 3.0 architecture, (2) The Time-Structured Merge Tree (TSM) storage engine that powers its performance, (3) The data model including buckets, measurements, tags, and fields, (4) The Flux query language and its functional paradigm, and (5) When to choose InfluxDB over alternatives.

InfluxDB Evolution: From 1.x to 3.0

Understanding InfluxDB requires appreciating its evolution. The database has undergone substantial architectural changes across major versions, each addressing limitations discovered at scale.

InfluxDB Version Evolution
Version	Era	Key Characteristics	Storage Engine
1.x	2015-2019	Single-node focus, InfluxQL (SQL-like), no native clustering in OSS	TSM (Time-Structured Merge Tree)
2.x	2019-2023	Unified platform (Telegraf, InfluxDB, Chronograf, Kapacitor), Flux language, built-in UI	TSM with improvements
3.0	2024+	Rewritten on Apache Arrow and DataFusion, SQL + InfluxQL, unlimited cardinality	Apache Parquet + Object Storage

InfluxDB 1.x: The Foundation

InfluxDB 1.x established the core concepts that define time-series databases today. It introduced:

Schemaless writes: Unlike rigid RDBMS schemas, InfluxDB 1.x allowed new measurements, tags, and fields to be written on the fly
InfluxQL: A SQL-like query language familiar to traditional database users
Retention policies: Automatic data expiration based on age
Continuous queries: Built-in downsampling and materialized aggregations

However, 1.x had significant limitations: the open-source version was single-node only, high cardinality (many unique tag combinations) degraded performance severely, and the storage engine required careful tuning.

InfluxDB 2.x: The Unified Platform

InfluxDB 2.x consolidated the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor) into a single binary. Key additions:

Flux language: A powerful functional scripting language replacing InfluxQL for complex queries
Built-in visualization: Integrated dashboarding without external tools
Task system: Native support for scheduled data processing
Organizations and buckets: Multi-tenant data isolation

InfluxDB 3.0: The Modern Rewrite

InfluxDB 3.0 represents a ground-up rewrite addressing the cardinality problem and modern cloud-native deployments:

Apache Arrow foundation: Columnar in-memory format for vectorized processing
DataFusion query engine: High-performance SQL engine from the Apache ecosystem
Parquet storage: Industry-standard columnar format for storage
Object storage native: Designed for S3/GCS/Azure Blob from the start
Unlimited cardinality: The Achilles' heel of 1.x/2.x is resolved

The Cardinality Problem

In InfluxDB 1.x/2.x, each unique combination of measurement + tags created an in-memory index entry. With millions of unique values (e.g., user_id as a tag), memory usage exploded and query performance collapsed. This single issue drove many organizations away from InfluxDB—and drove the complete 3.0 rewrite.

InfluxDB Architecture Deep Dive

InfluxDB's architecture is optimized for the unique write and query patterns of time-series data. We'll examine both the classic 2.x architecture (still widely deployed) and the modern 3.0 architecture.

influxdb_architecture.txt

Text

INFLUXDB 2.x ARCHITECTURE
=========================
 
┌─────────────────────────────────────────────────────────────────────┐
│                            WRITE PATH                                │
└─────────────────────────────────────────────────────────────────────┘
                                    │
          ┌─────────────────────────┼─────────────────────────┐
          │                         │                         │
          ▼                         ▼                         ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  HTTP/gRPC API  │     │   Telegraf      │     │  Client Libs    │
│  Line Protocol  │     │   (Collector)   │     │  (Python, Go)   │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                                 ▼
                    ┌─────────────────────┐
                    │    Write Buffer     │
                    │    (In-Memory)      │
                    └──────────┬──────────┘
                               │
                ┌──────────────┼──────────────┐
                │              │              │
                ▼              ▼              ▼
         ┌──────────┐   ┌──────────┐   ┌──────────┐
         │   WAL    │   │   WAL    │   │   WAL    │
         │ (Shard 1)│   │ (Shard 2)│   │ (Shard N)│
         └────┬─────┘   └────┬─────┘   └────┬─────┘
              │              │              │
              ▼              ▼              ▼
         ┌──────────┐   ┌──────────┐   ┌──────────┐
         │   TSM    │   │   TSM    │   │   TSM    │
         │  Files   │   │  Files   │   │  Files   │
         └──────────┘   └──────────┘   └──────────┘
                               │
                               ▼
                    ┌─────────────────────┐
                    │  Compaction Engine  │
                    │  (Levels 1-4)       │
                    └─────────────────────┘
 
SHARDING: Data is sharded by time + series (shard group duration: 1 day default)
WAL: Write-Ahead Log ensures durability before acknowledging writes
TSM: Time-Structured Merge Tree - the core storage format

Key Architectural Components:

•Write-Ahead Log (WAL): All incoming data is first written to a WAL file. This ensures durability—if the process crashes, data can be recovered from the WAL on restart. WAL writes are sequential and fast.
•In-Memory Cache: Data sits in memory after WAL write, organized by series. Queries can immediately hit this cache. Periodic flushes write data to TSM files.
•Time-Sharding: Data is automatically partitioned into shard groups by time (default: 1 day for 7-day retention, 7 days for >6-month retention). Each shard group contains data for its time window.
•TSM Files: The persistent storage format. TSM (Time-Structured Merge) files are immutable, sorted by series key + time. Multiple levels exist with compaction promoting and merging data.
•Compaction: Background process merging smaller TSM files into larger ones, removing deleted data, and optimizing storage layout. Similar to LSM-tree compaction in key-value stores.
•Series Index: In-memory inverted index mapping tag values to series. This enables efficient tag-based filtering but is also the source of cardinality limitations.

InfluxDB 3.0 Architecture Shift

InfluxDB 3.0 replaces TSM with Apache Parquet files stored on object storage (S3). The series index is replaced by Parquet column statistics and partition pruning. This eliminates the in-memory cardinality constraint and enables virtually unlimited scale—at the cost of some single-node simplicity.

The TSM Storage Engine

The Time-Structured Merge Tree (TSM) is InfluxDB's custom storage engine, specifically designed for time-series workloads. Understanding TSM explains InfluxDB's performance characteristics and operational requirements.

tsm_file_structure.txt

Text

TSM FILE FORMAT
===============
 
┌─────────────────────────────────────────────────────────────────────┐
│                         TSM FILE LAYOUT                              │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                        DATA BLOCKS                              │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  Block 1: Series Key: "cpu,host=server01" + Field: "usage"     │ │
│  │    Timestamps: [t1, t2, t3, ..., t1000] (Compressed)           │ │
│  │    Values:     [v1, v2, v3, ..., v1000] (Compressed)           │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  Block 2: Series Key: "cpu,host=server02" + Field: "usage"     │ │
│  │    Timestamps: [t1, t2, t3, ..., t800]                         │ │
│  │    Values:     [v1, v2, v3, ..., v800]                         │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  ... more data blocks ...                                      │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                       INDEX SECTION                             │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  Series Key → [Block offset, Min time, Max time, Block size]   │ │
│  │  "cpu,host=server01#usage" → [offset=0, min=t1, max=t1000]    │ │
│  │  "cpu,host=server02#usage" → [offset=8192, min=t1, max=t800]  │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                         FOOTER                                  │ │
│  │  Index offset, Version, Checksum                               │ │
│  └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
 
COMPRESSION TECHNIQUES:
- Timestamps: Delta-of-delta encoding + RLE (Run-Length Encoding)
  [t1, t1+10s, t1+20s, ...] → [t1, 10, 10, 10, ...] → [t1, 10×N]
  
- Integer values: Simple8B, ZigZag, Delta encoding
- Float values: XOR encoding (Gorilla-style)
  Similar floats XOR to values with many leading/trailing zeros
  
- String values: Snappy compression

TSM Design Principles:

Key TSM Characteristics

•Immutable Files: Once written, TSM files are never modified. Updates create new blocks; deletes are handled via tombstone files. Immutability enables safe concurrent reads during writes.
•Sorted by Series + Time: Data within blocks is sorted by series key then timestamp. This enables efficient binary search for time-range queries.
•Block-Based Storage: Data is organized in blocks (typically 1000-2000 points). Each block contains one series' data for a time range. Block-level compression maximizes efficiency.
•Min/Max Time Indexing: The index stores min/max timestamps for each block. Time-range queries skip entire blocks outside the query range without reading data.
•Multi-Level Compaction: New data starts at Level 1. Compaction merges Level N files into Level N+1, combining small files to reduce file count and improve read efficiency.

TSM Compaction Levels
Level	Trigger	Result	Purpose
L1 (Snapshot)	Cache full or flush interval	Cache → L1 TSM file	Persist in-memory data
L2 (Compact)	Multiple L1 files exist	Merge L1 → L2	Reduce file count
L3 (Full)	Multiple L2 files	Merge L2 → L3	Optimize read paths
L4 (Cold)	Data aging out	Final optimization	Maximum compression

Compression Effectiveness

TSM's compression achieves remarkable ratios. For metrics data with regular intervals and slowly-changing values, expect 10-20x compression. A naive storage of 16 bytes/point (8B timestamp + 8B float) compresses to 1-2 bytes/point, turning petabyte requirements into manageable terabytes.

InfluxDB Data Model

InfluxDB's data model differs significantly from relational databases. Mastering these concepts is essential for effective schema design and query writing.

influxdb_data_model.txt

Text

INFLUXDB DATA HIERARCHY
=======================
 
ORGANIZATION (InfluxDB 2.x+)
└── BUCKET (equivalent to database + retention policy)
    └── MEASUREMENT (equivalent to table name)
        └── POINT (a single data record)
            ├── TIMESTAMP (nanosecond precision)
            ├── TAGS (indexed key-value pairs for filtering)
            │   ├── tag_key: tag_value
            │   └── tag_key: tag_value
            └── FIELDS (non-indexed values being measured)
                ├── field_key: field_value (int/float/string/bool)
                └── field_key: field_value
 
EXAMPLE: Server Monitoring Data
================================
 
Measurement: "cpu"
Point 1:
  time:    2024-01-15T10:30:00.000000000Z
  tags:    host=web-server-01, region=us-east, env=prod
  fields:  usage_user=45.2, usage_system=12.3, usage_idle=42.5
 
Point 2:
  time:    2024-01-15T10:30:00.000000000Z
  tags:    host=web-server-02, region=us-east, env=prod  
  fields:  usage_user=38.1, usage_system=8.7, usage_idle=53.2
 
LINE PROTOCOL (InfluxDB's native write format):
cpu,host=web-server-01,region=us-east,env=prod usage_user=45.2,usage_system=12.3,usage_idle=42.5 1705315800000000000
cpu,host=web-server-02,region=us-east,env=prod usage_user=38.1,usage_system=8.7,usage_idle=53.2 1705315800000000000
 
Format: <measurement>,<tags> <fields> <timestamp>

Core Concepts Explained:

•Organization (2.x+): Top-level namespace for multi-tenant deployments. Contains users, buckets, and access controls.
•Bucket: A named container for time-series data with an associated retention policy. Replaces the database + retention policy combo from 1.x.
•Measurement: Conceptually similar to a table. All data with the same measurement name shares a logical grouping. Example: 'cpu', 'memory', 'http_requests'.
•Tags: Key-value metadata attached to points. Tags are indexed and should be used for values you filter or group by. Keep cardinality manageable—don't use high-cardinality values like request_id or user_id as tags.
•Fields: The actual values being measured. Fields are not indexed and should contain the metrics themselves (cpu_usage, response_time). Multiple fields can share a timestamp.
•Timestamp: Nanosecond-precision time when the data was recorded. If omitted during write, the server's current time is used.
•Series: A unique combination of measurement + tag set. Each series is stored and compacted independently.

Good Tag Usage

•host, region, datacenter
•service, environment, version
•sensor_type, building, floor
•cluster, namespace, pod_name
•Bounded sets of values (<10K)

Bad Tag Usage

•request_id, trace_id (unbounded)
•user_id, session_id (millions)
•email, phone_number (PII)
•timestamp (use the time field!)
•Highly variable numeric values

The #1 InfluxDB Mistake

Using high-cardinality values as tags is the most common mistake. If you have 1 million unique user_ids as tags, you'll have 1 million series—each with its own in-memory index entry. This consumes gigabytes of RAM and devastates query performance. Store high-cardinality data as fields instead.

The Flux Query Language

InfluxDB 2.x introduced Flux—a functional data scripting language replacing the SQL-like InfluxQL. While the learning curve is steeper, Flux provides powerful capabilities for complex time-series analysis, data transformation, and multi-source joins.

flux_basics.flux
Flux
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// FLUX FUNDAMENTALS
// ------------------
// Flux uses a pipe-forward model: data flows through transformations
// Source → Filter → Transform → Aggregate → Output
 
// Basic Query: Get CPU usage from the last hour
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> filter(fn: (r) => r["host"] == "web-server-01")
 
// Aggregation: Average CPU per 5-minute window
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> aggregateWindow(every: 5m, fn: mean)
  |> yield(name: "5min_average")
 
// Group and Aggregate: Average by host
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> group(columns: ["host"])
  |> mean()
  |> group()  // Ungroup for output
 
// Calculated Fields: Compute utilization percentage
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user" or r["_field"] == "usage_system")
  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({ r with total_usage: r.usage_user + r.usage_system }))
 
// Alerting: Find when CPU exceeds threshold
from(bucket: "server-metrics")
  |> range(start: -5m)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> aggregateWindow(every: 1m, fn: mean)
  |> filter(fn: (r) => r["_value"] > 80.0)
  |> map(fn: (r) => ({ r with alert_level: "critical" }))

Flux Key Concepts:

•Pipe-Forward Operator (|>): Passes the output of one function as input to the next. Data flows left-to-right through the pipeline.
•from(): Starting point—specifies the bucket (data source). Every query starts here.
•range(): Required for all queries—defines the time window. Uses relative (start: -1h) or absolute times.
•filter(): Reduces data based on predicate functions. Can filter on any tag, field, or built-in column.
•aggregateWindow(): Core time-series operation—divides data into time windows and applies aggregation functions.
•group() / ungroup(): Controls how data is partitioned for aggregation. Default grouping is by series.
•pivot(): Transforms data from narrow (one field per row) to wide (multiple fields per row) format.
•map(): Applies arbitrary transformations to each row, enabling calculated columns.
•join(): Combines data from multiple sources—useful for correlation across measurements.

Flux vs SQL

While Flux has a learning curve, its functional nature shines for time-series workflows. Operations like 'window by 5 minutes, calculate rate, then average across hosts' are natural in Flux but require complex subqueries in SQL. InfluxDB 3.0 restores SQL/InfluxQL support for those who prefer it.

Advanced Flux Operations

Beyond basic queries, Flux enables sophisticated time-series analysis including downsampling, gap filling, moving averages, and cross-measurement correlation.

flux_advanced.flux
Flux
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// MOVING AVERAGE: Smooth noisy data
from(bucket: "server-metrics")
  |> range(start: -4h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> movingAverage(n: 10)  // 10-point moving average
  |> yield(name: "smoothed")
 
// RATE OF CHANGE: Derivative/rate calculations
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "http_requests_total")
  |> derivative(unit: 1s, nonNegative: true)  // Requests per second
  |> yield(name: "request_rate")
 
// PERCENTILES: Distribution analysis
from(bucket: "server-metrics")
  |> range(start: -24h)
  |> filter(fn: (r) => r["_measurement"] == "response_time")
  |> aggregateWindow(every: 1h, fn: (column, tables=<-) => 
      tables |> quantile(q: 0.95, column: column))
  |> yield(name: "p95_latency")
 
// GAP FILLING: Handle missing data points
from(bucket: "sensor-data")
  |> range(start: -6h)
  |> filter(fn: (r) => r["_measurement"] == "temperature")
  |> aggregateWindow(every: 1m, fn: mean, createEmpty: true)
  |> fill(usePrevious: true)  // Carry forward last known value
  |> yield(name: "filled")
 
// CROSS-MEASUREMENT JOIN: Correlate different metrics
cpu = from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> aggregateWindow(every: 5m, fn: mean)
 
memory = from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "memory")
  |> filter(fn: (r) => r["_field"] == "used_percent")
  |> aggregateWindow(every: 5m, fn: mean)
 
join(
  tables: {cpu: cpu, memory: memory},
  on: ["_time", "host"]
)
  |> map(fn: (r) => ({
      _time: r._time,
      host: r.host,
      cpu: r._value_cpu,
      memory: r._value_memory,
      pressure_score: r._value_cpu * 0.6 + r._value_memory * 0.4
  }))
 
// CONTINUOUS DOWNSAMPLING TASK: Background processing
option task = {name: "downsample-1h", every: 1h}
 
from(bucket: "raw-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] =~ /cpu|memory/)
  |> aggregateWindow(every: 5m, fn: mean)
  |> to(bucket: "downsampled-metrics", org: "myorg")

Common Advanced Patterns

•Downsampling: Reduce storage by aggregating high-resolution data into lower resolution for long-term retention. 10-second data → 5-minute averages after 1 week.
•Anomaly Detection: Compare current values to moving averages or historical baselines to identify outliers.
•Counter Resets: Handle monotonic counters that reset (on process restart) using difference() with nonNegative option.
•Multi-Bucket Joins: Correlate data across different buckets or measurements to derive insights (CPU vs latency correlation).
•Dynamic Thresholds: Calculate adaptive thresholds based on historical patterns rather than static values.

Operational Considerations

Operating InfluxDB in production requires understanding its resource requirements, tuning parameters, and common failure modes.

InfluxDB Resource Guidelines
Metric	Recommendation	Notes
Memory	Plan for series cardinality × ~1KB	Primary constraint in 1.x/2.x; less critical in 3.0
CPU	Scale with concurrent queries + compaction	Compaction is CPU-intensive; more cores = faster
Disk IOPS	SSDs strongly recommended	WAL and compaction are I/O intensive
Disk Space	Monitor cache-snapshot-memory-size	Running out triggers emergency compaction
Network	Consider ingestion bandwidth	1M points/sec at 100 bytes = ~100 MB/s

Key Configuration Parameters

•cache-snapshot-memory-size: Memory threshold triggering cache flush to TSM. Default 25MB. Increase for write-heavy workloads.
•cache-snapshot-write-cold-duration: Time before forcing cache flush even without hitting memory limit. Balance between freshness and I/O.
•compact-full-write-cold-duration: Age before triggering full compaction. Older data compacts more aggressively.
•max-series-per-database (1.x): Hard limit on cardinality. Set based on available RAM.
•query-timeout: Maximum query execution time. Prevent runaway queries from exhausting resources.
•retention-autocreate: Whether to auto-create default retention policy. Disable in production for explicit control.

Common Production Issues

Cardinality explosion: Monitor series count; set alerts before hitting limits. 2) Compaction backlog: If writes consistently outpace compaction, file count grows unbounded. 3) OOM on startup: Loading series index consumes significant memory on restart. 4) Write rejection: When cache memory is full and flushes can't keep up, writes fail. Monitor cache utilization.

Summary: InfluxDB

We've explored InfluxDB comprehensively—from its architectural foundations through its data model, query language, and operational requirements.

Key Takeaways

•InfluxDB pioneered purpose-built TSDB design — Its evolution from 1.x through 3.0 reflects the industry's growing understanding of time-series requirements.
•The TSM storage engine enables InfluxDB's performance — Aggressive compression, time-partitioning, and LSM-style operation optimize for time-series workloads.
•The measurement/tag/field data model is foundational — Correct schema design, especially tag cardinality management, is critical for performance.
•Flux provides powerful time-series analysis capabilities — Its functional pipe-forward model enables complex transformations elegantly, though SQL alternatives exist.
•Cardinality is the primary constraint — In 1.x/2.x, series cardinality directly impacts memory usage and query performance. 3.0 addresses this limitation.
•Operational awareness is essential — Understanding cache sizing, compaction behavior, and retention configuration prevents production issues.

When to Choose InfluxDB:

You need a standalone, easy-to-deploy TSDB
Your primary workload is DevOps/infrastructure monitoring
You prefer a unified platform (collection, storage, visualization)
Cardinality is manageable (< 1M series for 2.x) or you're on 3.0
You value InfluxData's ecosystem (Telegraf, Chronograf, Kapacitor)

What's Next:

We'll explore TimescaleDB—a radically different approach to time-series databases. Rather than building from scratch, TimescaleDB extends PostgreSQL with time-series superpowers, offering full SQL compatibility and the familiar PostgreSQL ecosystem.

InfluxDB Mastery Achieved

You now understand InfluxDB's architecture, data model, and query language at a depth sufficient for production deployment. You can evaluate when InfluxDB is the right choice and design schemas that avoid common pitfalls.

InfluxDB

The Pioneer of Modern Time-Series Databases

What You Will Learn

InfluxDB Evolution: From 1.x to 3.0

Understanding InfluxDB requires appreciating its evolution. The database has undergone substantial architectural changes across major versions, each addressing limitations discovered at scale.

InfluxDB Version Evolution
Version	Era	Key Characteristics	Storage Engine
1.x	2015-2019	Single-node focus, InfluxQL (SQL-like), no native clustering in OSS	TSM (Time-Structured Merge Tree)
2.x	2019-2023	Unified platform (Telegraf, InfluxDB, Chronograf, Kapacitor), Flux language, built-in UI	TSM with improvements
3.0	2024+	Rewritten on Apache Arrow and DataFusion, SQL + InfluxQL, unlimited cardinality	Apache Parquet + Object Storage

InfluxDB 1.x: The Foundation

InfluxDB 1.x established the core concepts that define time-series databases today. It introduced:

Schemaless writes: Unlike rigid RDBMS schemas, InfluxDB 1.x allowed new measurements, tags, and fields to be written on the fly
InfluxQL: A SQL-like query language familiar to traditional database users
Retention policies: Automatic data expiration based on age
Continuous queries: Built-in downsampling and materialized aggregations

InfluxDB 2.x: The Unified Platform

InfluxDB 2.x consolidated the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor) into a single binary. Key additions:

Flux language: A powerful functional scripting language replacing InfluxQL for complex queries
Built-in visualization: Integrated dashboarding without external tools
Task system: Native support for scheduled data processing
Organizations and buckets: Multi-tenant data isolation

InfluxDB 3.0: The Modern Rewrite

InfluxDB 3.0 represents a ground-up rewrite addressing the cardinality problem and modern cloud-native deployments:

Apache Arrow foundation: Columnar in-memory format for vectorized processing
DataFusion query engine: High-performance SQL engine from the Apache ecosystem
Parquet storage: Industry-standard columnar format for storage
Object storage native: Designed for S3/GCS/Azure Blob from the start
Unlimited cardinality: The Achilles' heel of 1.x/2.x is resolved

The Cardinality Problem

InfluxDB Architecture Deep Dive

influxdb_architecture.txt

Text

INFLUXDB 2.x ARCHITECTURE
=========================
 
┌─────────────────────────────────────────────────────────────────────┐
│                            WRITE PATH                                │
└─────────────────────────────────────────────────────────────────────┘
                                    │
          ┌─────────────────────────┼─────────────────────────┐
          │                         │                         │
          ▼                         ▼                         ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  HTTP/gRPC API  │     │   Telegraf      │     │  Client Libs    │
│  Line Protocol  │     │   (Collector)   │     │  (Python, Go)   │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                                 ▼
                    ┌─────────────────────┐
                    │    Write Buffer     │
                    │    (In-Memory)      │
                    └──────────┬──────────┘
                               │
                ┌──────────────┼──────────────┐
                │              │              │
                ▼              ▼              ▼
         ┌──────────┐   ┌──────────┐   ┌──────────┐
         │   WAL    │   │   WAL    │   │   WAL    │
         │ (Shard 1)│   │ (Shard 2)│   │ (Shard N)│
         └────┬─────┘   └────┬─────┘   └────┬─────┘
              │              │              │
              ▼              ▼              ▼
         ┌──────────┐   ┌──────────┐   ┌──────────┐
         │   TSM    │   │   TSM    │   │   TSM    │
         │  Files   │   │  Files   │   │  Files   │
         └──────────┘   └──────────┘   └──────────┘
                               │
                               ▼
                    ┌─────────────────────┐
                    │  Compaction Engine  │
                    │  (Levels 1-4)       │
                    └─────────────────────┘
 
SHARDING: Data is sharded by time + series (shard group duration: 1 day default)
WAL: Write-Ahead Log ensures durability before acknowledging writes
TSM: Time-Structured Merge Tree - the core storage format

Key Architectural Components:

•Write-Ahead Log (WAL): All incoming data is first written to a WAL file. This ensures durability—if the process crashes, data can be recovered from the WAL on restart. WAL writes are sequential and fast.
•In-Memory Cache: Data sits in memory after WAL write, organized by series. Queries can immediately hit this cache. Periodic flushes write data to TSM files.
•Time-Sharding: Data is automatically partitioned into shard groups by time (default: 1 day for 7-day retention, 7 days for >6-month retention). Each shard group contains data for its time window.
•TSM Files: The persistent storage format. TSM (Time-Structured Merge) files are immutable, sorted by series key + time. Multiple levels exist with compaction promoting and merging data.
•Compaction: Background process merging smaller TSM files into larger ones, removing deleted data, and optimizing storage layout. Similar to LSM-tree compaction in key-value stores.
•Series Index: In-memory inverted index mapping tag values to series. This enables efficient tag-based filtering but is also the source of cardinality limitations.

InfluxDB 3.0 Architecture Shift

The TSM Storage Engine

tsm_file_structure.txt

Text

TSM FILE FORMAT
===============
 
┌─────────────────────────────────────────────────────────────────────┐
│                         TSM FILE LAYOUT                              │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                        DATA BLOCKS                              │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  Block 1: Series Key: "cpu,host=server01" + Field: "usage"     │ │
│  │    Timestamps: [t1, t2, t3, ..., t1000] (Compressed)           │ │
│  │    Values:     [v1, v2, v3, ..., v1000] (Compressed)           │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  Block 2: Series Key: "cpu,host=server02" + Field: "usage"     │ │
│  │    Timestamps: [t1, t2, t3, ..., t800]                         │ │
│  │    Values:     [v1, v2, v3, ..., v800]                         │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  ... more data blocks ...                                      │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                       INDEX SECTION                             │ │
│  ├────────────────────────────────────────────────────────────────┤ │
│  │  Series Key → [Block offset, Min time, Max time, Block size]   │ │
│  │  "cpu,host=server01#usage" → [offset=0, min=t1, max=t1000]    │ │
│  │  "cpu,host=server02#usage" → [offset=8192, min=t1, max=t800]  │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                         FOOTER                                  │ │
│  │  Index offset, Version, Checksum                               │ │
│  └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
 
COMPRESSION TECHNIQUES:
- Timestamps: Delta-of-delta encoding + RLE (Run-Length Encoding)
  [t1, t1+10s, t1+20s, ...] → [t1, 10, 10, 10, ...] → [t1, 10×N]
  
- Integer values: Simple8B, ZigZag, Delta encoding
- Float values: XOR encoding (Gorilla-style)
  Similar floats XOR to values with many leading/trailing zeros
  
- String values: Snappy compression

TSM Design Principles:

Key TSM Characteristics

•Immutable Files: Once written, TSM files are never modified. Updates create new blocks; deletes are handled via tombstone files. Immutability enables safe concurrent reads during writes.
•Sorted by Series + Time: Data within blocks is sorted by series key then timestamp. This enables efficient binary search for time-range queries.
•Block-Based Storage: Data is organized in blocks (typically 1000-2000 points). Each block contains one series' data for a time range. Block-level compression maximizes efficiency.
•Min/Max Time Indexing: The index stores min/max timestamps for each block. Time-range queries skip entire blocks outside the query range without reading data.
•Multi-Level Compaction: New data starts at Level 1. Compaction merges Level N files into Level N+1, combining small files to reduce file count and improve read efficiency.

TSM Compaction Levels
Level	Trigger	Result	Purpose
L1 (Snapshot)	Cache full or flush interval	Cache → L1 TSM file	Persist in-memory data
L2 (Compact)	Multiple L1 files exist	Merge L1 → L2	Reduce file count
L3 (Full)	Multiple L2 files	Merge L2 → L3	Optimize read paths
L4 (Cold)	Data aging out	Final optimization	Maximum compression

Compression Effectiveness

InfluxDB Data Model

InfluxDB's data model differs significantly from relational databases. Mastering these concepts is essential for effective schema design and query writing.

influxdb_data_model.txt

Text

INFLUXDB DATA HIERARCHY
=======================
 
ORGANIZATION (InfluxDB 2.x+)
└── BUCKET (equivalent to database + retention policy)
    └── MEASUREMENT (equivalent to table name)
        └── POINT (a single data record)
            ├── TIMESTAMP (nanosecond precision)
            ├── TAGS (indexed key-value pairs for filtering)
            │   ├── tag_key: tag_value
            │   └── tag_key: tag_value
            └── FIELDS (non-indexed values being measured)
                ├── field_key: field_value (int/float/string/bool)
                └── field_key: field_value
 
EXAMPLE: Server Monitoring Data
================================
 
Measurement: "cpu"
Point 1:
  time:    2024-01-15T10:30:00.000000000Z
  tags:    host=web-server-01, region=us-east, env=prod
  fields:  usage_user=45.2, usage_system=12.3, usage_idle=42.5
 
Point 2:
  time:    2024-01-15T10:30:00.000000000Z
  tags:    host=web-server-02, region=us-east, env=prod  
  fields:  usage_user=38.1, usage_system=8.7, usage_idle=53.2
 
LINE PROTOCOL (InfluxDB's native write format):
cpu,host=web-server-01,region=us-east,env=prod usage_user=45.2,usage_system=12.3,usage_idle=42.5 1705315800000000000
cpu,host=web-server-02,region=us-east,env=prod usage_user=38.1,usage_system=8.7,usage_idle=53.2 1705315800000000000
 
Format: <measurement>,<tags> <fields> <timestamp>

Core Concepts Explained:

•Organization (2.x+): Top-level namespace for multi-tenant deployments. Contains users, buckets, and access controls.
•Bucket: A named container for time-series data with an associated retention policy. Replaces the database + retention policy combo from 1.x.
•Measurement: Conceptually similar to a table. All data with the same measurement name shares a logical grouping. Example: 'cpu', 'memory', 'http_requests'.
•Tags: Key-value metadata attached to points. Tags are indexed and should be used for values you filter or group by. Keep cardinality manageable—don't use high-cardinality values like request_id or user_id as tags.
•Fields: The actual values being measured. Fields are not indexed and should contain the metrics themselves (cpu_usage, response_time). Multiple fields can share a timestamp.
•Timestamp: Nanosecond-precision time when the data was recorded. If omitted during write, the server's current time is used.
•Series: A unique combination of measurement + tag set. Each series is stored and compacted independently.

Good Tag Usage

•host, region, datacenter
•service, environment, version
•sensor_type, building, floor
•cluster, namespace, pod_name
•Bounded sets of values (<10K)

Bad Tag Usage

•request_id, trace_id (unbounded)
•user_id, session_id (millions)
•email, phone_number (PII)
•timestamp (use the time field!)
•Highly variable numeric values

The #1 InfluxDB Mistake

The Flux Query Language

flux_basics.flux
Flux
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// FLUX FUNDAMENTALS
// ------------------
// Flux uses a pipe-forward model: data flows through transformations
// Source → Filter → Transform → Aggregate → Output
 
// Basic Query: Get CPU usage from the last hour
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> filter(fn: (r) => r["host"] == "web-server-01")
 
// Aggregation: Average CPU per 5-minute window
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> aggregateWindow(every: 5m, fn: mean)
  |> yield(name: "5min_average")
 
// Group and Aggregate: Average by host
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> group(columns: ["host"])
  |> mean()
  |> group()  // Ungroup for output
 
// Calculated Fields: Compute utilization percentage
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user" or r["_field"] == "usage_system")
  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({ r with total_usage: r.usage_user + r.usage_system }))
 
// Alerting: Find when CPU exceeds threshold
from(bucket: "server-metrics")
  |> range(start: -5m)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> aggregateWindow(every: 1m, fn: mean)
  |> filter(fn: (r) => r["_value"] > 80.0)
  |> map(fn: (r) => ({ r with alert_level: "critical" }))

Flux Key Concepts:

•Pipe-Forward Operator (|>): Passes the output of one function as input to the next. Data flows left-to-right through the pipeline.
•from(): Starting point—specifies the bucket (data source). Every query starts here.
•range(): Required for all queries—defines the time window. Uses relative (start: -1h) or absolute times.
•filter(): Reduces data based on predicate functions. Can filter on any tag, field, or built-in column.
•aggregateWindow(): Core time-series operation—divides data into time windows and applies aggregation functions.
•group() / ungroup(): Controls how data is partitioned for aggregation. Default grouping is by series.
•pivot(): Transforms data from narrow (one field per row) to wide (multiple fields per row) format.
•map(): Applies arbitrary transformations to each row, enabling calculated columns.
•join(): Combines data from multiple sources—useful for correlation across measurements.

Flux vs SQL

Advanced Flux Operations

Beyond basic queries, Flux enables sophisticated time-series analysis including downsampling, gap filling, moving averages, and cross-measurement correlation.

flux_advanced.flux
Flux
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
// MOVING AVERAGE: Smooth noisy data
from(bucket: "server-metrics")
  |> range(start: -4h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> movingAverage(n: 10)  // 10-point moving average
  |> yield(name: "smoothed")
 
// RATE OF CHANGE: Derivative/rate calculations
from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "http_requests_total")
  |> derivative(unit: 1s, nonNegative: true)  // Requests per second
  |> yield(name: "request_rate")
 
// PERCENTILES: Distribution analysis
from(bucket: "server-metrics")
  |> range(start: -24h)
  |> filter(fn: (r) => r["_measurement"] == "response_time")
  |> aggregateWindow(every: 1h, fn: (column, tables=<-) => 
      tables |> quantile(q: 0.95, column: column))
  |> yield(name: "p95_latency")
 
// GAP FILLING: Handle missing data points
from(bucket: "sensor-data")
  |> range(start: -6h)
  |> filter(fn: (r) => r["_measurement"] == "temperature")
  |> aggregateWindow(every: 1m, fn: mean, createEmpty: true)
  |> fill(usePrevious: true)  // Carry forward last known value
  |> yield(name: "filled")
 
// CROSS-MEASUREMENT JOIN: Correlate different metrics
cpu = from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["_field"] == "usage_user")
  |> aggregateWindow(every: 5m, fn: mean)
 
memory = from(bucket: "server-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "memory")
  |> filter(fn: (r) => r["_field"] == "used_percent")
  |> aggregateWindow(every: 5m, fn: mean)
 
join(
  tables: {cpu: cpu, memory: memory},
  on: ["_time", "host"]
)
  |> map(fn: (r) => ({
      _time: r._time,
      host: r.host,
      cpu: r._value_cpu,
      memory: r._value_memory,
      pressure_score: r._value_cpu * 0.6 + r._value_memory * 0.4
  }))
 
// CONTINUOUS DOWNSAMPLING TASK: Background processing
option task = {name: "downsample-1h", every: 1h}
 
from(bucket: "raw-metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] =~ /cpu|memory/)
  |> aggregateWindow(every: 5m, fn: mean)
  |> to(bucket: "downsampled-metrics", org: "myorg")

Common Advanced Patterns

•Downsampling: Reduce storage by aggregating high-resolution data into lower resolution for long-term retention. 10-second data → 5-minute averages after 1 week.
•Anomaly Detection: Compare current values to moving averages or historical baselines to identify outliers.
•Counter Resets: Handle monotonic counters that reset (on process restart) using difference() with nonNegative option.
•Multi-Bucket Joins: Correlate data across different buckets or measurements to derive insights (CPU vs latency correlation).
•Dynamic Thresholds: Calculate adaptive thresholds based on historical patterns rather than static values.

Operational Considerations

Operating InfluxDB in production requires understanding its resource requirements, tuning parameters, and common failure modes.

InfluxDB Resource Guidelines
Metric	Recommendation	Notes
Memory	Plan for series cardinality × ~1KB	Primary constraint in 1.x/2.x; less critical in 3.0
CPU	Scale with concurrent queries + compaction	Compaction is CPU-intensive; more cores = faster
Disk IOPS	SSDs strongly recommended	WAL and compaction are I/O intensive
Disk Space	Monitor cache-snapshot-memory-size	Running out triggers emergency compaction
Network	Consider ingestion bandwidth	1M points/sec at 100 bytes = ~100 MB/s

Key Configuration Parameters

•cache-snapshot-memory-size: Memory threshold triggering cache flush to TSM. Default 25MB. Increase for write-heavy workloads.
•cache-snapshot-write-cold-duration: Time before forcing cache flush even without hitting memory limit. Balance between freshness and I/O.
•compact-full-write-cold-duration: Age before triggering full compaction. Older data compacts more aggressively.
•max-series-per-database (1.x): Hard limit on cardinality. Set based on available RAM.
•query-timeout: Maximum query execution time. Prevent runaway queries from exhausting resources.
•retention-autocreate: Whether to auto-create default retention policy. Disable in production for explicit control.

Common Production Issues

Cardinality explosion: Monitor series count; set alerts before hitting limits. 2) Compaction backlog: If writes consistently outpace compaction, file count grows unbounded. 3) OOM on startup: Loading series index consumes significant memory on restart. 4) Write rejection: When cache memory is full and flushes can't keep up, writes fail. Monitor cache utilization.

Summary: InfluxDB

We've explored InfluxDB comprehensively—from its architectural foundations through its data model, query language, and operational requirements.

Key Takeaways

•InfluxDB pioneered purpose-built TSDB design — Its evolution from 1.x through 3.0 reflects the industry's growing understanding of time-series requirements.
•The TSM storage engine enables InfluxDB's performance — Aggressive compression, time-partitioning, and LSM-style operation optimize for time-series workloads.
•The measurement/tag/field data model is foundational — Correct schema design, especially tag cardinality management, is critical for performance.
•Flux provides powerful time-series analysis capabilities — Its functional pipe-forward model enables complex transformations elegantly, though SQL alternatives exist.
•Cardinality is the primary constraint — In 1.x/2.x, series cardinality directly impacts memory usage and query performance. 3.0 addresses this limitation.
•Operational awareness is essential — Understanding cache sizing, compaction behavior, and retention configuration prevents production issues.

When to Choose InfluxDB:

You need a standalone, easy-to-deploy TSDB
Your primary workload is DevOps/infrastructure monitoring
You prefer a unified platform (collection, storage, visualization)
Cardinality is manageable (< 1M series for 2.x) or you're on 3.0
You value InfluxData's ecosystem (Telegraf, Chronograf, Kapacitor)

What's Next:

InfluxDB Mastery Achieved