Loading content...
When the monitoring and observability revolution accelerated in the mid-2010s, the database landscape wasn't ready. Organizations attempted to shoehorn time-series data into PostgreSQL, Cassandra, or OpenTSDB (built on HBase), but these solutions were either operationally complex, underperformed at scale, or both. Into this gap stepped InfluxDB—a purpose-built time-series database designed from the ground up for the unique characteristics of time-stamped data.
Launched in 2013 by InfluxData, InfluxDB pioneered many concepts that have become standard in the TSDB space: a schemaless approach to time-series storage, aggressive compression through custom storage engines, specialized query languages for temporal data, and first-class support for metrics, events, logs, and traces. Today, InfluxDB powers monitoring systems at thousands of organizations, from startups to Fortune 500 enterprises, handling trillions of data points across IoT, infrastructure monitoring, real-time analytics, and scientific research.
By the end of this page, you will understand: (1) InfluxDB's evolution from 1.x to the modern 3.0 architecture, (2) The Time-Structured Merge Tree (TSM) storage engine that powers its performance, (3) The data model including buckets, measurements, tags, and fields, (4) The Flux query language and its functional paradigm, and (5) When to choose InfluxDB over alternatives.
Understanding InfluxDB requires appreciating its evolution. The database has undergone substantial architectural changes across major versions, each addressing limitations discovered at scale.
| Version | Era | Key Characteristics | Storage Engine |
|---|---|---|---|
| 1.x | 2015-2019 | Single-node focus, InfluxQL (SQL-like), no native clustering in OSS | TSM (Time-Structured Merge Tree) |
| 2.x | 2019-2023 | Unified platform (Telegraf, InfluxDB, Chronograf, Kapacitor), Flux language, built-in UI | TSM with improvements |
| 3.0 | 2024+ | Rewritten on Apache Arrow and DataFusion, SQL + InfluxQL, unlimited cardinality | Apache Parquet + Object Storage |
InfluxDB 1.x: The Foundation
InfluxDB 1.x established the core concepts that define time-series databases today. It introduced:
However, 1.x had significant limitations: the open-source version was single-node only, high cardinality (many unique tag combinations) degraded performance severely, and the storage engine required careful tuning.
InfluxDB 2.x: The Unified Platform
InfluxDB 2.x consolidated the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor) into a single binary. Key additions:
InfluxDB 3.0: The Modern Rewrite
InfluxDB 3.0 represents a ground-up rewrite addressing the cardinality problem and modern cloud-native deployments:
In InfluxDB 1.x/2.x, each unique combination of measurement + tags created an in-memory index entry. With millions of unique values (e.g., user_id as a tag), memory usage exploded and query performance collapsed. This single issue drove many organizations away from InfluxDB—and drove the complete 3.0 rewrite.
InfluxDB's architecture is optimized for the unique write and query patterns of time-series data. We'll examine both the classic 2.x architecture (still widely deployed) and the modern 3.0 architecture.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
INFLUXDB 2.x ARCHITECTURE========================= ┌─────────────────────────────────────────────────────────────────────┐│ WRITE PATH │└─────────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────────┼─────────────────────────┐ │ │ │ ▼ ▼ ▼┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐│ HTTP/gRPC API │ │ Telegraf │ │ Client Libs ││ Line Protocol │ │ (Collector) │ │ (Python, Go) │└────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ └───────────────────────┼───────────────────────┘ │ ▼ ┌─────────────────────┐ │ Write Buffer │ │ (In-Memory) │ └──────────┬──────────┘ │ ┌──────────────┼──────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ WAL │ │ WAL │ │ WAL │ │ (Shard 1)│ │ (Shard 2)│ │ (Shard N)│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ TSM │ │ TSM │ │ TSM │ │ Files │ │ Files │ │ Files │ └──────────┘ └──────────┘ └──────────┘ │ ▼ ┌─────────────────────┐ │ Compaction Engine │ │ (Levels 1-4) │ └─────────────────────┘ SHARDING: Data is sharded by time + series (shard group duration: 1 day default)WAL: Write-Ahead Log ensures durability before acknowledging writesTSM: Time-Structured Merge Tree - the core storage formatKey Architectural Components:
InfluxDB 3.0 replaces TSM with Apache Parquet files stored on object storage (S3). The series index is replaced by Parquet column statistics and partition pruning. This eliminates the in-memory cardinality constraint and enables virtually unlimited scale—at the cost of some single-node simplicity.
The Time-Structured Merge Tree (TSM) is InfluxDB's custom storage engine, specifically designed for time-series workloads. Understanding TSM explains InfluxDB's performance characteristics and operational requirements.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
TSM FILE FORMAT=============== ┌─────────────────────────────────────────────────────────────────────┐│ TSM FILE LAYOUT │├──────────────────────────────────────────────────────────────────────┤│ ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ DATA BLOCKS │ ││ ├────────────────────────────────────────────────────────────────┤ ││ │ Block 1: Series Key: "cpu,host=server01" + Field: "usage" │ ││ │ Timestamps: [t1, t2, t3, ..., t1000] (Compressed) │ ││ │ Values: [v1, v2, v3, ..., v1000] (Compressed) │ ││ ├────────────────────────────────────────────────────────────────┤ ││ │ Block 2: Series Key: "cpu,host=server02" + Field: "usage" │ ││ │ Timestamps: [t1, t2, t3, ..., t800] │ ││ │ Values: [v1, v2, v3, ..., v800] │ ││ ├────────────────────────────────────────────────────────────────┤ ││ │ ... more data blocks ... │ ││ └────────────────────────────────────────────────────────────────┘ ││ ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ INDEX SECTION │ ││ ├────────────────────────────────────────────────────────────────┤ ││ │ Series Key → [Block offset, Min time, Max time, Block size] │ ││ │ "cpu,host=server01#usage" → [offset=0, min=t1, max=t1000] │ ││ │ "cpu,host=server02#usage" → [offset=8192, min=t1, max=t800] │ ││ └────────────────────────────────────────────────────────────────┘ ││ ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ FOOTER │ ││ │ Index offset, Version, Checksum │ ││ └────────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────────┘ COMPRESSION TECHNIQUES:- Timestamps: Delta-of-delta encoding + RLE (Run-Length Encoding) [t1, t1+10s, t1+20s, ...] → [t1, 10, 10, 10, ...] → [t1, 10×N] - Integer values: Simple8B, ZigZag, Delta encoding- Float values: XOR encoding (Gorilla-style) Similar floats XOR to values with many leading/trailing zeros - String values: Snappy compressionTSM Design Principles:
| Level | Trigger | Result | Purpose |
|---|---|---|---|
| L1 (Snapshot) | Cache full or flush interval | Cache → L1 TSM file | Persist in-memory data |
| L2 (Compact) | Multiple L1 files exist | Merge L1 → L2 | Reduce file count |
| L3 (Full) | Multiple L2 files | Merge L2 → L3 | Optimize read paths |
| L4 (Cold) | Data aging out | Final optimization | Maximum compression |
TSM's compression achieves remarkable ratios. For metrics data with regular intervals and slowly-changing values, expect 10-20x compression. A naive storage of 16 bytes/point (8B timestamp + 8B float) compresses to 1-2 bytes/point, turning petabyte requirements into manageable terabytes.
InfluxDB's data model differs significantly from relational databases. Mastering these concepts is essential for effective schema design and query writing.
12345678910111213141516171819202122232425262728293031323334
INFLUXDB DATA HIERARCHY======================= ORGANIZATION (InfluxDB 2.x+)└── BUCKET (equivalent to database + retention policy) └── MEASUREMENT (equivalent to table name) └── POINT (a single data record) ├── TIMESTAMP (nanosecond precision) ├── TAGS (indexed key-value pairs for filtering) │ ├── tag_key: tag_value │ └── tag_key: tag_value └── FIELDS (non-indexed values being measured) ├── field_key: field_value (int/float/string/bool) └── field_key: field_value EXAMPLE: Server Monitoring Data================================ Measurement: "cpu"Point 1: time: 2024-01-15T10:30:00.000000000Z tags: host=web-server-01, region=us-east, env=prod fields: usage_user=45.2, usage_system=12.3, usage_idle=42.5 Point 2: time: 2024-01-15T10:30:00.000000000Z tags: host=web-server-02, region=us-east, env=prod fields: usage_user=38.1, usage_system=8.7, usage_idle=53.2 LINE PROTOCOL (InfluxDB's native write format):cpu,host=web-server-01,region=us-east,env=prod usage_user=45.2,usage_system=12.3,usage_idle=42.5 1705315800000000000cpu,host=web-server-02,region=us-east,env=prod usage_user=38.1,usage_system=8.7,usage_idle=53.2 1705315800000000000 Format: <measurement>,<tags> <fields> <timestamp>Core Concepts Explained:
Using high-cardinality values as tags is the most common mistake. If you have 1 million unique user_ids as tags, you'll have 1 million series—each with its own in-memory index entry. This consumes gigabytes of RAM and devastates query performance. Store high-cardinality data as fields instead.
InfluxDB 2.x introduced Flux—a functional data scripting language replacing the SQL-like InfluxQL. While the learning curve is steeper, Flux provides powerful capabilities for complex time-series analysis, data transformation, and multi-source joins.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
// FLUX FUNDAMENTALS// ------------------// Flux uses a pipe-forward model: data flows through transformations// Source → Filter → Transform → Aggregate → Output // Basic Query: Get CPU usage from the last hourfrom(bucket: "server-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_user") |> filter(fn: (r) => r["host"] == "web-server-01") // Aggregation: Average CPU per 5-minute windowfrom(bucket: "server-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_user") |> aggregateWindow(every: 5m, fn: mean) |> yield(name: "5min_average") // Group and Aggregate: Average by hostfrom(bucket: "server-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_user") |> group(columns: ["host"]) |> mean() |> group() // Ungroup for output // Calculated Fields: Compute utilization percentagefrom(bucket: "server-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_user" or r["_field"] == "usage_system") |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value") |> map(fn: (r) => ({ r with total_usage: r.usage_user + r.usage_system })) // Alerting: Find when CPU exceeds thresholdfrom(bucket: "server-metrics") |> range(start: -5m) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_user") |> aggregateWindow(every: 1m, fn: mean) |> filter(fn: (r) => r["_value"] > 80.0) |> map(fn: (r) => ({ r with alert_level: "critical" }))Flux Key Concepts:
While Flux has a learning curve, its functional nature shines for time-series workflows. Operations like 'window by 5 minutes, calculate rate, then average across hosts' are natural in Flux but require complex subqueries in SQL. InfluxDB 3.0 restores SQL/InfluxQL support for those who prefer it.
Beyond basic queries, Flux enables sophisticated time-series analysis including downsampling, gap filling, moving averages, and cross-measurement correlation.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
// MOVING AVERAGE: Smooth noisy datafrom(bucket: "server-metrics") |> range(start: -4h) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_user") |> movingAverage(n: 10) // 10-point moving average |> yield(name: "smoothed") // RATE OF CHANGE: Derivative/rate calculationsfrom(bucket: "server-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] == "http_requests_total") |> derivative(unit: 1s, nonNegative: true) // Requests per second |> yield(name: "request_rate") // PERCENTILES: Distribution analysisfrom(bucket: "server-metrics") |> range(start: -24h) |> filter(fn: (r) => r["_measurement"] == "response_time") |> aggregateWindow(every: 1h, fn: (column, tables=<-) => tables |> quantile(q: 0.95, column: column)) |> yield(name: "p95_latency") // GAP FILLING: Handle missing data pointsfrom(bucket: "sensor-data") |> range(start: -6h) |> filter(fn: (r) => r["_measurement"] == "temperature") |> aggregateWindow(every: 1m, fn: mean, createEmpty: true) |> fill(usePrevious: true) // Carry forward last known value |> yield(name: "filled") // CROSS-MEASUREMENT JOIN: Correlate different metricscpu = from(bucket: "server-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] == "cpu") |> filter(fn: (r) => r["_field"] == "usage_user") |> aggregateWindow(every: 5m, fn: mean) memory = from(bucket: "server-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] == "memory") |> filter(fn: (r) => r["_field"] == "used_percent") |> aggregateWindow(every: 5m, fn: mean) join( tables: {cpu: cpu, memory: memory}, on: ["_time", "host"]) |> map(fn: (r) => ({ _time: r._time, host: r.host, cpu: r._value_cpu, memory: r._value_memory, pressure_score: r._value_cpu * 0.6 + r._value_memory * 0.4 })) // CONTINUOUS DOWNSAMPLING TASK: Background processingoption task = {name: "downsample-1h", every: 1h} from(bucket: "raw-metrics") |> range(start: -1h) |> filter(fn: (r) => r["_measurement"] =~ /cpu|memory/) |> aggregateWindow(every: 5m, fn: mean) |> to(bucket: "downsampled-metrics", org: "myorg")Operating InfluxDB in production requires understanding its resource requirements, tuning parameters, and common failure modes.
| Metric | Recommendation | Notes |
|---|---|---|
| Memory | Plan for series cardinality × ~1KB | Primary constraint in 1.x/2.x; less critical in 3.0 |
| CPU | Scale with concurrent queries + compaction | Compaction is CPU-intensive; more cores = faster |
| Disk IOPS | SSDs strongly recommended | WAL and compaction are I/O intensive |
| Disk Space | Monitor cache-snapshot-memory-size | Running out triggers emergency compaction |
| Network | Consider ingestion bandwidth | 1M points/sec at 100 bytes = ~100 MB/s |
We've explored InfluxDB comprehensively—from its architectural foundations through its data model, query language, and operational requirements.
When to Choose InfluxDB:
What's Next:
We'll explore TimescaleDB—a radically different approach to time-series databases. Rather than building from scratch, TimescaleDB extends PostgreSQL with time-series superpowers, offering full SQL compatibility and the familiar PostgreSQL ecosystem.
You now understand InfluxDB's architecture, data model, and query language at a depth sufficient for production deployment. You can evaluate when InfluxDB is the right choice and design schemas that avoid common pitfalls.