Database Management SystemsTime-Series Databases

Time-Series Databases

LevelAdvanced

Duration75 mins

TopicTime-Series Databases

5 / 5

Use Cases

Time-Series Databases in the Real World

Time-series databases have moved from specialized niche solutions to essential infrastructure powering some of the world's most demanding applications. From Tesla's fleet telemetry processing millions of data points per vehicle per day, to Uber's marketplace dynamics tracking supply and demand across thousands of cities in real-time, to Netflix's streaming health monitoring billions of playback events—TSDBs underpin the real-time pulse of modern digital experiences.

Understanding these use cases isn't just academic. Each domain has unique characteristics that influence TSDB selection, schema design, and operational requirements. The patterns you learn from observability are different from IoT are different from financial analytics. This page explores real-world deployments across major domains, extracting practical insights you can apply to your own time-series challenges.

What You Will Learn

By the end of this page, you will understand: (1) How TSDBs power infrastructure monitoring and observability, (2) The unique requirements of IoT deployments at scale, (3) Financial time-series use cases and their precision demands, (4) Application performance monitoring (APM) patterns, (5) Emerging use cases in ML/AI pipelines and real-time analytics, and (6) How to match TSDB selection to domain requirements.

Infrastructure Monitoring and Observability

Infrastructure monitoring was the original killer application for time-series databases. Every server, container, network device, and cloud service generates metrics that operations teams need to collect, visualize, alert on, and analyze. This domain defined the core TSDB requirements we see today.

infrastructure_monitoring.txt

Text

INFRASTRUCTURE MONITORING ARCHITECTURE
======================================
 
                    ┌─────────────────────────────────────────────┐
                    │              SOURCES                        │
                    ├──────────────┬──────────────┬───────────────┤
                    │   Servers    │  Containers  │  Cloud APIs   │
                    │   10,000+    │   50,000+    │   (AWS, GCP)  │
                    └──────┬───────┴──────┬───────┴───────┬───────┘
                           │              │               │
                    ┌──────▼───────┬──────▼───────┬───────▼───────┐
                    │   Telegraf   │  Prometheus  │  CloudWatch   │
                    │   (Agent)    │   Exporter   │   Exporter    │
                    └──────┬───────┴──────┬───────┴───────┬───────┘
                           │              │               │
                           └──────────────┼───────────────┘
                                          │
                    ┌─────────────────────▼──────────────────────┐
                    │            TIME-SERIES DATABASE            │
                    │  (InfluxDB / TimescaleDB / VictoriaMetrics)│
                    │                                            │
                    │  Metrics stored:                           │
                    │  - cpu_usage{host,cpu,mode}                │
                    │  - memory_used{host}                       │
                    │  - disk_io_bytes{host,device,direction}    │
                    │  - network_bytes{host,interface,direction} │
                    │  - container_cpu{pod,namespace,container}  │
                    └─────────────────────┬──────────────────────┘
                                          │
              ┌───────────────────────────┼───────────────────────────┐
              │                           │                           │
        ┌─────▼─────┐              ┌──────▼──────┐             ┌──────▼──────┐
        │  Grafana  │              │  Alerting   │             │  Long-term  │
        │ Dashboard │              │  (PagerDuty)│             │   Storage   │
        └───────────┘              └─────────────┘             └─────────────┘
 
SCALE CHARACTERISTICS:
- Collection interval: 10-60 seconds
- Metrics per host: 100-500 baseline, 1000+ with custom
- Cardinality: hosts × metrics × label combinations
- Retention: 7-30 days raw, 1-2 years aggregated
- Query patterns: dashboards (high read), alerts (continuous), ad-hoc (bursty)

Infrastructure Monitoring at Scale
Organization Scale	Hosts	Metrics/Second	Series Count	TSDB Choice
Small Startup	10-100	~1K	~10K	Prometheus, InfluxDB OSS
Mid-size Company	100-1K	~100K	~1M	VictoriaMetrics, TimescaleDB
Large Enterprise	1K-10K	~1M+	~10M+	Cortex, Thanos, M3, InfluxDB Cloud
Hyperscaler	100K+	~100M+	~1B+	Custom (Netflix Atlas, Uber M3)

Key Requirements

•High write throughput — Thousands of hosts each sending metrics every 10 seconds creates massive write load.
•Label-based filtering — Queries filter by environment, region, service, pod name. Efficient label indexing is critical.
•Real-time alerting — Alert evaluation runs continuously; query latency directly impacts detection time.
•Cardinality management — Kubernetes pod names, container IDs create high cardinality. Must be managed carefully.
•Grafana integration — De facto visualization standard; TSDB must support PromQL or SQL efficiently.

The Prometheus Ecosystem

Prometheus has become the de facto standard for cloud-native monitoring, but its single-node design limits scale. Production deployments often use Prometheus for collection with a scalable TSDB backend: VictoriaMetrics, Cortex, Thanos, or M3. These accept Prometheus remote_write and provide long-term storage, high availability, and global query capabilities.

IoT and Industrial Time-Series

The Internet of Things generates time-series data at unprecedented scale. Billions of sensors—in vehicles, factories, smart cities, wearables, and homes—continuously stream measurements. IoT time-series has unique characteristics that differentiate it from IT infrastructure monitoring.

iot_architecture.txt

Text

IOT TIME-SERIES ARCHITECTURE
============================
 
                          EDGE TIER
      ┌─────────────────────────────────────────────────┐
      │                                                 │
      │  ┌───────┐  ┌───────┐  ┌───────┐  ┌───────┐   │
      │  │Sensor │  │Sensor │  │Sensor │  │  ...  │   │
      │  │ (1ms) │  │(100ms)│  │ (1s)  │  │       │   │
      │  └───┬───┘  └───┬───┘  └───┬───┘  └───────┘   │
      │      │          │          │                   │
      │      └──────────┼──────────┘                   │
      │                 │                              │
      │           ┌─────▼─────┐                        │
      │           │   Edge    │      Local buffering   │
      │           │  Gateway  │  ◄── Compression       │
      │           │           │      Downsampling      │
      │           └─────┬─────┘                        │
      │                 │                              │
      └─────────────────┼──────────────────────────────┘
                        │
                        │  MQTT / HTTP / gRPC
                        │  (Cellular, LoRaWAN, WiFi)
                        │
      ┌─────────────────▼──────────────────────────────┐
      │                  CLOUD TIER                     │
      │                                                 │
      │  ┌──────────────────────────────────────────┐  │
      │  │          Message Broker                   │  │
      │  │     (Kafka, MQTT Broker, AWS IoT Core)   │  │
      │  └────────────────────┬─────────────────────┘  │
      │                       │                        │
      │  ┌────────────────────▼─────────────────────┐  │
      │  │        TIME-SERIES DATABASE              │  │
      │  │   (TimescaleDB, InfluxDB, QuestDB)       │  │
      │  │                                          │  │
      │  │  Measurements:                           │  │
      │  │  - temperature{device_id, location}      │  │
      │  │  - pressure{device_id, unit}             │  │
      │  │  - vibration{device_id, axis}            │  │
      │  │  - gps_position{vehicle_id}              │  │
      │  └────────────────────┬─────────────────────┘  │
      │                       │                        │
      │    ┌──────────────────┼──────────────────┐     │
      │    │                  │                  │     │
      │ ┌──▼──┐          ┌────▼────┐        ┌───▼───┐  │
      │ │Real-│          │Analytics│        │  ML   │  │
      │ │time │          │   BI    │        │Models │  │
      │ │Alert│          │         │        │       │  │
      │ └─────┘          └─────────┘        └───────┘  │
      └────────────────────────────────────────────────┘
 
IOT-SPECIFIC CHALLENGES:
1. Device cardinality: millions of unique device IDs
2. Irregular intervals: sensors may report only on change
3. Late data: connectivity issues cause out-of-order arrival
4. Edge buffering: devices must store data when offline
5. Geographic distribution: data arrives from global locations
6. Data quality: sensors fail, produce outliers, drift over time

IoT Use Case Scale Examples
Domain	Device Count	Data Points/Day	Key Metrics
Connected Vehicles	1-10 million	1-10B per day	GPS, speed, fuel, engine diagnostics
Smart Manufacturing	10K-100K sensors	100M-1B per day	Temperature, pressure, vibration, throughput
Smart Grid/Utilities	Millions of meters	Billions per day	Power consumption, voltage, frequency
Wearables/Health	Millions of devices	Variable	Heart rate, activity, sleep, SpO2
Smart Buildings	1K-100K per building	10M-100M per day	HVAC, occupancy, energy, access

IoT-Specific TSDB Requirements

•Handle device_id as tag — Millions of unique devices create massive cardinality. TSDBs must handle this without memory explosion.
•Out-of-order write support — Devices buffer when offline; data arrives minutes, hours, or days late. The TSDB must handle back-dated writes efficiently.
•Irregular intervals — Unlike infrastructure metrics, IoT sensors may report only on change or at variable rates.
•Edge preprocessing — Raw sensor data often needs aggregation at the edge before cloud ingestion to reduce bandwidth costs.
•Geospatial extensions — GPS coordinates are common in vehicle/logistics IoT. PostGIS with TimescaleDB is a powerful combination.

The Device ID Cardinality Problem

IoT naturally creates high cardinality: each device is unique. Traditional TSDBs like InfluxDB 1.x/2.x struggle when device_id is stored as a tag (in-memory index). Solutions: use InfluxDB 3.0 (Parquet-based), TimescaleDB (no in-memory index), or store device_id as a field (not indexed) with a separate device registry table.

Financial Time-Series Analytics

Financial markets generate some of the most demanding time-series workloads. Stock prices, order book updates, trade executions, and market microstructure data arrive at microsecond granularity during trading hours. Financial time-series has unique requirements around precision, latency, and regulatory compliance.

financial_timeseries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
-- FINANCIAL TIME-SERIES SCHEMA (TimescaleDB)
-- =============================================
 
-- Tick data: individual trades
CREATE TABLE trades (
    time TIMESTAMPTZ NOT NULL,
    symbol TEXT NOT NULL,
    exchange TEXT NOT NULL,
    price NUMERIC(20, 8) NOT NULL,  -- Precise decimal for currency
    quantity NUMERIC(20, 8) NOT NULL,
    trade_id TEXT,
    trade_type TEXT  -- 'buy', 'sell', 'auction'
);
SELECT create_hypertable('trades', 'time');
CREATE INDEX ON trades (symbol, time DESC);
 
-- OHLCV bars: aggregated candlesticks
CREATE TABLE ohlcv_1min (
    time TIMESTAMPTZ NOT NULL,
    symbol TEXT NOT NULL,
    open NUMERIC(20, 8),
    high NUMERIC(20, 8),
    low NUMERIC(20, 8),
    close NUMERIC(20, 8),
    volume NUMERIC(30, 8),
    trade_count INTEGER,
    vwap NUMERIC(20, 8)  -- Volume-Weighted Average Price
);
SELECT create_hypertable('ohlcv_1min', 'time');
 
-- Order book snapshots (quote data)
CREATE TABLE order_book (
    time TIMESTAMPTZ NOT NULL,
    symbol TEXT NOT NULL,
    bid_price NUMERIC(20, 8),
    bid_size NUMERIC(20, 8),
    ask_price NUMERIC(20, 8),
    ask_size NUMERIC(20, 8),
    spread NUMERIC(20, 8) GENERATED ALWAYS AS (ask_price - bid_price) STORED
);
SELECT create_hypertable('order_book', 'time');
 
-- Continuous aggregate for OHLCV generation
CREATE MATERIALIZED VIEW ohlcv_1min_agg
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 minute', time) AS bucket,
    symbol,
    first(price, time) AS open,
    max(price) AS high,
    min(price) AS low,
    last(price, time) AS close,
    sum(quantity) AS volume,
    count(*) AS trade_count,
    sum(price * quantity) / sum(quantity) AS vwap
FROM trades
GROUP BY bucket, symbol;
 
-- Query: Get OHLCV for a symbol over a date range
SELECT * FROM ohlcv_1min_agg
WHERE symbol = 'AAPL'
  AND bucket >= '2024-01-01'
  AND bucket < '2024-01-15'
ORDER BY bucket;
 
-- Query: Calculate returns over time
SELECT 
    bucket,
    symbol,
    close,
    (close - LAG(close) OVER (PARTITION BY symbol ORDER BY bucket)) / 
        LAG(close) OVER (PARTITION BY symbol ORDER BY bucket) AS return_1m
FROM ohlcv_1min_agg
WHERE symbol = 'AAPL'
  AND bucket >= NOW() - INTERVAL '1 day';

Financial Time-Series Scale
Data Type	Volume	Latency Requirement	Retention
Tick Data	Millions/day per exchange	< 1ms for trading systems	1-7 years (regulatory)
Quote Data	100M+ updates/day	< 10ms for analysis	90 days - 1 year
OHLCV (1min)	~500K bars/day all symbols	< 100ms for dashboards	Forever (reference data)
Order Book Snapshots	Varies by frequency	Real-time for trading	30-90 days
Fundamental Data	Quarterly per company	Not latency-sensitive	Forever

Financial Time-Series Requirements

•Decimal precision — Financial calculations cannot use floating-point. NUMERIC/DECIMAL types with 8+ decimal places are mandatory.
•Microsecond/nanosecond timestamps — Market microstructure analysis requires sub-millisecond precision.
•Complex aggregations — OHLCV, VWAP, TWAP require specialized aggregation functions not found in all TSDBs.
•Regulatory retention — MiFID II, SEC rules mandate years of trade data retention with full audit trails.
•Low-latency reads — Quantitative trading systems require sub-millisecond query latency for real-time signals.
•Multi-asset support — Schema must handle equities, options, futures, forex with different conventions.

Why Financial Firms Often Choose TimescaleDB

Financial time-series often requires: (1) Full SQL with window functions for complex analytics, (2) NUMERIC types for precise decimal arithmetic, (3) JOINs with reference data (company info, instrument definitions). TimescaleDB's PostgreSQL foundation provides all of these out of the box, making it popular in fintech and quant finance.

Application Performance Monitoring (APM)

Modern observability platforms rely on the 'three pillars': metrics, logs, and traces. Time-series databases store the metrics pillar, while specialized systems handle logs (Elasticsearch) and traces (Jaeger, Tempo). APM extends this with application-specific instrumentation for request latency, error rates, and dependency mapping.

apm_metrics.txt

Text

APPLICATION PERFORMANCE MONITORING
===================================
 
THE THREE PILLARS OF OBSERVABILITY:
 
1. METRICS (Time-Series Database)
   ┌───────────────────────────────────────────────────────┐
   │  http_request_duration_seconds{                       │
   │      service="api-gateway",                           │
   │      endpoint="/api/users",                           │
   │      method="GET",                                    │
   │      status_code="200"                                │
   │  } = histogram buckets [0.1, 0.25, 0.5, 1, 2.5, 5]   │
   │                                                       │
   │  http_requests_total{...} = counter                   │
   │  http_request_errors_total{...} = counter             │
   │  active_connections{service="..."} = gauge            │
   └───────────────────────────────────────────────────────┘
 
2. LOGS (Log Aggregation System)
   ┌───────────────────────────────────────────────────────┐
   │  {                                                    │
   │    "timestamp": "2024-01-15T10:30:00.123Z",          │
   │    "level": "ERROR",                                  │
   │    "service": "payment-service",                      │
   │    "trace_id": "abc123",                             │
   │    "message": "Payment failed: card declined",       │
   │    "user_id": "user_456"                             │
   │  }                                                    │
   └───────────────────────────────────────────────────────┘
 
3. TRACES (Distributed Tracing System)
   ┌───────────────────────────────────────────────────────┐
   │  Trace ID: abc123                                     │
   │  ├── Span: api-gateway (50ms total)                  │
   │  │   ├── Span: auth-service (10ms)                   │
   │  │   └── Span: payment-service (35ms)                │
   │  │       ├── Span: db-query (15ms)                   │
   │  │       └── Span: stripe-api (18ms)                 │
   │  └── Response returned                                │
   └───────────────────────────────────────────────────────┘
 
RED METRICS (Request, Error, Duration):
- Request Rate: requests per second per endpoint
- Error Rate: percentage of failed requests
- Duration: latency distribution (p50, p95, p99)
 
USE METRICS (Utilization, Saturation, Errors):
- CPU, Memory, Disk utilization
- Queue depths, connection pool saturation
- Resource-level error rates

apm_queries.promql

PromQL

# KEY APM QUERIES (PromQL / Flux equivalent patterns)
# =================================================
 
# Request Rate (RED: R)
sum(rate(http_requests_total{service="api-gateway"}[5m])) by (endpoint)
 
# Error Rate (RED: E)  
sum(rate(http_requests_total{service="api-gateway",status_code=~"5.."}[5m])) 
/ sum(rate(http_requests_total{service="api-gateway"}[5m])) * 100
 
# Latency Percentiles (RED: D)
histogram_quantile(0.99, 
  sum(rate(http_request_duration_seconds_bucket{service="api-gateway"}[5m])) 
  by (le, endpoint)
)
 
# Apdex Score (Application Performance Index)
# Apdex = (Satisfied + (Tolerating / 2)) / Total
(
  sum(rate(http_request_duration_seconds_bucket{le="0.5"}[5m])) +
  sum(rate(http_request_duration_seconds_bucket{le="2.0"}[5m])) / 2
) / sum(rate(http_requests_total[5m]))
 
# Service Dependency Health
sum(rate(http_requests_total{status_code="200"}[5m])) by (upstream_service) 
/ sum(rate(http_requests_total[5m])) by (upstream_service)
 
# Saturation: Request Queue Depth
avg(pending_requests{service="api-gateway"}) by (pod)
 
# SLO Burn Rate
(
  1 - (
    sum(rate(http_requests_total{status_code="200"}[1h]))
    / sum(rate(http_requests_total[1h]))
  )
) / (1 - 0.999)  # 99.9% SLO target

APM Platform Components

•Prometheus + Grafana: Open-source standard for metrics collection and visualization
•Datadog: Commercial full-stack observability with metrics, logs, traces, and APM unified
•New Relic: End-to-end APM with automatic instrumentation and AI-powered insights
•Dynatrace: AI-driven observability with automatic topology discovery
•Honeycomb: High-cardinality observability focused on debugging complex systems
•Grafana Cloud: Managed Prometheus, Loki, Tempo stack with enterprise features

High-Cardinality APM Challenge

APM often needs high-cardinality dimensions: user_id, request_id, trace_id. Traditional TSDBs can't handle these as tags. Solutions: (1) Store high-cardinality data in logs/traces, not metrics, (2) Use sampling to reduce cardinality, (3) Choose TSDBs designed for high cardinality (Honeycomb, InfluxDB 3.0, ClickHouse).

Real-Time Analytics and Business Intelligence

Beyond infrastructure and APM, time-series databases increasingly power real-time business analytics—tracking revenue, user engagement, conversion funnels, and operational KPIs as they happen, not hours or days later in batch reports.

Real-Time Analytics Use Cases
Domain	Metrics Tracked	Query Pattern	Latency Need
E-commerce	Orders, revenue, cart abandonment, inventory	Aggregates by product, region, time	< 1 minute freshness
Gaming	Concurrent players, in-game events, revenue	Leaderboards, engagement metrics	Real-time (< 10s)
Advertising	Impressions, clicks, conversions, spend	Campaign performance, A/B tests	< 5 minutes freshness
Streaming Media	Concurrent streams, quality metrics, engagement	By title, region, device type	Real-time (< 30s)
SaaS Platforms	Active users, feature usage, API calls	By customer, plan, feature	< 1 minute freshness

business_analytics.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- REAL-TIME E-COMMERCE ANALYTICS (TimescaleDB)
-- =============================================
 
-- Event tracking table
CREATE TABLE user_events (
    time TIMESTAMPTZ NOT NULL,
    event_type TEXT NOT NULL,  -- 'page_view', 'add_to_cart', 'purchase'
    user_id TEXT,
    session_id TEXT,
    product_id TEXT,
    category TEXT,
    revenue NUMERIC(10, 2),
    region TEXT
);
SELECT create_hypertable('user_events', 'time');
 
-- Real-time revenue by category (last hour, updating every minute)
CREATE MATERIALIZED VIEW revenue_by_category_hourly
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 minute', time) AS bucket,
    category,
    SUM(revenue) AS total_revenue,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
    COUNT(DISTINCT user_id) AS unique_buyers
FROM user_events
WHERE event_type = 'purchase'
GROUP BY bucket, category;
 
SELECT add_continuous_aggregate_policy('revenue_by_category_hourly',
    start_offset => INTERVAL '5 minutes',
    end_offset => INTERVAL '1 minute',
    schedule_interval => INTERVAL '1 minute'
);
 
-- Live dashboard query: Revenue and conversion for last hour
WITH funnel AS (
    SELECT 
        time_bucket('5 minutes', time) AS bucket,
        COUNT(*) FILTER (WHERE event_type = 'page_view') AS views,
        COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS add_to_carts,
        COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases,
        SUM(revenue) FILTER (WHERE event_type = 'purchase') AS revenue
    FROM user_events
    WHERE time > NOW() - INTERVAL '1 hour'
    GROUP BY bucket
    ORDER BY bucket
)
SELECT 
    bucket,
    views,
    add_to_carts,
    purchases,
    revenue,
    ROUND(100.0 * add_to_carts / NULLIF(views, 0), 2) AS cart_rate,
    ROUND(100.0 * purchases / NULLIF(add_to_carts, 0), 2) AS conversion_rate
FROM funnel;
 
-- Anomaly detection: Significant revenue drop
SELECT 
    time_bucket('5 minutes', time) AS bucket,
    SUM(revenue) AS revenue,
    AVG(SUM(revenue)) OVER (ORDER BY time_bucket('5 minutes', time) ROWS 12 PRECEDING) AS moving_avg,
    SUM(revenue) < 0.5 * AVG(SUM(revenue)) OVER (ORDER BY time_bucket('5 minutes', time) ROWS 12 PRECEDING) AS is_anomaly
FROM user_events
WHERE event_type = 'purchase'
  AND time > NOW() - INTERVAL '2 hours'
GROUP BY bucket
ORDER BY bucket;

TSDB vs OLAP for Analytics

For pure analytics on event data, columnar OLAP databases like ClickHouse, Apache Druid, or DuckDB may outperform traditional TSDBs. However, when you need both time-series operations (rate calculations, gap filling) AND analytical queries (funnels, cohorts), TSDBs like TimescaleDB provide a unified solution without data movement between systems.

Emerging Use Cases

Time-series databases are expanding into new domains as organizations recognize the value of temporal data analysis in areas previously served by specialized or general-purpose databases.

Emerging Time-Series Applications

•ML Feature Stores — Training data for time-series forecasting requires historical features with point-in-time correctness. TSDBs provide efficient storage and retrieval of feature values over time, preventing data leakage in model training.
•MLOps Model Monitoring — Track model predictions, actual outcomes, and drift metrics over time. Detect when models degrade and need retraining based on performance time-series.
•Energy and Carbon Monitoring — Track energy consumption, carbon emissions, and sustainability metrics at facility, process, and product levels for ESG reporting and optimization.
•Network Security and SIEM — Flow data, connection logs, and security events form massive time-series. Detect anomalies, analyze attack patterns, and support forensic investigation.
•Autonomous Systems — Self-driving vehicles, drones, and robots generate time-series telemetry that must be stored for debugging, simulation replay, and regulatory compliance.
•Digital Twins — Virtual replicas of physical assets require continuous synchronization with real-world sensor data, creating time-series that model asset behavior over time.
•Climate and Environmental Science — Weather stations, ocean sensors, satellite imagery produce time-series spanning decades, requiring specialized handling for climate modeling.

ml_feature_store.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- ML FEATURE STORE USE CASE
-- Point-in-time correct feature retrieval
 
-- Feature table with versioned features
CREATE TABLE user_features (
    time TIMESTAMPTZ NOT NULL,
    user_id TEXT NOT NULL,
    feature_name TEXT NOT NULL,
    feature_value DOUBLE PRECISION NOT NULL
);
SELECT create_hypertable('user_features', 'time');
 
-- Insert features as they're computed
INSERT INTO user_features VALUES
('2024-01-15 10:00:00', 'user_123', 'purchase_count_30d', 5),
('2024-01-15 10:00:00', 'user_123', 'avg_order_value', 87.50),
('2024-01-15 10:00:00', 'user_123', 'days_since_last_purchase', 3);
 
-- Point-in-time feature retrieval (no data leakage!)
-- Get features as they were known at prediction time, not current values
SELECT 
    l.user_id,
    l.label_time,
    f.feature_name,
    f.feature_value
FROM labels l
CROSS JOIN LATERAL (
    SELECT DISTINCT ON (feature_name)
        feature_name, feature_value
    FROM user_features
    WHERE user_id = l.user_id
      AND time <= l.label_time  -- Only features known before label
    ORDER BY feature_name, time DESC
) f
WHERE l.user_id = 'user_123';

The Convergence Trend

We're seeing convergence between TSDBs and other database categories. TimescaleDB adds analytics capabilities. ClickHouse adds time-series functions. QuestDB optimizes for both time-series AND OLAP. The future may be fewer, more capable databases rather than specialized tools for each use case.

Selecting the Right Time-Series Database

With multiple TSDBs available, selection depends on matching your specific requirements to database strengths. No single TSDB is best for all use cases.

TSDB Selection Matrix
Requirement	Best Options	Why
Full SQL support	TimescaleDB, QuestDB	PostgreSQL-compatible; familiar query language
Prometheus ecosystem	VictoriaMetrics, Cortex, Thanos	Native PromQL; drop-in Prometheus replacement
Maximum write throughput	QuestDB, ClickHouse, VictoriaMetrics	Optimized ingestion engines
IoT with high device cardinality	TimescaleDB, InfluxDB 3.0, QuestDB	Handle millions of unique device IDs
Unified observability platform	InfluxDB, Datadog, Grafana Cloud	Integrated collection, storage, visualization
Financial analytics (precise decimals)	TimescaleDB, QuestDB	Native NUMERIC types; window functions
Minimal operational overhead	InfluxDB Cloud, Timescale Cloud, Grafana Cloud	Fully managed services
Edge/embedded deployment	QuestDB (single binary), InfluxDB OSS	Small footprint; no dependencies

Selection Decision Framework

•Start with query patterns — What questions will you ask? If you need complex SQL analytics, start with TimescaleDB/QuestDB. If you need PromQL for dashboards, start with Prometheus ecosystem.
•Evaluate cardinality — How many unique series? < 1M: almost anything works. > 10M: carefully evaluate cardinality handling.
•Consider the ecosystem — What collection agents exist? What visualization tools integrate? What's your team's expertise?
•Test with realistic data — Benchmarks vary wildly by workload. Load your actual data and run your actual queries.
•Plan for operational reality — Who will operate this? How complex is failover? What's the upgrade path?
•Factor in cost — Open source vs managed. Storage costs at your scale. Query compute costs.

The Pragmatic Approach

For most organizations: Start with Prometheus + Grafana for infrastructure monitoring (it's free and ubiquitous). Add VictoriaMetrics or Thanos for scale. Use TimescaleDB if you need SQL analytics or IoT. Consider managed services to avoid operational burden. Don't over-engineer—you can migrate later if needed.

Summary: Time-Series Use Cases

We've explored the breadth of time-series database applications—from infrastructure monitoring that birthed the category to emerging use cases in ML and sustainability.

Key Takeaways

•Infrastructure monitoring remains the core use case — Prometheus ecosystem dominates, with scalable backends for enterprise.
•IoT introduces unique challenges — Device cardinality, out-of-order data, and edge processing require careful TSDB selection.
•Financial time-series demands precision — Decimal types, window functions, and SQL are essential; TimescaleDB/QuestDB excel here.
•APM requires three pillars — Metrics in TSDBs, logs in log aggregators, traces in trace systems—unified observability.
•Real-time analytics blurs TSDB/OLAP boundaries — Modern TSDBs increasingly compete with analytical databases for business intelligence.
•Emerging use cases expand the market — ML feature stores, MLOps, digital twins, and sustainability monitoring are growth areas.
•Selection depends on your specific requirements — No universal best; match database strengths to your workload characteristics.

Module Complete:

You've now completed the comprehensive exploration of time-series databases. From foundational concepts through specific technologies (InfluxDB, TimescaleDB), operational patterns (retention policies), and real-world applications—you have the knowledge to design, deploy, and operate time-series solutions at scale.

Key skills acquired:

Understanding time-series data characteristics and why specialized databases exist
Designing efficient schemas with tags vs. fields
Implementing retention policies and data lifecycle management
Selecting the appropriate TSDB for different use cases
Applying time-series patterns across infrastructure, IoT, financial, and business analytics domains

Module Complete: Time-Series Databases

Congratulations! You've mastered time-series databases at a depth suitable for production deployment. You understand when TSDBs are appropriate, how to select among options, and how to design for scale. Whether building IoT platforms, observability systems, or real-time analytics, you have the foundation to succeed.

5 / 5

Loading learning content...

Database Management SystemsTime-Series Databases

Time-Series Databases

LevelAdvanced

Duration75 mins

TopicTime-Series Databases

5 / 5

Use Cases

Time-Series Databases in the Real World

What You Will Learn

Infrastructure Monitoring and Observability

infrastructure_monitoring.txt

Text

INFRASTRUCTURE MONITORING ARCHITECTURE
======================================
 
                    ┌─────────────────────────────────────────────┐
                    │              SOURCES                        │
                    ├──────────────┬──────────────┬───────────────┤
                    │   Servers    │  Containers  │  Cloud APIs   │
                    │   10,000+    │   50,000+    │   (AWS, GCP)  │
                    └──────┬───────┴──────┬───────┴───────┬───────┘
                           │              │               │
                    ┌──────▼───────┬──────▼───────┬───────▼───────┐
                    │   Telegraf   │  Prometheus  │  CloudWatch   │
                    │   (Agent)    │   Exporter   │   Exporter    │
                    └──────┬───────┴──────┬───────┴───────┬───────┘
                           │              │               │
                           └──────────────┼───────────────┘
                                          │
                    ┌─────────────────────▼──────────────────────┐
                    │            TIME-SERIES DATABASE            │
                    │  (InfluxDB / TimescaleDB / VictoriaMetrics)│
                    │                                            │
                    │  Metrics stored:                           │
                    │  - cpu_usage{host,cpu,mode}                │
                    │  - memory_used{host}                       │
                    │  - disk_io_bytes{host,device,direction}    │
                    │  - network_bytes{host,interface,direction} │
                    │  - container_cpu{pod,namespace,container}  │
                    └─────────────────────┬──────────────────────┘
                                          │
              ┌───────────────────────────┼───────────────────────────┐
              │                           │                           │
        ┌─────▼─────┐              ┌──────▼──────┐             ┌──────▼──────┐
        │  Grafana  │              │  Alerting   │             │  Long-term  │
        │ Dashboard │              │  (PagerDuty)│             │   Storage   │
        └───────────┘              └─────────────┘             └─────────────┘
 
SCALE CHARACTERISTICS:
- Collection interval: 10-60 seconds
- Metrics per host: 100-500 baseline, 1000+ with custom
- Cardinality: hosts × metrics × label combinations
- Retention: 7-30 days raw, 1-2 years aggregated
- Query patterns: dashboards (high read), alerts (continuous), ad-hoc (bursty)

Infrastructure Monitoring at Scale
Organization Scale	Hosts	Metrics/Second	Series Count	TSDB Choice
Small Startup	10-100	~1K	~10K	Prometheus, InfluxDB OSS
Mid-size Company	100-1K	~100K	~1M	VictoriaMetrics, TimescaleDB
Large Enterprise	1K-10K	~1M+	~10M+	Cortex, Thanos, M3, InfluxDB Cloud
Hyperscaler	100K+	~100M+	~1B+	Custom (Netflix Atlas, Uber M3)

Key Requirements

•High write throughput — Thousands of hosts each sending metrics every 10 seconds creates massive write load.
•Label-based filtering — Queries filter by environment, region, service, pod name. Efficient label indexing is critical.
•Real-time alerting — Alert evaluation runs continuously; query latency directly impacts detection time.
•Cardinality management — Kubernetes pod names, container IDs create high cardinality. Must be managed carefully.
•Grafana integration — De facto visualization standard; TSDB must support PromQL or SQL efficiently.

The Prometheus Ecosystem

IoT and Industrial Time-Series

iot_architecture.txt

Text

IOT TIME-SERIES ARCHITECTURE
============================
 
                          EDGE TIER
      ┌─────────────────────────────────────────────────┐
      │                                                 │
      │  ┌───────┐  ┌───────┐  ┌───────┐  ┌───────┐   │
      │  │Sensor │  │Sensor │  │Sensor │  │  ...  │   │
      │  │ (1ms) │  │(100ms)│  │ (1s)  │  │       │   │
      │  └───┬───┘  └───┬───┘  └───┬───┘  └───────┘   │
      │      │          │          │                   │
      │      └──────────┼──────────┘                   │
      │                 │                              │
      │           ┌─────▼─────┐                        │
      │           │   Edge    │      Local buffering   │
      │           │  Gateway  │  ◄── Compression       │
      │           │           │      Downsampling      │
      │           └─────┬─────┘                        │
      │                 │                              │
      └─────────────────┼──────────────────────────────┘
                        │
                        │  MQTT / HTTP / gRPC
                        │  (Cellular, LoRaWAN, WiFi)
                        │
      ┌─────────────────▼──────────────────────────────┐
      │                  CLOUD TIER                     │
      │                                                 │
      │  ┌──────────────────────────────────────────┐  │
      │  │          Message Broker                   │  │
      │  │     (Kafka, MQTT Broker, AWS IoT Core)   │  │
      │  └────────────────────┬─────────────────────┘  │
      │                       │                        │
      │  ┌────────────────────▼─────────────────────┐  │
      │  │        TIME-SERIES DATABASE              │  │
      │  │   (TimescaleDB, InfluxDB, QuestDB)       │  │
      │  │                                          │  │
      │  │  Measurements:                           │  │
      │  │  - temperature{device_id, location}      │  │
      │  │  - pressure{device_id, unit}             │  │
      │  │  - vibration{device_id, axis}            │  │
      │  │  - gps_position{vehicle_id}              │  │
      │  └────────────────────┬─────────────────────┘  │
      │                       │                        │
      │    ┌──────────────────┼──────────────────┐     │
      │    │                  │                  │     │
      │ ┌──▼──┐          ┌────▼────┐        ┌───▼───┐  │
      │ │Real-│          │Analytics│        │  ML   │  │
      │ │time │          │   BI    │        │Models │  │
      │ │Alert│          │         │        │       │  │
      │ └─────┘          └─────────┘        └───────┘  │
      └────────────────────────────────────────────────┘
 
IOT-SPECIFIC CHALLENGES:
1. Device cardinality: millions of unique device IDs
2. Irregular intervals: sensors may report only on change
3. Late data: connectivity issues cause out-of-order arrival
4. Edge buffering: devices must store data when offline
5. Geographic distribution: data arrives from global locations
6. Data quality: sensors fail, produce outliers, drift over time

IoT Use Case Scale Examples
Domain	Device Count	Data Points/Day	Key Metrics
Connected Vehicles	1-10 million	1-10B per day	GPS, speed, fuel, engine diagnostics
Smart Manufacturing	10K-100K sensors	100M-1B per day	Temperature, pressure, vibration, throughput
Smart Grid/Utilities	Millions of meters	Billions per day	Power consumption, voltage, frequency
Wearables/Health	Millions of devices	Variable	Heart rate, activity, sleep, SpO2
Smart Buildings	1K-100K per building	10M-100M per day	HVAC, occupancy, energy, access

IoT-Specific TSDB Requirements

•Handle device_id as tag — Millions of unique devices create massive cardinality. TSDBs must handle this without memory explosion.
•Out-of-order write support — Devices buffer when offline; data arrives minutes, hours, or days late. The TSDB must handle back-dated writes efficiently.
•Irregular intervals — Unlike infrastructure metrics, IoT sensors may report only on change or at variable rates.
•Edge preprocessing — Raw sensor data often needs aggregation at the edge before cloud ingestion to reduce bandwidth costs.
•Geospatial extensions — GPS coordinates are common in vehicle/logistics IoT. PostGIS with TimescaleDB is a powerful combination.

The Device ID Cardinality Problem

Financial Time-Series Analytics

financial_timeseries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
-- FINANCIAL TIME-SERIES SCHEMA (TimescaleDB)
-- =============================================
 
-- Tick data: individual trades
CREATE TABLE trades (
    time TIMESTAMPTZ NOT NULL,
    symbol TEXT NOT NULL,
    exchange TEXT NOT NULL,
    price NUMERIC(20, 8) NOT NULL,  -- Precise decimal for currency
    quantity NUMERIC(20, 8) NOT NULL,
    trade_id TEXT,
    trade_type TEXT  -- 'buy', 'sell', 'auction'
);
SELECT create_hypertable('trades', 'time');
CREATE INDEX ON trades (symbol, time DESC);
 
-- OHLCV bars: aggregated candlesticks
CREATE TABLE ohlcv_1min (
    time TIMESTAMPTZ NOT NULL,
    symbol TEXT NOT NULL,
    open NUMERIC(20, 8),
    high NUMERIC(20, 8),
    low NUMERIC(20, 8),
    close NUMERIC(20, 8),
    volume NUMERIC(30, 8),
    trade_count INTEGER,
    vwap NUMERIC(20, 8)  -- Volume-Weighted Average Price
);
SELECT create_hypertable('ohlcv_1min', 'time');
 
-- Order book snapshots (quote data)
CREATE TABLE order_book (
    time TIMESTAMPTZ NOT NULL,
    symbol TEXT NOT NULL,
    bid_price NUMERIC(20, 8),
    bid_size NUMERIC(20, 8),
    ask_price NUMERIC(20, 8),
    ask_size NUMERIC(20, 8),
    spread NUMERIC(20, 8) GENERATED ALWAYS AS (ask_price - bid_price) STORED
);
SELECT create_hypertable('order_book', 'time');
 
-- Continuous aggregate for OHLCV generation
CREATE MATERIALIZED VIEW ohlcv_1min_agg
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 minute', time) AS bucket,
    symbol,
    first(price, time) AS open,
    max(price) AS high,
    min(price) AS low,
    last(price, time) AS close,
    sum(quantity) AS volume,
    count(*) AS trade_count,
    sum(price * quantity) / sum(quantity) AS vwap
FROM trades
GROUP BY bucket, symbol;
 
-- Query: Get OHLCV for a symbol over a date range
SELECT * FROM ohlcv_1min_agg
WHERE symbol = 'AAPL'
  AND bucket >= '2024-01-01'
  AND bucket < '2024-01-15'
ORDER BY bucket;
 
-- Query: Calculate returns over time
SELECT 
    bucket,
    symbol,
    close,
    (close - LAG(close) OVER (PARTITION BY symbol ORDER BY bucket)) / 
        LAG(close) OVER (PARTITION BY symbol ORDER BY bucket) AS return_1m
FROM ohlcv_1min_agg
WHERE symbol = 'AAPL'
  AND bucket >= NOW() - INTERVAL '1 day';

Financial Time-Series Scale
Data Type	Volume	Latency Requirement	Retention
Tick Data	Millions/day per exchange	< 1ms for trading systems	1-7 years (regulatory)
Quote Data	100M+ updates/day	< 10ms for analysis	90 days - 1 year
OHLCV (1min)	~500K bars/day all symbols	< 100ms for dashboards	Forever (reference data)
Order Book Snapshots	Varies by frequency	Real-time for trading	30-90 days
Fundamental Data	Quarterly per company	Not latency-sensitive	Forever

Financial Time-Series Requirements

•Decimal precision — Financial calculations cannot use floating-point. NUMERIC/DECIMAL types with 8+ decimal places are mandatory.
•Microsecond/nanosecond timestamps — Market microstructure analysis requires sub-millisecond precision.
•Complex aggregations — OHLCV, VWAP, TWAP require specialized aggregation functions not found in all TSDBs.
•Regulatory retention — MiFID II, SEC rules mandate years of trade data retention with full audit trails.
•Low-latency reads — Quantitative trading systems require sub-millisecond query latency for real-time signals.
•Multi-asset support — Schema must handle equities, options, futures, forex with different conventions.

Why Financial Firms Often Choose TimescaleDB

Application Performance Monitoring (APM)

apm_metrics.txt

Text

APPLICATION PERFORMANCE MONITORING
===================================
 
THE THREE PILLARS OF OBSERVABILITY:
 
1. METRICS (Time-Series Database)
   ┌───────────────────────────────────────────────────────┐
   │  http_request_duration_seconds{                       │
   │      service="api-gateway",                           │
   │      endpoint="/api/users",                           │
   │      method="GET",                                    │
   │      status_code="200"                                │
   │  } = histogram buckets [0.1, 0.25, 0.5, 1, 2.5, 5]   │
   │                                                       │
   │  http_requests_total{...} = counter                   │
   │  http_request_errors_total{...} = counter             │
   │  active_connections{service="..."} = gauge            │
   └───────────────────────────────────────────────────────┘
 
2. LOGS (Log Aggregation System)
   ┌───────────────────────────────────────────────────────┐
   │  {                                                    │
   │    "timestamp": "2024-01-15T10:30:00.123Z",          │
   │    "level": "ERROR",                                  │
   │    "service": "payment-service",                      │
   │    "trace_id": "abc123",                             │
   │    "message": "Payment failed: card declined",       │
   │    "user_id": "user_456"                             │
   │  }                                                    │
   └───────────────────────────────────────────────────────┘
 
3. TRACES (Distributed Tracing System)
   ┌───────────────────────────────────────────────────────┐
   │  Trace ID: abc123                                     │
   │  ├── Span: api-gateway (50ms total)                  │
   │  │   ├── Span: auth-service (10ms)                   │
   │  │   └── Span: payment-service (35ms)                │
   │  │       ├── Span: db-query (15ms)                   │
   │  │       └── Span: stripe-api (18ms)                 │
   │  └── Response returned                                │
   └───────────────────────────────────────────────────────┘
 
RED METRICS (Request, Error, Duration):
- Request Rate: requests per second per endpoint
- Error Rate: percentage of failed requests
- Duration: latency distribution (p50, p95, p99)
 
USE METRICS (Utilization, Saturation, Errors):
- CPU, Memory, Disk utilization
- Queue depths, connection pool saturation
- Resource-level error rates

apm_queries.promql

PromQL

# KEY APM QUERIES (PromQL / Flux equivalent patterns)
# =================================================
 
# Request Rate (RED: R)
sum(rate(http_requests_total{service="api-gateway"}[5m])) by (endpoint)
 
# Error Rate (RED: E)  
sum(rate(http_requests_total{service="api-gateway",status_code=~"5.."}[5m])) 
/ sum(rate(http_requests_total{service="api-gateway"}[5m])) * 100
 
# Latency Percentiles (RED: D)
histogram_quantile(0.99, 
  sum(rate(http_request_duration_seconds_bucket{service="api-gateway"}[5m])) 
  by (le, endpoint)
)
 
# Apdex Score (Application Performance Index)
# Apdex = (Satisfied + (Tolerating / 2)) / Total
(
  sum(rate(http_request_duration_seconds_bucket{le="0.5"}[5m])) +
  sum(rate(http_request_duration_seconds_bucket{le="2.0"}[5m])) / 2
) / sum(rate(http_requests_total[5m]))
 
# Service Dependency Health
sum(rate(http_requests_total{status_code="200"}[5m])) by (upstream_service) 
/ sum(rate(http_requests_total[5m])) by (upstream_service)
 
# Saturation: Request Queue Depth
avg(pending_requests{service="api-gateway"}) by (pod)
 
# SLO Burn Rate
(
  1 - (
    sum(rate(http_requests_total{status_code="200"}[1h]))
    / sum(rate(http_requests_total[1h]))
  )
) / (1 - 0.999)  # 99.9% SLO target

APM Platform Components

•Prometheus + Grafana: Open-source standard for metrics collection and visualization
•Datadog: Commercial full-stack observability with metrics, logs, traces, and APM unified
•New Relic: End-to-end APM with automatic instrumentation and AI-powered insights
•Dynatrace: AI-driven observability with automatic topology discovery
•Honeycomb: High-cardinality observability focused on debugging complex systems
•Grafana Cloud: Managed Prometheus, Loki, Tempo stack with enterprise features

High-Cardinality APM Challenge

Real-Time Analytics and Business Intelligence

Real-Time Analytics Use Cases
Domain	Metrics Tracked	Query Pattern	Latency Need
E-commerce	Orders, revenue, cart abandonment, inventory	Aggregates by product, region, time	< 1 minute freshness
Gaming	Concurrent players, in-game events, revenue	Leaderboards, engagement metrics	Real-time (< 10s)
Advertising	Impressions, clicks, conversions, spend	Campaign performance, A/B tests	< 5 minutes freshness
Streaming Media	Concurrent streams, quality metrics, engagement	By title, region, device type	Real-time (< 30s)
SaaS Platforms	Active users, feature usage, API calls	By customer, plan, feature	< 1 minute freshness

business_analytics.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- REAL-TIME E-COMMERCE ANALYTICS (TimescaleDB)
-- =============================================
 
-- Event tracking table
CREATE TABLE user_events (
    time TIMESTAMPTZ NOT NULL,
    event_type TEXT NOT NULL,  -- 'page_view', 'add_to_cart', 'purchase'
    user_id TEXT,
    session_id TEXT,
    product_id TEXT,
    category TEXT,
    revenue NUMERIC(10, 2),
    region TEXT
);
SELECT create_hypertable('user_events', 'time');
 
-- Real-time revenue by category (last hour, updating every minute)
CREATE MATERIALIZED VIEW revenue_by_category_hourly
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 minute', time) AS bucket,
    category,
    SUM(revenue) AS total_revenue,
    COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchase_count,
    COUNT(DISTINCT user_id) AS unique_buyers
FROM user_events
WHERE event_type = 'purchase'
GROUP BY bucket, category;
 
SELECT add_continuous_aggregate_policy('revenue_by_category_hourly',
    start_offset => INTERVAL '5 minutes',
    end_offset => INTERVAL '1 minute',
    schedule_interval => INTERVAL '1 minute'
);
 
-- Live dashboard query: Revenue and conversion for last hour
WITH funnel AS (
    SELECT 
        time_bucket('5 minutes', time) AS bucket,
        COUNT(*) FILTER (WHERE event_type = 'page_view') AS views,
        COUNT(*) FILTER (WHERE event_type = 'add_to_cart') AS add_to_carts,
        COUNT(*) FILTER (WHERE event_type = 'purchase') AS purchases,
        SUM(revenue) FILTER (WHERE event_type = 'purchase') AS revenue
    FROM user_events
    WHERE time > NOW() - INTERVAL '1 hour'
    GROUP BY bucket
    ORDER BY bucket
)
SELECT 
    bucket,
    views,
    add_to_carts,
    purchases,
    revenue,
    ROUND(100.0 * add_to_carts / NULLIF(views, 0), 2) AS cart_rate,
    ROUND(100.0 * purchases / NULLIF(add_to_carts, 0), 2) AS conversion_rate
FROM funnel;
 
-- Anomaly detection: Significant revenue drop
SELECT 
    time_bucket('5 minutes', time) AS bucket,
    SUM(revenue) AS revenue,
    AVG(SUM(revenue)) OVER (ORDER BY time_bucket('5 minutes', time) ROWS 12 PRECEDING) AS moving_avg,
    SUM(revenue) < 0.5 * AVG(SUM(revenue)) OVER (ORDER BY time_bucket('5 minutes', time) ROWS 12 PRECEDING) AS is_anomaly
FROM user_events
WHERE event_type = 'purchase'
  AND time > NOW() - INTERVAL '2 hours'
GROUP BY bucket
ORDER BY bucket;

TSDB vs OLAP for Analytics

Emerging Use Cases

Time-series databases are expanding into new domains as organizations recognize the value of temporal data analysis in areas previously served by specialized or general-purpose databases.

Emerging Time-Series Applications

•ML Feature Stores — Training data for time-series forecasting requires historical features with point-in-time correctness. TSDBs provide efficient storage and retrieval of feature values over time, preventing data leakage in model training.
•MLOps Model Monitoring — Track model predictions, actual outcomes, and drift metrics over time. Detect when models degrade and need retraining based on performance time-series.
•Energy and Carbon Monitoring — Track energy consumption, carbon emissions, and sustainability metrics at facility, process, and product levels for ESG reporting and optimization.
•Network Security and SIEM — Flow data, connection logs, and security events form massive time-series. Detect anomalies, analyze attack patterns, and support forensic investigation.
•Autonomous Systems — Self-driving vehicles, drones, and robots generate time-series telemetry that must be stored for debugging, simulation replay, and regulatory compliance.
•Digital Twins — Virtual replicas of physical assets require continuous synchronization with real-world sensor data, creating time-series that model asset behavior over time.
•Climate and Environmental Science — Weather stations, ocean sensors, satellite imagery produce time-series spanning decades, requiring specialized handling for climate modeling.

ml_feature_store.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- ML FEATURE STORE USE CASE
-- Point-in-time correct feature retrieval
 
-- Feature table with versioned features
CREATE TABLE user_features (
    time TIMESTAMPTZ NOT NULL,
    user_id TEXT NOT NULL,
    feature_name TEXT NOT NULL,
    feature_value DOUBLE PRECISION NOT NULL
);
SELECT create_hypertable('user_features', 'time');
 
-- Insert features as they're computed
INSERT INTO user_features VALUES
('2024-01-15 10:00:00', 'user_123', 'purchase_count_30d', 5),
('2024-01-15 10:00:00', 'user_123', 'avg_order_value', 87.50),
('2024-01-15 10:00:00', 'user_123', 'days_since_last_purchase', 3);
 
-- Point-in-time feature retrieval (no data leakage!)
-- Get features as they were known at prediction time, not current values
SELECT 
    l.user_id,
    l.label_time,
    f.feature_name,
    f.feature_value
FROM labels l
CROSS JOIN LATERAL (
    SELECT DISTINCT ON (feature_name)
        feature_name, feature_value
    FROM user_features
    WHERE user_id = l.user_id
      AND time <= l.label_time  -- Only features known before label
    ORDER BY feature_name, time DESC
) f
WHERE l.user_id = 'user_123';

The Convergence Trend

Selecting the Right Time-Series Database

With multiple TSDBs available, selection depends on matching your specific requirements to database strengths. No single TSDB is best for all use cases.

TSDB Selection Matrix
Requirement	Best Options	Why
Full SQL support	TimescaleDB, QuestDB	PostgreSQL-compatible; familiar query language
Prometheus ecosystem	VictoriaMetrics, Cortex, Thanos	Native PromQL; drop-in Prometheus replacement
Maximum write throughput	QuestDB, ClickHouse, VictoriaMetrics	Optimized ingestion engines
IoT with high device cardinality	TimescaleDB, InfluxDB 3.0, QuestDB	Handle millions of unique device IDs
Unified observability platform	InfluxDB, Datadog, Grafana Cloud	Integrated collection, storage, visualization
Financial analytics (precise decimals)	TimescaleDB, QuestDB	Native NUMERIC types; window functions
Minimal operational overhead	InfluxDB Cloud, Timescale Cloud, Grafana Cloud	Fully managed services
Edge/embedded deployment	QuestDB (single binary), InfluxDB OSS	Small footprint; no dependencies

Selection Decision Framework

•Start with query patterns — What questions will you ask? If you need complex SQL analytics, start with TimescaleDB/QuestDB. If you need PromQL for dashboards, start with Prometheus ecosystem.
•Evaluate cardinality — How many unique series? < 1M: almost anything works. > 10M: carefully evaluate cardinality handling.
•Consider the ecosystem — What collection agents exist? What visualization tools integrate? What's your team's expertise?
•Test with realistic data — Benchmarks vary wildly by workload. Load your actual data and run your actual queries.
•Plan for operational reality — Who will operate this? How complex is failover? What's the upgrade path?
•Factor in cost — Open source vs managed. Storage costs at your scale. Query compute costs.

The Pragmatic Approach

Summary: Time-Series Use Cases

We've explored the breadth of time-series database applications—from infrastructure monitoring that birthed the category to emerging use cases in ML and sustainability.

Key Takeaways

•Infrastructure monitoring remains the core use case — Prometheus ecosystem dominates, with scalable backends for enterprise.
•IoT introduces unique challenges — Device cardinality, out-of-order data, and edge processing require careful TSDB selection.
•Financial time-series demands precision — Decimal types, window functions, and SQL are essential; TimescaleDB/QuestDB excel here.
•APM requires three pillars — Metrics in TSDBs, logs in log aggregators, traces in trace systems—unified observability.
•Real-time analytics blurs TSDB/OLAP boundaries — Modern TSDBs increasingly compete with analytical databases for business intelligence.
•Emerging use cases expand the market — ML feature stores, MLOps, digital twins, and sustainability monitoring are growth areas.
•Selection depends on your specific requirements — No universal best; match database strengths to your workload characteristics.

Module Complete:

Key skills acquired:

Understanding time-series data characteristics and why specialized databases exist
Designing efficient schemas with tags vs. fields
Implementing retention policies and data lifecycle management
Selecting the appropriate TSDB for different use cases
Applying time-series patterns across infrastructure, IoT, financial, and business analytics domains

Module Complete: Time-Series Databases

5 / 5