Time Series Databases - Learning Module

Loading content...

0/252

TimescaleDB

PostgreSQL Meets Time-Series: The Best of Both Worlds

The database landscape presents developers with an uncomfortable choice: adopt a purpose-built time-series database and abandon familiar SQL tooling, or shoehorn time-series data into PostgreSQL and suffer performance penalties. TimescaleDB eliminates this tradeoff by extending PostgreSQL with native time-series capabilities.

Launched in 2017, TimescaleDB takes a radically different approach from purpose-built TSDBs like InfluxDB. Rather than creating an entirely new database, TimescaleDB is a PostgreSQL extension that adds transparent time-partitioning, columnar compression, and time-series-specific query optimizations—all while preserving full SQL compatibility. Your existing PostgreSQL tools, ORMs, drivers, and expertise work unchanged. You get the performance of a TSDB with the richness of PostgreSQL.

What You Will Learn

By the end of this page, you will understand: (1) TimescaleDB's architecture and how hypertables work, (2) Automatic time-partitioning (chunking) and its benefits, (3) Columnar compression achieving 90%+ storage reduction, (4) Time-series-specific SQL functions and continuous aggregates, (5) When TimescaleDB excels versus alternatives, and (6) Migration paths from vanilla PostgreSQL.

The PostgreSQL Extension Approach

TimescaleDB's fundamental insight is that PostgreSQL is already an excellent database—it just needs help with the specific characteristics of time-series workloads. Rather than reinventing storage engines, query planners, and replication, TimescaleDB leverages PostgreSQL's mature foundation and adds targeted enhancements.

What TimescaleDB Adds to PostgreSQL:

TimescaleDB Capabilities

•Hypertables: Virtual tables that transparently partition data across time-based chunks. You interact with one table; TimescaleDB manages hundreds or thousands of underlying partitions.
•Automatic Chunking: Data is automatically partitioned into chunks by time (and optionally space). Chunk creation, management, and query routing are fully automated.
•Native Compression: Columnar compression achieving 90-95% storage reduction on typical time-series data. Compressed chunks remain fully queryable.
•Continuous Aggregates: Materialized views that automatically update as new data arrives. Pre-compute rollups without manual ETL pipelines.
•Time-Series Functions: Specialized SQL functions like time_bucket(), locf() (last observation carried forward), interpolate(), and gap-filling operations.
•Retention Policies: Automated data lifecycle management—drop chunks older than threshold with minimal overhead.
•Tiered Storage: Move older, less-accessed data to cheaper storage (object storage in cloud deployments).

What You Keep from PostgreSQL

•Full SQL support (JOINs, CTEs, window functions)
•All PostgreSQL data types
•Existing drivers and ORMs
•ACID transactions
•Extensions (PostGIS, pg_vector, etc.)
•Streaming replication
•PostgreSQL security model

What TimescaleDB Adds

•Transparent automatic partitioning
•10-100x faster time-range queries
•90%+ compression ratios
•Built-in downsampling
•Continuous aggregate refreshes
•Retention automation
•Time-bucketing functions

The Migration Story

If you already have time-series data in PostgreSQL, migration to TimescaleDB can be as simple as: (1) Install the extension, (2) Create a new hypertable, (3) INSERT INTO new_table SELECT * FROM old_table. Your existing queries work unchanged but now run on optimized infrastructure.

Hypertables and Chunks: The Core Abstraction

The hypertable is TimescaleDB's fundamental abstraction. A hypertable looks and acts like a regular PostgreSQL table but internally consists of many smaller chunks, each containing data for a specific time range.

hypertable_concept.txt

Text

HYPERTABLE ARCHITECTURE
=======================
 
                    ┌─────────────────────────────────────────┐
                    │           HYPERTABLE: metrics           │
                    │         (Virtual/Logical Table)          │
                    │                                         │
                    │  SELECT * FROM metrics                  │
                    │  WHERE time > NOW() - INTERVAL '1 day'  │
                    └───────────────────┬─────────────────────┘
                                        │
                                        │ Query Planning:
                                        │ "Only need chunks for last 24h"
                                        │
        ┌───────────────────────────────┼────────────────────────────────┐
        │                               │                                │
        ▼                               ▼                                ▼
┌───────────────┐            ┌───────────────┐              ┌───────────────┐
│   Chunk 1     │            │   Chunk 2     │              │   Chunk 3     │
│ (Jan 1-7)     │            │ (Jan 8-14)    │       ...    │ (Today)       │
│               │            │               │              │               │
│ Regular PG    │            │ Regular PG    │              │ Regular PG    │
│ table with    │            │ table with    │              │ table with    │
│ indexes       │            │ indexes       │              │ indexes       │
│               │            │               │              │               │
│ ○ SKIPPED     │            │ ○ SKIPPED     │              │ ● SCANNED     │
│ (not in range)│            │ (not in range)│              │ (in range!)   │
└───────────────┘            └───────────────┘              └───────────────┘
 
CHUNK EXCLUSION:
- Query planner knows each chunk's time range
- Chunks outside WHERE clause are excluded before scan
- Only touches relevant chunks → orders of magnitude faster
 
EACH CHUNK IS:
- A real PostgreSQL table (with _timescaledb_internal schema)
- Has its own indexes (inherited from hypertable)
- Can be compressed independently
- Can be dropped instantly (for retention)

create_hypertable.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Step 1: Create a regular PostgreSQL table
CREATE TABLE sensor_data (
    time        TIMESTAMPTZ NOT NULL,
    sensor_id   INTEGER NOT NULL,
    location    TEXT NOT NULL,
    temperature DOUBLE PRECISION,
    humidity    DOUBLE PRECISION,
    pressure    DOUBLE PRECISION
);
 
-- Step 2: Convert to hypertable with automatic time-partitioning
-- This is the key TimescaleDB operation!
SELECT create_hypertable('sensor_data', 'time');
 
-- Or with custom chunk interval (default is 7 days)
SELECT create_hypertable(
    'sensor_data', 
    'time',
    chunk_time_interval => INTERVAL '1 day'
);
 
-- Step 3: Add indexes (inherited by all chunks automatically)
CREATE INDEX ON sensor_data (sensor_id, time DESC);
CREATE INDEX ON sensor_data (location, time DESC);
 
-- From this point, INSERT/SELECT work exactly like regular tables
-- But internally, data is partitioned across chunks by time
 
-- Insert data (goes to appropriate chunk automatically)
INSERT INTO sensor_data VALUES 
    (NOW(), 1, 'warehouse-a', 22.5, 45.0, 1013.25),
    (NOW(), 2, 'warehouse-b', 24.1, 42.0, 1012.80);
 
-- Query with time filter (only relevant chunks scanned)
SELECT time_bucket('1 hour', time) AS hour,
       sensor_id,
       AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour, sensor_id
ORDER BY hour DESC;

Why Chunking Matters:

•Query Performance: Time-range queries only scan relevant chunks. A query for the last hour doesn't touch last year's data.
•Index Efficiency: Smaller per-chunk B-trees outperform massive single-table indexes. Index lookups are faster when indexes fit in memory.
•Compression: Chunks are compressed independently. Older chunks (immutable) achieve higher compression ratios.
•Retention: Dropping a chunk is an instant DROP TABLE operation. No slow DELETE + VACUUM cycles.
•Parallelism: Chunks can be scanned in parallel across CPU cores. More chunks = more parallelism opportunity.

Choosing Chunk Interval

Chunk interval affects performance. Too small = too many chunks = metadata overhead. Too large = chunks too big = less exclusion benefit. Rule of thumb: aim for chunks that fit in 25% of available memory. For 64GB RAM, target ~16GB uncompressed chunk size. Adjust chunk_time_interval based on your data ingestion rate.

Native Columnar Compression

TimescaleDB's native compression transforms chunks from PostgreSQL's row-based heap format into a columnar layout with aggressive type-specific compression. This achieves 90-95% storage reduction while keeping data fully queryable.

compression_setup.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Enable compression on a hypertable
ALTER TABLE sensor_data SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id',  -- Separate segments per sensor
    timescaledb.compress_orderby = 'time DESC'      -- Order within segments
);
 
-- Add compression policy: compress chunks older than 7 days
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
 
-- Or compress specific chunks manually
SELECT compress_chunk('_timescaledb_internal._hyper_1_10_chunk');
 
-- View compression statistics
SELECT 
    chunk_name,
    pg_size_pretty(before_compression_total_bytes) AS before,
    pg_size_pretty(after_compression_total_bytes) AS after,
    ROUND((1 - after_compression_total_bytes::numeric / 
           before_compression_total_bytes::numeric) * 100, 1) AS compression_ratio
FROM chunk_compression_stats('sensor_data')
ORDER BY chunk_name;
 
-- Example output:
-- chunk_name                        | before  | after  | compression_ratio
-- _hyper_1_1_chunk                  | 1200 MB | 84 MB  | 93.0
-- _hyper_1_2_chunk                  | 1180 MB | 78 MB  | 93.4
-- _hyper_1_3_chunk                  | 1210 MB | 82 MB  | 93.2

How Compression Works:

compression_internals.txt

Text

COMPRESSION TRANSFORMATION
==========================
 
BEFORE COMPRESSION (Row-based PostgreSQL heap):
┌──────────────────────────────────────────────────────────────────────┐
│ Row 1: time=2024-01-15T10:00:00 | sensor_id=1 | temp=22.5 | hum=45.0 │
│ Row 2: time=2024-01-15T10:00:01 | sensor_id=1 | temp=22.6 | hum=45.1 │
│ Row 3: time=2024-01-15T10:00:02 | sensor_id=1 | temp=22.5 | hum=45.0 │
│ ... (millions of rows)                                                │
└──────────────────────────────────────────────────────────────────────┘
 
AFTER COMPRESSION (Columnar segments):
┌─────────────────────────────────────────────────────────────────────┐
│ Segment: sensor_id=1                                                │
│ ┌────────────────────────────────────────────────────────────────┐  │
│ │ time column: [t1, t2, t3, ...]   → Delta-of-delta + LZ4        │  │
│ │   Regular intervals compress extremely well                     │  │
│ │   1000 timestamps → ~100 bytes                                  │  │
│ └────────────────────────────────────────────────────────────────┘  │
│ ┌────────────────────────────────────────────────────────────────┐  │
│ │ temperature column: [22.5, 22.6, 22.5, ...]  → Gorilla + LZ4   │  │
│ │   Similar floats XOR to values with many zeros                  │  │
│ │   1000 floats → ~400 bytes                                      │  │
│ └────────────────────────────────────────────────────────────────┘  │
│ ┌────────────────────────────────────────────────────────────────┐  │
│ │ humidity column: [45.0, 45.1, 45.0, ...]  → Gorilla + LZ4      │  │
│ └────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
 
COMPRESSION ALGORITHMS BY TYPE:
- Timestamps:  Delta-of-delta encoding (regular intervals → tiny deltas)
- Floats:      Gorilla compression (XOR encoding for similar values)  
- Integers:    Delta + Simple8B bit-packing
- Strings:     Dictionary encoding + LZ4
- All types:   Final LZ4 pass for additional compression

Compression Key Concepts

•segmentby: Defines how data is grouped within compressed chunks. Similar to 'partition by' in analytics. Group by the column you most frequently filter on.
•orderby: Determines ordering within segments. Usually 'time DESC' for time-series. Affects which queries can use segment metadata for filtering.
•Compressed chunks remain queryable: TimescaleDB decompresses on-the-fly during queries. No separate ETL to access historical data.
•Compression is per-chunk: Active (recent) chunks stay uncompressed for fast writes. Older chunks compress automatically via policy.

Compression Tradeoffs

Compressed chunks are append-only. Updates and deletes require decompressing the chunk first. This matches time-series workloads (append-mostly) but is important for hybrid use cases. Also, query performance on compressed chunks may be slightly slower than uncompressed—but the storage savings often outweigh this for cold data.

Time-Series SQL Functions

TimescaleDB extends PostgreSQL with time-series-specific functions that simplify common analytical patterns. These functions are optimized for the chunked storage model.

timescale_functions.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- TIME_BUCKET: Group data into time intervals
-- The most-used TimescaleDB function
SELECT 
    time_bucket('5 minutes', time) AS bucket,
    sensor_id,
    AVG(temperature) AS avg_temp,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, sensor_id
ORDER BY bucket, sensor_id;
 
-- TIME_BUCKET with origin (align buckets to specific time)
SELECT 
    time_bucket('1 hour', time, TIMESTAMP '2024-01-01 00:00:00') AS bucket,
    COUNT(*) AS readings
FROM sensor_data
GROUP BY bucket;
 
-- GAP FILLING: Generate rows for missing time intervals
SELECT 
    time_bucket_gapfill('1 minute', time) AS minute,
    sensor_id,
    AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
  AND sensor_id = 1
GROUP BY minute, sensor_id
ORDER BY minute;
 
-- LOCF (Last Observation Carried Forward): Fill gaps with previous value
SELECT 
    time_bucket_gapfill('1 minute', time) AS minute,
    sensor_id,
    locf(AVG(temperature)) AS temperature  -- Carry forward last known value
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
  AND sensor_id = 1
GROUP BY minute, sensor_id;
 
-- INTERPOLATE: Linear interpolation between known values
SELECT 
    time_bucket_gapfill('1 minute', time) AS minute,
    interpolate(AVG(temperature)) AS temperature
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
  AND sensor_id = 1
GROUP BY minute;
 
-- FIRST/LAST: Get first or last value in each bucket
SELECT 
    time_bucket('1 hour', time) AS hour,
    first(temperature, time) AS opening_temp,
    last(temperature, time) AS closing_temp,
    MAX(temperature) AS high,
    MIN(temperature) AS low
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
 
-- DELTA: Calculate difference from previous value (for counters)
SELECT 
    time,
    sensor_id,
    delta(total_requests) AS requests_in_period
FROM (
    SELECT 
        time_bucket('1 minute', time) AS time,
        sensor_id,
        last(request_counter, time) AS total_requests
    FROM http_metrics
    WHERE time > NOW() - INTERVAL '1 hour'
    GROUP BY 1, 2
) subq;
 
-- HISTOGRAM: Distribution of values
SELECT 
    histogram(temperature, 20.0, 30.0, 10) AS temp_histogram
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours';

Key TimescaleDB Functions
Function	Purpose	Use Case
`time_bucket(interval, time)`	Group timestamps into fixed intervals	Downsampling, aggregation windows
`time_bucket_gapfill()`	Generate rows for missing intervals	Creating continuous time series
`locf(value)`	Last observation carried forward	Fill gaps with previous value
`interpolate(value)`	Linear interpolation	Smooth estimation between points
`first(value, time)`	First value ordered by time	Opening prices, initial readings
`last(value, time)`	Last value ordered by time	Closing prices, final readings
`histogram(value, min, max, buckets)`	Value distribution	Latency distributions, percentiles
`approximate_row_count()`	Fast table size estimate	Dashboard queries avoiding COUNT(*)

time_bucket vs DATE_TRUNC

PostgreSQL's DATE_TRUNC only supports calendar intervals (hour, day, week). time_bucket supports arbitrary intervals (5 minutes, 15 minutes, 4 hours) and handles timezone-aware bucketing correctly. Always prefer time_bucket for time-series work.

Continuous Aggregates: Automatic Materialization

Continuous aggregates are materialized views that automatically refresh as new data arrives. They pre-compute aggregations, making queries over long time ranges return instantly instead of scanning billions of rows.

continuous_aggregates.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- Create a continuous aggregate for hourly temperature summaries
CREATE MATERIALIZED VIEW hourly_temperature
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 hour', time) AS hour,
    sensor_id,
    location,
    AVG(temperature) AS avg_temp,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp,
    COUNT(*) AS reading_count
FROM sensor_data
GROUP BY hour, sensor_id, location
WITH NO DATA;  -- Don't populate immediately
 
-- Add automatic refresh policy
SELECT add_continuous_aggregate_policy('hourly_temperature',
    start_offset => INTERVAL '3 hours',   -- Refresh data starting from 3 hours ago
    end_offset => INTERVAL '1 hour',      -- Up to 1 hour ago (allow late data)
    schedule_interval => INTERVAL '1 hour' -- Run every hour
);
 
-- Manually refresh a specific time range
CALL refresh_continuous_aggregate('hourly_temperature', 
    '2024-01-01', '2024-01-15');
 
-- Query the continuous aggregate (instant response!)
SELECT * FROM hourly_temperature
WHERE hour > NOW() - INTERVAL '30 days'
  AND location = 'warehouse-a'
ORDER BY hour DESC;
 
-- HIERARCHICAL AGGREGATES: Build on top of existing aggregates
CREATE MATERIALIZED VIEW daily_temperature
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 day', hour) AS day,
    sensor_id,
    location,
    AVG(avg_temp) AS avg_temp,
    MAX(max_temp) AS max_temp,
    MIN(min_temp) AS min_temp,
    SUM(reading_count) AS total_readings
FROM hourly_temperature
GROUP BY day, sensor_id, location
WITH DATA;
 
-- REAL-TIME AGGREGATES: Combine materialized + recent unmaterialized data
-- By default, queries transparently merge materialized data with recent raw data
 
-- View refresh status
SELECT * FROM timescaledb_information.continuous_aggregates
WHERE view_name = 'hourly_temperature';
 
-- See materialization progress
SELECT * FROM timescaledb_information.continuous_aggregate_stats
WHERE view_name = 'hourly_temperature';

How Continuous Aggregates Work:

continuous_aggregate_flow.txt

Text

CONTINUOUS AGGREGATE ARCHITECTURE
==================================
 
     RAW DATA                    CONTINUOUS AGGREGATE
   (hypertable)                  (materialized view)
                   
┌──────────────┐                ┌──────────────────────────┐
│ Raw chunk 1  │   Refresh     │ Materialized result      │
│ Raw chunk 2  │ ───────────►  │ (pre-computed hourly     │
│ Raw chunk 3  │   Policy      │  aggregates)             │
│ ... (100s)   │               │                          │
└──────────────┘               └──────────────────────────┘
       │                                    │
       │                                    │
       ▼                                    ▼
┌──────────────┐               ┌──────────────────────────┐
│ Recent data  │◄── UNION ────►│ Query over 30 days       │
│ (not yet     │               │ Instant! Pre-computed +  │
│ materialized)│               │ recent data merged       │
└──────────────┘               └──────────────────────────┘
 
REFRESH WINDOW:
│◄────── start_offset ──────►│◄─ end_offset ─►│ NOW
                              │               │
Old data already             Refresh zone    Too recent
materialized                 (update this)    (wait for
                                              late data)
 
REAL-TIME AGGREGATES:
- Queries automatically combine:
  1. Materialized data (fast, pre-computed)
  2. Recent raw data (live, computed on query)
- Seamless to application—looks like one table

Continuous Aggregate Benefits

Without continuous aggregates, a query like 'average temperature per hour for the last year' would scan billions of raw data points. With continuous aggregates, it reads ~8,760 pre-computed hourly rows. Query time drops from minutes to milliseconds. Storage of aggregates is typically <1% of raw data.

Data Lifecycle Management

Time-series data naturally ages—second-level granularity becomes less valuable over time. TimescaleDB provides comprehensive data lifecycle management through retention policies, compression policies, and tiered storage.

data_lifecycle.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- RETENTION POLICY: Automatically drop old chunks
-- This is the most efficient way to delete time-series data
 
-- Drop chunks older than 90 days
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');
 
-- For continuous aggregates, keep summaries longer than raw data
SELECT add_retention_policy('hourly_temperature', INTERVAL '2 years');
SELECT add_retention_policy('daily_temperature', INTERVAL '10 years');
 
-- View existing retention policies
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';
 
-- Drop a retention policy
SELECT remove_retention_policy('sensor_data');
 
-- COMPRESSION POLICY: Compress chunks older than threshold
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
 
-- Reorder to optimize compression (optional, can improve ratios)
SELECT add_reorder_policy('sensor_data', 'sensor_data_sensor_id_time_idx');
 
-- TIERED STORAGE (TimescaleDB Cloud / Enterprise)
-- Move cold data to object storage (S3, GCS)
SELECT add_tiering_policy('sensor_data', INTERVAL '30 days');
 
-- COMPLETE LIFECYCLE EXAMPLE:
-- Days 0-7:   Uncompressed, fast writes, hot storage
-- Days 7-30:  Compressed, warm storage  
-- Days 30-90: Tiered to object storage, cold storage
-- Day 90+:   Dropped automatically
 
-- View chunk information including compression and tiering status
SELECT 
    chunk_table,
    chunk_name,
    range_start,
    range_end,
    pg_size_pretty(chunk_bytes) AS size,
    is_compressed,
    is_tiered
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start DESC
LIMIT 20;

Data Lifecycle Stages
Stage	Age	State	Cost/Performance
Hot	0-7 days	Uncompressed, local SSD	High cost, fastest queries/writes
Warm	7-30 days	Compressed, local disk	Medium cost, fast queries, no writes
Cold	30-90 days	Tiered to object storage	Low cost, slower queries, archival
Expired	90+ days	Automatically dropped	No cost, no access

Retain Aggregates, Drop Raw

A powerful pattern: keep raw data for 30 days (operational debugging), hourly aggregates for 2 years (trend analysis), daily aggregates forever (historical views). This provides the best of both worlds—recent detail and long-term insights—with minimal storage cost.

Performance Tuning and Best Practices

Maximizing TimescaleDB performance requires understanding how hypertables interact with PostgreSQL's query planner and tuning configurations appropriately.

Key Performance Optimizations

•Always include time predicates: Queries without time filters scan all chunks. Even WHERE time > '1970-01-01' is better than nothing (enables chunk exclusion).
•Match indexes to query patterns: Create composite indexes matching your WHERE and ORDER BY clauses. For WHERE sensor_id = X AND time > Y ORDER BY time, index on (sensor_id, time DESC).
•Tune chunk_time_interval: Chunk size affects exclusion granularity and parallelism. More chunks = better exclusion but more metadata overhead.
•Use compression segmentby wisely: Match segmentby to your most common filter column for best compressed query performance.
•Leverage continuous aggregates: Pre-compute common aggregations. Dashboards should query aggregates, not raw data.
•Configure parallel workers: Set max_parallel_workers_per_gather high enough to parallelize across chunks.
•Monitor chunk count: Thousands of chunks is fine; millions becomes problematic. Tune retention and chunk intervals.

performance_queries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Check if queries are using chunk exclusion
EXPLAIN ANALYZE
SELECT AVG(temperature) FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour';
 
-- Look for: "Chunks excluded: 157" in the output
-- This means 157 chunks were skipped without scanning!
 
-- View chunk information
SELECT * FROM chunks_detailed_size('sensor_data');
 
-- Monitor compression effectiveness
SELECT hypertable_name,
       pg_size_pretty(before_compression_total_bytes) AS before,
       pg_size_pretty(after_compression_total_bytes) AS after,
       ROUND((1 - after_compression_total_bytes::numeric / 
              before_compression_total_bytes::numeric) * 100, 1) AS ratio
FROM hypertable_compression_stats()
WHERE hypertable_name = 'sensor_data';
 
-- Key PostgreSQL settings for TimescaleDB
ALTER SYSTEM SET shared_buffers = '8GB';              -- 25% of RAM
ALTER SYSTEM SET effective_cache_size = '24GB';       -- 75% of RAM
ALTER SYSTEM SET work_mem = '64MB';                   -- Per-operation sort memory
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;  -- Parallel query workers
ALTER SYSTEM SET random_page_cost = 1.1;              -- For SSDs
SELECT pg_reload_conf();

Common Performance Pitfalls

Queries without time filters scan all chunks—always include time predicates. 2) Too many small chunks cause metadata overhead—balance chunk_time_interval. 3) Missing indexes on non-time filter columns (sensor_id, location) cause full chunk scans. 4) Not using continuous aggregates for dashboard queries.

Summary: TimescaleDB

We've explored TimescaleDB comprehensively—from its PostgreSQL extension architecture through hypertables, compression, and data lifecycle management.

Key Takeaways

•TimescaleDB extends PostgreSQL rather than replacing it — Full SQL compatibility, existing tools work, but with time-series superpowers.
•Hypertables and chunks enable efficient time-partitioning — Transparent to applications, but queries only scan relevant time ranges.
•Native compression achieves 90%+ storage reduction — Columnar format with type-specific encoding for dramatic space savings.
•Continuous aggregates pre-compute common queries — Turn multi-minute aggregations into millisecond lookups.
•Data lifecycle management is built-in — Compression, tiering, and retention policies automate data aging.
•Performance requires time predicates and proper indexing — Chunk exclusion is the key optimization; always filter by time.

When to Choose TimescaleDB:

You want full SQL compatibility and a PostgreSQL ecosystem
You need time-series + relational data in the same database
Your team has PostgreSQL expertise
You want to leverage PostgreSQL extensions (PostGIS, pg_vector)
You need ACID transactions across time-series and relational data
Schema flexibility and complex JOINs are important

What's Next:

We'll explore retention policies in depth—understanding how TSDBs manage data lifecycle, from high-resolution recent data through progressively downsampled archives to eventual expiration. This is crucial for cost management at scale.

TimescaleDB Mastery Achieved

You now understand TimescaleDB's architecture, can design efficient hypertable schemas, implement compression and continuous aggregates, and configure data lifecycle policies. You're equipped to evaluate when TimescaleDB is the right choice and deploy it effectively.

TimescaleDB

PostgreSQL Meets Time-Series: The Best of Both Worlds

What You Will Learn

The PostgreSQL Extension Approach

What TimescaleDB Adds to PostgreSQL:

TimescaleDB Capabilities

•Hypertables: Virtual tables that transparently partition data across time-based chunks. You interact with one table; TimescaleDB manages hundreds or thousands of underlying partitions.
•Automatic Chunking: Data is automatically partitioned into chunks by time (and optionally space). Chunk creation, management, and query routing are fully automated.
•Native Compression: Columnar compression achieving 90-95% storage reduction on typical time-series data. Compressed chunks remain fully queryable.
•Continuous Aggregates: Materialized views that automatically update as new data arrives. Pre-compute rollups without manual ETL pipelines.
•Time-Series Functions: Specialized SQL functions like time_bucket(), locf() (last observation carried forward), interpolate(), and gap-filling operations.
•Retention Policies: Automated data lifecycle management—drop chunks older than threshold with minimal overhead.
•Tiered Storage: Move older, less-accessed data to cheaper storage (object storage in cloud deployments).

What You Keep from PostgreSQL

•Full SQL support (JOINs, CTEs, window functions)
•All PostgreSQL data types
•Existing drivers and ORMs
•ACID transactions
•Extensions (PostGIS, pg_vector, etc.)
•Streaming replication
•PostgreSQL security model

What TimescaleDB Adds

•Transparent automatic partitioning
•10-100x faster time-range queries
•90%+ compression ratios
•Built-in downsampling
•Continuous aggregate refreshes
•Retention automation
•Time-bucketing functions

The Migration Story

Hypertables and Chunks: The Core Abstraction

hypertable_concept.txt

Text

HYPERTABLE ARCHITECTURE
=======================
 
                    ┌─────────────────────────────────────────┐
                    │           HYPERTABLE: metrics           │
                    │         (Virtual/Logical Table)          │
                    │                                         │
                    │  SELECT * FROM metrics                  │
                    │  WHERE time > NOW() - INTERVAL '1 day'  │
                    └───────────────────┬─────────────────────┘
                                        │
                                        │ Query Planning:
                                        │ "Only need chunks for last 24h"
                                        │
        ┌───────────────────────────────┼────────────────────────────────┐
        │                               │                                │
        ▼                               ▼                                ▼
┌───────────────┐            ┌───────────────┐              ┌───────────────┐
│   Chunk 1     │            │   Chunk 2     │              │   Chunk 3     │
│ (Jan 1-7)     │            │ (Jan 8-14)    │       ...    │ (Today)       │
│               │            │               │              │               │
│ Regular PG    │            │ Regular PG    │              │ Regular PG    │
│ table with    │            │ table with    │              │ table with    │
│ indexes       │            │ indexes       │              │ indexes       │
│               │            │               │              │               │
│ ○ SKIPPED     │            │ ○ SKIPPED     │              │ ● SCANNED     │
│ (not in range)│            │ (not in range)│              │ (in range!)   │
└───────────────┘            └───────────────┘              └───────────────┘
 
CHUNK EXCLUSION:
- Query planner knows each chunk's time range
- Chunks outside WHERE clause are excluded before scan
- Only touches relevant chunks → orders of magnitude faster
 
EACH CHUNK IS:
- A real PostgreSQL table (with _timescaledb_internal schema)
- Has its own indexes (inherited from hypertable)
- Can be compressed independently
- Can be dropped instantly (for retention)

create_hypertable.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Step 1: Create a regular PostgreSQL table
CREATE TABLE sensor_data (
    time        TIMESTAMPTZ NOT NULL,
    sensor_id   INTEGER NOT NULL,
    location    TEXT NOT NULL,
    temperature DOUBLE PRECISION,
    humidity    DOUBLE PRECISION,
    pressure    DOUBLE PRECISION
);
 
-- Step 2: Convert to hypertable with automatic time-partitioning
-- This is the key TimescaleDB operation!
SELECT create_hypertable('sensor_data', 'time');
 
-- Or with custom chunk interval (default is 7 days)
SELECT create_hypertable(
    'sensor_data', 
    'time',
    chunk_time_interval => INTERVAL '1 day'
);
 
-- Step 3: Add indexes (inherited by all chunks automatically)
CREATE INDEX ON sensor_data (sensor_id, time DESC);
CREATE INDEX ON sensor_data (location, time DESC);
 
-- From this point, INSERT/SELECT work exactly like regular tables
-- But internally, data is partitioned across chunks by time
 
-- Insert data (goes to appropriate chunk automatically)
INSERT INTO sensor_data VALUES 
    (NOW(), 1, 'warehouse-a', 22.5, 45.0, 1013.25),
    (NOW(), 2, 'warehouse-b', 24.1, 42.0, 1012.80);
 
-- Query with time filter (only relevant chunks scanned)
SELECT time_bucket('1 hour', time) AS hour,
       sensor_id,
       AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour, sensor_id
ORDER BY hour DESC;

Why Chunking Matters:

•Query Performance: Time-range queries only scan relevant chunks. A query for the last hour doesn't touch last year's data.
•Index Efficiency: Smaller per-chunk B-trees outperform massive single-table indexes. Index lookups are faster when indexes fit in memory.
•Compression: Chunks are compressed independently. Older chunks (immutable) achieve higher compression ratios.
•Retention: Dropping a chunk is an instant DROP TABLE operation. No slow DELETE + VACUUM cycles.
•Parallelism: Chunks can be scanned in parallel across CPU cores. More chunks = more parallelism opportunity.

Choosing Chunk Interval

Native Columnar Compression

compression_setup.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Enable compression on a hypertable
ALTER TABLE sensor_data SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'sensor_id',  -- Separate segments per sensor
    timescaledb.compress_orderby = 'time DESC'      -- Order within segments
);
 
-- Add compression policy: compress chunks older than 7 days
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
 
-- Or compress specific chunks manually
SELECT compress_chunk('_timescaledb_internal._hyper_1_10_chunk');
 
-- View compression statistics
SELECT 
    chunk_name,
    pg_size_pretty(before_compression_total_bytes) AS before,
    pg_size_pretty(after_compression_total_bytes) AS after,
    ROUND((1 - after_compression_total_bytes::numeric / 
           before_compression_total_bytes::numeric) * 100, 1) AS compression_ratio
FROM chunk_compression_stats('sensor_data')
ORDER BY chunk_name;
 
-- Example output:
-- chunk_name                        | before  | after  | compression_ratio
-- _hyper_1_1_chunk                  | 1200 MB | 84 MB  | 93.0
-- _hyper_1_2_chunk                  | 1180 MB | 78 MB  | 93.4
-- _hyper_1_3_chunk                  | 1210 MB | 82 MB  | 93.2

How Compression Works:

compression_internals.txt

Text

COMPRESSION TRANSFORMATION
==========================
 
BEFORE COMPRESSION (Row-based PostgreSQL heap):
┌──────────────────────────────────────────────────────────────────────┐
│ Row 1: time=2024-01-15T10:00:00 | sensor_id=1 | temp=22.5 | hum=45.0 │
│ Row 2: time=2024-01-15T10:00:01 | sensor_id=1 | temp=22.6 | hum=45.1 │
│ Row 3: time=2024-01-15T10:00:02 | sensor_id=1 | temp=22.5 | hum=45.0 │
│ ... (millions of rows)                                                │
└──────────────────────────────────────────────────────────────────────┘
 
AFTER COMPRESSION (Columnar segments):
┌─────────────────────────────────────────────────────────────────────┐
│ Segment: sensor_id=1                                                │
│ ┌────────────────────────────────────────────────────────────────┐  │
│ │ time column: [t1, t2, t3, ...]   → Delta-of-delta + LZ4        │  │
│ │   Regular intervals compress extremely well                     │  │
│ │   1000 timestamps → ~100 bytes                                  │  │
│ └────────────────────────────────────────────────────────────────┘  │
│ ┌────────────────────────────────────────────────────────────────┐  │
│ │ temperature column: [22.5, 22.6, 22.5, ...]  → Gorilla + LZ4   │  │
│ │   Similar floats XOR to values with many zeros                  │  │
│ │   1000 floats → ~400 bytes                                      │  │
│ └────────────────────────────────────────────────────────────────┘  │
│ ┌────────────────────────────────────────────────────────────────┐  │
│ │ humidity column: [45.0, 45.1, 45.0, ...]  → Gorilla + LZ4      │  │
│ └────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
 
COMPRESSION ALGORITHMS BY TYPE:
- Timestamps:  Delta-of-delta encoding (regular intervals → tiny deltas)
- Floats:      Gorilla compression (XOR encoding for similar values)  
- Integers:    Delta + Simple8B bit-packing
- Strings:     Dictionary encoding + LZ4
- All types:   Final LZ4 pass for additional compression

Compression Key Concepts

•segmentby: Defines how data is grouped within compressed chunks. Similar to 'partition by' in analytics. Group by the column you most frequently filter on.
•orderby: Determines ordering within segments. Usually 'time DESC' for time-series. Affects which queries can use segment metadata for filtering.
•Compressed chunks remain queryable: TimescaleDB decompresses on-the-fly during queries. No separate ETL to access historical data.
•Compression is per-chunk: Active (recent) chunks stay uncompressed for fast writes. Older chunks compress automatically via policy.

Compression Tradeoffs

Time-Series SQL Functions

TimescaleDB extends PostgreSQL with time-series-specific functions that simplify common analytical patterns. These functions are optimized for the chunked storage model.

timescale_functions.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- TIME_BUCKET: Group data into time intervals
-- The most-used TimescaleDB function
SELECT 
    time_bucket('5 minutes', time) AS bucket,
    sensor_id,
    AVG(temperature) AS avg_temp,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket, sensor_id
ORDER BY bucket, sensor_id;
 
-- TIME_BUCKET with origin (align buckets to specific time)
SELECT 
    time_bucket('1 hour', time, TIMESTAMP '2024-01-01 00:00:00') AS bucket,
    COUNT(*) AS readings
FROM sensor_data
GROUP BY bucket;
 
-- GAP FILLING: Generate rows for missing time intervals
SELECT 
    time_bucket_gapfill('1 minute', time) AS minute,
    sensor_id,
    AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
  AND sensor_id = 1
GROUP BY minute, sensor_id
ORDER BY minute;
 
-- LOCF (Last Observation Carried Forward): Fill gaps with previous value
SELECT 
    time_bucket_gapfill('1 minute', time) AS minute,
    sensor_id,
    locf(AVG(temperature)) AS temperature  -- Carry forward last known value
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
  AND sensor_id = 1
GROUP BY minute, sensor_id;
 
-- INTERPOLATE: Linear interpolation between known values
SELECT 
    time_bucket_gapfill('1 minute', time) AS minute,
    interpolate(AVG(temperature)) AS temperature
FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour'
  AND sensor_id = 1
GROUP BY minute;
 
-- FIRST/LAST: Get first or last value in each bucket
SELECT 
    time_bucket('1 hour', time) AS hour,
    first(temperature, time) AS opening_temp,
    last(temperature, time) AS closing_temp,
    MAX(temperature) AS high,
    MIN(temperature) AS low
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
 
-- DELTA: Calculate difference from previous value (for counters)
SELECT 
    time,
    sensor_id,
    delta(total_requests) AS requests_in_period
FROM (
    SELECT 
        time_bucket('1 minute', time) AS time,
        sensor_id,
        last(request_counter, time) AS total_requests
    FROM http_metrics
    WHERE time > NOW() - INTERVAL '1 hour'
    GROUP BY 1, 2
) subq;
 
-- HISTOGRAM: Distribution of values
SELECT 
    histogram(temperature, 20.0, 30.0, 10) AS temp_histogram
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours';

Key TimescaleDB Functions
Function	Purpose	Use Case
`time_bucket(interval, time)`	Group timestamps into fixed intervals	Downsampling, aggregation windows
`time_bucket_gapfill()`	Generate rows for missing intervals	Creating continuous time series
`locf(value)`	Last observation carried forward	Fill gaps with previous value
`interpolate(value)`	Linear interpolation	Smooth estimation between points
`first(value, time)`	First value ordered by time	Opening prices, initial readings
`last(value, time)`	Last value ordered by time	Closing prices, final readings
`histogram(value, min, max, buckets)`	Value distribution	Latency distributions, percentiles
`approximate_row_count()`	Fast table size estimate	Dashboard queries avoiding COUNT(*)

time_bucket vs DATE_TRUNC

Continuous Aggregates: Automatic Materialization

continuous_aggregates.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
-- Create a continuous aggregate for hourly temperature summaries
CREATE MATERIALIZED VIEW hourly_temperature
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 hour', time) AS hour,
    sensor_id,
    location,
    AVG(temperature) AS avg_temp,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp,
    COUNT(*) AS reading_count
FROM sensor_data
GROUP BY hour, sensor_id, location
WITH NO DATA;  -- Don't populate immediately
 
-- Add automatic refresh policy
SELECT add_continuous_aggregate_policy('hourly_temperature',
    start_offset => INTERVAL '3 hours',   -- Refresh data starting from 3 hours ago
    end_offset => INTERVAL '1 hour',      -- Up to 1 hour ago (allow late data)
    schedule_interval => INTERVAL '1 hour' -- Run every hour
);
 
-- Manually refresh a specific time range
CALL refresh_continuous_aggregate('hourly_temperature', 
    '2024-01-01', '2024-01-15');
 
-- Query the continuous aggregate (instant response!)
SELECT * FROM hourly_temperature
WHERE hour > NOW() - INTERVAL '30 days'
  AND location = 'warehouse-a'
ORDER BY hour DESC;
 
-- HIERARCHICAL AGGREGATES: Build on top of existing aggregates
CREATE MATERIALIZED VIEW daily_temperature
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 day', hour) AS day,
    sensor_id,
    location,
    AVG(avg_temp) AS avg_temp,
    MAX(max_temp) AS max_temp,
    MIN(min_temp) AS min_temp,
    SUM(reading_count) AS total_readings
FROM hourly_temperature
GROUP BY day, sensor_id, location
WITH DATA;
 
-- REAL-TIME AGGREGATES: Combine materialized + recent unmaterialized data
-- By default, queries transparently merge materialized data with recent raw data
 
-- View refresh status
SELECT * FROM timescaledb_information.continuous_aggregates
WHERE view_name = 'hourly_temperature';
 
-- See materialization progress
SELECT * FROM timescaledb_information.continuous_aggregate_stats
WHERE view_name = 'hourly_temperature';

How Continuous Aggregates Work:

continuous_aggregate_flow.txt

Text

CONTINUOUS AGGREGATE ARCHITECTURE
==================================
 
     RAW DATA                    CONTINUOUS AGGREGATE
   (hypertable)                  (materialized view)
                   
┌──────────────┐                ┌──────────────────────────┐
│ Raw chunk 1  │   Refresh     │ Materialized result      │
│ Raw chunk 2  │ ───────────►  │ (pre-computed hourly     │
│ Raw chunk 3  │   Policy      │  aggregates)             │
│ ... (100s)   │               │                          │
└──────────────┘               └──────────────────────────┘
       │                                    │
       │                                    │
       ▼                                    ▼
┌──────────────┐               ┌──────────────────────────┐
│ Recent data  │◄── UNION ────►│ Query over 30 days       │
│ (not yet     │               │ Instant! Pre-computed +  │
│ materialized)│               │ recent data merged       │
└──────────────┘               └──────────────────────────┘
 
REFRESH WINDOW:
│◄────── start_offset ──────►│◄─ end_offset ─►│ NOW
                              │               │
Old data already             Refresh zone    Too recent
materialized                 (update this)    (wait for
                                              late data)
 
REAL-TIME AGGREGATES:
- Queries automatically combine:
  1. Materialized data (fast, pre-computed)
  2. Recent raw data (live, computed on query)
- Seamless to application—looks like one table

Continuous Aggregate Benefits

Data Lifecycle Management

data_lifecycle.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- RETENTION POLICY: Automatically drop old chunks
-- This is the most efficient way to delete time-series data
 
-- Drop chunks older than 90 days
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');
 
-- For continuous aggregates, keep summaries longer than raw data
SELECT add_retention_policy('hourly_temperature', INTERVAL '2 years');
SELECT add_retention_policy('daily_temperature', INTERVAL '10 years');
 
-- View existing retention policies
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';
 
-- Drop a retention policy
SELECT remove_retention_policy('sensor_data');
 
-- COMPRESSION POLICY: Compress chunks older than threshold
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
 
-- Reorder to optimize compression (optional, can improve ratios)
SELECT add_reorder_policy('sensor_data', 'sensor_data_sensor_id_time_idx');
 
-- TIERED STORAGE (TimescaleDB Cloud / Enterprise)
-- Move cold data to object storage (S3, GCS)
SELECT add_tiering_policy('sensor_data', INTERVAL '30 days');
 
-- COMPLETE LIFECYCLE EXAMPLE:
-- Days 0-7:   Uncompressed, fast writes, hot storage
-- Days 7-30:  Compressed, warm storage  
-- Days 30-90: Tiered to object storage, cold storage
-- Day 90+:   Dropped automatically
 
-- View chunk information including compression and tiering status
SELECT 
    chunk_table,
    chunk_name,
    range_start,
    range_end,
    pg_size_pretty(chunk_bytes) AS size,
    is_compressed,
    is_tiered
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start DESC
LIMIT 20;

Data Lifecycle Stages
Stage	Age	State	Cost/Performance
Hot	0-7 days	Uncompressed, local SSD	High cost, fastest queries/writes
Warm	7-30 days	Compressed, local disk	Medium cost, fast queries, no writes
Cold	30-90 days	Tiered to object storage	Low cost, slower queries, archival
Expired	90+ days	Automatically dropped	No cost, no access

Retain Aggregates, Drop Raw

Performance Tuning and Best Practices

Maximizing TimescaleDB performance requires understanding how hypertables interact with PostgreSQL's query planner and tuning configurations appropriately.

Key Performance Optimizations

•Always include time predicates: Queries without time filters scan all chunks. Even WHERE time > '1970-01-01' is better than nothing (enables chunk exclusion).
•Match indexes to query patterns: Create composite indexes matching your WHERE and ORDER BY clauses. For WHERE sensor_id = X AND time > Y ORDER BY time, index on (sensor_id, time DESC).
•Tune chunk_time_interval: Chunk size affects exclusion granularity and parallelism. More chunks = better exclusion but more metadata overhead.
•Use compression segmentby wisely: Match segmentby to your most common filter column for best compressed query performance.
•Leverage continuous aggregates: Pre-compute common aggregations. Dashboards should query aggregates, not raw data.
•Configure parallel workers: Set max_parallel_workers_per_gather high enough to parallelize across chunks.
•Monitor chunk count: Thousands of chunks is fine; millions becomes problematic. Tune retention and chunk intervals.

performance_queries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Check if queries are using chunk exclusion
EXPLAIN ANALYZE
SELECT AVG(temperature) FROM sensor_data
WHERE time > NOW() - INTERVAL '1 hour';
 
-- Look for: "Chunks excluded: 157" in the output
-- This means 157 chunks were skipped without scanning!
 
-- View chunk information
SELECT * FROM chunks_detailed_size('sensor_data');
 
-- Monitor compression effectiveness
SELECT hypertable_name,
       pg_size_pretty(before_compression_total_bytes) AS before,
       pg_size_pretty(after_compression_total_bytes) AS after,
       ROUND((1 - after_compression_total_bytes::numeric / 
              before_compression_total_bytes::numeric) * 100, 1) AS ratio
FROM hypertable_compression_stats()
WHERE hypertable_name = 'sensor_data';
 
-- Key PostgreSQL settings for TimescaleDB
ALTER SYSTEM SET shared_buffers = '8GB';              -- 25% of RAM
ALTER SYSTEM SET effective_cache_size = '24GB';       -- 75% of RAM
ALTER SYSTEM SET work_mem = '64MB';                   -- Per-operation sort memory
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;  -- Parallel query workers
ALTER SYSTEM SET random_page_cost = 1.1;              -- For SSDs
SELECT pg_reload_conf();

Common Performance Pitfalls

Queries without time filters scan all chunks—always include time predicates. 2) Too many small chunks cause metadata overhead—balance chunk_time_interval. 3) Missing indexes on non-time filter columns (sensor_id, location) cause full chunk scans. 4) Not using continuous aggregates for dashboard queries.

Summary: TimescaleDB

We've explored TimescaleDB comprehensively—from its PostgreSQL extension architecture through hypertables, compression, and data lifecycle management.

Key Takeaways

•TimescaleDB extends PostgreSQL rather than replacing it — Full SQL compatibility, existing tools work, but with time-series superpowers.
•Hypertables and chunks enable efficient time-partitioning — Transparent to applications, but queries only scan relevant time ranges.
•Native compression achieves 90%+ storage reduction — Columnar format with type-specific encoding for dramatic space savings.
•Continuous aggregates pre-compute common queries — Turn multi-minute aggregations into millisecond lookups.
•Data lifecycle management is built-in — Compression, tiering, and retention policies automate data aging.
•Performance requires time predicates and proper indexing — Chunk exclusion is the key optimization; always filter by time.

When to Choose TimescaleDB:

You want full SQL compatibility and a PostgreSQL ecosystem
You need time-series + relational data in the same database
Your team has PostgreSQL expertise
You want to leverage PostgreSQL extensions (PostGIS, pg_vector)
You need ACID transactions across time-series and relational data
Schema flexibility and complex JOINs are important

What's Next:

TimescaleDB Mastery Achieved