Loading content...
The database landscape presents developers with an uncomfortable choice: adopt a purpose-built time-series database and abandon familiar SQL tooling, or shoehorn time-series data into PostgreSQL and suffer performance penalties. TimescaleDB eliminates this tradeoff by extending PostgreSQL with native time-series capabilities.
Launched in 2017, TimescaleDB takes a radically different approach from purpose-built TSDBs like InfluxDB. Rather than creating an entirely new database, TimescaleDB is a PostgreSQL extension that adds transparent time-partitioning, columnar compression, and time-series-specific query optimizations—all while preserving full SQL compatibility. Your existing PostgreSQL tools, ORMs, drivers, and expertise work unchanged. You get the performance of a TSDB with the richness of PostgreSQL.
By the end of this page, you will understand: (1) TimescaleDB's architecture and how hypertables work, (2) Automatic time-partitioning (chunking) and its benefits, (3) Columnar compression achieving 90%+ storage reduction, (4) Time-series-specific SQL functions and continuous aggregates, (5) When TimescaleDB excels versus alternatives, and (6) Migration paths from vanilla PostgreSQL.
TimescaleDB's fundamental insight is that PostgreSQL is already an excellent database—it just needs help with the specific characteristics of time-series workloads. Rather than reinventing storage engines, query planners, and replication, TimescaleDB leverages PostgreSQL's mature foundation and adds targeted enhancements.
What TimescaleDB Adds to PostgreSQL:
time_bucket(), locf() (last observation carried forward), interpolate(), and gap-filling operations.If you already have time-series data in PostgreSQL, migration to TimescaleDB can be as simple as: (1) Install the extension, (2) Create a new hypertable, (3) INSERT INTO new_table SELECT * FROM old_table. Your existing queries work unchanged but now run on optimized infrastructure.
The hypertable is TimescaleDB's fundamental abstraction. A hypertable looks and acts like a regular PostgreSQL table but internally consists of many smaller chunks, each containing data for a specific time range.
123456789101112131415161718192021222324252627282930313233343536373839
HYPERTABLE ARCHITECTURE======================= ┌─────────────────────────────────────────┐ │ HYPERTABLE: metrics │ │ (Virtual/Logical Table) │ │ │ │ SELECT * FROM metrics │ │ WHERE time > NOW() - INTERVAL '1 day' │ └───────────────────┬─────────────────────┘ │ │ Query Planning: │ "Only need chunks for last 24h" │ ┌───────────────────────────────┼────────────────────────────────┐ │ │ │ ▼ ▼ ▼┌───────────────┐ ┌───────────────┐ ┌───────────────┐│ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 ││ (Jan 1-7) │ │ (Jan 8-14) │ ... │ (Today) ││ │ │ │ │ ││ Regular PG │ │ Regular PG │ │ Regular PG ││ table with │ │ table with │ │ table with ││ indexes │ │ indexes │ │ indexes ││ │ │ │ │ ││ ○ SKIPPED │ │ ○ SKIPPED │ │ ● SCANNED ││ (not in range)│ │ (not in range)│ │ (in range!) │└───────────────┘ └───────────────┘ └───────────────┘ CHUNK EXCLUSION:- Query planner knows each chunk's time range- Chunks outside WHERE clause are excluded before scan- Only touches relevant chunks → orders of magnitude faster EACH CHUNK IS:- A real PostgreSQL table (with _timescaledb_internal schema)- Has its own indexes (inherited from hypertable)- Can be compressed independently- Can be dropped instantly (for retention)1234567891011121314151617181920212223242526272829303132333435363738394041
-- Step 1: Create a regular PostgreSQL tableCREATE TABLE sensor_data ( time TIMESTAMPTZ NOT NULL, sensor_id INTEGER NOT NULL, location TEXT NOT NULL, temperature DOUBLE PRECISION, humidity DOUBLE PRECISION, pressure DOUBLE PRECISION); -- Step 2: Convert to hypertable with automatic time-partitioning-- This is the key TimescaleDB operation!SELECT create_hypertable('sensor_data', 'time'); -- Or with custom chunk interval (default is 7 days)SELECT create_hypertable( 'sensor_data', 'time', chunk_time_interval => INTERVAL '1 day'); -- Step 3: Add indexes (inherited by all chunks automatically)CREATE INDEX ON sensor_data (sensor_id, time DESC);CREATE INDEX ON sensor_data (location, time DESC); -- From this point, INSERT/SELECT work exactly like regular tables-- But internally, data is partitioned across chunks by time -- Insert data (goes to appropriate chunk automatically)INSERT INTO sensor_data VALUES (NOW(), 1, 'warehouse-a', 22.5, 45.0, 1013.25), (NOW(), 2, 'warehouse-b', 24.1, 42.0, 1012.80); -- Query with time filter (only relevant chunks scanned)SELECT time_bucket('1 hour', time) AS hour, sensor_id, AVG(temperature) AS avg_tempFROM sensor_dataWHERE time > NOW() - INTERVAL '24 hours'GROUP BY hour, sensor_idORDER BY hour DESC;Why Chunking Matters:
DROP TABLE operation. No slow DELETE + VACUUM cycles.Chunk interval affects performance. Too small = too many chunks = metadata overhead. Too large = chunks too big = less exclusion benefit. Rule of thumb: aim for chunks that fit in 25% of available memory. For 64GB RAM, target ~16GB uncompressed chunk size. Adjust chunk_time_interval based on your data ingestion rate.
TimescaleDB's native compression transforms chunks from PostgreSQL's row-based heap format into a columnar layout with aggressive type-specific compression. This achieves 90-95% storage reduction while keeping data fully queryable.
12345678910111213141516171819202122232425262728
-- Enable compression on a hypertableALTER TABLE sensor_data SET ( timescaledb.compress, timescaledb.compress_segmentby = 'sensor_id', -- Separate segments per sensor timescaledb.compress_orderby = 'time DESC' -- Order within segments); -- Add compression policy: compress chunks older than 7 daysSELECT add_compression_policy('sensor_data', INTERVAL '7 days'); -- Or compress specific chunks manuallySELECT compress_chunk('_timescaledb_internal._hyper_1_10_chunk'); -- View compression statisticsSELECT chunk_name, pg_size_pretty(before_compression_total_bytes) AS before, pg_size_pretty(after_compression_total_bytes) AS after, ROUND((1 - after_compression_total_bytes::numeric / before_compression_total_bytes::numeric) * 100, 1) AS compression_ratioFROM chunk_compression_stats('sensor_data')ORDER BY chunk_name; -- Example output:-- chunk_name | before | after | compression_ratio-- _hyper_1_1_chunk | 1200 MB | 84 MB | 93.0-- _hyper_1_2_chunk | 1180 MB | 78 MB | 93.4-- _hyper_1_3_chunk | 1210 MB | 82 MB | 93.2How Compression Works:
1234567891011121314151617181920212223242526272829303132333435
COMPRESSION TRANSFORMATION========================== BEFORE COMPRESSION (Row-based PostgreSQL heap):┌──────────────────────────────────────────────────────────────────────┐│ Row 1: time=2024-01-15T10:00:00 | sensor_id=1 | temp=22.5 | hum=45.0 ││ Row 2: time=2024-01-15T10:00:01 | sensor_id=1 | temp=22.6 | hum=45.1 ││ Row 3: time=2024-01-15T10:00:02 | sensor_id=1 | temp=22.5 | hum=45.0 ││ ... (millions of rows) │└──────────────────────────────────────────────────────────────────────┘ AFTER COMPRESSION (Columnar segments):┌─────────────────────────────────────────────────────────────────────┐│ Segment: sensor_id=1 ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ time column: [t1, t2, t3, ...] → Delta-of-delta + LZ4 │ ││ │ Regular intervals compress extremely well │ ││ │ 1000 timestamps → ~100 bytes │ ││ └────────────────────────────────────────────────────────────────┘ ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ temperature column: [22.5, 22.6, 22.5, ...] → Gorilla + LZ4 │ ││ │ Similar floats XOR to values with many zeros │ ││ │ 1000 floats → ~400 bytes │ ││ └────────────────────────────────────────────────────────────────┘ ││ ┌────────────────────────────────────────────────────────────────┐ ││ │ humidity column: [45.0, 45.1, 45.0, ...] → Gorilla + LZ4 │ ││ └────────────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────────┘ COMPRESSION ALGORITHMS BY TYPE:- Timestamps: Delta-of-delta encoding (regular intervals → tiny deltas)- Floats: Gorilla compression (XOR encoding for similar values) - Integers: Delta + Simple8B bit-packing- Strings: Dictionary encoding + LZ4- All types: Final LZ4 pass for additional compressionCompressed chunks are append-only. Updates and deletes require decompressing the chunk first. This matches time-series workloads (append-mostly) but is important for hybrid use cases. Also, query performance on compressed chunks may be slightly slower than uncompressed—but the storage savings often outweigh this for cold data.
TimescaleDB extends PostgreSQL with time-series-specific functions that simplify common analytical patterns. These functions are optimized for the chunked storage model.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
-- TIME_BUCKET: Group data into time intervals-- The most-used TimescaleDB functionSELECT time_bucket('5 minutes', time) AS bucket, sensor_id, AVG(temperature) AS avg_temp, MAX(temperature) AS max_temp, MIN(temperature) AS min_tempFROM sensor_dataWHERE time > NOW() - INTERVAL '1 hour'GROUP BY bucket, sensor_idORDER BY bucket, sensor_id; -- TIME_BUCKET with origin (align buckets to specific time)SELECT time_bucket('1 hour', time, TIMESTAMP '2024-01-01 00:00:00') AS bucket, COUNT(*) AS readingsFROM sensor_dataGROUP BY bucket; -- GAP FILLING: Generate rows for missing time intervalsSELECT time_bucket_gapfill('1 minute', time) AS minute, sensor_id, AVG(temperature) AS avg_tempFROM sensor_dataWHERE time > NOW() - INTERVAL '1 hour' AND sensor_id = 1GROUP BY minute, sensor_idORDER BY minute; -- LOCF (Last Observation Carried Forward): Fill gaps with previous valueSELECT time_bucket_gapfill('1 minute', time) AS minute, sensor_id, locf(AVG(temperature)) AS temperature -- Carry forward last known valueFROM sensor_dataWHERE time > NOW() - INTERVAL '1 hour' AND sensor_id = 1GROUP BY minute, sensor_id; -- INTERPOLATE: Linear interpolation between known valuesSELECT time_bucket_gapfill('1 minute', time) AS minute, interpolate(AVG(temperature)) AS temperatureFROM sensor_dataWHERE time > NOW() - INTERVAL '1 hour' AND sensor_id = 1GROUP BY minute; -- FIRST/LAST: Get first or last value in each bucketSELECT time_bucket('1 hour', time) AS hour, first(temperature, time) AS opening_temp, last(temperature, time) AS closing_temp, MAX(temperature) AS high, MIN(temperature) AS lowFROM sensor_dataWHERE time > NOW() - INTERVAL '24 hours'GROUP BY hourORDER BY hour; -- DELTA: Calculate difference from previous value (for counters)SELECT time, sensor_id, delta(total_requests) AS requests_in_periodFROM ( SELECT time_bucket('1 minute', time) AS time, sensor_id, last(request_counter, time) AS total_requests FROM http_metrics WHERE time > NOW() - INTERVAL '1 hour' GROUP BY 1, 2) subq; -- HISTOGRAM: Distribution of valuesSELECT histogram(temperature, 20.0, 30.0, 10) AS temp_histogramFROM sensor_dataWHERE time > NOW() - INTERVAL '24 hours';| Function | Purpose | Use Case |
|---|---|---|
time_bucket(interval, time) | Group timestamps into fixed intervals | Downsampling, aggregation windows |
time_bucket_gapfill() | Generate rows for missing intervals | Creating continuous time series |
locf(value) | Last observation carried forward | Fill gaps with previous value |
interpolate(value) | Linear interpolation | Smooth estimation between points |
first(value, time) | First value ordered by time | Opening prices, initial readings |
last(value, time) | Last value ordered by time | Closing prices, final readings |
histogram(value, min, max, buckets) | Value distribution | Latency distributions, percentiles |
approximate_row_count() | Fast table size estimate | Dashboard queries avoiding COUNT(*) |
PostgreSQL's DATE_TRUNC only supports calendar intervals (hour, day, week). time_bucket supports arbitrary intervals (5 minutes, 15 minutes, 4 hours) and handles timezone-aware bucketing correctly. Always prefer time_bucket for time-series work.
Continuous aggregates are materialized views that automatically refresh as new data arrives. They pre-compute aggregations, making queries over long time ranges return instantly instead of scanning billions of rows.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
-- Create a continuous aggregate for hourly temperature summariesCREATE MATERIALIZED VIEW hourly_temperatureWITH (timescaledb.continuous) ASSELECT time_bucket('1 hour', time) AS hour, sensor_id, location, AVG(temperature) AS avg_temp, MAX(temperature) AS max_temp, MIN(temperature) AS min_temp, COUNT(*) AS reading_countFROM sensor_dataGROUP BY hour, sensor_id, locationWITH NO DATA; -- Don't populate immediately -- Add automatic refresh policySELECT add_continuous_aggregate_policy('hourly_temperature', start_offset => INTERVAL '3 hours', -- Refresh data starting from 3 hours ago end_offset => INTERVAL '1 hour', -- Up to 1 hour ago (allow late data) schedule_interval => INTERVAL '1 hour' -- Run every hour); -- Manually refresh a specific time rangeCALL refresh_continuous_aggregate('hourly_temperature', '2024-01-01', '2024-01-15'); -- Query the continuous aggregate (instant response!)SELECT * FROM hourly_temperatureWHERE hour > NOW() - INTERVAL '30 days' AND location = 'warehouse-a'ORDER BY hour DESC; -- HIERARCHICAL AGGREGATES: Build on top of existing aggregatesCREATE MATERIALIZED VIEW daily_temperatureWITH (timescaledb.continuous) ASSELECT time_bucket('1 day', hour) AS day, sensor_id, location, AVG(avg_temp) AS avg_temp, MAX(max_temp) AS max_temp, MIN(min_temp) AS min_temp, SUM(reading_count) AS total_readingsFROM hourly_temperatureGROUP BY day, sensor_id, locationWITH DATA; -- REAL-TIME AGGREGATES: Combine materialized + recent unmaterialized data-- By default, queries transparently merge materialized data with recent raw data -- View refresh statusSELECT * FROM timescaledb_information.continuous_aggregatesWHERE view_name = 'hourly_temperature'; -- See materialization progressSELECT * FROM timescaledb_information.continuous_aggregate_statsWHERE view_name = 'hourly_temperature';How Continuous Aggregates Work:
123456789101112131415161718192021222324252627282930313233
CONTINUOUS AGGREGATE ARCHITECTURE================================== RAW DATA CONTINUOUS AGGREGATE (hypertable) (materialized view) ┌──────────────┐ ┌──────────────────────────┐│ Raw chunk 1 │ Refresh │ Materialized result ││ Raw chunk 2 │ ───────────► │ (pre-computed hourly ││ Raw chunk 3 │ Policy │ aggregates) ││ ... (100s) │ │ │└──────────────┘ └──────────────────────────┘ │ │ │ │ ▼ ▼┌──────────────┐ ┌──────────────────────────┐│ Recent data │◄── UNION ────►│ Query over 30 days ││ (not yet │ │ Instant! Pre-computed + ││ materialized)│ │ recent data merged │└──────────────┘ └──────────────────────────┘ REFRESH WINDOW:│◄────── start_offset ──────►│◄─ end_offset ─►│ NOW │ │Old data already Refresh zone Too recentmaterialized (update this) (wait for late data) REAL-TIME AGGREGATES:- Queries automatically combine: 1. Materialized data (fast, pre-computed) 2. Recent raw data (live, computed on query)- Seamless to application—looks like one tableWithout continuous aggregates, a query like 'average temperature per hour for the last year' would scan billions of raw data points. With continuous aggregates, it reads ~8,760 pre-computed hourly rows. Query time drops from minutes to milliseconds. Storage of aggregates is typically <1% of raw data.
Time-series data naturally ages—second-level granularity becomes less valuable over time. TimescaleDB provides comprehensive data lifecycle management through retention policies, compression policies, and tiered storage.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- RETENTION POLICY: Automatically drop old chunks-- This is the most efficient way to delete time-series data -- Drop chunks older than 90 daysSELECT add_retention_policy('sensor_data', INTERVAL '90 days'); -- For continuous aggregates, keep summaries longer than raw dataSELECT add_retention_policy('hourly_temperature', INTERVAL '2 years');SELECT add_retention_policy('daily_temperature', INTERVAL '10 years'); -- View existing retention policiesSELECT * FROM timescaledb_information.jobsWHERE proc_name = 'policy_retention'; -- Drop a retention policySELECT remove_retention_policy('sensor_data'); -- COMPRESSION POLICY: Compress chunks older than thresholdSELECT add_compression_policy('sensor_data', INTERVAL '7 days'); -- Reorder to optimize compression (optional, can improve ratios)SELECT add_reorder_policy('sensor_data', 'sensor_data_sensor_id_time_idx'); -- TIERED STORAGE (TimescaleDB Cloud / Enterprise)-- Move cold data to object storage (S3, GCS)SELECT add_tiering_policy('sensor_data', INTERVAL '30 days'); -- COMPLETE LIFECYCLE EXAMPLE:-- Days 0-7: Uncompressed, fast writes, hot storage-- Days 7-30: Compressed, warm storage -- Days 30-90: Tiered to object storage, cold storage-- Day 90+: Dropped automatically -- View chunk information including compression and tiering statusSELECT chunk_table, chunk_name, range_start, range_end, pg_size_pretty(chunk_bytes) AS size, is_compressed, is_tieredFROM timescaledb_information.chunksWHERE hypertable_name = 'sensor_data'ORDER BY range_start DESCLIMIT 20;| Stage | Age | State | Cost/Performance |
|---|---|---|---|
| Hot | 0-7 days | Uncompressed, local SSD | High cost, fastest queries/writes |
| Warm | 7-30 days | Compressed, local disk | Medium cost, fast queries, no writes |
| Cold | 30-90 days | Tiered to object storage | Low cost, slower queries, archival |
| Expired | 90+ days | Automatically dropped | No cost, no access |
A powerful pattern: keep raw data for 30 days (operational debugging), hourly aggregates for 2 years (trend analysis), daily aggregates forever (historical views). This provides the best of both worlds—recent detail and long-term insights—with minimal storage cost.
Maximizing TimescaleDB performance requires understanding how hypertables interact with PostgreSQL's query planner and tuning configurations appropriately.
WHERE time > '1970-01-01' is better than nothing (enables chunk exclusion).WHERE sensor_id = X AND time > Y ORDER BY time, index on (sensor_id, time DESC).max_parallel_workers_per_gather high enough to parallelize across chunks.123456789101112131415161718192021222324252627
-- Check if queries are using chunk exclusionEXPLAIN ANALYZESELECT AVG(temperature) FROM sensor_dataWHERE time > NOW() - INTERVAL '1 hour'; -- Look for: "Chunks excluded: 157" in the output-- This means 157 chunks were skipped without scanning! -- View chunk informationSELECT * FROM chunks_detailed_size('sensor_data'); -- Monitor compression effectivenessSELECT hypertable_name, pg_size_pretty(before_compression_total_bytes) AS before, pg_size_pretty(after_compression_total_bytes) AS after, ROUND((1 - after_compression_total_bytes::numeric / before_compression_total_bytes::numeric) * 100, 1) AS ratioFROM hypertable_compression_stats()WHERE hypertable_name = 'sensor_data'; -- Key PostgreSQL settings for TimescaleDBALTER SYSTEM SET shared_buffers = '8GB'; -- 25% of RAMALTER SYSTEM SET effective_cache_size = '24GB'; -- 75% of RAMALTER SYSTEM SET work_mem = '64MB'; -- Per-operation sort memoryALTER SYSTEM SET max_parallel_workers_per_gather = 4; -- Parallel query workersALTER SYSTEM SET random_page_cost = 1.1; -- For SSDsSELECT pg_reload_conf();We've explored TimescaleDB comprehensively—from its PostgreSQL extension architecture through hypertables, compression, and data lifecycle management.
When to Choose TimescaleDB:
What's Next:
We'll explore retention policies in depth—understanding how TSDBs manage data lifecycle, from high-resolution recent data through progressively downsampled archives to eventual expiration. This is crucial for cost management at scale.
You now understand TimescaleDB's architecture, can design efficient hypertable schemas, implement compression and continuous aggregates, and configure data lifecycle policies. You're equipped to evaluate when TimescaleDB is the right choice and deploy it effectively.