Loading content...
Time-series data grows relentlessly. A modest deployment collecting metrics every 10 seconds across 1,000 servers generates 8.64 million data points per day—over 3 billion per year. At scale, organizations generate trillions of data points monthly. Without intelligent management, storage costs explode, query performance degrades, and operational complexity becomes unmanageable.
The fundamental insight of time-series data management is that not all data is equally valuable over time. Second-level precision from two years ago rarely matters—hourly or daily summaries suffice for trend analysis. Yesterday's server CPU spike is critical for debugging; the same data from 2019 is merely historical context. Retention policies formalize this insight, automatically managing data lifecycle from high-resolution recent data through progressively summarized archives to eventual expiration.
By the end of this page, you will understand: (1) The principles of time-series data aging and value decay, (2) Retention policy patterns across different TSDBs, (3) Downsampling strategies for long-term storage efficiency, (4) Multi-tier retention architectures, (5) Cost optimization through intelligent retention, and (6) Regulatory and compliance considerations affecting retention decisions.
Understanding retention policies begins with recognizing that time-series data follows a predictable value decay curve. The utility of high-resolution data decreases over time, while storage costs remain constant. This asymmetry drives retention strategy.
12345678910111213141516171819202122232425262728293031323334353637
TIME-SERIES DATA VALUE OVER TIME================================ VALUE ▲ │ █████ █████ HIGH VALUE █████ - Real-time alerting █████ - Incident debugging █████ - Anomaly detection █████████ █████████ MEDIUM VALUE █████████ - Trend analysis █████████ - Capacity planning █████████████ █████████████ LOW VALUE ████████████████ - Historical context ███████████████████ - Compliance/audit █████████████████████████ - Long-term patterns ██████████████████████████████████ ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬► TIME Minutes Hours Days Weeks Months Years WHAT THIS MEANS FOR STORAGE: Age: 0-24h → Full resolution (10s intervals) → Critical for opsAge: 1-7d → High resolution (1m intervals) → Debugging, trends Age: 7-30d → Medium resolution (5m intervals) → Capacity planningAge: 30-90d → Hourly aggregates → Long-term patternsAge: 90d-1y → Daily aggregates → Historical contextAge: 1y+ → Monthly/quarterly summaries → Compliance, archives AS DATA AGES:✓ Reduce resolution (downsample)✓ Increase compression (colder storage)✓ Move to cheaper storage tiers✓ Eventually delete (if not required for compliance)Think of retention not as 'how long to keep data' but as 'at what resolution to keep data over time.' You might keep raw data for 30 days, 5-minute aggregates for 6 months, and daily aggregates forever. The data never disappears—it just loses precision as it ages.
Different time-series databases implement retention policies with varying levels of sophistication. Understanding these patterns helps in selecting and configuring retention strategies.
| Database | Retention Mechanism | Key Features |
|---|---|---|
| InfluxDB | Retention policies per bucket | Define duration per bucket; older data automatically dropped |
| TimescaleDB | Chunk-based drop_chunks() | Drop entire time-partitioned chunks older than threshold |
| Prometheus | Global retention flag | Single --storage.tsdb.retention.time for entire instance |
| VictoriaMetrics | -retentionPeriod flag | Global retention with optional per-tenant overrides |
| QuestDB | Partition-level TTL | Drop partitions older than specified period |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
-- ============================================================-- INFLUXDB: Bucket-level retention (configured at bucket creation)-- ============================================================ -- InfluxDB CLI: Create bucket with 30-day retention# influx bucket create --name raw-metrics --retention 30d -- InfluxDB CLI: Update retention on existing bucket # influx bucket update --id <bucket-id> --retention 90d -- InfluxDB Flux: Check bucket retentionbuckets() |> filter(fn: (r) => r.name == "raw-metrics") -- ============================================================-- TIMESCALEDB: Flexible chunk-based retention policies-- ============================================================ -- Add automated retention policySELECT add_retention_policy('sensor_data', INTERVAL '90 days'); -- View existing retention policiesSELECT * FROM timescaledb_information.jobsWHERE proc_name = 'policy_retention'; -- Manual chunk dropping (one-time cleanup)SELECT drop_chunks('sensor_data', INTERVAL '90 days'); -- Drop specific chunks older than timestampSELECT drop_chunks('sensor_data', older_than => '2024-01-01'::timestamptz); -- Different retention for different tablesSELECT add_retention_policy('raw_metrics', INTERVAL '7 days');SELECT add_retention_policy('hourly_metrics', INTERVAL '1 year');SELECT add_retention_policy('daily_metrics', INTERVAL '10 years'); -- Remove retention policySELECT remove_retention_policy('sensor_data'); -- ============================================================-- PROMETHEUS: Global retention at startup-- ============================================================ # prometheus --storage.tsdb.retention.time=90d# prometheus --storage.tsdb.retention.size=100GB # Size-based -- ============================================================-- QUESTDB: Partition-level TTL-- ============================================================ -- Set retention at table creationCREATE TABLE metrics ( ts TIMESTAMP, host SYMBOL, value DOUBLE) TIMESTAMP(ts) PARTITION BY DAYWITH maxUncommittedRows=100000, o3MaxLag=5mTTL 90d; -- Data older than 90 days automatically dropped -- Modify retention on existing tableALTER TABLE metrics SET PARAM TTL = 30d;In partition-aware TSDBs (TimescaleDB, QuestDB, InfluxDB), retention is O(1)—dropping a partition is simply metadata deletion. In contrast, row-level DELETE in traditional databases is O(n)—each row must be marked deleted and later vacuumed. This is why TSDBs partition by time: efficient retention is a first-class requirement.
Downsampling (also called rollup or aggregation) reduces data resolution by computing summary statistics over time windows. Instead of deleting old data entirely, downsampling preserves essential patterns at reduced storage cost.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
DOWNSAMPLING PROCESS==================== RAW DATA (10-second resolution):┌────────────────────────────────────────────────────────────┐│ 10:00:00 → 45.2 10:00:10 → 45.8 10:00:20 → 46.1 ││ 10:00:30 → 45.9 10:00:40 → 46.3 10:00:50 → 46.0 ││ 10:01:00 → 45.7 10:01:10 → 45.5 10:01:20 → 45.3 ... ││ ││ 8,640 points/day × 365 days = 3,153,600 points/year │└────────────────────────────────────────────────────────────┘ │ │ Downsample to 5-minute │ Keep: avg, min, max, count ▼5-MINUTE AGGREGATES:┌────────────────────────────────────────────────────────────┐│ 10:00:00 → avg=45.72, min=45.2, max=46.3, count=30 ││ 10:05:00 → avg=44.88, min=43.9, max=45.8, count=30 ││ 10:10:00 → avg=45.11, min=44.2, max=46.0, count=30 ││ ││ 288 points/day × 365 days = 105,120 points/year ││ 97% storage reduction! │└────────────────────────────────────────────────────────────┘ │ │ Downsample to hourly │ Keep: avg, min, max ▼HOURLY AGGREGATES:┌────────────────────────────────────────────────────────────┐│ 10:00:00 → avg=45.33, min=43.1, max=47.2 ││ 11:00:00 → avg=44.92, min=42.8, max=46.9 ││ ││ 24 points/day × 365 days = 8,760 points/year ││ 99.7% storage reduction from raw! │└────────────────────────────────────────────────────────────┘ AGGREGATION FUNCTIONS TO PRESERVE:- AVERAGE: Overall level- MIN/MAX: Outliers, peak values- COUNT: Sample size for weighted re-aggregation- SUM: For counters (total events)- PERCENTILES: Distribution shape (P50, P95, P99)- FIRST/LAST: Opening/closing values (financial data)1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
-- ============================================================-- TIMESCALEDB: Continuous Aggregates for automatic downsampling-- ============================================================ -- Create hourly rollup (automatically maintained)CREATE MATERIALIZED VIEW hourly_metricsWITH (timescaledb.continuous) ASSELECT time_bucket('1 hour', time) AS bucket, sensor_id, AVG(value) AS avg_value, MIN(value) AS min_value, MAX(value) AS max_value, COUNT(*) AS sample_countFROM raw_metricsGROUP BY bucket, sensor_idWITH NO DATA; -- Add refresh policySELECT add_continuous_aggregate_policy('hourly_metrics', start_offset => INTERVAL '4 hours', end_offset => INTERVAL '1 hour', schedule_interval => INTERVAL '1 hour'); -- Create daily rollup FROM hourly (hierarchical aggregates)CREATE MATERIALIZED VIEW daily_metricsWITH (timescaledb.continuous) ASSELECT time_bucket('1 day', bucket) AS day, sensor_id, AVG(avg_value) AS avg_value, MIN(min_value) AS min_value, MAX(max_value) AS max_value, SUM(sample_count) AS total_samplesFROM hourly_metricsGROUP BY day, sensor_id; -- Set different retention for each tierSELECT add_retention_policy('raw_metrics', INTERVAL '7 days');SELECT add_retention_policy('hourly_metrics', INTERVAL '6 months');SELECT add_retention_policy('daily_metrics', INTERVAL '5 years'); -- ============================================================-- INFLUXDB: Continuous Queries / Tasks for downsampling-- ============================================================ // Flux task for hourly downsamplingoption task = { name: "downsample_hourly", every: 1h} from(bucket: "raw-metrics") |> range(start: -2h, stop: -1h) |> filter(fn: (r) => r["_measurement"] == "cpu") |> aggregateWindow(every: 1h, fn: mean, createEmpty: false) |> to(bucket: "hourly-metrics", org: "myorg") -- ============================================================-- PROMETHEUS: Recording Rules for aggregation-- ============================================================ # prometheus.yml recording rulesgroups: - name: downsampling interval: 1m rules: - record: job:http_requests:rate5m expr: sum(rate(http_requests_total[5m])) by (job) - record: job:http_request_duration:p99:5m expr: histogram_quantile(0.99, sum(rate(http_request_duration_bucket[5m])) by (job, le))Be careful what you aggregate: 1) Averages of averages are wrong for unequal sample sizes—weight by count. 2) Percentiles can't be re-aggregated—store histograms or t-digests instead. 3) Counters need rate() before aggregation—sum of counters is meaningless. 4) Don't downsample too aggressively—5-minute is usually fine; 1-minute might lose important patterns.
Production deployments typically implement multi-tier retention combining different storage technologies, resolutions, and access patterns. This architecture optimizes for both cost and query performance.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
MULTI-TIER TIME-SERIES ARCHITECTURE==================================== TIER 0: REAL-TIME (In-Memory Buffer)├── Age: 0 - 15 minutes├── Resolution: Full (every data point)├── Storage: RAM / Write-ahead log├── Purpose: Real-time dashboards, alerting├── Latency: < 10ms query└── Cost: $$$$ (RAM is expensive) ↓ Flush to disk TIER 1: HOT STORAGE (Local SSD)├── Age: 0 - 7 days├── Resolution: Full (10s, 30s, 1m intervals)├── Storage: NVMe SSD, uncompressed chunks├── Purpose: Operational debugging, incident response├── Latency: < 100ms query└── Cost: $$$ (Fast SSD) ↓ Compress aging chunks TIER 2: WARM STORAGE (Local Disk, Compressed)├── Age: 7 - 30 days├── Resolution: Full, but compressed├── Storage: SSD or HDD, compressed chunks├── Purpose: Recent trend analysis, capacity planning├── Latency: < 1s query└── Cost: $$ (Compression reduces space 90%) ↓ Tier to object storage / downsample TIER 3: COLD STORAGE (Object Storage)├── Age: 30 days - 1 year├── Resolution: 5-minute or hourly aggregates├── Storage: S3 / GCS / Azure Blob (Parquet files)├── Purpose: Historical analysis, compliance├── Latency: 5-30s query└── Cost: $ (Object storage is cheap) ↓ Further aggregate TIER 4: ARCHIVE (Long-term Object Storage)├── Age: 1+ years├── Resolution: Daily or weekly aggregates├── Storage: S3 Glacier / Cold storage classes├── Purpose: Compliance, audit, long-term trends├── Latency: Minutes to hours (restore required)└── Cost: ¢ (Archive storage is very cheap) ↓ Retention policy TIER 5: DELETION├── Age: Beyond retention period├── Purpose: Compliance with data retention laws└── Action: Automatic, policy-driven deletion| Data Type | Hot (Raw) | Warm (Compressed) | Cold (Aggregated) | Archive |
|---|---|---|---|---|
| Infrastructure Metrics | 7 days @ 10s | 30 days @ 10s | 1 year @ 5m | 5 years @ 1h |
| Application Traces | 2 days @ full | 7 days (sampled) | 30 days (errors only) | — |
| Business Metrics | 30 days @ 1m | 1 year @ 5m | 5 years @ 1h | Forever @ 1d |
| Security Logs | 90 days @ full | 1 year @ full | 7 years @ full | — |
| IoT Sensor Data | 24h @ 1s | 7 days @ 1m | 90 days @ 5m | 2 years @ 1h |
Modern TSDBs handle tier transitions automatically. TimescaleDB's retention and compression policies run on schedule. InfluxDB Cloud automatically tiers data. The key is defining the policies correctly upfront—then the system manages transitions without manual intervention.
Intelligent retention policies can reduce storage costs by 90-99% compared to naive 'keep everything forever' approaches. Let's quantify the impact.
12345678910111213141516171819202122232425262728293031323334353637383940414243
COST ANALYSIS: 1 YEAR OF METRICS DATA====================================== ASSUMPTIONS:- 10,000 servers monitored- 100 metrics per server- 10-second collection interval- 16 bytes per raw data point (timestamp + value + overhead) RAW DATA GENERATION:- Points per day = 10K servers × 100 metrics × 8640 intervals = 8.64B points- Storage per day = 8.64B × 16 bytes = 138 GB- Storage per year = 138 GB × 365 = 50.4 TB SCENARIO 1: No Retention (Keep Everything Raw)├── Storage required: 50.4 TB├── Cloud storage cost (SSD): $100/TB/month├── Annual cost: 50.4 TB × $100 × 12 = $60,480└── Query performance: DEGRADING (scanning years of data) SCENARIO 2: Simple Retention (Delete after 30 days)├── Storage required: 138 GB × 30 = 4.14 TB├── Annual cost: 4.14 TB × $100 × 12 = $4,968├── Savings: 92%└── Limitation: No historical analysis possible SCENARIO 3: Smart Tiered Retention├── Hot (30 days raw): 4.14 TB × $100/TB = $414/month├── Warm (6 months @ 5min): 0.23 TB × $30/TB = $7/month├── Cold (2 years @ hourly): 0.02 TB × $5/TB = $0.10/month├── Monthly cost: $421├── Annual cost: $5,052├── Savings: 92% with full historical access!└── Query performance: OPTIMAL (appropriate tier for each query) COST BREAKDOWN BY TIER:┌─────────────────────────────────────────────────────────────────┐│ Raw hot data: 4.14 TB × $100 = $5,040/year (85.2%) ││ Compressed warm: 0.23 TB × $30 = $83/year (1.4%) ││ Aggregated cold: 0.02 TB × $5 = $1/year (0.02%) ││ Query & compute: — = $800/year (13.4%) │└─────────────────────────────────────────────────────────────────┘Total annual cost: ~$5,924 vs $60,480 for raw everything = 90% savingsStorage isn't the only cost. More data means: slower queries (degraded user experience), higher backup costs, longer disaster recovery, more complexity in compliance audits, and increased infrastructure management overhead. Lean retention policies have compounding benefits beyond raw storage savings.
Retention policies must balance operational optimization with regulatory requirements. Different industries and jurisdictions mandate specific retention periods for different data types.
| Regulation | Data Type | Minimum Retention | Notes |
|---|---|---|---|
| SOX (US) | Financial records, audit logs | 7 years | Applies to public companies |
| HIPAA (US) | Healthcare records, access logs | 6 years | From date of creation or last use |
| GDPR (EU) | Personal data processing logs | As long as necessary | Must also support deletion rights |
| PCI-DSS | Cardholder transaction logs | 1 year (accessible) | 3 months readily accessible |
| SEC Rule 17a-4 | Financial communications | 3-6 years | Broker-dealer specific |
| FINRA | Trading records | 6 years | Securities industry |
| MiFID II (EU) | Trading records | 5 years | Can be extended to 7 years |
Compliance creates a tension: you must delete data per privacy laws (GDPR right to erasure) while retaining data per financial regulations (SOX). Resolution requires careful data classification: personal data in one bucket with short retention, financial records in another with long retention, and clear boundaries between them.
Implementing robust retention policies requires more than just setting a duration. Here are proven patterns for production-grade retention management.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
-- ============================================================-- PATTERN 1: Hierarchical Retention with Continuous Aggregates-- ============================================================ -- Raw data (hot tier): 7 daysCREATE TABLE sensor_data ( time TIMESTAMPTZ NOT NULL, device_id TEXT, metric TEXT, value DOUBLE PRECISION);SELECT create_hypertable('sensor_data', 'time');SELECT add_retention_policy('sensor_data', INTERVAL '7 days');SELECT add_compression_policy('sensor_data', INTERVAL '1 day'); -- 5-minute aggregates (warm tier): 90 daysCREATE MATERIALIZED VIEW sensor_5minWITH (timescaledb.continuous) ASSELECT time_bucket('5 minutes', time) AS bucket, device_id, metric, AVG(value) AS avg, MIN(value) AS min, MAX(value) AS maxFROM sensor_dataGROUP BY bucket, device_id, metric;SELECT add_retention_policy('sensor_5min', INTERVAL '90 days'); -- Hourly aggregates (cold tier): 2 yearsCREATE MATERIALIZED VIEW sensor_hourlyWITH (timescaledb.continuous) ASSELECT time_bucket('1 hour', bucket) AS bucket, device_id, metric, AVG(avg) AS avg, MIN(min) AS min, MAX(max) AS maxFROM sensor_5minGROUP BY 1, 2, 3;SELECT add_retention_policy('sensor_hourly', INTERVAL '2 years'); -- Daily aggregates (archive tier): "forever" (10 years)CREATE MATERIALIZED VIEW sensor_dailyWITH (timescaledb.continuous) ASSELECT time_bucket('1 day', bucket) AS bucket, device_id, metric, AVG(avg) AS avg, MIN(min) AS min, MAX(max) AS maxFROM sensor_hourlyGROUP BY 1, 2, 3;SELECT add_retention_policy('sensor_daily', INTERVAL '10 years'); -- ============================================================-- PATTERN 2: Metadata-Driven Retention Configuration-- ============================================================ -- Store retention configurations in a meta tableCREATE TABLE retention_config ( table_name TEXT PRIMARY KEY, retention_days INTEGER NOT NULL, compression_after_days INTEGER, tier_to_s3_after_days INTEGER, compliance_requirement TEXT, owner TEXT, last_reviewed TIMESTAMPTZ); INSERT INTO retention_config VALUES('application_logs', 30, 3, 14, 'INTERNAL', 'platform-team', NOW()),('security_audit_logs', 2555, 7, 30, 'SOX', 'security-team', NOW()),('user_activity', 90, 7, 30, 'GDPR', 'privacy-team', NOW()),('system_metrics', 7, 1, NULL, 'INTERNAL', 'sre-team', NOW()); -- Apply retention policies from configDO $$DECLARE config RECORD;BEGIN FOR config IN SELECT * FROM retention_config LOOP EXECUTE format( 'SELECT add_retention_policy(%L, INTERVAL ''%s days'')', config.table_name, config.retention_days ); END LOOP;END $$; -- ============================================================-- PATTERN 3: Logical Retention with Soft Deletes-- ============================================================ -- For data that might need undeletion (compliance holds)ALTER TABLE audit_logs ADD COLUMN _deleted_at TIMESTAMPTZ;ALTER TABLE audit_logs ADD COLUMN _hold_until TIMESTAMPTZ; -- Create view that filters deleted recordsCREATE VIEW active_audit_logs ASSELECT * FROM audit_logs WHERE _deleted_at IS NULL OR _hold_until > NOW(); -- Soft delete function that respects legal holdsCREATE FUNCTION soft_delete_old_records(table_name TEXT, days INTEGER)RETURNS INTEGER AS $$DECLARE count INTEGER;BEGIN EXECUTE format( 'UPDATE %I SET _deleted_at = NOW() WHERE time < NOW() - INTERVAL ''%s days'' AND _deleted_at IS NULL AND (_hold_until IS NULL OR _hold_until < NOW())', table_name, days ); GET DIAGNOSTICS count = ROW_COUNT; RETURN count;END;$$ LANGUAGE plpgsql;We've explored the principles and practices of time-series data retention—from understanding value decay through implementing multi-tier architectures.
What's Next:
We'll explore real-world time-series database use cases in depth—examining how organizations deploy TSDBs for IoT, infrastructure monitoring, financial analytics, and observability. We'll see retention policies in action across different domains and scale requirements.
You now understand the principles of time-series data lifecycle management, can design multi-tier retention architectures, calculate cost impacts, and implement retention policies that balance operational needs with compliance requirements. You're equipped to make informed decisions about data aging and expiration.