Loading learning content...
We live in a world that generates timestamps at an extraordinary rate. Every IoT sensor reading, every metric from your infrastructure, every financial transaction, every user interaction—each carries a timestamp that positions it in the continuous flow of time.
The scale is staggering:
This data isn't just large—it has unique characteristics that traditional databases struggle to handle. Time-series data follows distinct patterns: high write volume, append-mostly semantics, time-based queries, aggregation needs, and eventual expiration.
Column-family stores have emerged as a leading solution for time-series workloads. Their architecture—write-optimized LSM trees, time-based compaction strategies, and tunable consistency—aligns remarkably well with time-series requirements.
This page explores the unique characteristics of time-series data, how to model time-series in column-family stores, bucketing and partitioning strategies, aggregation and downsampling techniques, TTL and data lifecycle management, and real-world patterns from IoT, monitoring, and analytics domains.
Before diving into schema design, we must understand what makes time-series data fundamentally different from transactional data.
Time-series data is a sequence of data points ordered by time. Each data point typically consists of:
| Characteristic | Description | Implication |
|---|---|---|
| Time-Ordered | Data arrives roughly in chronological order | Append-optimized storage works well |
| Write-Heavy | Inserts vastly outnumber updates | LSM trees > B-trees |
| Immutable | Historical data rarely changes | No update-in-place needed |
| Expiring | Old data loses value over time | TTL and archival essential |
| Query by Time Range | "Last hour", "Today vs Yesterday" | Clustering by time |
| Aggregation-Focused | Averages, sums, percentiles over windows | Pre-aggregation helps |
1. High-Volume Ingestion
Thousands of sources writing simultaneously. Writes must be:
2. Range Queries
Most queries are time-bounded:
3. Aggregation Queries
Raw data is often too granular:
4. Cardinality Concerns
The number of unique time series can explode:
Time-series and event sourcing both involve timestamped data, but differ in intent. Time-series measures state at points in time (sensor readings). Event sourcing captures state transitions (order placed, payment received). Column-family stores can support both, but modeling differs.
The fundamental challenge in time-series schema design is balancing partition size against query efficiency. Let's explore the evolution of time-series schemas.
The naive approach is partitioning by identifier only:
1234567891011121314
-- DON'T DO THIS: Partition grows foreverCREATE TABLE sensor_readings_bad ( sensor_id UUID, reading_time TIMESTAMP, temperature DECIMAL, humidity DECIMAL, PRIMARY KEY (sensor_id, reading_time)) WITH CLUSTERING ORDER BY (reading_time DESC); -- Problems:-- 1. Partition size grows unbounded (100 readings/sec × 86400 sec/day × 365 days = 3B rows!)-- 2. SSTable compaction becomes expensive-- 3. Read queries may scan huge partitions-- 4. Single node hotspot for high-volume sensorsTime bucketing adds a time component to the partition key, limiting partition size:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
-- Pattern 1: Daily Buckets-- Good for: Moderate volume, day-level queries commonCREATE TABLE sensor_readings_daily ( sensor_id UUID, reading_date DATE, -- Bucket: one partition per day per sensor reading_time TIMESTAMP, temperature DECIMAL, humidity DECIMAL, PRIMARY KEY ((sensor_id, reading_date), reading_time)) WITH CLUSTERING ORDER BY (reading_time DESC); -- InsertINSERT INTO sensor_readings_daily (sensor_id, reading_date, reading_time, temperature, humidity)VALUES (?, toDate(now()), now(), 23.5, 65.2); -- Query today's readingsSELECT * FROM sensor_readings_dailyWHERE sensor_id = ? AND reading_date = toDate(now()); -- Query time range within todaySELECT * FROM sensor_readings_dailyWHERE sensor_id = ? AND reading_date = toDate(now())AND reading_time >= '2024-01-15T10:00:00'AND reading_time <= '2024-01-15T11:00:00'; -- ------------------------------------------------------------- Pattern 2: Hourly Buckets-- Good for: High volume, hour-level queries commonCREATE TABLE metrics_hourly ( host_id UUID, metric_name TEXT, hour TIMESTAMP, -- Truncated to hour: 2024-01-15T10:00:00 event_time TIMESTAMP, value DOUBLE, PRIMARY KEY ((host_id, metric_name, hour), event_time)) WITH CLUSTERING ORDER BY (event_time DESC); -- Helper function in application to compute hour bucket:-- hour = timestamp.truncatedTo(HOURS) -- ------------------------------------------------------------- Pattern 3: Reverse Time UUID (Natural Ordering)-- Good for: When you always query recent data firstCREATE TABLE events_by_source ( source_id UUID, day DATE, event_id TIMEUUID, -- Time-ordered UUID, latest first event_type TEXT, payload TEXT, PRIMARY KEY ((source_id, day), event_id)) WITH CLUSTERING ORDER BY (event_id DESC); -- TIMEUUID embeds timestamp, providing natural ordering-- Queries get most recent events without explicit timestamp filteringThe optimal bucket size depends on your workload:
| Factor | Smaller Buckets (Hour) | Larger Buckets (Day) |
|---|---|---|
| Write volume per source | Higher tolerance | Lower tolerance |
| Query range | Usually small | Often spans days |
| Number of partitions | Many more | Fewer |
| Partition size | Smaller, easier to compact | Larger |
| Query efficiency | May need multiple requests | Single request for day |
Good Rule of Thumb:
(writes_per_second × seconds_in_bucket) × row_size < 100MBFor variable-volume sources, consider dynamic bucket sizing. Calculate the bucket in your application based on expected volume: high-volume sensors use hourly buckets, low-volume sensors use daily. Include bucket type in partition key for clarity.
High-volume time-series ingestion requires careful optimization at every layer.
Sending individual writes for each data point is inefficient. Batch related writes together:
123456789101112131415
-- UNLOGGED batch: Group writes to reduce round trips-- Use when: Multiple data points for same/different partitionsBEGIN UNLOGGED BATCH INSERT INTO metrics (host, hour, ts, cpu) VALUES (?, ?, ?, ?); INSERT INTO metrics (host, hour, ts, cpu) VALUES (?, ?, ?, ?); INSERT INTO metrics (host, hour, ts, cpu) VALUES (?, ?, ?, ?); -- ... up to 50-100 statementsAPPLY BATCH; -- LOGGED batch: Atomic across statements (same partition only)-- Use when: Write must be atomic (rare for time-series)BEGIN BATCH INSERT INTO sensor_data (sensor, day, ts, temp) VALUES (?, ?, ?, ?); UPDATE sensor_status SET last_reading = ? WHERE sensor_id = ?;APPLY BATCH;Don't wait for acknowledgment of each write. Use async patterns:
Java Driver Example:
List<CompletionStage<AsyncResultSet>> futures = new ArrayList<>();
for (DataPoint point : batch) {
BoundStatement statement = preparedInsert.bind(
point.getSensorId(),
point.getDate(),
point.getTimestamp(),
point.getValue()
);
futures.add(session.executeAsync(statement));
}
// Wait for all to complete
CompletableFuture.allOf(
futures.stream()
.map(CompletionStage::toCompletableFuture)
.toArray(CompletableFuture[]::new)
).join();
For time-series, lower consistency often acceptable:
| Consistency | Use Case |
|---|---|
ONE | Best throughput; data may be lost if node fails before replication |
LOCAL_ONE | Same as ONE but local datacenter only |
LOCAL_QUORUM | Good balance; survives one replica failure |
QUORUM | Higher durability; cross-datacenter latency |
Always use prepared statements for time-series ingestion:
// Prepare once, reuse many times
PreparedStatement prepared = session.prepare(
"INSERT INTO metrics (host, hour, ts, value) VALUES (?, ?, ?, ?)"
);
// Execute thousands of times
for (DataPoint p : points) {
session.executeAsync(prepared.bind(p.host, p.hour, p.ts, p.value));
}
Benefits:
When sensors start simultaneously (power restoration, deployment), they create write spikes. Implement jitter: each sensor waits a random delay (0-30 seconds) before first write. This spreads load and prevents coordinator overload.
Time-series queries follow predictable patterns. Optimizing for these patterns dramatically improves performance.
1. Recent Data ("Last N minutes")
12345678910111213141516171819202122
-- Query last hour of data for a sensor-- Efficient: Single partition, clustering order matches query -- Table clustered DESC (most recent first)CREATE TABLE sensor_readings ( sensor_id UUID, day DATE, reading_time TIMESTAMP, value DECIMAL, PRIMARY KEY ((sensor_id, day), reading_time)) WITH CLUSTERING ORDER BY (reading_time DESC); -- Query recent data with limitSELECT * FROM sensor_readingsWHERE sensor_id = ? AND day = toDate(now())LIMIT 100; -- Most recent 100 readings -- Query specific time rangeSELECT * FROM sensor_readingsWHERE sensor_id = ? AND day = toDate(now())AND reading_time >= toTimestamp(now()) - 1h -- Last hourORDER BY reading_time DESC;2. Multi-Day Range Queries
When queries span multiple time buckets, you must query multiple partitions:
1234567891011121314151617181920212223
-- Querying 7 days of data requires 7 partition queries-- Option 1: Multiple sequential queries (simple but slow)SELECT * FROM sensor_readings WHERE sensor_id = ? AND day = '2024-01-15';SELECT * FROM sensor_readings WHERE sensor_id = ? AND day = '2024-01-14';-- ... (7 queries total) -- Option 2: IN clause (limited to small sets)SELECT * FROM sensor_readingsWHERE sensor_id = ?AND day IN ('2024-01-15', '2024-01-14', '2024-01-13', '2024-01-12', '2024-01-11', '2024-01-10', '2024-01-09'); -- Option 3: Async parallel queries (best performance)-- Application code:-- 1. Generate list of days in range-- 2. Fire async query for each day-- 3. Merge results client-side -- Option 4: Use TOKEN function for full scans (analytics only)-- WARNING: This scans entire cluster - use for batch jobs onlySELECT * FROM sensor_readingsWHERE TOKEN(sensor_id, day) > TOKEN(?, '2024-01-09')AND TOKEN(sensor_id, day) <= TOKEN(?, '2024-01-15');For large result sets, use stateful pagination:
123456789101112131415161718192021222324252627282930
-- First pageSELECT * FROM sensor_readingsWHERE sensor_id = ? AND day = '2024-01-15'LIMIT 100;-- Returns: 100 rows, last row has reading_time = '2024-01-15T10:30:00' -- Next page: Use last timestamp as cursorSELECT * FROM sensor_readingsWHERE sensor_id = ? AND day = '2024-01-15'AND reading_time < '2024-01-15T10:30:00' -- Start after last seenLIMIT 100; -- Alternatively: Use driver's paging state-- Java example:ResultSet rs = session.execute( SimpleStatement.builder("SELECT * FROM sensor_readings WHERE sensor_id = ? AND day = ?") .addPositionalValues(sensorId, day) .setPageSize(100) .build());ByteBuffer pagingState = rs.getExecutionInfo().getPagingState(); // For next page:session.execute( SimpleStatement.builder("SELECT * FROM sensor_readings WHERE sensor_id = ? AND day = ?") .addPositionalValues(sensorId, day) .setPagingState(pagingState) .setPageSize(100) .build());Most time-series queries access recent data. Cassandra's row cache can store hot partitions in memory. For the current time bucket (today's partition), enable row caching to serve frequent queries from RAM instead of disk.
Raw time-series data is often too granular for queries. A sensor reporting every second generates 86,400 points per day—but dashboards typically show hourly or daily summaries.
Maintain multiple tables at different granularities:
1234567891011121314151617181920212223242526272829303132333435363738394041
-- Raw data: Full resolution, short retentionCREATE TABLE metrics_raw ( host_id UUID, metric_name TEXT, hour TIMESTAMP, event_time TIMESTAMP, value DOUBLE, PRIMARY KEY ((host_id, metric_name, hour), event_time)) WITH CLUSTERING ORDER BY (event_time DESC)AND default_time_to_live = 604800; -- 7 days retention -- Hourly aggregates: Pre-computed, medium retentionCREATE TABLE metrics_hourly_agg ( host_id UUID, metric_name TEXT, hour TIMESTAMP, min_value DOUBLE, max_value DOUBLE, avg_value DOUBLE, sum_value DOUBLE, count_value BIGINT, p50_value DOUBLE, p95_value DOUBLE, p99_value DOUBLE, PRIMARY KEY ((host_id, metric_name), hour)) WITH CLUSTERING ORDER BY (hour DESC)AND default_time_to_live = 7776000; -- 90 days retention -- Daily aggregates: Long-term storageCREATE TABLE metrics_daily_agg ( host_id UUID, metric_name TEXT, day DATE, min_value DOUBLE, max_value DOUBLE, avg_value DOUBLE, sum_value DOUBLE, count_value BIGINT, PRIMARY KEY ((host_id, metric_name), day)) WITH CLUSTERING ORDER BY (day DESC);-- No TTL: Keep forever for historical analysisComputing aggregates can be done multiple ways:
1. Write-Time Aggregation
Update aggregate tables on every write:
// On each raw data point:
Batch batch = BatchStatement.builder(BatchType.UNLOGGED)
.addStatement(insertRaw.bind(hostId, metric, hour, time, value))
.addStatement(updateHourlyAgg.bind(value, value, value, value, 1, hostId, metric, hour))
.build();
session.execute(batch);
Pros: Aggregates always current Cons: Write amplification, counter limitations
2. Scheduled Batch Aggregation
Periodically compute aggregates from raw data:
Every hour:
1. Query raw data for previous hour
2. Compute aggregates (min, max, avg, percentiles)
3. Write to aggregate table
4. (Optional) Delete raw data older than retention
Pros: Clean separation, complex aggregates possible Cons: Aggregates lag by batch interval
3. Stream Processing
Use Kafka/Flink/Spark Streaming:
Kafka Topic (raw events)
→ Flink Window Aggregation
→ Kafka Topic (aggregates)
→ Cassandra Sink
Pros: Near-real-time, scalable, backpressure handling Cons: Operational complexity
| Strategy | Latency | Complexity | Accuracy | Best For |
|---|---|---|---|---|
| Write-Time | Real-time | Low | Exact | Simple counters, sums |
| Batch | Minutes-hours | Medium | Exact | Complex aggregates |
| Stream | Seconds | High | Exact | High-volume, low-latency |
Cassandra supports counter columns for write-time aggregation (sums, counts). However, counters have limitations: no TTL, no batch with non-counters, idempotency challenges. For complex aggregates like percentiles, use batch or stream processing.
Time-series data has a natural lifecycle: high value when fresh, diminishing value over time, eventually worthless. Managing this lifecycle efficiently is crucial for storage costs and query performance.
Cassandra TTL (Time To Live) automatically expires data:
123456789101112131415161718192021222324252627
-- Table-level default TTLCREATE TABLE metrics_short_term ( sensor_id UUID, day DATE, reading_time TIMESTAMP, value DECIMAL, PRIMARY KEY ((sensor_id, day), reading_time)) WITH default_time_to_live = 604800; -- 7 days (604800 seconds) -- All inserts without explicit TTL use table defaultINSERT INTO metrics_short_term (sensor_id, day, reading_time, value)VALUES (?, ?, ?, ?); -- Expires in 7 days -- Row-level TTL (overrides table default)INSERT INTO metrics_short_term (sensor_id, day, reading_time, value)VALUES (?, ?, ?, ?)USING TTL 86400; -- This row expires in 1 day -- Column-level TTLUPDATE metrics_short_termUSING TTL 3600 -- 1 hourSET value = ?WHERE sensor_id = ? AND day = ? AND reading_time = ?; -- Check remaining TTLSELECT TTL(value), WRITETIME(value) FROM metrics_short_termWHERE sensor_id = ? AND day = ? AND reading_time = ?;Important: Expired data isn't immediately deleted. It exists as tombstones until compaction. This has implications:
TWCS is designed specifically for time-series with TTL:
12345678910111213141516171819202122
-- Optimal TWCS configuration for time-seriesCREATE TABLE sensor_metrics ( sensor_id UUID, hour TIMESTAMP, reading_time TIMESTAMP, value DECIMAL, PRIMARY KEY ((sensor_id, hour), reading_time)) WITH CLUSTERING ORDER BY (reading_time DESC)AND compaction = { 'class': 'TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': 1}AND default_time_to_live = 604800; -- 7 days -- Rules for TWCS:-- 1. compaction_window_size should align with partition bucket-- (hourly buckets → hourly compaction windows)-- 2. TTL should be much longer than compaction window-- (7 days TTL >> 1 hour window)-- 3. All data in table should have similar TTL-- (don't mix 1-day and 1-year retention)Mixing explicit DELETEs with TTL data breaks TWCS efficiency. Deletes create tombstones that may span time windows, forcing cross-window compaction. For time-series, rely on TTL for data expiration. If you must delete, consider truncating entire tables for old data instead.
Let's examine concrete implementations across different domains.
Requirements:
Scale:
12345678910111213141516171819202122232425262728293031323334353637
-- Raw metricsCREATE TABLE infra_metrics_raw ( host_id UUID, metric_name TEXT, hour TIMESTAMP, -- Hourly buckets collected_at TIMESTAMP, value DOUBLE, tags MAP<TEXT, TEXT>, -- region, env, etc. PRIMARY KEY ((host_id, metric_name, hour), collected_at)) WITH CLUSTERING ORDER BY (collected_at DESC)AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': 1}AND default_time_to_live = 604800; -- Hourly rollups for dashboardsCREATE TABLE infra_metrics_hourly ( host_id UUID, metric_name TEXT, hour TIMESTAMP, min_val DOUBLE, max_val DOUBLE, avg_val DOUBLE, count_val BIGINT, PRIMARY KEY ((host_id, metric_name), hour)) WITH CLUSTERING ORDER BY (hour DESC)AND default_time_to_live = 7776000; -- 90 days -- Alert evaluation: Recent values by metric across all hostsCREATE TABLE metric_by_name ( metric_name TEXT, hour TIMESTAMP, host_id UUID, last_value DOUBLE, last_collected TIMESTAMP, PRIMARY KEY ((metric_name, hour), host_id)) WITH default_time_to_live = 3600; -- 1 hour onlyRequirements:
1234567891011121314151617181920212223242526272829303132333435363738
-- Multi-tenant time-seriesCREATE TABLE iot_readings ( tenant_id UUID, -- Customer isolation sensor_id UUID, day DATE, -- Daily buckets reading_time TIMESTAMP, reading_type TEXT, -- temperature, pressure, etc. value DOUBLE, unit TEXT, quality_score TINYINT, -- Data quality indicator PRIMARY KEY ((tenant_id, sensor_id, day), reading_type, reading_time)) WITH CLUSTERING ORDER BY (reading_type ASC, reading_time DESC)AND default_time_to_live = 2592000; -- 30 days -- Sensor metadata (infrequently changing)CREATE TABLE sensor_metadata ( tenant_id UUID, sensor_id UUID, name TEXT, location TEXT, installed_at TIMESTAMP, reading_types SET<TEXT>, config MAP<TEXT, TEXT>, PRIMARY KEY ((tenant_id), sensor_id)); -- Alert state (current thresholds and violations)CREATE TABLE sensor_alerts ( tenant_id UUID, alert_id UUID, sensor_id UUID, reading_type TEXT, threshold_type TEXT, -- above, below, rate_of_change threshold_value DOUBLE, current_status TEXT, -- ok, warning, critical last_triggered TIMESTAMP, PRIMARY KEY ((tenant_id, sensor_id), alert_id));Requirements:
12345678910111213141516171819202122232425262728293031323334353637383940414243
-- Raw ticks (high precision)CREATE TABLE market_ticks ( symbol TEXT, trading_date DATE, seq_number BIGINT, -- Exchange sequence for ordering tick_time TIMESTAMP, -- Microsecond precision bid_price DECIMAL, ask_price DECIMAL, bid_size INT, ask_size INT, trade_price DECIMAL, trade_size INT, exchange TEXT, PRIMARY KEY ((symbol, trading_date), seq_number)) WITH CLUSTERING ORDER BY (seq_number ASC)AND compression = {'class': 'ZstdCompressor'}; -- Better compression -- OHLC candles (pre-computed at multiple intervals)CREATE TABLE ohlc_candles ( symbol TEXT, interval TEXT, -- '1m', '5m', '1h', '1d' candle_time TIMESTAMP, open_price DECIMAL, high_price DECIMAL, low_price DECIMAL, close_price DECIMAL, volume BIGINT, trade_count INT, vwap DECIMAL, -- Volume-weighted average price PRIMARY KEY ((symbol, interval), candle_time)) WITH CLUSTERING ORDER BY (candle_time DESC); -- Trade tape for replay/backtestingCREATE TABLE trade_tape ( symbol TEXT, year INT, -- Yearly partitions for archival trade_id BIGINT, trade_time TIMESTAMP, price DECIMAL, size INT, side TEXT, -- buy/sell PRIMARY KEY ((symbol, year), trade_id)) WITH CLUSTERING ORDER BY (trade_id ASC);We've comprehensively explored time-series data handling—from characteristics through production-ready schema designs. Let's consolidate the essential insights:
What's Next:
Now that we understand time-series deeply, the final page explores Column-Family Use Cases broadly—examining when column-family databases are the right choice and when alternatives might serve better.
You now possess production-grade knowledge of time-series data modeling in column-family stores. These patterns—time bucketing, multi-resolution aggregation, TWCS compaction, and TTL management—are directly applicable to IoT platforms, monitoring systems, and analytical applications.