Loading learning content...
Consider a business intelligence dashboard displaying these metrics:
In a normalized schema, every dashboard refresh recomputes these statistics from raw transactional data. As your data grows, these queries become progressively slower. What took milliseconds with 10,000 orders takes seconds with 10 million orders, and minutes with 100 million orders.
Pre-computed aggregates solve this by calculating and storing summary statistics in advance. Instead of scanning millions of rows to answer "what's this month's revenue?", you query a single pre-computed row.
By the end of this page, you will master the design and implementation of pre-computed aggregates. You'll understand granularity hierarchies, incremental vs. full refresh strategies, consistency guarantees, and the architectural patterns that make aggregates maintainable at scale.
A pre-computed aggregate is a stored summary statistic calculated from detailed source data. Unlike derived columns (which compute values for individual rows), aggregates combine data across multiple rows into summary records.
Terminology:
| Term | Definition | Example |
|---|---|---|
| Source data | The detailed transactional records | Individual orders, page views, transactions |
| Aggregate | The computed summary value | SUM, COUNT, AVG, MIN, MAX, etc. |
| Dimension | The grouping key(s) for aggregation | Date, region, category, customer segment |
| Granularity | The level of detail in the aggregate | Daily, weekly, monthly, by-region, by-store |
| Aggregate table | The table storing pre-computed summaries | daily_sales_summary, monthly_revenue_by_region |
How aggregates transform queries:
Common aggregate types:
| Aggregate Function | Use Case | Incremental Maintenance |
|---|---|---|
COUNT | Row counts, visitor counts | Easy: +1 or -1 per change |
SUM | Revenue, quantities, totals | Easy: +value or -value per change |
AVG | Average order value, ratings | Moderate: Store sum and count separately |
MIN / MAX | First/last values, extremes | Hard: May need full recalc on delete |
COUNT DISTINCT | Unique visitors, unique products | Hard: Requires probabilistic structures or full recalc |
PERCENTILE | Median, 95th percentile | Very hard: Typically requires full recalc |
The complexity of maintaining an aggregate dictates whether incremental updates are practical or if periodic full recalculation is required.
Effective aggregate design requires understanding your query patterns and choosing appropriate granularity levels. The goal is to pre-answer the questions your applications actually ask.
Granularity hierarchy example (temporal):
Raw transactions (most granular)
↓ aggregate ↓
Hourly summaries
↓ aggregate ↓
Daily summaries
↓ aggregate ↓
Monthly summaries
↓ aggregate ↓
Yearly summaries (least granular)
Each level in the hierarchy is both a consumer of more granular data and a source for less granular aggregates. This layered approach enables efficient queries at any time scale.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
-- Example: Multi-granularity sales aggregate schema -- Source table: Individual order items (millions of rows)CREATE TABLE order_items ( order_item_id SERIAL PRIMARY KEY, order_id INT NOT NULL, product_id INT NOT NULL, category_id INT NOT NULL, region_id INT NOT NULL, order_date DATE NOT NULL, quantity INT NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, total_amount DECIMAL(12, 2) GENERATED ALWAYS AS (quantity * unit_price) STORED); -- Aggregate Level 1: Daily sales by category and region-- Granularity: Day × Category × Region (tens of thousands of rows)CREATE TABLE daily_sales_summary ( summary_id SERIAL PRIMARY KEY, summary_date DATE NOT NULL, category_id INT NOT NULL, region_id INT NOT NULL, -- Aggregate columns total_revenue DECIMAL(14, 2) NOT NULL DEFAULT 0, total_quantity INT NOT NULL DEFAULT 0, order_count INT NOT NULL DEFAULT 0, -- For AVG calculation: store components separately -- avg_order_value = total_revenue / order_count -- Metadata last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE(summary_date, category_id, region_id)); -- Aggregate Level 2: Monthly sales by category-- Granularity: Month × Category (thousands of rows)CREATE TABLE monthly_category_summary ( summary_id SERIAL PRIMARY KEY, summary_month DATE NOT NULL, -- First day of month category_id INT NOT NULL, total_revenue DECIMAL(16, 2) NOT NULL DEFAULT 0, total_quantity INT NOT NULL DEFAULT 0, order_count INT NOT NULL DEFAULT 0, unique_customers INT NOT NULL DEFAULT 0, -- Requires special handling last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE(summary_month, category_id)); -- Aggregate Level 3: Yearly company-wide totals-- Granularity: Year (tens of rows)CREATE TABLE yearly_company_summary ( summary_id SERIAL PRIMARY KEY, summary_year INT NOT NULL, total_revenue DECIMAL(18, 2) NOT NULL DEFAULT 0, total_orders INT NOT NULL DEFAULT 0, total_customers INT NOT NULL DEFAULT 0, avg_order_value DECIMAL(12, 2) NOT NULL DEFAULT 0, last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE(summary_year)); -- Index design for aggregate tablesCREATE INDEX idx_daily_summary_date ON daily_sales_summary(summary_date DESC);CREATE INDEX idx_daily_summary_category ON daily_sales_summary(category_id, summary_date DESC);CREATE INDEX idx_monthly_summary_category ON monthly_category_summary(category_id, summary_month DESC);Before creating aggregate tables, catalog your actual query patterns. What dimensions are always filtered? What time granularity is needed? What metrics are displayed together? Design aggregates that directly answer these queries without additional computation.
Multi-dimensional aggregates:
Real-world analytics often require drilling down across multiple dimensions. A well-designed aggregate schema supports common combinations:
| Aggregate Table | Dimensions | Query Examples |
|---|---|---|
daily_sales_by_region | date, region | Regional daily trends |
daily_sales_by_category | date, category | Category daily trends |
monthly_sales_by_region_category | month, region, category | Regional category performance |
customer_segment_summary | segment, acquisition_month | Cohort analysis |
product_performance | product, month | Product-level trends |
Each aggregate table trades storage for query speed. The key is identifying which dimension combinations are frequently queried together.
Maintaining aggregate accuracy as source data changes is the central challenge. There are three fundamental approaches, each with distinct trade-offs.
Strategy 1: Full Refresh (Complete Recalculation)
Periodically recalculate aggregates from scratch by querying source data.
12345678910111213141516171819202122232425262728293031
-- Full refresh: Recalculate daily summary from source data-- Typically run as a scheduled job (e.g., nightly, hourly) -- Step 1: Calculate new aggregatesCREATE TEMP TABLE new_daily_summary ASSELECT order_date AS summary_date, category_id, region_id, SUM(total_amount) AS total_revenue, SUM(quantity) AS total_quantity, COUNT(DISTINCT order_id) AS order_countFROM order_itemsWHERE order_date >= CURRENT_DATE - INTERVAL '7 days' -- Scope to recent dataGROUP BY order_date, category_id, region_id; -- Step 2: Merge into aggregate table (upsert pattern)INSERT INTO daily_sales_summary (summary_date, category_id, region_id, total_revenue, total_quantity, order_count, last_updated)SELECT summary_date, category_id, region_id, total_revenue, total_quantity, order_count, CURRENT_TIMESTAMPFROM new_daily_summaryON CONFLICT (summary_date, category_id, region_id)DO UPDATE SET total_revenue = EXCLUDED.total_revenue, total_quantity = EXCLUDED.total_quantity, order_count = EXCLUDED.order_count, last_updated = CURRENT_TIMESTAMP; -- Step 3: CleanupDROP TABLE new_daily_summary;| Pros | Cons |
|---|---|
| Simple to implement and debug | Expensive for large datasets |
| Guaranteed accuracy (no drift) | Must scan all source data |
| Handles all aggregate types (including MIN/MAX) | Stale data between refreshes |
| No triggers or complex maintenance code | Resource-intensive during refresh |
Strategy 2: Incremental Update (Real-time Maintenance)
Update aggregates immediately when source data changes, using triggers or application logic.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
-- Incremental update: Maintain aggregates via triggers CREATE OR REPLACE FUNCTION maintain_daily_sales_aggregate()RETURNS TRIGGER AS $$DECLARE affected_date DATE; affected_category INT; affected_region INT; delta_revenue DECIMAL(12, 2); delta_quantity INT; delta_orders INT;BEGIN -- Determine affected dimensions and deltas based on operation IF TG_OP = 'INSERT' THEN affected_date := NEW.order_date; affected_category := NEW.category_id; affected_region := NEW.region_id; delta_revenue := NEW.quantity * NEW.unit_price; delta_quantity := NEW.quantity; delta_orders := 1; -- Simplified; see note on unique order counting ELSIF TG_OP = 'DELETE' THEN affected_date := OLD.order_date; affected_category := OLD.category_id; affected_region := OLD.region_id; delta_revenue := -(OLD.quantity * OLD.unit_price); delta_quantity := -OLD.quantity; delta_orders := -1; ELSIF TG_OP = 'UPDATE' THEN -- Handle updates by processing as delete + insert if dimensions changed IF OLD.order_date != NEW.order_date OR OLD.category_id != NEW.category_id OR OLD.region_id != NEW.region_id THEN -- Decrement old aggregate INSERT INTO daily_sales_summary (summary_date, category_id, region_id, total_revenue, total_quantity, order_count) VALUES (OLD.order_date, OLD.category_id, OLD.region_id, -(OLD.quantity * OLD.unit_price), -OLD.quantity, -1) ON CONFLICT (summary_date, category_id, region_id) DO UPDATE SET total_revenue = daily_sales_summary.total_revenue + EXCLUDED.total_revenue, total_quantity = daily_sales_summary.total_quantity + EXCLUDED.total_quantity, order_count = daily_sales_summary.order_count + EXCLUDED.order_count, last_updated = CURRENT_TIMESTAMP; END IF; affected_date := NEW.order_date; affected_category := NEW.category_id; affected_region := NEW.region_id; delta_revenue := (NEW.quantity * NEW.unit_price) - (OLD.quantity * OLD.unit_price); delta_quantity := NEW.quantity - OLD.quantity; delta_orders := 0; -- Same order, different values END IF; -- Apply delta to aggregate table (upsert with increment) INSERT INTO daily_sales_summary (summary_date, category_id, region_id, total_revenue, total_quantity, order_count, last_updated) VALUES (affected_date, affected_category, affected_region, delta_revenue, delta_quantity, delta_orders, CURRENT_TIMESTAMP) ON CONFLICT (summary_date, category_id, region_id) DO UPDATE SET total_revenue = daily_sales_summary.total_revenue + EXCLUDED.total_revenue, total_quantity = daily_sales_summary.total_quantity + EXCLUDED.total_quantity, order_count = daily_sales_summary.order_count + EXCLUDED.order_count, last_updated = CURRENT_TIMESTAMP; RETURN COALESCE(NEW, OLD);END;$$ LANGUAGE plpgsql; CREATE TRIGGER order_items_aggregate_triggerAFTER INSERT OR UPDATE OR DELETE ON order_itemsFOR EACH ROW EXECUTE FUNCTION maintain_daily_sales_aggregate();| Pros | Cons |
|---|---|
| Real-time aggregate accuracy | Complex trigger logic |
| No periodic refresh overhead | Potential for drift over time |
| Consistent query latency | Adds latency to every write |
| No stale data window | Difficult for MIN/MAX/COUNT DISTINCT |
Strategy 3: Hybrid Approach (Incremental + Periodic Reconciliation)
The most robust approach combines real-time incremental updates with periodic full reconciliation to detect and correct drift.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
-- Hybrid approach: Incremental updates + periodic reconciliation -- 1. Use triggers for real-time incremental updates (as shown above)-- 2. Run periodic reconciliation job to detect and fix drift -- Reconciliation job: Compare computed vs. stored aggregatesCREATE OR REPLACE FUNCTION reconcile_daily_sales_aggregates( p_start_date DATE DEFAULT CURRENT_DATE - INTERVAL '7 days', p_end_date DATE DEFAULT CURRENT_DATE)RETURNS TABLE ( summary_date DATE, category_id INT, region_id INT, stored_revenue DECIMAL(14,2), computed_revenue DECIMAL(14,2), discrepancy DECIMAL(14,2), corrected BOOLEAN) AS $$BEGIN RETURN QUERY WITH computed AS ( SELECT oi.order_date AS summary_date, oi.category_id, oi.region_id, COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS computed_revenue, COALESCE(SUM(oi.quantity), 0) AS computed_quantity, COUNT(DISTINCT oi.order_id) AS computed_orders FROM order_items oi WHERE oi.order_date BETWEEN p_start_date AND p_end_date GROUP BY oi.order_date, oi.category_id, oi.region_id ), comparison AS ( SELECT COALESCE(c.summary_date, s.summary_date) AS summary_date, COALESCE(c.category_id, s.category_id) AS category_id, COALESCE(c.region_id, s.region_id) AS region_id, COALESCE(s.total_revenue, 0) AS stored_revenue, COALESCE(c.computed_revenue, 0) AS computed_revenue, COALESCE(c.computed_revenue, 0) - COALESCE(s.total_revenue, 0) AS discrepancy, COALESCE(c.computed_quantity, 0) AS computed_quantity, COALESCE(c.computed_orders, 0) AS computed_orders FROM computed c FULL OUTER JOIN daily_sales_summary s ON c.summary_date = s.summary_date AND c.category_id = s.category_id AND c.region_id = s.region_id WHERE s.summary_date BETWEEN p_start_date AND p_end_date OR c.summary_date BETWEEN p_start_date AND p_end_date ) -- Auto-fix discrepancies UPDATE daily_sales_summary dst SET total_revenue = cmp.computed_revenue, total_quantity = cmp.computed_quantity, order_count = cmp.computed_orders, last_updated = CURRENT_TIMESTAMP FROM comparison cmp WHERE dst.summary_date = cmp.summary_date AND dst.category_id = cmp.category_id AND dst.region_id = cmp.region_id AND ABS(cmp.discrepancy) > 0.01 -- Tolerance for floating-point RETURNING dst.summary_date, dst.category_id, dst.region_id, cmp.stored_revenue, cmp.computed_revenue, cmp.discrepancy, true;END;$$ LANGUAGE plpgsql; -- Schedule reconciliation to run during off-peak hours-- Example: Run nightly at 3 AM via pg_cron or external scheduler-- SELECT reconcile_daily_sales_aggregates();If your application requires real-time accurate aggregates (financial systems, inventory counts), use incremental with frequent reconciliation. If slight staleness is acceptable (analytics dashboards, reporting), periodic full refresh is simpler and less error-prone.
Not all aggregates are equally maintainable. While SUM and COUNT can be updated incrementally with simple delta arithmetic, other aggregates require special treatment.
COUNT DISTINCT (Unique Counts)
Counting unique values (unique customers, unique products, etc.) is challenging because:
Solutions for COUNT DISTINCT:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
-- Strategy 1: Store component elements, not just the count-- Maintains a set of unique values, count is derived CREATE TABLE monthly_unique_customers ( summary_month DATE NOT NULL, category_id INT NOT NULL, customer_id INT NOT NULL, -- Store each unique customer first_order_date DATE, last_order_date DATE, order_count INT DEFAULT 1, PRIMARY KEY (summary_month, category_id, customer_id)); -- Count is derived by counting rowsSELECT summary_month, category_id, COUNT(*) AS unique_customersFROM monthly_unique_customersGROUP BY summary_month, category_id; -- Strategy 2: HyperLogLog for approximate counts (PostgreSQL extension)-- Trades perfect accuracy for dramatically reduced storage and computation CREATE EXTENSION IF NOT EXISTS hll; CREATE TABLE monthly_unique_visitors_hll ( summary_month DATE PRIMARY KEY, unique_visitors hll -- HyperLogLog sketch); -- Insert/update by adding to HLL sketchINSERT INTO monthly_unique_visitors_hll (summary_month, unique_visitors)VALUES (DATE_TRUNC('month', CURRENT_DATE), hll_empty())ON CONFLICT (summary_month) DO NOTHING; UPDATE monthly_unique_visitors_hllSET unique_visitors = hll_add(unique_visitors, hll_hash_text('user_12345'))WHERE summary_month = DATE_TRUNC('month', CURRENT_DATE); -- Query approximate count (typically within 2% accuracy)SELECT summary_month, hll_cardinality(unique_visitors) AS approx_unique_visitorsFROM monthly_unique_visitors_hll; -- Strategy 3: Periodic recalculation (simplest, most accurate)-- Accept staleness, recalculate nightly UPDATE monthly_category_summary mcsSET unique_customers = ( SELECT COUNT(DISTINCT oi.customer_id) FROM order_items oi JOIN orders o ON oi.order_id = o.order_id WHERE DATE_TRUNC('month', o.order_date) = mcs.summary_month AND oi.category_id = mcs.category_id);MIN / MAX Aggregates
Minimum and maximum values are difficult to maintain incrementally because:
Solutions for MIN/MAX:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
-- Strategy 1: Store top-N values, not just the single min/max-- Deleting the current max still leaves other candidates CREATE TABLE daily_top_orders ( summary_date DATE NOT NULL, rank INT NOT NULL CHECK (rank BETWEEN 1 AND 10), order_id INT NOT NULL, order_total DECIMAL(12, 2) NOT NULL, PRIMARY KEY (summary_date, rank)); -- When a new order arrives, check if it enters top-10-- When an order is deleted, promote next candidate -- Strategy 2: Accept eventual consistency for min/max-- Store current min/max, mark as "potentially stale" on delete CREATE TABLE daily_sales_extremes ( summary_date DATE PRIMARY KEY, max_order_id INT, max_order_total DECIMAL(12, 2), min_order_id INT, min_order_total DECIMAL(12, 2), needs_recalc BOOLEAN DEFAULT FALSE -- Flag when max/min deleted); -- Trigger on delete: flag for recalculationCREATE OR REPLACE FUNCTION flag_extreme_recalc()RETURNS TRIGGER AS $$BEGIN UPDATE daily_sales_extremes SET needs_recalc = TRUE WHERE summary_date = OLD.order_date AND (max_order_id = OLD.order_id OR min_order_id = OLD.order_id); RETURN OLD;END;$$ LANGUAGE plpgsql; -- Background job recalculates flagged rowsUPDATE daily_sales_extremes dseSET max_order_id = sub.max_order_id, max_order_total = sub.max_total, min_order_id = sub.min_order_id, min_order_total = sub.min_total, needs_recalc = FALSEFROM ( SELECT order_date, MAX(order_id) FILTER (WHERE order_total = MAX(order_total) OVER ()) AS max_order_id, MAX(order_total) AS max_total, MIN(order_id) FILTER (WHERE order_total = MIN(order_total) OVER ()) AS min_order_id, MIN(order_total) AS min_total FROM orders WHERE order_date IN (SELECT summary_date FROM daily_sales_extremes WHERE needs_recalc) GROUP BY order_date) subWHERE dse.summary_date = sub.order_date;Different aggregate types may require different maintenance strategies in the same system. A dashboard might need real-time SUM/COUNT but accept nightly-recalculated COUNT DISTINCT and MIN/MAX. Design your aggregate tables to accommodate mixed refresh cadences.
A well-designed aggregate hierarchy enables efficient querying at any granularity level. Roll-up combines finer-grained aggregates into coarser ones. Drill-down queries more granular aggregates (or source data) for details.
Hierarchical aggregate example:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
-- Level 0: Source data (most granular)-- order_items table (millions of rows) -- Level 1: Hourly aggregatesCREATE TABLE hourly_sales ( summary_hour TIMESTAMP NOT NULL, -- truncated to hour region_id INT NOT NULL, category_id INT NOT NULL, revenue DECIMAL(14, 2) NOT NULL DEFAULT 0, units_sold INT NOT NULL DEFAULT 0, order_count INT NOT NULL DEFAULT 0, PRIMARY KEY (summary_hour, region_id, category_id)); -- Level 2: Daily aggregates (built from hourly)CREATE TABLE daily_sales ( summary_date DATE NOT NULL, region_id INT NOT NULL, category_id INT NOT NULL, revenue DECIMAL(14, 2) NOT NULL DEFAULT 0, units_sold INT NOT NULL DEFAULT 0, order_count INT NOT NULL DEFAULT 0, PRIMARY KEY (summary_date, region_id, category_id)); -- Level 3: Monthly aggregates (built from daily)CREATE TABLE monthly_sales ( summary_month DATE NOT NULL, -- first of month region_id INT NOT NULL, category_id INT NOT NULL, revenue DECIMAL(16, 2) NOT NULL DEFAULT 0, units_sold INT NOT NULL DEFAULT 0, order_count INT NOT NULL DEFAULT 0, PRIMARY KEY (summary_month, region_id, category_id)); -- Roll-up function: Aggregate hourly → dailyCREATE OR REPLACE FUNCTION rollup_hourly_to_daily(p_date DATE)RETURNS void AS $$BEGIN INSERT INTO daily_sales (summary_date, region_id, category_id, revenue, units_sold, order_count) SELECT p_date, region_id, category_id, SUM(revenue), SUM(units_sold), SUM(order_count) FROM hourly_sales WHERE summary_hour >= p_date AND summary_hour < p_date + INTERVAL '1 day' GROUP BY region_id, category_id ON CONFLICT (summary_date, region_id, category_id) DO UPDATE SET revenue = EXCLUDED.revenue, units_sold = EXCLUDED.units_sold, order_count = EXCLUDED.order_count;END;$$ LANGUAGE plpgsql; -- Roll-up function: Aggregate daily → monthlyCREATE OR REPLACE FUNCTION rollup_daily_to_monthly(p_month DATE)RETURNS void AS $$BEGIN INSERT INTO monthly_sales (summary_month, region_id, category_id, revenue, units_sold, order_count) SELECT DATE_TRUNC('month', p_month), region_id, category_id, SUM(revenue), SUM(units_sold), SUM(order_count) FROM daily_sales WHERE summary_date >= DATE_TRUNC('month', p_month) AND summary_date < DATE_TRUNC('month', p_month) + INTERVAL '1 month' GROUP BY region_id, category_id ON CONFLICT (summary_month, region_id, category_id) DO UPDATE SET revenue = EXCLUDED.revenue, units_sold = EXCLUDED.units_sold, order_count = EXCLUDED.order_count;END;$$ LANGUAGE plpgsql; -- Query patterns at different granularities: -- Dashboard: Show today's hourly trend (queries hourly aggregates)SELECT summary_hour, SUM(revenue) AS total_revenueFROM hourly_salesWHERE summary_hour >= CURRENT_DATE AND region_id = 1GROUP BY summary_hour ORDER BY summary_hour; -- Report: Show this month's daily trend (queries daily aggregates)SELECT summary_date, SUM(revenue) AS daily_revenueFROM daily_salesWHERE summary_date >= DATE_TRUNC('month', CURRENT_DATE)GROUP BY summary_date ORDER BY summary_date; -- Exec summary: Year-over-year monthly comparison (queries monthly aggregates)SELECT summary_month, SUM(revenue) AS monthly_revenueFROM monthly_salesWHERE summary_month >= CURRENT_DATE - INTERVAL '2 years'GROUP BY summary_month ORDER BY summary_month;Benefits of hierarchical aggregates:
Query efficiency at all scales — Yearly trend queries don't scan billions of source rows
Incremental maintenance — Each level only needs to process from the level below it
Selective drill-down — Start at coarse level, drill into finer granularity only when needed
Data retention tiers — Keep hourly data for 30 days, daily for 2 years, monthly forever
Parallel processing — Different aggregate levels can be maintained by different jobs/servers
Temporal: Second → Minute → Hour → Day → Week → Month → Quarter → Year. Organizational: Store → District → Region → Country → Global. Product: SKU → Product → Subcategory → Category → Department. Design your hierarchy to match how your business naturally analyzes data.
Most modern database systems support materialized views—stored query results that can be periodically refreshed. Materialized views provide a declarative way to create pre-computed aggregates without manual maintenance code.
Advantages of materialized views:
Disadvantages:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
-- PostgreSQL: Materialized Views for Pre-computed Aggregates -- Create materialized view for daily sales summaryCREATE MATERIALIZED VIEW mv_daily_sales_summary ASSELECT DATE(o.order_date) AS summary_date, oi.category_id, oi.region_id, SUM(oi.quantity * oi.unit_price) AS total_revenue, SUM(oi.quantity) AS total_quantity, COUNT(DISTINCT o.order_id) AS order_count, COUNT(DISTINCT o.customer_id) AS unique_customers, AVG(oi.quantity * oi.unit_price) AS avg_line_item_valueFROM orders oJOIN order_items oi ON o.order_id = oi.order_idGROUP BY DATE(o.order_date), oi.category_id, oi.region_idWITH NO DATA; -- Create structure only, don't populate yet -- Create indexes on materialized view (after population)CREATE UNIQUE INDEX idx_mv_daily_sales_pk ON mv_daily_sales_summary(summary_date, category_id, region_id);CREATE INDEX idx_mv_daily_sales_date ON mv_daily_sales_summary(summary_date DESC);CREATE INDEX idx_mv_daily_sales_category ON mv_daily_sales_summary(category_id, summary_date DESC); -- Initial populationREFRESH MATERIALIZED VIEW mv_daily_sales_summary; -- Subsequent refreshes (exclusive lock - blocks queries during refresh)REFRESH MATERIALIZED VIEW mv_daily_sales_summary; -- Concurrent refresh (no lock - queries can continue, requires unique index)REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_sales_summary; -- Query the materialized view exactly like a regular tableSELECT summary_date, SUM(total_revenue) AS daily_totalFROM mv_daily_sales_summaryWHERE category_id = 5 AND summary_date >= CURRENT_DATE - 30GROUP BY summary_dateORDER BY summary_date; -- Monthly rollup materialized view (aggregates the daily view)CREATE MATERIALIZED VIEW mv_monthly_sales_summary ASSELECT DATE_TRUNC('month', summary_date) AS summary_month, category_id, region_id, SUM(total_revenue) AS total_revenue, SUM(total_quantity) AS total_quantity, SUM(order_count) AS order_count -- Note: Cannot directly sum unique_customers (would overcount)FROM mv_daily_sales_summaryGROUP BY DATE_TRUNC('month', summary_date), category_id, region_id; -- Scheduled refresh (using pg_cron extension)CREATE EXTENSION IF NOT EXISTS pg_cron; -- Refresh daily MV every hourSELECT cron.schedule( 'refresh-daily-sales', '0 * * * *', -- Every hour on the hour 'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_sales_summary'); -- Refresh monthly MV once per daySELECT cron.schedule( 'refresh-monthly-sales', '0 2 * * *', -- 2 AM daily 'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_sales_summary');REFRESH MATERIALIZED VIEW CONCURRENTLY requires a unique index on the materialized view. Without a unique index, you must use blocking refresh, which prevents queries during the refresh operation. Plan your materialized view indexes carefully.
Comparing Materialized Views vs. Manual Aggregate Tables:
| Aspect | Materialized Views | Manual Aggregate Tables |
|---|---|---|
| Definition | Declarative SQL query | Explicit schema + maintenance code |
| Refresh | Single REFRESH command | Custom update logic (triggers/jobs) |
| Incremental update | Limited (database-dependent) | Full control over delta logic |
| Query complexity | Direct SELECT from source | Must match aggregate schema |
| Index control | Full index control | Full index control |
| Cross-table refresh | Automatic via view query | Must coordinate manually |
| Production maturity | Varies by database | Battle-tested patterns available |
Pre-computed aggregates are among the most impactful denormalization techniques for analytical workloads. When designed and maintained correctly, they transform query performance from seconds to milliseconds.
What's Next:
Pre-computed aggregates address analytical query performance. But many transactional queries are slowed by joins—retrieving related data from normalized tables. The next page explores duplicating foreign key data: storing copies of related data in the same table to eliminate join operations.
You now understand pre-computed aggregates as a denormalization technique. You can design aggregate schemas, implement maintenance strategies, handle complex aggregate types, and build hierarchical aggregate structures. Next, we explore duplicating foreign key data for join elimination.