Loading learning content...
Consider a retail analytics platform serving executives with these requirements:
With a normalized transactional database containing billions of records, each dashboard refresh could require:
No amount of indexing makes this performant. The solution is summary tables—purpose-built denormalized structures specifically designed to answer analytical questions efficiently.
Summary tables represent the synthesis of all denormalization techniques: they contain derived columns, pre-computed aggregates, duplicated dimension data, and merged attributes, all organized for optimum query performance.
By the end of this page, you will master the design and implementation of summary tables. You'll understand their role in analytics architecture, design principles for different use cases, refresh strategies, and best practices from data warehousing that apply to all database systems.
A summary table (also called an aggregate table, fact table, or reporting table) is a denormalized structure designed to serve a specific set of analytical queries. Unlike operational tables optimized for transactions, summary tables are optimized for read-heavy analytical workloads.
Key characteristics:
| Aspect | Operational Tables | Summary Tables |
|---|---|---|
| Purpose | Record transactions, maintain state | Serve analytics, reporting |
| Optimization | Insert/update performance | Query performance |
| Normalization | Typically normalized (3NF/BCNF) | Heavily denormalized |
| Data freshness | Real-time | May be delayed (minutes to days) |
| Row size | Narrow, compact | Wide, comprehensive |
| Query patterns | Transaction lookup, single-entity updates | Aggregations, trends, comparisons |
Anatomy of a summary table:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
-- Anatomy of a well-designed summary table-- Example: Daily Sales Summary for retail analytics CREATE TABLE daily_sales_summary ( -- ========================================= -- DIMENSION KEYS (What we're measuring by) -- ========================================= summary_date DATE NOT NULL, store_id INT NOT NULL, product_category_id INT NOT NULL, customer_segment VARCHAR(20) NOT NULL, -- ========================================= -- DENORMALIZED DIMENSION ATTRIBUTES -- (Copied from dimension tables to avoid joins) -- ========================================= store_name VARCHAR(100) NOT NULL, store_region VARCHAR(50) NOT NULL, store_city VARCHAR(50) NOT NULL, category_name VARCHAR(100) NOT NULL, category_group VARCHAR(50) NOT NULL, -- ========================================= -- AGGREGATE MEASURES (Pre-computed metrics) -- ========================================= total_revenue DECIMAL(14, 2) NOT NULL DEFAULT 0, total_quantity INT NOT NULL DEFAULT 0, order_count INT NOT NULL DEFAULT 0, unique_customer_count INT NOT NULL DEFAULT 0, -- For average calculation (store components, not just result) avg_order_value DECIMAL(10, 2) GENERATED ALWAYS AS (CASE WHEN order_count > 0 THEN total_revenue / order_count ELSE 0 END) STORED, -- ========================================= -- DERIVED CALCULATIONS -- ========================================= revenue_per_customer DECIMAL(10, 2) GENERATED ALWAYS AS (CASE WHEN unique_customer_count > 0 THEN total_revenue / unique_customer_count ELSE 0 END) STORED, units_per_order DECIMAL(8, 2) GENERATED ALWAYS AS (CASE WHEN order_count > 0 THEN total_quantity::DECIMAL / order_count ELSE 0 END) STORED, -- ========================================= -- COMPARISON METRICS (Pre-computed for trends) -- ========================================= prev_day_revenue DECIMAL(14, 2), -- Yesterday's revenue (same dimension combo) prev_week_revenue DECIMAL(14, 2), -- Same day last week prev_year_revenue DECIMAL(14, 2), -- Same day last year day_over_day_change DECIMAL(8, 4) GENERATED ALWAYS AS (CASE WHEN prev_day_revenue > 0 THEN (total_revenue - prev_day_revenue) / prev_day_revenue ELSE NULL END) STORED, year_over_year_change DECIMAL(8, 4) GENERATED ALWAYS AS (CASE WHEN prev_year_revenue > 0 THEN (total_revenue - prev_year_revenue) / prev_year_revenue ELSE NULL END) STORED, -- ========================================= -- METADATA -- ========================================= created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, source_record_count INT, -- How many source rows contributed -- Composite primary key on all dimension keys PRIMARY KEY (summary_date, store_id, product_category_id, customer_segment)); -- Indexes for common query patternsCREATE INDEX idx_summary_date_desc ON daily_sales_summary(summary_date DESC);CREATE INDEX idx_summary_store_date ON daily_sales_summary(store_id, summary_date DESC);CREATE INDEX idx_summary_category_date ON daily_sales_summary(product_category_id, summary_date DESC);CREATE INDEX idx_summary_region_date ON daily_sales_summary(store_region, summary_date DESC);Summary tables should be designed starting from the queries they need to answer. List every chart, dashboard widget, and report. Identify dimensions, measures, and comparisons. Then design the table to answer those specific questions with simple SELECTs—no joins, minimal aggregation.
Summary tables draw heavily from dimensional modeling, a design technique developed for data warehousing. Understanding its core concepts helps design effective summary tables.
Key concepts:
Facts are the quantitative measurements you want to analyze:
Dimensions are the perspectives by which you analyze facts:
Grain is the level of detail in each row:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
-- Example: Different grain levels for the same subject area -- FINE GRAIN: Hourly sales by product-- High detail, many rows, supports any drill-downCREATE TABLE hourly_product_sales ( summary_hour TIMESTAMP NOT NULL, product_id INT NOT NULL, store_id INT NOT NULL, quantity_sold INT NOT NULL, revenue DECIMAL(10, 2) NOT NULL, cost DECIMAL(10, 2) NOT NULL, transaction_count INT NOT NULL, -- Denormalized attributes product_name VARCHAR(200) NOT NULL, product_category VARCHAR(100) NOT NULL, store_name VARCHAR(100) NOT NULL, store_region VARCHAR(50) NOT NULL, PRIMARY KEY (summary_hour, product_id, store_id));-- Estimated rows: 24 hours × 365 days × 10,000 products × 500 stores = ~43B rows/year-- Use case: Detailed drill-down, anomaly detection -- MEDIUM GRAIN: Daily sales by category by store-- Balanced detail and performanceCREATE TABLE daily_category_sales ( summary_date DATE NOT NULL, category_id INT NOT NULL, store_id INT NOT NULL, quantity_sold INT NOT NULL, revenue DECIMAL(12, 2) NOT NULL, order_count INT NOT NULL, unique_products_sold INT NOT NULL, -- Denormalized category_name VARCHAR(100) NOT NULL, store_name VARCHAR(100) NOT NULL, store_region VARCHAR(50) NOT NULL, PRIMARY KEY (summary_date, category_id, store_id));-- Estimated rows: 365 days × 100 categories × 500 stores = ~18M rows/year-- Use case: Daily dashboards, store performance -- COARSE GRAIN: Monthly sales by region-- Executive-level summariesCREATE TABLE monthly_regional_sales ( summary_month DATE NOT NULL, -- First of month region VARCHAR(50) NOT NULL, total_revenue DECIMAL(16, 2) NOT NULL, total_orders INT NOT NULL, total_customers INT NOT NULL, store_count INT NOT NULL, avg_order_value DECIMAL(10, 2), revenue_per_store DECIMAL(14, 2), -- Trend comparisons prev_month_revenue DECIMAL(16, 2), prev_year_month_revenue DECIMAL(16, 2), PRIMARY KEY (summary_month, region));-- Estimated rows: 12 months × 10 regions × years = ~120 rows/year-- Use case: Executive dashboards, investor reportsStar Schema Pattern:
The classic dimensional modeling pattern places fact tables at the center with dimension tables radiating outward like a star. In summary tables, we often embed dimension attributes directly rather than joining, but understanding the star schema helps identify what to embed:
┌─────────────┐
│ dim_product │
└──────┬──────┘
│
┌─────────────┐ ┌──────┴──────┐ ┌─────────────┐
│ dim_store │────│ fact_sales │────│ dim_time │
└─────────────┘ └──────┬──────┘ └─────────────┘
│
┌──────┴──────┐
│dim_customer │
└─────────────┘
Summary tables flatten this structure by absorbing dimension attributes into the fact table, trading normalization for query speed.
Finer grain = more flexibility for drill-down but more rows and slower queries. Coarser grain = faster queries but less detail. Many systems maintain multiple grain levels: fine-grained for recent data, coarse-grained for historical. Choose based on actual query requirements.
Different analytical needs require different summary table designs. Here are common patterns:
Pattern 1: Time-Series Summary
Optimized for trend analysis, period comparisons, and time-based filtering:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- Pattern 1: Time-Series Summary Table-- Optimized for: trends, comparisons, time-range queries CREATE TABLE time_series_revenue ( -- Time dimension at day grain summary_date DATE PRIMARY KEY, -- Core metrics total_revenue DECIMAL(16, 2) NOT NULL, total_orders INT NOT NULL, total_customers INT NOT NULL, -- Rolling aggregates (pre-computed for performance) revenue_7d_avg DECIMAL(14, 2), -- 7-day rolling average revenue_30d_avg DECIMAL(14, 2), -- 30-day rolling average revenue_90d_sum DECIMAL(16, 2), -- 90-day rolling sum -- Period comparisons revenue_prev_day DECIMAL(16, 2), revenue_prev_week DECIMAL(16, 2), -- Same weekday last week revenue_prev_month DECIMAL(16, 2), -- Same day last month revenue_prev_year DECIMAL(16, 2), -- Same day last year -- Pre-computed growth rates dod_growth_rate DECIMAL(8, 4), -- Day-over-day wow_growth_rate DECIMAL(8, 4), -- Week-over-week (same weekday) mom_growth_rate DECIMAL(8, 4), -- Month-over-month (same day) yoy_growth_rate DECIMAL(8, 4), -- Year-over-year -- Day-of-week patterns day_of_week INT, -- 0=Sunday, 6=Saturday is_weekend BOOLEAN, is_holiday BOOLEAN, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Query example: Get trend with growth rates for last 30 daysSELECT summary_date, total_revenue, revenue_7d_avg, revenue_30d_avg, yoy_growth_rate, wow_growth_rateFROM time_series_revenueWHERE summary_date > CURRENT_DATE - INTERVAL '30 days'ORDER BY summary_date; -- Uses PK index directlyPattern 2: Ranking/Leaderboard Summary
Optimized for top-N queries, rankings, and competitive analysis:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Pattern 2: Ranking Summary Table-- Optimized for: top products, best performers, leaderboards CREATE TABLE daily_product_rankings ( summary_date DATE NOT NULL, product_id INT NOT NULL, -- Denormalized product info product_name VARCHAR(200) NOT NULL, product_category VARCHAR(100) NOT NULL, product_brand VARCHAR(100) NOT NULL, -- Performance metrics units_sold INT NOT NULL, revenue DECIMAL(12, 2) NOT NULL, order_count INT NOT NULL, -- Pre-computed rankings (saves expensive window functions) units_rank_overall INT, -- Rank by units across all products revenue_rank_overall INT, -- Rank by revenue across all products units_rank_in_category INT, -- Rank within category revenue_rank_in_category INT, -- Rank within category -- Percentile positions revenue_percentile DECIMAL(5, 2), -- What percentile this product is in -- Change metrics prev_day_revenue DECIMAL(12, 2), prev_week_revenue DECIMAL(12, 2), rank_change_from_prev_day INT, -- Positive = moved up PRIMARY KEY (summary_date, product_id)); -- Supporting indexes for common queriesCREATE INDEX idx_rankings_date_revenue ON daily_product_rankings(summary_date, revenue DESC);CREATE INDEX idx_rankings_date_category ON daily_product_rankings(summary_date, product_category, revenue_rank_in_category); -- Query example: Top 10 products by revenue todaySELECT product_name, product_category, revenue, revenue_rank_overall, rank_change_from_prev_dayFROM daily_product_rankingsWHERE summary_date = CURRENT_DATE AND revenue_rank_overall <= 10ORDER BY revenue_rank_overall; -- No window functions needed!Pattern 3: Cohort Analysis Summary
Optimized for analyzing groups of entities over time:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- Pattern 3: Cohort Analysis Summary Table-- Optimized for: retention analysis, customer lifetime value, behavior patterns CREATE TABLE customer_cohort_summary ( -- Cohort definition cohort_month DATE NOT NULL, -- Month of first purchase activity_month DATE NOT NULL, -- Month being measured -- Metrics for this cohort in this activity month active_customers INT NOT NULL, -- Customers from cohort who were active total_cohort_size INT NOT NULL, -- Original cohort size (for retention calc) total_revenue DECIMAL(14, 2) NOT NULL, total_orders INT NOT NULL, -- Pre-computed retention and LTV metrics retention_rate DECIMAL(6, 4) GENERATED ALWAYS AS (active_customers::DECIMAL / NULLIF(total_cohort_size, 0)) STORED, revenue_per_customer DECIMAL(10, 2) GENERATED ALWAYS AS (total_revenue / NULLIF(active_customers, 0)) STORED, orders_per_customer DECIMAL(8, 2) GENERATED ALWAYS AS (total_orders::DECIMAL / NULLIF(active_customers, 0)) STORED, -- Cumulative metrics (for LTV calculation) cumulative_revenue DECIMAL(16, 2), -- Total revenue from cohort since start cumulative_orders INT, -- Total orders from cohort since start months_since_cohort INT GENERATED ALWAYS AS (EXTRACT(YEAR FROM age(activity_month, cohort_month)) * 12 + EXTRACT(MONTH FROM age(activity_month, cohort_month))) STORED, PRIMARY KEY (cohort_month, activity_month)); -- Query example: Retention curve for 2024 cohortsSELECT cohort_month, months_since_cohort, retention_rate, cumulative_revenueFROM customer_cohort_summaryWHERE cohort_month >= '2024-01-01' AND months_since_cohort <= 12ORDER BY cohort_month, months_since_cohort; -- Query example: Compare cohort LTV at 6 monthsSELECT cohort_month, retention_rate AS m6_retention, cumulative_revenue / total_cohort_size AS ltv_per_customerFROM customer_cohort_summaryWHERE months_since_cohort = 6ORDER BY cohort_month;Summary tables should pre-compute operations that are expensive at query time: rankings (window functions), rolling averages, cumulative sums, and complex derived metrics. If you find yourself writing complex SQL in every dashboard query, that complexity should move to summary table population.
Summary tables must be populated from source transactional data. The population strategy depends on data volume, freshness requirements, and computational complexity.
Strategy 1: Scheduled Full Rebuild
Simplest approach—completely recalculate the summary from source data:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
-- Strategy 1: Full Rebuild-- Best for: Small to medium data volumes, simple queries, daily refresh OK CREATE OR REPLACE PROCEDURE rebuild_daily_sales_summary(p_date DATE)LANGUAGE plpgsqlAS $$BEGIN -- Delete existing data for this date DELETE FROM daily_sales_summary WHERE summary_date = p_date; -- Rebuild from source INSERT INTO daily_sales_summary ( summary_date, store_id, product_category_id, customer_segment, store_name, store_region, store_city, category_name, category_group, total_revenue, total_quantity, order_count, unique_customer_count, source_record_count ) SELECT DATE(o.order_date) AS summary_date, s.store_id, p.category_id, c.segment AS customer_segment, -- Denormalized dimension attributes s.name AS store_name, s.region AS store_region, s.city AS store_city, cat.name AS category_name, cat.group_name AS category_group, -- Aggregate measures SUM(oi.quantity * oi.unit_price) AS total_revenue, SUM(oi.quantity) AS total_quantity, COUNT(DISTINCT o.order_id) AS order_count, COUNT(DISTINCT o.customer_id) AS unique_customer_count, COUNT(*) AS source_record_count FROM orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN stores s ON o.store_id = s.store_id JOIN products p ON oi.product_id = p.product_id JOIN categories cat ON p.category_id = cat.category_id JOIN customers c ON o.customer_id = c.customer_id WHERE DATE(o.order_date) = p_date GROUP BY DATE(o.order_date), s.store_id, p.category_id, c.segment, s.name, s.region, s.city, cat.name, cat.group_name; -- Update comparison columns (requires previous data to exist) UPDATE daily_sales_summary curr SET prev_day_revenue = prev.total_revenue, prev_week_revenue = week_ago.total_revenue, prev_year_revenue = year_ago.total_revenue FROM daily_sales_summary prev, daily_sales_summary week_ago, daily_sales_summary year_ago WHERE curr.summary_date = p_date AND prev.summary_date = p_date - INTERVAL '1 day' AND prev.store_id = curr.store_id AND prev.product_category_id = curr.product_category_id AND prev.customer_segment = curr.customer_segment -- Similar joins for week_ago and year_ago... ; COMMIT;END;$$; -- Schedule nightly at 2 AMSELECT cron.schedule('rebuild-daily-summary', '0 2 * * *', $$CALL rebuild_daily_sales_summary(CURRENT_DATE - INTERVAL '1 day')$$);Strategy 2: Incremental Update
More efficient for large volumes—only process changed or new data:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
-- Strategy 2: Incremental Update-- Best for: Large data volumes, near-real-time requirements -- Track what's been processedCREATE TABLE summary_watermark ( table_name VARCHAR(100) PRIMARY KEY, last_processed_id BIGINT, last_processed_time TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); CREATE OR REPLACE PROCEDURE incremental_update_hourly_sales()LANGUAGE plpgsqlAS $$DECLARE v_last_id BIGINT; v_new_max_id BIGINT;BEGIN -- Get watermark SELECT last_processed_id INTO v_last_id FROM summary_watermark WHERE table_name = 'hourly_sales_summary'; IF v_last_id IS NULL THEN v_last_id := 0; END IF; -- Find new maximum SELECT MAX(order_item_id) INTO v_new_max_id FROM order_items; -- Skip if no new data IF v_new_max_id <= v_last_id THEN RETURN; END IF; -- Process new records (upsert pattern) INSERT INTO hourly_sales_summary ( summary_hour, store_id, category_id, store_name, category_name, total_revenue, total_quantity, order_count ) SELECT DATE_TRUNC('hour', o.order_date) AS summary_hour, o.store_id, p.category_id, s.name AS store_name, cat.name AS category_name, SUM(oi.quantity * oi.unit_price) AS total_revenue, SUM(oi.quantity) AS total_quantity, COUNT(DISTINCT o.order_id) AS order_count FROM order_items oi JOIN orders o ON oi.order_id = o.order_id JOIN stores s ON o.store_id = s.store_id JOIN products p ON oi.product_id = p.product_id JOIN categories cat ON p.category_id = cat.category_id WHERE oi.order_item_id > v_last_id AND oi.order_item_id <= v_new_max_id GROUP BY DATE_TRUNC('hour', o.order_date), o.store_id, p.category_id, s.name, cat.name ON CONFLICT (summary_hour, store_id, category_id) DO UPDATE SET total_revenue = hourly_sales_summary.total_revenue + EXCLUDED.total_revenue, total_quantity = hourly_sales_summary.total_quantity + EXCLUDED.total_quantity, order_count = hourly_sales_summary.order_count + EXCLUDED.order_count, updated_at = CURRENT_TIMESTAMP; -- Update watermark UPDATE summary_watermark SET last_processed_id = v_new_max_id, last_processed_time = CURRENT_TIMESTAMP, updated_at = CURRENT_TIMESTAMP WHERE table_name = 'hourly_sales_summary'; IF NOT FOUND THEN INSERT INTO summary_watermark (table_name, last_processed_id, last_processed_time) VALUES ('hourly_sales_summary', v_new_max_id, CURRENT_TIMESTAMP); END IF; COMMIT;END;$$; -- Run every 5 minutesSELECT cron.schedule('incremental-hourly-sales', '*/5 * * * *', $$CALL incremental_update_hourly_sales()$$);| Aspect | Full Rebuild | Incremental Update |
|---|---|---|
| Complexity | Simple, straightforward SQL | Complex, requires watermarking and upsert logic |
| Performance | May be slow for large datasets | Fast, processes only new data |
| Correctness | Always accurate (recalculates everything) | May drift if logic has bugs; requires periodic full rebuild |
| Resource usage | Heavy during rebuild window | Spread over time, lighter per execution |
| Freshness | Stale until next rebuild | Near-real-time (depends on frequency) |
| Handling deletes/updates | Automatically included | Requires additional logic for corrections |
Most production systems use both: incremental updates for recent data (last few days/weeks) for near-real-time freshness, plus periodic full rebuilds for older data to correct any drift and ensure accuracy. This combines the best of both approaches.
Efficient analytics systems maintain summary tables at multiple granularity levels, with coarser tables aggregated from finer ones rather than from source data. This roll-up hierarchy reduces computation and ensures consistency.
Example hierarchy:
Source: order_items (billions of rows)
↓ aggregate ↓
Level 1: hourly_sales_summary (millions of rows)
↓ roll up ↓
Level 2: daily_sales_summary (hundreds of thousands)
↓ roll up ↓
Level 3: monthly_sales_summary (thousands)
↓ roll up ↓
Level 4: yearly_sales_summary (tens of rows)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117
-- Roll-up Hierarchy Implementation -- Level 1: Hourly summary (built from source)CREATE TABLE hourly_sales ( summary_hour TIMESTAMP NOT NULL, store_id INT NOT NULL, category_id INT NOT NULL, revenue DECIMAL(12, 2) NOT NULL, quantity INT NOT NULL, orders INT NOT NULL, PRIMARY KEY (summary_hour, store_id, category_id)); -- Level 2: Daily summary (rolled up from hourly)CREATE TABLE daily_sales ( summary_date DATE NOT NULL, store_id INT NOT NULL, category_id INT NOT NULL, revenue DECIMAL(14, 2) NOT NULL, quantity INT NOT NULL, orders INT NOT NULL, -- Additional daily-level metrics peak_hour INT, -- Hour with highest revenue PRIMARY KEY (summary_date, store_id, category_id)); -- Level 3: Monthly summary (rolled up from daily)CREATE TABLE monthly_sales ( summary_month DATE NOT NULL, -- First of month store_id INT NOT NULL, category_id INT NOT NULL, revenue DECIMAL(16, 2) NOT NULL, quantity INT NOT NULL, orders INT NOT NULL, -- Additional monthly-level metrics operating_days INT, -- Days with >0 revenue avg_daily_revenue DECIMAL(12, 2), PRIMARY KEY (summary_month, store_id, category_id)); -- Roll-up from hourly to dailyCREATE OR REPLACE PROCEDURE rollup_hourly_to_daily(p_date DATE)LANGUAGE plpgsqlAS $$BEGIN INSERT INTO daily_sales ( summary_date, store_id, category_id, revenue, quantity, orders, peak_hour ) SELECT p_date, store_id, category_id, SUM(revenue), SUM(quantity), SUM(orders), -- Peak hour requires window function on the aggregated data (SELECT EXTRACT(HOUR FROM summary_hour) FROM hourly_sales h2 WHERE h2.store_id = hourly_sales.store_id AND h2.category_id = hourly_sales.category_id AND DATE(h2.summary_hour) = p_date ORDER BY h2.revenue DESC LIMIT 1) AS peak_hour FROM hourly_sales WHERE DATE(summary_hour) = p_date GROUP BY store_id, category_id ON CONFLICT (summary_date, store_id, category_id) DO UPDATE SET revenue = EXCLUDED.revenue, quantity = EXCLUDED.quantity, orders = EXCLUDED.orders, peak_hour = EXCLUDED.peak_hour;END;$$; -- Roll-up from daily to monthlyCREATE OR REPLACE PROCEDURE rollup_daily_to_monthly(p_month DATE)LANGUAGE plpgsqlAS $$BEGIN INSERT INTO monthly_sales ( summary_month, store_id, category_id, revenue, quantity, orders, operating_days, avg_daily_revenue ) SELECT DATE_TRUNC('month', p_month), store_id, category_id, SUM(revenue), SUM(quantity), SUM(orders), COUNT(*) AS operating_days, -- Days with data AVG(revenue) AS avg_daily_revenue FROM daily_sales WHERE summary_date >= DATE_TRUNC('month', p_month) AND summary_date < DATE_TRUNC('month', p_month) + INTERVAL '1 month' GROUP BY store_id, category_id ON CONFLICT (summary_month, store_id, category_id) DO UPDATE SET revenue = EXCLUDED.revenue, quantity = EXCLUDED.quantity, orders = EXCLUDED.orders, operating_days = EXCLUDED.operating_days, avg_daily_revenue = EXCLUDED.avg_daily_revenue;END;$$; -- Schedule the cascade: hourly → daily → monthly-- Hourly: Runs every hour to aggregate completed hourSELECT cron.schedule('hourly-to-daily', '10 * * * *', -- 10 min past each hour $$CALL rollup_hourly_to_daily(CURRENT_DATE)$$); -- Daily: Runs at 3 AM to finalize previous daySELECT cron.schedule('daily-to-monthly', '0 3 * * *', $$CALL rollup_daily_to_monthly(CURRENT_DATE)$$);Not all metrics can be rolled up by simple summation. AVG, COUNT DISTINCT, PERCENTILE, and MIN/MAX of non-additive values require storing intermediate components or recalculating from source. Design your schema with roll-up compatibility in mind.
Let's examine a complete summary table architecture for a SaaS analytics platform that serves executive dashboards, operational reports, and customer-facing analytics.
Requirements:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168
-- SaaS Analytics Platform: Summary Table Architecture -- =====================================================-- LEVEL 1: Near-Real-Time Summaries (5-minute lag)-- ===================================================== -- Active usage trackingCREATE TABLE realtime_usage_summary ( bucket_time TIMESTAMP NOT NULL, -- 5-minute buckets tenant_id UUID NOT NULL, feature_category VARCHAR(50) NOT NULL, active_users INT NOT NULL, api_calls INT NOT NULL, error_count INT NOT NULL, avg_response_ms DECIMAL(8, 2), p99_response_ms DECIMAL(8, 2), PRIMARY KEY (bucket_time, tenant_id, feature_category)); -- =====================================================-- LEVEL 2: Hourly Operational Summaries-- ===================================================== CREATE TABLE hourly_tenant_metrics ( summary_hour TIMESTAMP NOT NULL, tenant_id UUID NOT NULL, -- Denormalized tenant info tenant_name VARCHAR(200) NOT NULL, tenant_plan VARCHAR(50) NOT NULL, tenant_region VARCHAR(50) NOT NULL, -- Usage metrics active_users INT NOT NULL, new_users INT NOT NULL, api_calls INT NOT NULL, data_processed_mb DECIMAL(12, 2) NOT NULL, -- Revenue metrics (for billing) billable_units INT NOT NULL, estimated_charges DECIMAL(10, 2) NOT NULL, -- Performance metrics avg_response_ms DECIMAL(8, 2), error_rate DECIMAL(6, 4), PRIMARY KEY (summary_hour, tenant_id)); -- =====================================================-- LEVEL 3: Daily Analytics Summaries-- ===================================================== CREATE TABLE daily_business_metrics ( summary_date DATE NOT NULL, -- Overall platform metrics total_tenants INT NOT NULL, active_tenants INT NOT NULL, -- At least 1 API call new_tenants INT NOT NULL, churned_tenants INT NOT NULL, total_users INT NOT NULL, active_users INT NOT NULL, new_users INT NOT NULL, total_revenue DECIMAL(14, 2) NOT NULL, mrr DECIMAL(14, 2) NOT NULL, -- Monthly Recurring Revenue (prorated daily) -- Engagement metrics avg_api_calls_per_tenant DECIMAL(10, 2), avg_users_per_tenant DECIMAL(8, 2), -- Comparisons (pre-computed) prev_day_revenue DECIMAL(14, 2), prev_week_revenue DECIMAL(14, 2), prev_month_revenue DECIMAL(14, 2), prev_year_revenue DECIMAL(14, 2), dod_growth DECIMAL(8, 4), wow_growth DECIMAL(8, 4), mom_growth DECIMAL(8, 4), yoy_growth DECIMAL(8, 4), PRIMARY KEY (summary_date)); CREATE TABLE daily_plan_breakdown ( summary_date DATE NOT NULL, plan_name VARCHAR(50) NOT NULL, tenant_count INT NOT NULL, user_count INT NOT NULL, revenue DECIMAL(12, 2) NOT NULL, -- Plan-specific retention churn_count INT NOT NULL, expansion_count INT NOT NULL, -- Upgraded from this plan contraction_count INT NOT NULL, -- Downgraded to this plan PRIMARY KEY (summary_date, plan_name)); -- =====================================================-- LEVEL 4: Monthly Executive Summaries-- ===================================================== CREATE TABLE monthly_executive_dashboard ( summary_month DATE NOT NULL, -- First of month -- Revenue mrr_start DECIMAL(14, 2) NOT NULL, mrr_end DECIMAL(14, 2) NOT NULL, new_mrr DECIMAL(12, 2) NOT NULL, expansion_mrr DECIMAL(12, 2) NOT NULL, contraction_mrr DECIMAL(12, 2) NOT NULL, churn_mrr DECIMAL(12, 2) NOT NULL, net_mrr_change DECIMAL(12, 2) GENERATED ALWAYS AS (new_mrr + expansion_mrr - contraction_mrr - churn_mrr) STORED, -- Growth rates mrr_growth_rate DECIMAL(8, 4), arr DECIMAL(16, 2) NOT NULL, -- Annual Recurring Revenue -- Customers customers_start INT NOT NULL, customers_end INT NOT NULL, new_customers INT NOT NULL, churned_customers INT NOT NULL, churn_rate DECIMAL(6, 4), net_retention_rate DECIMAL(8, 4), -- Dollar retention -- Engagement avg_dau INT, -- Daily Active Users (average) avg_mau INT, -- Monthly Active Users dau_mau_ratio DECIMAL(4, 2), -- Stickiness -- Efficiency ltv DECIMAL(10, 2), -- Lifetime Value cac DECIMAL(10, 2), -- Customer Acquisition Cost ltv_cac_ratio DECIMAL(6, 2), PRIMARY KEY (summary_month)); -- =====================================================-- COHORT ANALYSIS TABLES-- ===================================================== CREATE TABLE monthly_cohort_metrics ( cohort_month DATE NOT NULL, -- When customer signed up activity_month DATE NOT NULL, -- Month being measured cohort_size INT NOT NULL, -- Original cohort count active_customers INT NOT NULL, -- Still active this month revenue DECIMAL(14, 2) NOT NULL, cumulative_revenue DECIMAL(16, 2) NOT NULL, retention_rate DECIMAL(6, 4) GENERATED ALWAYS AS (active_customers::DECIMAL / NULLIF(cohort_size, 0)) STORED, ltv_to_date DECIMAL(10, 2) GENERATED ALWAYS AS (cumulative_revenue / NULLIF(cohort_size, 0)) STORED, PRIMARY KEY (cohort_month, activity_month));This architecture serves a dashboard with 20+ widgets in under 100ms total. Each widget queries a single summary table with simple WHERE clauses—no joins, no aggregations at query time. The executive monthly summary queries return in <5ms, enabling real-time interaction even for complex SaaS metrics.
Summary tables are the culmination of denormalization techniques, purpose-built for analytics performance. When designed correctly, they transform complex analytical queries from seconds or minutes to milliseconds.
Module Complete:
You have now mastered all five core denormalization techniques:
With these techniques, you can strategically introduce redundancy to optimize database performance while understanding and managing the trade-offs involved.
Congratulations! You've completed the Denormalization Techniques module. You now have the knowledge to apply strategic denormalization across all common scenarios—from simple derived columns to comprehensive summary table architectures. Remember: denormalization is a tool, not a goal. Use it where the performance benefits justify the maintenance complexity.