Database Management SystemsDenormalization

Denormalization Techniques

LevelIntermediate

Duration75 mins

TopicDenormalization

5 / 5

Summary Tables

The Analytics Challenge

Consider a retail analytics platform serving executives with these requirements:

Sales dashboard: Real-time totals by region, category, and time period
Trend analysis: Week-over-week, month-over-month comparisons
Top performers: Best-selling products, highest-revenue stores, top customers
Drill-down capability: From yearly totals down to daily details

With a normalized transactional database containing billions of records, each dashboard refresh could require:

Joining orders, items, products, categories, customers, and locations
Aggregating across millions of rows
Computing complex metrics (year-over-year growth, moving averages, rankings)

No amount of indexing makes this performant. The solution is summary tables—purpose-built denormalized structures specifically designed to answer analytical questions efficiently.

Summary tables represent the synthesis of all denormalization techniques: they contain derived columns, pre-computed aggregates, duplicated dimension data, and merged attributes, all organized for optimum query performance.

What You Will Learn

By the end of this page, you will master the design and implementation of summary tables. You'll understand their role in analytics architecture, design principles for different use cases, refresh strategies, and best practices from data warehousing that apply to all database systems.

Understanding Summary Tables

A summary table (also called an aggregate table, fact table, or reporting table) is a denormalized structure designed to serve a specific set of analytical queries. Unlike operational tables optimized for transactions, summary tables are optimized for read-heavy analytical workloads.

Key characteristics:

Aspect	Operational Tables	Summary Tables
Purpose	Record transactions, maintain state	Serve analytics, reporting
Optimization	Insert/update performance	Query performance
Normalization	Typically normalized (3NF/BCNF)	Heavily denormalized
Data freshness	Real-time	May be delayed (minutes to days)
Row size	Narrow, compact	Wide, comprehensive
Query patterns	Transaction lookup, single-entity updates	Aggregations, trends, comparisons

Anatomy of a summary table:

summary-table-anatomy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
-- Anatomy of a well-designed summary table
-- Example: Daily Sales Summary for retail analytics
 
CREATE TABLE daily_sales_summary (
    -- =========================================
    -- DIMENSION KEYS (What we're measuring by)
    -- =========================================
    summary_date DATE NOT NULL,
    store_id INT NOT NULL,
    product_category_id INT NOT NULL,
    customer_segment VARCHAR(20) NOT NULL,
    
    -- =========================================
    -- DENORMALIZED DIMENSION ATTRIBUTES
    -- (Copied from dimension tables to avoid joins)
    -- =========================================
    store_name VARCHAR(100) NOT NULL,
    store_region VARCHAR(50) NOT NULL,
    store_city VARCHAR(50) NOT NULL,
    category_name VARCHAR(100) NOT NULL,
    category_group VARCHAR(50) NOT NULL,
    
    -- =========================================
    -- AGGREGATE MEASURES (Pre-computed metrics)
    -- =========================================
    total_revenue DECIMAL(14, 2) NOT NULL DEFAULT 0,
    total_quantity INT NOT NULL DEFAULT 0,
    order_count INT NOT NULL DEFAULT 0,
    unique_customer_count INT NOT NULL DEFAULT 0,
    
    -- For average calculation (store components, not just result)
    avg_order_value DECIMAL(10, 2) GENERATED ALWAYS AS 
        (CASE WHEN order_count > 0 THEN total_revenue / order_count ELSE 0 END) STORED,
    
    -- =========================================
    -- DERIVED CALCULATIONS
    -- =========================================
    revenue_per_customer DECIMAL(10, 2) GENERATED ALWAYS AS 
        (CASE WHEN unique_customer_count > 0 
              THEN total_revenue / unique_customer_count ELSE 0 END) STORED,
    units_per_order DECIMAL(8, 2) GENERATED ALWAYS AS 
        (CASE WHEN order_count > 0 THEN total_quantity::DECIMAL / order_count ELSE 0 END) STORED,
    
    -- =========================================
    -- COMPARISON METRICS (Pre-computed for trends)
    -- =========================================
    prev_day_revenue DECIMAL(14, 2),  -- Yesterday's revenue (same dimension combo)
    prev_week_revenue DECIMAL(14, 2), -- Same day last week
    prev_year_revenue DECIMAL(14, 2), -- Same day last year
    
    day_over_day_change DECIMAL(8, 4) GENERATED ALWAYS AS 
        (CASE WHEN prev_day_revenue > 0 
              THEN (total_revenue - prev_day_revenue) / prev_day_revenue ELSE NULL END) STORED,
    year_over_year_change DECIMAL(8, 4) GENERATED ALWAYS AS 
        (CASE WHEN prev_year_revenue > 0 
              THEN (total_revenue - prev_year_revenue) / prev_year_revenue ELSE NULL END) STORED,
    
    -- =========================================
    -- METADATA
    -- =========================================
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    source_record_count INT,  -- How many source rows contributed
    
    -- Composite primary key on all dimension keys
    PRIMARY KEY (summary_date, store_id, product_category_id, customer_segment)
);
 
-- Indexes for common query patterns
CREATE INDEX idx_summary_date_desc ON daily_sales_summary(summary_date DESC);
CREATE INDEX idx_summary_store_date ON daily_sales_summary(store_id, summary_date DESC);
CREATE INDEX idx_summary_category_date ON daily_sales_summary(product_category_id, summary_date DESC);
CREATE INDEX idx_summary_region_date ON daily_sales_summary(store_region, summary_date DESC);

Design for Queries

Summary tables should be designed starting from the queries they need to answer. List every chart, dashboard widget, and report. Identify dimensions, measures, and comparisons. Then design the table to answer those specific questions with simple SELECTs—no joins, minimal aggregation.

Dimensional Modeling Concepts

Summary tables draw heavily from dimensional modeling, a design technique developed for data warehousing. Understanding its core concepts helps design effective summary tables.

Key concepts:

Facts are the quantitative measurements you want to analyze:

Revenue, quantity, cost, duration, count
Typically numeric and additive (can be summed across dimensions)

Dimensions are the perspectives by which you analyze facts:

Time, location, product, customer, channel
Provide context and filtering/grouping criteria

Grain is the level of detail in each row:

Daily by store by category (medium grain)
Hourly by product by customer (fine grain)
Monthly by region (coarse grain)

dimensional-modeling-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- Example: Different grain levels for the same subject area
 
-- FINE GRAIN: Hourly sales by product
-- High detail, many rows, supports any drill-down
CREATE TABLE hourly_product_sales (
    summary_hour TIMESTAMP NOT NULL,
    product_id INT NOT NULL,
    store_id INT NOT NULL,
    
    quantity_sold INT NOT NULL,
    revenue DECIMAL(10, 2) NOT NULL,
    cost DECIMAL(10, 2) NOT NULL,
    transaction_count INT NOT NULL,
    
    -- Denormalized attributes
    product_name VARCHAR(200) NOT NULL,
    product_category VARCHAR(100) NOT NULL,
    store_name VARCHAR(100) NOT NULL,
    store_region VARCHAR(50) NOT NULL,
    
    PRIMARY KEY (summary_hour, product_id, store_id)
);
-- Estimated rows: 24 hours × 365 days × 10,000 products × 500 stores = ~43B rows/year
-- Use case: Detailed drill-down, anomaly detection
 
-- MEDIUM GRAIN: Daily sales by category by store
-- Balanced detail and performance
CREATE TABLE daily_category_sales (
    summary_date DATE NOT NULL,
    category_id INT NOT NULL,
    store_id INT NOT NULL,
    
    quantity_sold INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    order_count INT NOT NULL,
    unique_products_sold INT NOT NULL,
    
    -- Denormalized
    category_name VARCHAR(100) NOT NULL,
    store_name VARCHAR(100) NOT NULL,
    store_region VARCHAR(50) NOT NULL,
    
    PRIMARY KEY (summary_date, category_id, store_id)
);
-- Estimated rows: 365 days × 100 categories × 500 stores = ~18M rows/year
-- Use case: Daily dashboards, store performance
 
-- COARSE GRAIN: Monthly sales by region
-- Executive-level summaries
CREATE TABLE monthly_regional_sales (
    summary_month DATE NOT NULL,  -- First of month
    region VARCHAR(50) NOT NULL,
    
    total_revenue DECIMAL(16, 2) NOT NULL,
    total_orders INT NOT NULL,
    total_customers INT NOT NULL,
    store_count INT NOT NULL,
    
    avg_order_value DECIMAL(10, 2),
    revenue_per_store DECIMAL(14, 2),
    
    -- Trend comparisons
    prev_month_revenue DECIMAL(16, 2),
    prev_year_month_revenue DECIMAL(16, 2),
    
    PRIMARY KEY (summary_month, region)
);
-- Estimated rows: 12 months × 10 regions × years = ~120 rows/year
-- Use case: Executive dashboards, investor reports

Star Schema Pattern:

The classic dimensional modeling pattern places fact tables at the center with dimension tables radiating outward like a star. In summary tables, we often embed dimension attributes directly rather than joining, but understanding the star schema helps identify what to embed:

                   ┌─────────────┐
                   │ dim_product │
                   └──────┬──────┘
                          │
┌─────────────┐    ┌──────┴──────┐    ┌─────────────┐
│  dim_store  │────│ fact_sales  │────│  dim_time   │
└─────────────┘    └──────┬──────┘    └─────────────┘
                          │
                   ┌──────┴──────┐
                   │dim_customer │
                   └─────────────┘

Summary tables flatten this structure by absorbing dimension attributes into the fact table, trading normalization for query speed.

Choose Grain Carefully

Finer grain = more flexibility for drill-down but more rows and slower queries. Coarser grain = faster queries but less detail. Many systems maintain multiple grain levels: fine-grained for recent data, coarse-grained for historical. Choose based on actual query requirements.

Summary Table Design Patterns

Different analytical needs require different summary table designs. Here are common patterns:

Pattern 1: Time-Series Summary

Optimized for trend analysis, period comparisons, and time-based filtering:

time-series-summary-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Pattern 1: Time-Series Summary Table
-- Optimized for: trends, comparisons, time-range queries
 
CREATE TABLE time_series_revenue (
    -- Time dimension at day grain
    summary_date DATE PRIMARY KEY,
    
    -- Core metrics
    total_revenue DECIMAL(16, 2) NOT NULL,
    total_orders INT NOT NULL,
    total_customers INT NOT NULL,
    
    -- Rolling aggregates (pre-computed for performance)
    revenue_7d_avg DECIMAL(14, 2),    -- 7-day rolling average
    revenue_30d_avg DECIMAL(14, 2),   -- 30-day rolling average
    revenue_90d_sum DECIMAL(16, 2),   -- 90-day rolling sum
    
    -- Period comparisons
    revenue_prev_day DECIMAL(16, 2),
    revenue_prev_week DECIMAL(16, 2),  -- Same weekday last week
    revenue_prev_month DECIMAL(16, 2), -- Same day last month
    revenue_prev_year DECIMAL(16, 2),  -- Same day last year
    
    -- Pre-computed growth rates
    dod_growth_rate DECIMAL(8, 4),    -- Day-over-day
    wow_growth_rate DECIMAL(8, 4),    -- Week-over-week (same weekday)
    mom_growth_rate DECIMAL(8, 4),    -- Month-over-month (same day)
    yoy_growth_rate DECIMAL(8, 4),    -- Year-over-year
    
    -- Day-of-week patterns
    day_of_week INT,                  -- 0=Sunday, 6=Saturday
    is_weekend BOOLEAN,
    is_holiday BOOLEAN,
    
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Query example: Get trend with growth rates for last 30 days
SELECT summary_date, total_revenue, 
       revenue_7d_avg, revenue_30d_avg,
       yoy_growth_rate, wow_growth_rate
FROM time_series_revenue
WHERE summary_date > CURRENT_DATE - INTERVAL '30 days'
ORDER BY summary_date;  -- Uses PK index directly

Pattern 2: Ranking/Leaderboard Summary

Optimized for top-N queries, rankings, and competitive analysis:

ranking-summary-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Pattern 2: Ranking Summary Table
-- Optimized for: top products, best performers, leaderboards
 
CREATE TABLE daily_product_rankings (
    summary_date DATE NOT NULL,
    product_id INT NOT NULL,
    
    -- Denormalized product info
    product_name VARCHAR(200) NOT NULL,
    product_category VARCHAR(100) NOT NULL,
    product_brand VARCHAR(100) NOT NULL,
    
    -- Performance metrics
    units_sold INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    order_count INT NOT NULL,
    
    -- Pre-computed rankings (saves expensive window functions)
    units_rank_overall INT,          -- Rank by units across all products
    revenue_rank_overall INT,        -- Rank by revenue across all products
    units_rank_in_category INT,      -- Rank within category
    revenue_rank_in_category INT,    -- Rank within category
    
    -- Percentile positions
    revenue_percentile DECIMAL(5, 2), -- What percentile this product is in
    
    -- Change metrics
    prev_day_revenue DECIMAL(12, 2),
    prev_week_revenue DECIMAL(12, 2),
    rank_change_from_prev_day INT,   -- Positive = moved up
    
    PRIMARY KEY (summary_date, product_id)
);
 
-- Supporting indexes for common queries
CREATE INDEX idx_rankings_date_revenue ON daily_product_rankings(summary_date, revenue DESC);
CREATE INDEX idx_rankings_date_category ON daily_product_rankings(summary_date, product_category, revenue_rank_in_category);
 
-- Query example: Top 10 products by revenue today
SELECT product_name, product_category, revenue, 
       revenue_rank_overall, rank_change_from_prev_day
FROM daily_product_rankings
WHERE summary_date = CURRENT_DATE
  AND revenue_rank_overall <= 10
ORDER BY revenue_rank_overall;  -- No window functions needed!

Pattern 3: Cohort Analysis Summary

Optimized for analyzing groups of entities over time:

cohort-summary-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- Pattern 3: Cohort Analysis Summary Table
-- Optimized for: retention analysis, customer lifetime value, behavior patterns
 
CREATE TABLE customer_cohort_summary (
    -- Cohort definition
    cohort_month DATE NOT NULL,           -- Month of first purchase
    activity_month DATE NOT NULL,         -- Month being measured
    
    -- Metrics for this cohort in this activity month
    active_customers INT NOT NULL,        -- Customers from cohort who were active
    total_cohort_size INT NOT NULL,       -- Original cohort size (for retention calc)
    
    total_revenue DECIMAL(14, 2) NOT NULL,
    total_orders INT NOT NULL,
    
    -- Pre-computed retention and LTV metrics
    retention_rate DECIMAL(6, 4) GENERATED ALWAYS AS 
        (active_customers::DECIMAL / NULLIF(total_cohort_size, 0)) STORED,
    revenue_per_customer DECIMAL(10, 2) GENERATED ALWAYS AS 
        (total_revenue / NULLIF(active_customers, 0)) STORED,
    orders_per_customer DECIMAL(8, 2) GENERATED ALWAYS AS 
        (total_orders::DECIMAL / NULLIF(active_customers, 0)) STORED,
    
    -- Cumulative metrics (for LTV calculation)
    cumulative_revenue DECIMAL(16, 2),    -- Total revenue from cohort since start
    cumulative_orders INT,                 -- Total orders from cohort since start
    
    months_since_cohort INT GENERATED ALWAYS AS 
        (EXTRACT(YEAR FROM age(activity_month, cohort_month)) * 12 +
         EXTRACT(MONTH FROM age(activity_month, cohort_month))) STORED,
    
    PRIMARY KEY (cohort_month, activity_month)
);
 
-- Query example: Retention curve for 2024 cohorts
SELECT cohort_month, months_since_cohort, 
       retention_rate, cumulative_revenue
FROM customer_cohort_summary
WHERE cohort_month >= '2024-01-01'
  AND months_since_cohort <= 12
ORDER BY cohort_month, months_since_cohort;
 
-- Query example: Compare cohort LTV at 6 months
SELECT cohort_month, 
       retention_rate AS m6_retention,
       cumulative_revenue / total_cohort_size AS ltv_per_customer
FROM customer_cohort_summary
WHERE months_since_cohort = 6
ORDER BY cohort_month;

Pre-compute Expensive Operations

Summary tables should pre-compute operations that are expensive at query time: rankings (window functions), rolling averages, cumulative sums, and complex derived metrics. If you find yourself writing complex SQL in every dashboard query, that complexity should move to summary table population.

Populating Summary Tables

Summary tables must be populated from source transactional data. The population strategy depends on data volume, freshness requirements, and computational complexity.

Strategy 1: Scheduled Full Rebuild

Simplest approach—completely recalculate the summary from source data:

full-rebuild-strategy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
-- Strategy 1: Full Rebuild
-- Best for: Small to medium data volumes, simple queries, daily refresh OK
 
CREATE OR REPLACE PROCEDURE rebuild_daily_sales_summary(p_date DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    -- Delete existing data for this date
    DELETE FROM daily_sales_summary WHERE summary_date = p_date;
    
    -- Rebuild from source
    INSERT INTO daily_sales_summary (
        summary_date, store_id, product_category_id, customer_segment,
        store_name, store_region, store_city,
        category_name, category_group,
        total_revenue, total_quantity, order_count, unique_customer_count,
        source_record_count
    )
    SELECT 
        DATE(o.order_date) AS summary_date,
        s.store_id,
        p.category_id,
        c.segment AS customer_segment,
        
        -- Denormalized dimension attributes
        s.name AS store_name,
        s.region AS store_region,
        s.city AS store_city,
        cat.name AS category_name,
        cat.group_name AS category_group,
        
        -- Aggregate measures
        SUM(oi.quantity * oi.unit_price) AS total_revenue,
        SUM(oi.quantity) AS total_quantity,
        COUNT(DISTINCT o.order_id) AS order_count,
        COUNT(DISTINCT o.customer_id) AS unique_customer_count,
        COUNT(*) AS source_record_count
        
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    JOIN stores s ON o.store_id = s.store_id
    JOIN products p ON oi.product_id = p.product_id
    JOIN categories cat ON p.category_id = cat.category_id
    JOIN customers c ON o.customer_id = c.customer_id
    WHERE DATE(o.order_date) = p_date
    GROUP BY 
        DATE(o.order_date), s.store_id, p.category_id, c.segment,
        s.name, s.region, s.city, cat.name, cat.group_name;
    
    -- Update comparison columns (requires previous data to exist)
    UPDATE daily_sales_summary curr
    SET 
        prev_day_revenue = prev.total_revenue,
        prev_week_revenue = week_ago.total_revenue,
        prev_year_revenue = year_ago.total_revenue
    FROM 
        daily_sales_summary prev,
        daily_sales_summary week_ago,
        daily_sales_summary year_ago
    WHERE curr.summary_date = p_date
      AND prev.summary_date = p_date - INTERVAL '1 day'
      AND prev.store_id = curr.store_id
      AND prev.product_category_id = curr.product_category_id
      AND prev.customer_segment = curr.customer_segment
      -- Similar joins for week_ago and year_ago...
    ;
    
    COMMIT;
END;
$$;
 
-- Schedule nightly at 2 AM
SELECT cron.schedule('rebuild-daily-summary', '0 2 * * *',
    $$CALL rebuild_daily_sales_summary(CURRENT_DATE - INTERVAL '1 day')$$);

Strategy 2: Incremental Update

More efficient for large volumes—only process changed or new data:

incremental-update-strategy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
-- Strategy 2: Incremental Update
-- Best for: Large data volumes, near-real-time requirements
 
-- Track what's been processed
CREATE TABLE summary_watermark (
    table_name VARCHAR(100) PRIMARY KEY,
    last_processed_id BIGINT,
    last_processed_time TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
CREATE OR REPLACE PROCEDURE incremental_update_hourly_sales()
LANGUAGE plpgsql
AS $$
DECLARE
    v_last_id BIGINT;
    v_new_max_id BIGINT;
BEGIN
    -- Get watermark
    SELECT last_processed_id INTO v_last_id
    FROM summary_watermark
    WHERE table_name = 'hourly_sales_summary';
    
    IF v_last_id IS NULL THEN v_last_id := 0; END IF;
    
    -- Find new maximum
    SELECT MAX(order_item_id) INTO v_new_max_id FROM order_items;
    
    -- Skip if no new data
    IF v_new_max_id <= v_last_id THEN
        RETURN;
    END IF;
    
    -- Process new records (upsert pattern)
    INSERT INTO hourly_sales_summary (
        summary_hour, store_id, category_id,
        store_name, category_name,
        total_revenue, total_quantity, order_count
    )
    SELECT 
        DATE_TRUNC('hour', o.order_date) AS summary_hour,
        o.store_id,
        p.category_id,
        s.name AS store_name,
        cat.name AS category_name,
        SUM(oi.quantity * oi.unit_price) AS total_revenue,
        SUM(oi.quantity) AS total_quantity,
        COUNT(DISTINCT o.order_id) AS order_count
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    JOIN stores s ON o.store_id = s.store_id
    JOIN products p ON oi.product_id = p.product_id
    JOIN categories cat ON p.category_id = cat.category_id
    WHERE oi.order_item_id > v_last_id
      AND oi.order_item_id <= v_new_max_id
    GROUP BY 
        DATE_TRUNC('hour', o.order_date), o.store_id, p.category_id,
        s.name, cat.name
    ON CONFLICT (summary_hour, store_id, category_id)
    DO UPDATE SET
        total_revenue = hourly_sales_summary.total_revenue + EXCLUDED.total_revenue,
        total_quantity = hourly_sales_summary.total_quantity + EXCLUDED.total_quantity,
        order_count = hourly_sales_summary.order_count + EXCLUDED.order_count,
        updated_at = CURRENT_TIMESTAMP;
    
    -- Update watermark
    UPDATE summary_watermark
    SET last_processed_id = v_new_max_id,
        last_processed_time = CURRENT_TIMESTAMP,
        updated_at = CURRENT_TIMESTAMP
    WHERE table_name = 'hourly_sales_summary';
    
    IF NOT FOUND THEN
        INSERT INTO summary_watermark (table_name, last_processed_id, last_processed_time)
        VALUES ('hourly_sales_summary', v_new_max_id, CURRENT_TIMESTAMP);
    END IF;
    
    COMMIT;
END;
$$;
 
-- Run every 5 minutes
SELECT cron.schedule('incremental-hourly-sales', '*/5 * * * *',
    $$CALL incremental_update_hourly_sales()$$);

Population Strategy Comparison
Aspect	Full Rebuild	Incremental Update
Complexity	Simple, straightforward SQL	Complex, requires watermarking and upsert logic
Performance	May be slow for large datasets	Fast, processes only new data
Correctness	Always accurate (recalculates everything)	May drift if logic has bugs; requires periodic full rebuild
Resource usage	Heavy during rebuild window	Spread over time, lighter per execution
Freshness	Stale until next rebuild	Near-real-time (depends on frequency)
Handling deletes/updates	Automatically included	Requires additional logic for corrections

Hybrid Approach

Most production systems use both: incremental updates for recent data (last few days/weeks) for near-real-time freshness, plus periodic full rebuilds for older data to correct any drift and ensure accuracy. This combines the best of both approaches.

Roll-up Hierarchies

Efficient analytics systems maintain summary tables at multiple granularity levels, with coarser tables aggregated from finer ones rather than from source data. This roll-up hierarchy reduces computation and ensures consistency.

Example hierarchy:

Source: order_items (billions of rows)
           ↓ aggregate ↓
Level 1: hourly_sales_summary (millions of rows)
           ↓ roll up ↓
Level 2: daily_sales_summary (hundreds of thousands)
           ↓ roll up ↓
Level 3: monthly_sales_summary (thousands)
           ↓ roll up ↓
Level 4: yearly_sales_summary (tens of rows)

rollup-hierarchy-implementation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
-- Roll-up Hierarchy Implementation
 
-- Level 1: Hourly summary (built from source)
CREATE TABLE hourly_sales (
    summary_hour TIMESTAMP NOT NULL,
    store_id INT NOT NULL,
    category_id INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    quantity INT NOT NULL,
    orders INT NOT NULL,
    PRIMARY KEY (summary_hour, store_id, category_id)
);
 
-- Level 2: Daily summary (rolled up from hourly)
CREATE TABLE daily_sales (
    summary_date DATE NOT NULL,
    store_id INT NOT NULL,
    category_id INT NOT NULL,
    revenue DECIMAL(14, 2) NOT NULL,
    quantity INT NOT NULL,
    orders INT NOT NULL,
    -- Additional daily-level metrics
    peak_hour INT,  -- Hour with highest revenue
    PRIMARY KEY (summary_date, store_id, category_id)
);
 
-- Level 3: Monthly summary (rolled up from daily)
CREATE TABLE monthly_sales (
    summary_month DATE NOT NULL,  -- First of month
    store_id INT NOT NULL,
    category_id INT NOT NULL,
    revenue DECIMAL(16, 2) NOT NULL,
    quantity INT NOT NULL,
    orders INT NOT NULL,
    -- Additional monthly-level metrics
    operating_days INT,  -- Days with >0 revenue
    avg_daily_revenue DECIMAL(12, 2),
    PRIMARY KEY (summary_month, store_id, category_id)
);
 
-- Roll-up from hourly to daily
CREATE OR REPLACE PROCEDURE rollup_hourly_to_daily(p_date DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO daily_sales (
        summary_date, store_id, category_id,
        revenue, quantity, orders, peak_hour
    )
    SELECT 
        p_date,
        store_id,
        category_id,
        SUM(revenue),
        SUM(quantity),
        SUM(orders),
        -- Peak hour requires window function on the aggregated data
        (SELECT EXTRACT(HOUR FROM summary_hour)
         FROM hourly_sales h2
         WHERE h2.store_id = hourly_sales.store_id
           AND h2.category_id = hourly_sales.category_id
           AND DATE(h2.summary_hour) = p_date
         ORDER BY h2.revenue DESC
         LIMIT 1) AS peak_hour
    FROM hourly_sales
    WHERE DATE(summary_hour) = p_date
    GROUP BY store_id, category_id
    ON CONFLICT (summary_date, store_id, category_id)
    DO UPDATE SET
        revenue = EXCLUDED.revenue,
        quantity = EXCLUDED.quantity,
        orders = EXCLUDED.orders,
        peak_hour = EXCLUDED.peak_hour;
END;
$$;
 
-- Roll-up from daily to monthly
CREATE OR REPLACE PROCEDURE rollup_daily_to_monthly(p_month DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO monthly_sales (
        summary_month, store_id, category_id,
        revenue, quantity, orders,
        operating_days, avg_daily_revenue
    )
    SELECT 
        DATE_TRUNC('month', p_month),
        store_id,
        category_id,
        SUM(revenue),
        SUM(quantity),
        SUM(orders),
        COUNT(*) AS operating_days,  -- Days with data
        AVG(revenue) AS avg_daily_revenue
    FROM daily_sales
    WHERE summary_date >= DATE_TRUNC('month', p_month)
      AND summary_date < DATE_TRUNC('month', p_month) + INTERVAL '1 month'
    GROUP BY store_id, category_id
    ON CONFLICT (summary_month, store_id, category_id)
    DO UPDATE SET
        revenue = EXCLUDED.revenue,
        quantity = EXCLUDED.quantity,
        orders = EXCLUDED.orders,
        operating_days = EXCLUDED.operating_days,
        avg_daily_revenue = EXCLUDED.avg_daily_revenue;
END;
$$;
 
-- Schedule the cascade: hourly → daily → monthly
-- Hourly: Runs every hour to aggregate completed hour
SELECT cron.schedule('hourly-to-daily', '10 * * * *',  -- 10 min past each hour
    $$CALL rollup_hourly_to_daily(CURRENT_DATE)$$);
    
-- Daily: Runs at 3 AM to finalize previous day
SELECT cron.schedule('daily-to-monthly', '0 3 * * *',
    $$CALL rollup_daily_to_monthly(CURRENT_DATE)$$);

Aggregate Only Compatible Metrics

Not all metrics can be rolled up by simple summation. AVG, COUNT DISTINCT, PERCENTILE, and MIN/MAX of non-additive values require storing intermediate components or recalculating from source. Design your schema with roll-up compatibility in mind.

Real-World Architecture Example

Let's examine a complete summary table architecture for a SaaS analytics platform that serves executive dashboards, operational reports, and customer-facing analytics.

Requirements:

Real-time dashboard with revenue, user activity metrics
Trend analysis with week-over-week, month-over-month comparisons
Customer cohort analysis for retention tracking
Top customer and feature usage rankings
Sub-second query response for all dashboard components

saas-summary-architecture.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
-- SaaS Analytics Platform: Summary Table Architecture
 
-- =====================================================
-- LEVEL 1: Near-Real-Time Summaries (5-minute lag)
-- =====================================================
 
-- Active usage tracking
CREATE TABLE realtime_usage_summary (
    bucket_time TIMESTAMP NOT NULL,  -- 5-minute buckets
    tenant_id UUID NOT NULL,
    feature_category VARCHAR(50) NOT NULL,
    
    active_users INT NOT NULL,
    api_calls INT NOT NULL,
    error_count INT NOT NULL,
    avg_response_ms DECIMAL(8, 2),
    p99_response_ms DECIMAL(8, 2),
    
    PRIMARY KEY (bucket_time, tenant_id, feature_category)
);
 
-- =====================================================
-- LEVEL 2: Hourly Operational Summaries
-- =====================================================
 
CREATE TABLE hourly_tenant_metrics (
    summary_hour TIMESTAMP NOT NULL,
    tenant_id UUID NOT NULL,
    
    -- Denormalized tenant info
    tenant_name VARCHAR(200) NOT NULL,
    tenant_plan VARCHAR(50) NOT NULL,
    tenant_region VARCHAR(50) NOT NULL,
    
    -- Usage metrics
    active_users INT NOT NULL,
    new_users INT NOT NULL,
    api_calls INT NOT NULL,
    data_processed_mb DECIMAL(12, 2) NOT NULL,
    
    -- Revenue metrics (for billing)
    billable_units INT NOT NULL,
    estimated_charges DECIMAL(10, 2) NOT NULL,
    
    -- Performance metrics
    avg_response_ms DECIMAL(8, 2),
    error_rate DECIMAL(6, 4),
    
    PRIMARY KEY (summary_hour, tenant_id)
);
 
-- =====================================================
-- LEVEL 3: Daily Analytics Summaries
-- =====================================================
 
CREATE TABLE daily_business_metrics (
    summary_date DATE NOT NULL,
    
    -- Overall platform metrics
    total_tenants INT NOT NULL,
    active_tenants INT NOT NULL,  -- At least 1 API call
    new_tenants INT NOT NULL,
    churned_tenants INT NOT NULL,
    
    total_users INT NOT NULL,
    active_users INT NOT NULL,
    new_users INT NOT NULL,
    
    total_revenue DECIMAL(14, 2) NOT NULL,
    mrr DECIMAL(14, 2) NOT NULL,  -- Monthly Recurring Revenue (prorated daily)
    
    -- Engagement metrics
    avg_api_calls_per_tenant DECIMAL(10, 2),
    avg_users_per_tenant DECIMAL(8, 2),
    
    -- Comparisons (pre-computed)
    prev_day_revenue DECIMAL(14, 2),
    prev_week_revenue DECIMAL(14, 2),
    prev_month_revenue DECIMAL(14, 2),
    prev_year_revenue DECIMAL(14, 2),
    
    dod_growth DECIMAL(8, 4),
    wow_growth DECIMAL(8, 4),
    mom_growth DECIMAL(8, 4),
    yoy_growth DECIMAL(8, 4),
    
    PRIMARY KEY (summary_date)
);
 
CREATE TABLE daily_plan_breakdown (
    summary_date DATE NOT NULL,
    plan_name VARCHAR(50) NOT NULL,
    
    tenant_count INT NOT NULL,
    user_count INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    
    -- Plan-specific retention
    churn_count INT NOT NULL,
    expansion_count INT NOT NULL,  -- Upgraded from this plan
    contraction_count INT NOT NULL, -- Downgraded to this plan
    
    PRIMARY KEY (summary_date, plan_name)
);
 
-- =====================================================
-- LEVEL 4: Monthly Executive Summaries
-- =====================================================
 
CREATE TABLE monthly_executive_dashboard (
    summary_month DATE NOT NULL,  -- First of month
    
    -- Revenue
    mrr_start DECIMAL(14, 2) NOT NULL,
    mrr_end DECIMAL(14, 2) NOT NULL,
    new_mrr DECIMAL(12, 2) NOT NULL,
    expansion_mrr DECIMAL(12, 2) NOT NULL,
    contraction_mrr DECIMAL(12, 2) NOT NULL,
    churn_mrr DECIMAL(12, 2) NOT NULL,
    net_mrr_change DECIMAL(12, 2) GENERATED ALWAYS AS 
        (new_mrr + expansion_mrr - contraction_mrr - churn_mrr) STORED,
    
    -- Growth rates
    mrr_growth_rate DECIMAL(8, 4),
    arr DECIMAL(16, 2) NOT NULL,  -- Annual Recurring Revenue
    
    -- Customers
    customers_start INT NOT NULL,
    customers_end INT NOT NULL,
    new_customers INT NOT NULL,
    churned_customers INT NOT NULL,
    churn_rate DECIMAL(6, 4),
    net_retention_rate DECIMAL(8, 4),  -- Dollar retention
    
    -- Engagement
    avg_dau INT,          -- Daily Active Users (average)
    avg_mau INT,          -- Monthly Active Users
    dau_mau_ratio DECIMAL(4, 2),  -- Stickiness
    
    -- Efficiency
    ltv DECIMAL(10, 2),   -- Lifetime Value
    cac DECIMAL(10, 2),   -- Customer Acquisition Cost
    ltv_cac_ratio DECIMAL(6, 2),
    
    PRIMARY KEY (summary_month)
);
 
-- =====================================================
-- COHORT ANALYSIS TABLES
-- =====================================================
 
CREATE TABLE monthly_cohort_metrics (
    cohort_month DATE NOT NULL,   -- When customer signed up
    activity_month DATE NOT NULL, -- Month being measured
    
    cohort_size INT NOT NULL,     -- Original cohort count
    active_customers INT NOT NULL, -- Still active this month
    
    revenue DECIMAL(14, 2) NOT NULL,
    cumulative_revenue DECIMAL(16, 2) NOT NULL,
    
    retention_rate DECIMAL(6, 4) GENERATED ALWAYS AS 
        (active_customers::DECIMAL / NULLIF(cohort_size, 0)) STORED,
    ltv_to_date DECIMAL(10, 2) GENERATED ALWAYS AS 
        (cumulative_revenue / NULLIF(cohort_size, 0)) STORED,
    
    PRIMARY KEY (cohort_month, activity_month)
);

Query Performance Result

This architecture serves a dashboard with 20+ widgets in under 100ms total. Each widget queries a single summary table with simple WHERE clauses—no joins, no aggregations at query time. The executive monthly summary queries return in <5ms, enabling real-time interaction even for complex SaaS metrics.

Summary and Best Practices

Summary tables are the culmination of denormalization techniques, purpose-built for analytics performance. When designed correctly, they transform complex analytical queries from seconds or minutes to milliseconds.

Key Takeaways

•Summary tables are optimized for reads, not writes — They pre-compute aggregates, trends, and rankings so queries become simple SELECTs.
•Design starts with query requirements — Identify every chart, metric, and filter needed; design tables to answer those specific questions.
•Apply dimensional modeling concepts — Understand facts (measures), dimensions (grouping keys), and grain (level of detail).
•Use multiple grain levels — Maintain fine-grained summaries for drill-down, coarse-grained for executive views, with roll-up between levels.
•Choose appropriate population strategies — Full rebuild for simplicity and accuracy; incremental for near-real-time; hybrid for best of both.
•Pre-compute expensive operations — Rankings, rolling averages, period comparisons, and complex derived metrics should be computed at population time, not query time.

Module Complete:

You have now mastered all five core denormalization techniques:

Adding derived columns — Store computed values to avoid recalculation
Pre-computed aggregates — Store summary statistics for analytical queries
Duplicating foreign key data — Eliminate joins by copying related attributes
Merging tables — Combine normalized structures for maximum efficiency
Summary tables — Purpose-built denormalized structures for analytics

With these techniques, you can strategically introduce redundancy to optimize database performance while understanding and managing the trade-offs involved.

Module Complete

Congratulations! You've completed the Denormalization Techniques module. You now have the knowledge to apply strategic denormalization across all common scenarios—from simple derived columns to comprehensive summary table architectures. Remember: denormalization is a tool, not a goal. Use it where the performance benefits justify the maintenance complexity.

5 / 5

Loading learning content...

Database Management SystemsDenormalization

Denormalization Techniques

LevelIntermediate

Duration75 mins

TopicDenormalization

5 / 5

Summary Tables

The Analytics Challenge

Consider a retail analytics platform serving executives with these requirements:

Sales dashboard: Real-time totals by region, category, and time period
Trend analysis: Week-over-week, month-over-month comparisons
Top performers: Best-selling products, highest-revenue stores, top customers
Drill-down capability: From yearly totals down to daily details

With a normalized transactional database containing billions of records, each dashboard refresh could require:

Joining orders, items, products, categories, customers, and locations
Aggregating across millions of rows
Computing complex metrics (year-over-year growth, moving averages, rankings)

No amount of indexing makes this performant. The solution is summary tables—purpose-built denormalized structures specifically designed to answer analytical questions efficiently.

What You Will Learn

Understanding Summary Tables

Key characteristics:

Aspect	Operational Tables	Summary Tables
Purpose	Record transactions, maintain state	Serve analytics, reporting
Optimization	Insert/update performance	Query performance
Normalization	Typically normalized (3NF/BCNF)	Heavily denormalized
Data freshness	Real-time	May be delayed (minutes to days)
Row size	Narrow, compact	Wide, comprehensive
Query patterns	Transaction lookup, single-entity updates	Aggregations, trends, comparisons

Anatomy of a summary table:

summary-table-anatomy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
-- Anatomy of a well-designed summary table
-- Example: Daily Sales Summary for retail analytics
 
CREATE TABLE daily_sales_summary (
    -- =========================================
    -- DIMENSION KEYS (What we're measuring by)
    -- =========================================
    summary_date DATE NOT NULL,
    store_id INT NOT NULL,
    product_category_id INT NOT NULL,
    customer_segment VARCHAR(20) NOT NULL,
    
    -- =========================================
    -- DENORMALIZED DIMENSION ATTRIBUTES
    -- (Copied from dimension tables to avoid joins)
    -- =========================================
    store_name VARCHAR(100) NOT NULL,
    store_region VARCHAR(50) NOT NULL,
    store_city VARCHAR(50) NOT NULL,
    category_name VARCHAR(100) NOT NULL,
    category_group VARCHAR(50) NOT NULL,
    
    -- =========================================
    -- AGGREGATE MEASURES (Pre-computed metrics)
    -- =========================================
    total_revenue DECIMAL(14, 2) NOT NULL DEFAULT 0,
    total_quantity INT NOT NULL DEFAULT 0,
    order_count INT NOT NULL DEFAULT 0,
    unique_customer_count INT NOT NULL DEFAULT 0,
    
    -- For average calculation (store components, not just result)
    avg_order_value DECIMAL(10, 2) GENERATED ALWAYS AS 
        (CASE WHEN order_count > 0 THEN total_revenue / order_count ELSE 0 END) STORED,
    
    -- =========================================
    -- DERIVED CALCULATIONS
    -- =========================================
    revenue_per_customer DECIMAL(10, 2) GENERATED ALWAYS AS 
        (CASE WHEN unique_customer_count > 0 
              THEN total_revenue / unique_customer_count ELSE 0 END) STORED,
    units_per_order DECIMAL(8, 2) GENERATED ALWAYS AS 
        (CASE WHEN order_count > 0 THEN total_quantity::DECIMAL / order_count ELSE 0 END) STORED,
    
    -- =========================================
    -- COMPARISON METRICS (Pre-computed for trends)
    -- =========================================
    prev_day_revenue DECIMAL(14, 2),  -- Yesterday's revenue (same dimension combo)
    prev_week_revenue DECIMAL(14, 2), -- Same day last week
    prev_year_revenue DECIMAL(14, 2), -- Same day last year
    
    day_over_day_change DECIMAL(8, 4) GENERATED ALWAYS AS 
        (CASE WHEN prev_day_revenue > 0 
              THEN (total_revenue - prev_day_revenue) / prev_day_revenue ELSE NULL END) STORED,
    year_over_year_change DECIMAL(8, 4) GENERATED ALWAYS AS 
        (CASE WHEN prev_year_revenue > 0 
              THEN (total_revenue - prev_year_revenue) / prev_year_revenue ELSE NULL END) STORED,
    
    -- =========================================
    -- METADATA
    -- =========================================
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    source_record_count INT,  -- How many source rows contributed
    
    -- Composite primary key on all dimension keys
    PRIMARY KEY (summary_date, store_id, product_category_id, customer_segment)
);
 
-- Indexes for common query patterns
CREATE INDEX idx_summary_date_desc ON daily_sales_summary(summary_date DESC);
CREATE INDEX idx_summary_store_date ON daily_sales_summary(store_id, summary_date DESC);
CREATE INDEX idx_summary_category_date ON daily_sales_summary(product_category_id, summary_date DESC);
CREATE INDEX idx_summary_region_date ON daily_sales_summary(store_region, summary_date DESC);

Design for Queries

Dimensional Modeling Concepts

Summary tables draw heavily from dimensional modeling, a design technique developed for data warehousing. Understanding its core concepts helps design effective summary tables.

Key concepts:

Facts are the quantitative measurements you want to analyze:

Revenue, quantity, cost, duration, count
Typically numeric and additive (can be summed across dimensions)

Dimensions are the perspectives by which you analyze facts:

Time, location, product, customer, channel
Provide context and filtering/grouping criteria

Grain is the level of detail in each row:

Daily by store by category (medium grain)
Hourly by product by customer (fine grain)
Monthly by region (coarse grain)

dimensional-modeling-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- Example: Different grain levels for the same subject area
 
-- FINE GRAIN: Hourly sales by product
-- High detail, many rows, supports any drill-down
CREATE TABLE hourly_product_sales (
    summary_hour TIMESTAMP NOT NULL,
    product_id INT NOT NULL,
    store_id INT NOT NULL,
    
    quantity_sold INT NOT NULL,
    revenue DECIMAL(10, 2) NOT NULL,
    cost DECIMAL(10, 2) NOT NULL,
    transaction_count INT NOT NULL,
    
    -- Denormalized attributes
    product_name VARCHAR(200) NOT NULL,
    product_category VARCHAR(100) NOT NULL,
    store_name VARCHAR(100) NOT NULL,
    store_region VARCHAR(50) NOT NULL,
    
    PRIMARY KEY (summary_hour, product_id, store_id)
);
-- Estimated rows: 24 hours × 365 days × 10,000 products × 500 stores = ~43B rows/year
-- Use case: Detailed drill-down, anomaly detection
 
-- MEDIUM GRAIN: Daily sales by category by store
-- Balanced detail and performance
CREATE TABLE daily_category_sales (
    summary_date DATE NOT NULL,
    category_id INT NOT NULL,
    store_id INT NOT NULL,
    
    quantity_sold INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    order_count INT NOT NULL,
    unique_products_sold INT NOT NULL,
    
    -- Denormalized
    category_name VARCHAR(100) NOT NULL,
    store_name VARCHAR(100) NOT NULL,
    store_region VARCHAR(50) NOT NULL,
    
    PRIMARY KEY (summary_date, category_id, store_id)
);
-- Estimated rows: 365 days × 100 categories × 500 stores = ~18M rows/year
-- Use case: Daily dashboards, store performance
 
-- COARSE GRAIN: Monthly sales by region
-- Executive-level summaries
CREATE TABLE monthly_regional_sales (
    summary_month DATE NOT NULL,  -- First of month
    region VARCHAR(50) NOT NULL,
    
    total_revenue DECIMAL(16, 2) NOT NULL,
    total_orders INT NOT NULL,
    total_customers INT NOT NULL,
    store_count INT NOT NULL,
    
    avg_order_value DECIMAL(10, 2),
    revenue_per_store DECIMAL(14, 2),
    
    -- Trend comparisons
    prev_month_revenue DECIMAL(16, 2),
    prev_year_month_revenue DECIMAL(16, 2),
    
    PRIMARY KEY (summary_month, region)
);
-- Estimated rows: 12 months × 10 regions × years = ~120 rows/year
-- Use case: Executive dashboards, investor reports

Star Schema Pattern:

                   ┌─────────────┐
                   │ dim_product │
                   └──────┬──────┘
                          │
┌─────────────┐    ┌──────┴──────┐    ┌─────────────┐
│  dim_store  │────│ fact_sales  │────│  dim_time   │
└─────────────┘    └──────┬──────┘    └─────────────┘
                          │
                   ┌──────┴──────┐
                   │dim_customer │
                   └─────────────┘

Summary tables flatten this structure by absorbing dimension attributes into the fact table, trading normalization for query speed.

Choose Grain Carefully

Summary Table Design Patterns

Different analytical needs require different summary table designs. Here are common patterns:

Pattern 1: Time-Series Summary

Optimized for trend analysis, period comparisons, and time-based filtering:

time-series-summary-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Pattern 1: Time-Series Summary Table
-- Optimized for: trends, comparisons, time-range queries
 
CREATE TABLE time_series_revenue (
    -- Time dimension at day grain
    summary_date DATE PRIMARY KEY,
    
    -- Core metrics
    total_revenue DECIMAL(16, 2) NOT NULL,
    total_orders INT NOT NULL,
    total_customers INT NOT NULL,
    
    -- Rolling aggregates (pre-computed for performance)
    revenue_7d_avg DECIMAL(14, 2),    -- 7-day rolling average
    revenue_30d_avg DECIMAL(14, 2),   -- 30-day rolling average
    revenue_90d_sum DECIMAL(16, 2),   -- 90-day rolling sum
    
    -- Period comparisons
    revenue_prev_day DECIMAL(16, 2),
    revenue_prev_week DECIMAL(16, 2),  -- Same weekday last week
    revenue_prev_month DECIMAL(16, 2), -- Same day last month
    revenue_prev_year DECIMAL(16, 2),  -- Same day last year
    
    -- Pre-computed growth rates
    dod_growth_rate DECIMAL(8, 4),    -- Day-over-day
    wow_growth_rate DECIMAL(8, 4),    -- Week-over-week (same weekday)
    mom_growth_rate DECIMAL(8, 4),    -- Month-over-month (same day)
    yoy_growth_rate DECIMAL(8, 4),    -- Year-over-year
    
    -- Day-of-week patterns
    day_of_week INT,                  -- 0=Sunday, 6=Saturday
    is_weekend BOOLEAN,
    is_holiday BOOLEAN,
    
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Query example: Get trend with growth rates for last 30 days
SELECT summary_date, total_revenue, 
       revenue_7d_avg, revenue_30d_avg,
       yoy_growth_rate, wow_growth_rate
FROM time_series_revenue
WHERE summary_date > CURRENT_DATE - INTERVAL '30 days'
ORDER BY summary_date;  -- Uses PK index directly

Pattern 2: Ranking/Leaderboard Summary

Optimized for top-N queries, rankings, and competitive analysis:

ranking-summary-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Pattern 2: Ranking Summary Table
-- Optimized for: top products, best performers, leaderboards
 
CREATE TABLE daily_product_rankings (
    summary_date DATE NOT NULL,
    product_id INT NOT NULL,
    
    -- Denormalized product info
    product_name VARCHAR(200) NOT NULL,
    product_category VARCHAR(100) NOT NULL,
    product_brand VARCHAR(100) NOT NULL,
    
    -- Performance metrics
    units_sold INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    order_count INT NOT NULL,
    
    -- Pre-computed rankings (saves expensive window functions)
    units_rank_overall INT,          -- Rank by units across all products
    revenue_rank_overall INT,        -- Rank by revenue across all products
    units_rank_in_category INT,      -- Rank within category
    revenue_rank_in_category INT,    -- Rank within category
    
    -- Percentile positions
    revenue_percentile DECIMAL(5, 2), -- What percentile this product is in
    
    -- Change metrics
    prev_day_revenue DECIMAL(12, 2),
    prev_week_revenue DECIMAL(12, 2),
    rank_change_from_prev_day INT,   -- Positive = moved up
    
    PRIMARY KEY (summary_date, product_id)
);
 
-- Supporting indexes for common queries
CREATE INDEX idx_rankings_date_revenue ON daily_product_rankings(summary_date, revenue DESC);
CREATE INDEX idx_rankings_date_category ON daily_product_rankings(summary_date, product_category, revenue_rank_in_category);
 
-- Query example: Top 10 products by revenue today
SELECT product_name, product_category, revenue, 
       revenue_rank_overall, rank_change_from_prev_day
FROM daily_product_rankings
WHERE summary_date = CURRENT_DATE
  AND revenue_rank_overall <= 10
ORDER BY revenue_rank_overall;  -- No window functions needed!

Pattern 3: Cohort Analysis Summary

Optimized for analyzing groups of entities over time:

cohort-summary-pattern.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- Pattern 3: Cohort Analysis Summary Table
-- Optimized for: retention analysis, customer lifetime value, behavior patterns
 
CREATE TABLE customer_cohort_summary (
    -- Cohort definition
    cohort_month DATE NOT NULL,           -- Month of first purchase
    activity_month DATE NOT NULL,         -- Month being measured
    
    -- Metrics for this cohort in this activity month
    active_customers INT NOT NULL,        -- Customers from cohort who were active
    total_cohort_size INT NOT NULL,       -- Original cohort size (for retention calc)
    
    total_revenue DECIMAL(14, 2) NOT NULL,
    total_orders INT NOT NULL,
    
    -- Pre-computed retention and LTV metrics
    retention_rate DECIMAL(6, 4) GENERATED ALWAYS AS 
        (active_customers::DECIMAL / NULLIF(total_cohort_size, 0)) STORED,
    revenue_per_customer DECIMAL(10, 2) GENERATED ALWAYS AS 
        (total_revenue / NULLIF(active_customers, 0)) STORED,
    orders_per_customer DECIMAL(8, 2) GENERATED ALWAYS AS 
        (total_orders::DECIMAL / NULLIF(active_customers, 0)) STORED,
    
    -- Cumulative metrics (for LTV calculation)
    cumulative_revenue DECIMAL(16, 2),    -- Total revenue from cohort since start
    cumulative_orders INT,                 -- Total orders from cohort since start
    
    months_since_cohort INT GENERATED ALWAYS AS 
        (EXTRACT(YEAR FROM age(activity_month, cohort_month)) * 12 +
         EXTRACT(MONTH FROM age(activity_month, cohort_month))) STORED,
    
    PRIMARY KEY (cohort_month, activity_month)
);
 
-- Query example: Retention curve for 2024 cohorts
SELECT cohort_month, months_since_cohort, 
       retention_rate, cumulative_revenue
FROM customer_cohort_summary
WHERE cohort_month >= '2024-01-01'
  AND months_since_cohort <= 12
ORDER BY cohort_month, months_since_cohort;
 
-- Query example: Compare cohort LTV at 6 months
SELECT cohort_month, 
       retention_rate AS m6_retention,
       cumulative_revenue / total_cohort_size AS ltv_per_customer
FROM customer_cohort_summary
WHERE months_since_cohort = 6
ORDER BY cohort_month;

Pre-compute Expensive Operations

Populating Summary Tables

Summary tables must be populated from source transactional data. The population strategy depends on data volume, freshness requirements, and computational complexity.

Strategy 1: Scheduled Full Rebuild

Simplest approach—completely recalculate the summary from source data:

full-rebuild-strategy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
-- Strategy 1: Full Rebuild
-- Best for: Small to medium data volumes, simple queries, daily refresh OK
 
CREATE OR REPLACE PROCEDURE rebuild_daily_sales_summary(p_date DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    -- Delete existing data for this date
    DELETE FROM daily_sales_summary WHERE summary_date = p_date;
    
    -- Rebuild from source
    INSERT INTO daily_sales_summary (
        summary_date, store_id, product_category_id, customer_segment,
        store_name, store_region, store_city,
        category_name, category_group,
        total_revenue, total_quantity, order_count, unique_customer_count,
        source_record_count
    )
    SELECT 
        DATE(o.order_date) AS summary_date,
        s.store_id,
        p.category_id,
        c.segment AS customer_segment,
        
        -- Denormalized dimension attributes
        s.name AS store_name,
        s.region AS store_region,
        s.city AS store_city,
        cat.name AS category_name,
        cat.group_name AS category_group,
        
        -- Aggregate measures
        SUM(oi.quantity * oi.unit_price) AS total_revenue,
        SUM(oi.quantity) AS total_quantity,
        COUNT(DISTINCT o.order_id) AS order_count,
        COUNT(DISTINCT o.customer_id) AS unique_customer_count,
        COUNT(*) AS source_record_count
        
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    JOIN stores s ON o.store_id = s.store_id
    JOIN products p ON oi.product_id = p.product_id
    JOIN categories cat ON p.category_id = cat.category_id
    JOIN customers c ON o.customer_id = c.customer_id
    WHERE DATE(o.order_date) = p_date
    GROUP BY 
        DATE(o.order_date), s.store_id, p.category_id, c.segment,
        s.name, s.region, s.city, cat.name, cat.group_name;
    
    -- Update comparison columns (requires previous data to exist)
    UPDATE daily_sales_summary curr
    SET 
        prev_day_revenue = prev.total_revenue,
        prev_week_revenue = week_ago.total_revenue,
        prev_year_revenue = year_ago.total_revenue
    FROM 
        daily_sales_summary prev,
        daily_sales_summary week_ago,
        daily_sales_summary year_ago
    WHERE curr.summary_date = p_date
      AND prev.summary_date = p_date - INTERVAL '1 day'
      AND prev.store_id = curr.store_id
      AND prev.product_category_id = curr.product_category_id
      AND prev.customer_segment = curr.customer_segment
      -- Similar joins for week_ago and year_ago...
    ;
    
    COMMIT;
END;
$$;
 
-- Schedule nightly at 2 AM
SELECT cron.schedule('rebuild-daily-summary', '0 2 * * *',
    $$CALL rebuild_daily_sales_summary(CURRENT_DATE - INTERVAL '1 day')$$);

Strategy 2: Incremental Update

More efficient for large volumes—only process changed or new data:

incremental-update-strategy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
-- Strategy 2: Incremental Update
-- Best for: Large data volumes, near-real-time requirements
 
-- Track what's been processed
CREATE TABLE summary_watermark (
    table_name VARCHAR(100) PRIMARY KEY,
    last_processed_id BIGINT,
    last_processed_time TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
CREATE OR REPLACE PROCEDURE incremental_update_hourly_sales()
LANGUAGE plpgsql
AS $$
DECLARE
    v_last_id BIGINT;
    v_new_max_id BIGINT;
BEGIN
    -- Get watermark
    SELECT last_processed_id INTO v_last_id
    FROM summary_watermark
    WHERE table_name = 'hourly_sales_summary';
    
    IF v_last_id IS NULL THEN v_last_id := 0; END IF;
    
    -- Find new maximum
    SELECT MAX(order_item_id) INTO v_new_max_id FROM order_items;
    
    -- Skip if no new data
    IF v_new_max_id <= v_last_id THEN
        RETURN;
    END IF;
    
    -- Process new records (upsert pattern)
    INSERT INTO hourly_sales_summary (
        summary_hour, store_id, category_id,
        store_name, category_name,
        total_revenue, total_quantity, order_count
    )
    SELECT 
        DATE_TRUNC('hour', o.order_date) AS summary_hour,
        o.store_id,
        p.category_id,
        s.name AS store_name,
        cat.name AS category_name,
        SUM(oi.quantity * oi.unit_price) AS total_revenue,
        SUM(oi.quantity) AS total_quantity,
        COUNT(DISTINCT o.order_id) AS order_count
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    JOIN stores s ON o.store_id = s.store_id
    JOIN products p ON oi.product_id = p.product_id
    JOIN categories cat ON p.category_id = cat.category_id
    WHERE oi.order_item_id > v_last_id
      AND oi.order_item_id <= v_new_max_id
    GROUP BY 
        DATE_TRUNC('hour', o.order_date), o.store_id, p.category_id,
        s.name, cat.name
    ON CONFLICT (summary_hour, store_id, category_id)
    DO UPDATE SET
        total_revenue = hourly_sales_summary.total_revenue + EXCLUDED.total_revenue,
        total_quantity = hourly_sales_summary.total_quantity + EXCLUDED.total_quantity,
        order_count = hourly_sales_summary.order_count + EXCLUDED.order_count,
        updated_at = CURRENT_TIMESTAMP;
    
    -- Update watermark
    UPDATE summary_watermark
    SET last_processed_id = v_new_max_id,
        last_processed_time = CURRENT_TIMESTAMP,
        updated_at = CURRENT_TIMESTAMP
    WHERE table_name = 'hourly_sales_summary';
    
    IF NOT FOUND THEN
        INSERT INTO summary_watermark (table_name, last_processed_id, last_processed_time)
        VALUES ('hourly_sales_summary', v_new_max_id, CURRENT_TIMESTAMP);
    END IF;
    
    COMMIT;
END;
$$;
 
-- Run every 5 minutes
SELECT cron.schedule('incremental-hourly-sales', '*/5 * * * *',
    $$CALL incremental_update_hourly_sales()$$);

Population Strategy Comparison
Aspect	Full Rebuild	Incremental Update
Complexity	Simple, straightforward SQL	Complex, requires watermarking and upsert logic
Performance	May be slow for large datasets	Fast, processes only new data
Correctness	Always accurate (recalculates everything)	May drift if logic has bugs; requires periodic full rebuild
Resource usage	Heavy during rebuild window	Spread over time, lighter per execution
Freshness	Stale until next rebuild	Near-real-time (depends on frequency)
Handling deletes/updates	Automatically included	Requires additional logic for corrections

Hybrid Approach

Roll-up Hierarchies

Example hierarchy:

Source: order_items (billions of rows)
           ↓ aggregate ↓
Level 1: hourly_sales_summary (millions of rows)
           ↓ roll up ↓
Level 2: daily_sales_summary (hundreds of thousands)
           ↓ roll up ↓
Level 3: monthly_sales_summary (thousands)
           ↓ roll up ↓
Level 4: yearly_sales_summary (tens of rows)

rollup-hierarchy-implementation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
-- Roll-up Hierarchy Implementation
 
-- Level 1: Hourly summary (built from source)
CREATE TABLE hourly_sales (
    summary_hour TIMESTAMP NOT NULL,
    store_id INT NOT NULL,
    category_id INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    quantity INT NOT NULL,
    orders INT NOT NULL,
    PRIMARY KEY (summary_hour, store_id, category_id)
);
 
-- Level 2: Daily summary (rolled up from hourly)
CREATE TABLE daily_sales (
    summary_date DATE NOT NULL,
    store_id INT NOT NULL,
    category_id INT NOT NULL,
    revenue DECIMAL(14, 2) NOT NULL,
    quantity INT NOT NULL,
    orders INT NOT NULL,
    -- Additional daily-level metrics
    peak_hour INT,  -- Hour with highest revenue
    PRIMARY KEY (summary_date, store_id, category_id)
);
 
-- Level 3: Monthly summary (rolled up from daily)
CREATE TABLE monthly_sales (
    summary_month DATE NOT NULL,  -- First of month
    store_id INT NOT NULL,
    category_id INT NOT NULL,
    revenue DECIMAL(16, 2) NOT NULL,
    quantity INT NOT NULL,
    orders INT NOT NULL,
    -- Additional monthly-level metrics
    operating_days INT,  -- Days with >0 revenue
    avg_daily_revenue DECIMAL(12, 2),
    PRIMARY KEY (summary_month, store_id, category_id)
);
 
-- Roll-up from hourly to daily
CREATE OR REPLACE PROCEDURE rollup_hourly_to_daily(p_date DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO daily_sales (
        summary_date, store_id, category_id,
        revenue, quantity, orders, peak_hour
    )
    SELECT 
        p_date,
        store_id,
        category_id,
        SUM(revenue),
        SUM(quantity),
        SUM(orders),
        -- Peak hour requires window function on the aggregated data
        (SELECT EXTRACT(HOUR FROM summary_hour)
         FROM hourly_sales h2
         WHERE h2.store_id = hourly_sales.store_id
           AND h2.category_id = hourly_sales.category_id
           AND DATE(h2.summary_hour) = p_date
         ORDER BY h2.revenue DESC
         LIMIT 1) AS peak_hour
    FROM hourly_sales
    WHERE DATE(summary_hour) = p_date
    GROUP BY store_id, category_id
    ON CONFLICT (summary_date, store_id, category_id)
    DO UPDATE SET
        revenue = EXCLUDED.revenue,
        quantity = EXCLUDED.quantity,
        orders = EXCLUDED.orders,
        peak_hour = EXCLUDED.peak_hour;
END;
$$;
 
-- Roll-up from daily to monthly
CREATE OR REPLACE PROCEDURE rollup_daily_to_monthly(p_month DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO monthly_sales (
        summary_month, store_id, category_id,
        revenue, quantity, orders,
        operating_days, avg_daily_revenue
    )
    SELECT 
        DATE_TRUNC('month', p_month),
        store_id,
        category_id,
        SUM(revenue),
        SUM(quantity),
        SUM(orders),
        COUNT(*) AS operating_days,  -- Days with data
        AVG(revenue) AS avg_daily_revenue
    FROM daily_sales
    WHERE summary_date >= DATE_TRUNC('month', p_month)
      AND summary_date < DATE_TRUNC('month', p_month) + INTERVAL '1 month'
    GROUP BY store_id, category_id
    ON CONFLICT (summary_month, store_id, category_id)
    DO UPDATE SET
        revenue = EXCLUDED.revenue,
        quantity = EXCLUDED.quantity,
        orders = EXCLUDED.orders,
        operating_days = EXCLUDED.operating_days,
        avg_daily_revenue = EXCLUDED.avg_daily_revenue;
END;
$$;
 
-- Schedule the cascade: hourly → daily → monthly
-- Hourly: Runs every hour to aggregate completed hour
SELECT cron.schedule('hourly-to-daily', '10 * * * *',  -- 10 min past each hour
    $$CALL rollup_hourly_to_daily(CURRENT_DATE)$$);
    
-- Daily: Runs at 3 AM to finalize previous day
SELECT cron.schedule('daily-to-monthly', '0 3 * * *',
    $$CALL rollup_daily_to_monthly(CURRENT_DATE)$$);

Aggregate Only Compatible Metrics

Real-World Architecture Example

Let's examine a complete summary table architecture for a SaaS analytics platform that serves executive dashboards, operational reports, and customer-facing analytics.

Requirements:

Real-time dashboard with revenue, user activity metrics
Trend analysis with week-over-week, month-over-month comparisons
Customer cohort analysis for retention tracking
Top customer and feature usage rankings
Sub-second query response for all dashboard components

saas-summary-architecture.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
-- SaaS Analytics Platform: Summary Table Architecture
 
-- =====================================================
-- LEVEL 1: Near-Real-Time Summaries (5-minute lag)
-- =====================================================
 
-- Active usage tracking
CREATE TABLE realtime_usage_summary (
    bucket_time TIMESTAMP NOT NULL,  -- 5-minute buckets
    tenant_id UUID NOT NULL,
    feature_category VARCHAR(50) NOT NULL,
    
    active_users INT NOT NULL,
    api_calls INT NOT NULL,
    error_count INT NOT NULL,
    avg_response_ms DECIMAL(8, 2),
    p99_response_ms DECIMAL(8, 2),
    
    PRIMARY KEY (bucket_time, tenant_id, feature_category)
);
 
-- =====================================================
-- LEVEL 2: Hourly Operational Summaries
-- =====================================================
 
CREATE TABLE hourly_tenant_metrics (
    summary_hour TIMESTAMP NOT NULL,
    tenant_id UUID NOT NULL,
    
    -- Denormalized tenant info
    tenant_name VARCHAR(200) NOT NULL,
    tenant_plan VARCHAR(50) NOT NULL,
    tenant_region VARCHAR(50) NOT NULL,
    
    -- Usage metrics
    active_users INT NOT NULL,
    new_users INT NOT NULL,
    api_calls INT NOT NULL,
    data_processed_mb DECIMAL(12, 2) NOT NULL,
    
    -- Revenue metrics (for billing)
    billable_units INT NOT NULL,
    estimated_charges DECIMAL(10, 2) NOT NULL,
    
    -- Performance metrics
    avg_response_ms DECIMAL(8, 2),
    error_rate DECIMAL(6, 4),
    
    PRIMARY KEY (summary_hour, tenant_id)
);
 
-- =====================================================
-- LEVEL 3: Daily Analytics Summaries
-- =====================================================
 
CREATE TABLE daily_business_metrics (
    summary_date DATE NOT NULL,
    
    -- Overall platform metrics
    total_tenants INT NOT NULL,
    active_tenants INT NOT NULL,  -- At least 1 API call
    new_tenants INT NOT NULL,
    churned_tenants INT NOT NULL,
    
    total_users INT NOT NULL,
    active_users INT NOT NULL,
    new_users INT NOT NULL,
    
    total_revenue DECIMAL(14, 2) NOT NULL,
    mrr DECIMAL(14, 2) NOT NULL,  -- Monthly Recurring Revenue (prorated daily)
    
    -- Engagement metrics
    avg_api_calls_per_tenant DECIMAL(10, 2),
    avg_users_per_tenant DECIMAL(8, 2),
    
    -- Comparisons (pre-computed)
    prev_day_revenue DECIMAL(14, 2),
    prev_week_revenue DECIMAL(14, 2),
    prev_month_revenue DECIMAL(14, 2),
    prev_year_revenue DECIMAL(14, 2),
    
    dod_growth DECIMAL(8, 4),
    wow_growth DECIMAL(8, 4),
    mom_growth DECIMAL(8, 4),
    yoy_growth DECIMAL(8, 4),
    
    PRIMARY KEY (summary_date)
);
 
CREATE TABLE daily_plan_breakdown (
    summary_date DATE NOT NULL,
    plan_name VARCHAR(50) NOT NULL,
    
    tenant_count INT NOT NULL,
    user_count INT NOT NULL,
    revenue DECIMAL(12, 2) NOT NULL,
    
    -- Plan-specific retention
    churn_count INT NOT NULL,
    expansion_count INT NOT NULL,  -- Upgraded from this plan
    contraction_count INT NOT NULL, -- Downgraded to this plan
    
    PRIMARY KEY (summary_date, plan_name)
);
 
-- =====================================================
-- LEVEL 4: Monthly Executive Summaries
-- =====================================================
 
CREATE TABLE monthly_executive_dashboard (
    summary_month DATE NOT NULL,  -- First of month
    
    -- Revenue
    mrr_start DECIMAL(14, 2) NOT NULL,
    mrr_end DECIMAL(14, 2) NOT NULL,
    new_mrr DECIMAL(12, 2) NOT NULL,
    expansion_mrr DECIMAL(12, 2) NOT NULL,
    contraction_mrr DECIMAL(12, 2) NOT NULL,
    churn_mrr DECIMAL(12, 2) NOT NULL,
    net_mrr_change DECIMAL(12, 2) GENERATED ALWAYS AS 
        (new_mrr + expansion_mrr - contraction_mrr - churn_mrr) STORED,
    
    -- Growth rates
    mrr_growth_rate DECIMAL(8, 4),
    arr DECIMAL(16, 2) NOT NULL,  -- Annual Recurring Revenue
    
    -- Customers
    customers_start INT NOT NULL,
    customers_end INT NOT NULL,
    new_customers INT NOT NULL,
    churned_customers INT NOT NULL,
    churn_rate DECIMAL(6, 4),
    net_retention_rate DECIMAL(8, 4),  -- Dollar retention
    
    -- Engagement
    avg_dau INT,          -- Daily Active Users (average)
    avg_mau INT,          -- Monthly Active Users
    dau_mau_ratio DECIMAL(4, 2),  -- Stickiness
    
    -- Efficiency
    ltv DECIMAL(10, 2),   -- Lifetime Value
    cac DECIMAL(10, 2),   -- Customer Acquisition Cost
    ltv_cac_ratio DECIMAL(6, 2),
    
    PRIMARY KEY (summary_month)
);
 
-- =====================================================
-- COHORT ANALYSIS TABLES
-- =====================================================
 
CREATE TABLE monthly_cohort_metrics (
    cohort_month DATE NOT NULL,   -- When customer signed up
    activity_month DATE NOT NULL, -- Month being measured
    
    cohort_size INT NOT NULL,     -- Original cohort count
    active_customers INT NOT NULL, -- Still active this month
    
    revenue DECIMAL(14, 2) NOT NULL,
    cumulative_revenue DECIMAL(16, 2) NOT NULL,
    
    retention_rate DECIMAL(6, 4) GENERATED ALWAYS AS 
        (active_customers::DECIMAL / NULLIF(cohort_size, 0)) STORED,
    ltv_to_date DECIMAL(10, 2) GENERATED ALWAYS AS 
        (cumulative_revenue / NULLIF(cohort_size, 0)) STORED,
    
    PRIMARY KEY (cohort_month, activity_month)
);

Query Performance Result

Summary and Best Practices

Key Takeaways

•Summary tables are optimized for reads, not writes — They pre-compute aggregates, trends, and rankings so queries become simple SELECTs.
•Design starts with query requirements — Identify every chart, metric, and filter needed; design tables to answer those specific questions.
•Apply dimensional modeling concepts — Understand facts (measures), dimensions (grouping keys), and grain (level of detail).
•Use multiple grain levels — Maintain fine-grained summaries for drill-down, coarse-grained for executive views, with roll-up between levels.
•Choose appropriate population strategies — Full rebuild for simplicity and accuracy; incremental for near-real-time; hybrid for best of both.
•Pre-compute expensive operations — Rankings, rolling averages, period comparisons, and complex derived metrics should be computed at population time, not query time.

Module Complete:

You have now mastered all five core denormalization techniques:

Adding derived columns — Store computed values to avoid recalculation
Pre-computed aggregates — Store summary statistics for analytical queries
Duplicating foreign key data — Eliminate joins by copying related attributes
Merging tables — Combine normalized structures for maximum efficiency
Summary tables — Purpose-built denormalized structures for analytics

With these techniques, you can strategically introduce redundancy to optimize database performance while understanding and managing the trade-offs involved.

Module Complete

5 / 5