Oltp Vs Olap Considerations - Learning Module

Loading content...

0/241

Dimensional Modeling

The Art and Science of Analytical Design

Dimensional modeling is more than a schema pattern—it's a complete methodology for translating business analytic requirements into physical database structures. Pioneered by Ralph Kimball and popularized in The Data Warehouse Toolkit, dimensional modeling has become the dominant approach for designing data warehouses and analytical systems.

Where the star schema defines the structure, dimensional modeling provides the process: how to interview business users, identify measurements and context, design for historical tracking, and build scalable architectures that integrate across the enterprise. Mastering this methodology transforms you from someone who understands data structures to someone who can design complete analytical solutions.

What You Will Master

By the end of this page, you will understand the complete dimensional modeling methodology. You'll learn the four-step design process, master slowly changing dimension techniques, and understand bus architecture principles for enterprise-scale integration. This knowledge enables you to lead analytical data warehouse design projects.

Dimensional Modeling Philosophy

Dimensional modeling operates from a fundamentally different philosophy than traditional database design. Understanding this philosophy is essential for proper application.

Business Process Orientation:

Dimensional modeling starts with business processes, not data entities. Traditional ER modeling asks "What entities exist and how do they relate?" Dimensional modeling asks "What business processes do we want to analyze?"

Examples of business processes:

Processing customer orders
Managing inventory
Handling customer support tickets
Executing marketing campaigns
Onboarding new employees

Each business process generates measurable events—these become fact tables.

ER Modeling vs. Dimensional Modeling
Aspect	ER Modeling (OLTP)	Dimensional Modeling (OLAP)
Starting Point	Business entities (Customer, Product, Order)	Business processes (Selling, Shipping, Supporting)
Design Goal	Data integrity in transactions	Query performance and usability
Schema Form	Normalized (3NF, BCNF)	Denormalized (Star Schema)
Primary Users	Application developers	Business analysts, executives
Query Pattern	Many simple queries	Few complex analytical queries
Data Freshness	Real-time current state	Historical snapshots over time
Success Metric	Transaction throughput	Query response time, user adoption

Core Principles:

Dimensional Modeling Principles

•User Understandability — Dimensional models should be intuitive to business users. They should see 'Sales by Region by Quarter' not 'ORDER_LINE_ITEM.FK_CUST_ID JOIN CUSTOMER...'
•Query Performance — The design should enable fast query response. Complex analytical queries should execute in seconds, not minutes.
•Extensibility — New dimensions and measures should be addable without disrupting existing reports. The model should grow gracefully.
•Resilience to Change — Operational system changes shouldn't break the warehouse. Surrogate keys and staging layers provide insulation.
•Self-Documenting — Dimension and fact table names, column names, and values should be meaningful without requiring a data dictionary.

The Kimball vs. Inmon Debate

Bill Inmon advocated normalized enterprise data warehouses (corporate information factory) while Kimball advocated dimensional models. Today, most practitioners use Kimball's dimensional approach for analytical consumption while using Inmon-style normalized staging areas. The 'data lakehouse' pattern combines both philosophies.

The Four-Step Design Process

Kimball's dimensional modeling methodology defines a rigorous four-step process for designing each fact table. Following this process ensures consistent, well-designed dimensional models.

Step 1: Select the Business Process

Identify the operational business process to model. This should be a measurable event that generates data:

✅ Order fulfillment (clear events: orders placed, shipped, delivered)
✅ Help desk ticketing (clear events: tickets created, escalated, resolved)
❌ "Customer management" (too vague—which specific process?)

Good business processes have:

Clear start and end points
Measurable metrics
Organizational ownership
Source system(s) that capture the data

four_step_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- FOUR-STEP DESIGN PROCESS EXAMPLE
-- Business Process: E-Commerce Order Fulfillment
 
-- ============================================
-- STEP 1: SELECT THE BUSINESS PROCESS
-- ============================================
-- Process: Order fulfillment
-- Events: Order placed → Order shipped → Order delivered
-- Sources: Order Management System, Shipping System
-- Owner: Operations Team
 
 
-- ============================================
-- STEP 2: DECLARE THE GRAIN
-- ============================================
-- Grain options:
--   a) One row per order (order header level)
--   b) One row per order line item (most detail)
--   c) One row per shipment
--   d) One row per order lifecycle stage
 
-- Selected Grain: One row per ORDER LINE ITEM
-- Rationale: 
--   - Enables product-level analysis
--   - Can aggregate to order level
--   - Captures individual item metrics
 
 
-- ============================================  
-- STEP 3: IDENTIFY THE DIMENSIONS
-- ============================================
-- Who, What, When, Where, Why, How?
--   - WHEN: Order Date, Ship Date, Delivery Date (date dimension)
--   - WHO: Customer (customer dimension)
--   - WHAT: Product purchased (product dimension)
--   - WHERE: Shipping destination (geography dimension)
--   - WHERE: Fulfillment warehouse (warehouse dimension)
--   - HOW: Shipping method (shipping dimension)
--   - WHY: Promotion applied (promotion dimension)
 
 
-- ============================================
-- STEP 4: IDENTIFY THE FACTS
-- ============================================
-- Measurements at the grain:
--   - Quantity ordered (additive)
--   - Unit price at time of sale (non-additive)
--   - Extended price = qty × unit price (additive)
--   - Discount amount (additive)
--   - Net revenue (additive)
--   - Unit cost (non-additive)
--   - Gross profit (additive)
--   - Shipping cost allocated (additive)
 
 
-- RESULTING FACT TABLE DESIGN:
CREATE TABLE fact_order_line (
    -- Step 2: Grain = one row per order line item
    order_line_key          BIGINT PRIMARY KEY,
    
    -- Step 3: Dimension keys
    order_date_key          BIGINT NOT NULL REFERENCES dim_date(date_key),
    ship_date_key           BIGINT REFERENCES dim_date(date_key),
    delivery_date_key       BIGINT REFERENCES dim_date(date_key),
    customer_key            BIGINT NOT NULL REFERENCES dim_customer(customer_key),
    product_key             BIGINT NOT NULL REFERENCES dim_product(product_key),
    ship_to_geography_key   BIGINT NOT NULL REFERENCES dim_geography(geography_key),
    warehouse_key           BIGINT NOT NULL REFERENCES dim_warehouse(warehouse_key),
    shipping_method_key     BIGINT NOT NULL REFERENCES dim_shipping(shipping_key),
    promotion_key           BIGINT NOT NULL REFERENCES dim_promotion(promotion_key),
    
    -- Degenerate dimension
    order_number            VARCHAR(50) NOT NULL,
    line_number             INT NOT NULL,
    
    -- Step 4: Fact measures
    quantity                INT NOT NULL,
    unit_price              DECIMAL(10,2) NOT NULL,
    extended_price          DECIMAL(12,2) NOT NULL,
    discount_amount         DECIMAL(10,2) DEFAULT 0,
    net_revenue             DECIMAL(12,2) NOT NULL,
    unit_cost               DECIMAL(10,2),
    gross_profit            DECIMAL(12,2),
    shipping_cost           DECIMAL(10,2)
);

Step 2: Declare the Grain

The grain defines what each fact table row represents. This is the most critical decision:

Too coarse: Lose detail (can't answer product-level questions if grain is order-level)
Too fine: Excessive storage and query complexity

Grain Guidelines:

Choose the finest grain the business needs
Ensure all facts are true at the grain level
Ensure all dimension foreign keys make sense at the grain

Step 3: Identify the Dimensions

Dimensions provide context—the who, what, when, where, why, and how. For each dimension:

Name the dimension clearly
Identify source system(s)
Define key attributes
Determine hierarchy levels

Step 4: Identify the Facts

Facts are the numeric measurements at the declared grain. For each fact:

Confirm it's true at the grain level
Classify as additive, semi-additive, or non-additive
Define calculation formula if derived
Identify source system and field

Grain Violations

If a measurement doesn't make sense at the declared grain, either the measurement is wrong or the grain needs reconsideration. Example: 'Order Total' doesn't belong in a line-item grain fact table—it would repeat for every line in the order, breaking additivity.

Slowly Changing Dimensions (SCD)

Dimension attributes change over time. A customer moves to a new city. A product is recategorized. An employee is promoted. Slowly Changing Dimensions (SCD) define how to handle these changes while preserving analytical accuracy.

The Challenge:

When a customer moves from New York to Chicago:

Should historical orders show New York (where customer lived when ordering)?
Or should they show Chicago (current address)?

The answer depends on the business question being asked. SCD techniques provide options.

SCD Types Overview
Type	Behavior	History Preserved	Use Case
Type 0	Retain original value forever	Original only	Fixed attributes (birth date, original signup date)
Type 1	Overwrite with new value	None	Corrections, attributes where history doesn't matter
Type 2	Add new row with new value	Full history	Attributes critical for historical accuracy
Type 3	Add column for previous value	Limited (usually 1)	Need current and previous only
Type 4	Separate history table	Full history	Mini-dimensions for rapidly changing attributes
Type 6	Combination of 1, 2, 3	Full + current	Need historical analysis AND current-state reporting

scd_implementations.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
-- SLOWLY CHANGING DIMENSION IMPLEMENTATIONS
 
-- ============================================
-- TYPE 1: Overwrite (No History)
-- ============================================
-- Customer email changed: just update
UPDATE dim_customer
SET email = 'new.email@domain.com',
    updated_at = CURRENT_TIMESTAMP
WHERE customer_key = 12345;
 
-- Historical orders will show new email
-- Simple but loses history
 
 
-- ============================================
-- TYPE 2: Add New Row (Full History)
-- ============================================
-- Customer moves from NYC to Chicago
 
-- Step 1: Expire the current record
UPDATE dim_customer
SET expiration_date = CURRENT_DATE - 1,
    is_current = false
WHERE customer_key = 12345 AND is_current = true;
 
-- Step 2: Insert new version with new address
INSERT INTO dim_customer (
    customer_key,        -- NEW surrogate key
    customer_id,         -- Same natural key
    customer_name,
    city,                -- New value
    state,
    effective_date,
    expiration_date,
    is_current,
    version_number
)
SELECT 
    NEXT VALUE FOR customer_key_seq,
    customer_id,
    customer_name,
    'Chicago',           -- New city
    'IL',                -- New state
    CURRENT_DATE,
    '9999-12-31',
    true,
    version_number + 1
FROM dim_customer
WHERE customer_key = 12345;
 
-- Historical orders keep old customer_key (NYC)
-- New orders get new customer_key (Chicago)
 
-- Query: Show sales by customer's HISTORICAL location
SELECT 
    c.city,
    SUM(f.revenue) as revenue
FROM fact_sales f
JOIN dim_customer c ON f.customer_key = c.customer_key
GROUP BY c.city;
-- NYC and Chicago appear separately
 
 
-- ============================================
-- TYPE 3: Previous Value Column
-- ============================================
CREATE TABLE dim_customer_type3 (
    customer_key        BIGINT PRIMARY KEY,
    customer_id         VARCHAR(50),
    customer_name       VARCHAR(255),
    
    -- Current and previous value
    current_city        VARCHAR(100),
    previous_city       VARCHAR(100),
    city_change_date    DATE,
    
    -- Or for segment changes
    current_segment     VARCHAR(50),
    previous_segment    VARCHAR(50),
    segment_change_date DATE
);
 
-- Update when customer moves:
UPDATE dim_customer_type3
SET previous_city = current_city,
    current_city = 'Chicago',
    city_change_date = CURRENT_DATE
WHERE customer_key = 12345;
 
-- Can analyze: "Sales by current segment" AND "Sales by previous segment"
-- But only one level of history
 
 
-- ============================================
-- TYPE 6 (Hybrid): Combination
-- ============================================
CREATE TABLE dim_customer_type6 (
    customer_key        BIGINT PRIMARY KEY,
    customer_id         VARCHAR(50),
    customer_name       VARCHAR(255),
    
    -- Type 2 fields (historical row versioning)
    historical_city     VARCHAR(100),  -- Value when this version was created
    effective_date      DATE,
    expiration_date     DATE,
    is_current          BOOLEAN,
    
    -- Type 1 field (current value on ALL rows)
    current_city        VARCHAR(100),  -- Updated on ALL rows when customer moves
    
    -- Type 3 field (previous value)
    previous_city       VARCHAR(100)
);
 
-- Benefits: 
-- historical_city: Accurate point-in-time analysis
-- current_city: "All orders from currently-Chicago customers"
-- previous_city: "Customers who moved from NYC"

Choosing SCD Type:

The right SCD type depends on the attribute and business requirements:

Use Type 1 when:

Attribute represents corrections (typo in name)
History has no analytical value
Storage/complexity must be minimized

Use Type 2 when:

Historical accuracy is critical for analysis
Regulatory compliance requires point-in-time snapshots
Attribute changes have business significance

Use Type 3 when:

Only current and previous value matter
Need simple comparison queries
Storage constraints prevent Type 2

Use Type 6 when:

Need both historical accuracy AND current-state aggregation
Common for customer segmentation, territory assignments

SCD Type 2 is Most Common

For most analytical warehouses, Type 2 SCD is the default choice for important dimension attributes. The storage overhead is manageable, and it provides the flexibility to answer both historical and current-state questions. Use Type 1 only for attributes where history truly doesn't matter.

Enterprise Bus Architecture

As organizations build multiple dimensional models for different business processes, integration becomes critical. The bus architecture provides a framework for building integrated, enterprise-wide analytical capability.

Conformed Dimensions:

A conformed dimension is a dimension shared across multiple fact tables with identical meaning. When dim_product is used by both fact_sales and fact_inventory, it must have:

Same surrogate keys
Same attribute definitions
Same grain (what a product row represents)
Same hierarchies

Conformed dimensions enable drill-across queries—joining results from different fact tables.

drill_across_query.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- DRILL-ACROSS QUERY: Compare metrics from different fact tables
-- Only possible with conformed dimensions
 
-- Compare sales quantity vs returns quantity by product category
WITH sales_by_category AS (
    SELECT 
        p.category_name,
        SUM(fs.quantity) AS quantity_sold,
        SUM(fs.net_revenue) AS sales_revenue
    FROM fact_sales fs
    JOIN dim_product p ON fs.product_key = p.product_key
    JOIN dim_date d ON fs.date_key = d.date_key
    WHERE d.calendar_year = 2024
    GROUP BY p.category_name
),
returns_by_category AS (
    SELECT 
        p.category_name,
        SUM(fr.quantity) AS quantity_returned,
        SUM(fr.refund_amount) AS refund_amount
    FROM fact_returns fr
    JOIN dim_product p ON fr.product_key = p.product_key  -- SAME dimension
    JOIN dim_date d ON fr.return_date_key = d.date_key    -- SAME dimension
    WHERE d.calendar_year = 2024
    GROUP BY p.category_name
)
SELECT 
    COALESCE(s.category_name, r.category_name) AS category,
    s.quantity_sold,
    r.quantity_returned,
    ROUND(r.quantity_returned * 100.0 / NULLIF(s.quantity_sold, 0), 2) AS return_rate_pct,
    s.sales_revenue,
    r.refund_amount,
    s.sales_revenue - COALESCE(r.refund_amount, 0) AS net_revenue
FROM sales_by_category s
FULL OUTER JOIN returns_by_category r ON s.category_name = r.category_name
ORDER BY return_rate_pct DESC;
 
-- This query is ONLY possible because:
-- 1. dim_product is conformed across fact_sales and fact_returns
-- 2. dim_date is conformed across both fact tables
-- 3. category_name means the same thing in both contexts

The Bus Matrix:

The bus matrix is a planning tool that documents which dimensions are used by which business processes/fact tables. It visualizes conformed dimension coverage across the enterprise.

Example Bus Matrix
Business Process	Date	Product	Customer	Store	Employee	Promotion	Shipping
Retail Sales	✓	✓	✓	✓	✓	✓
Inventory	✓	✓		✓
Order Fulfillment	✓	✓	✓	✓			✓
Returns	✓	✓	✓	✓	✓
Customer Support	✓		✓		✓
Marketing Campaigns	✓	✓	✓			✓

Conformity Requirements:

Identical Dimensions:

Same surrogate keys
Same attribute values (category_name = 'Electronics' in both places)
Same hierarchies

Subset Conformity:

A dimension may exist at different grains for different facts
dim_product includes all products; fact_inventory may only use active products
The inventory fact uses a subset of the full product dimension

Shrunken Roleplay:

Same dimension with fewer attributes for specific use
dim_customer for sales includes demographics
dim_customer for support might be shrunken version without purchase history

Bus Architecture Governance

Conformed dimensions require governance. When marketing wants to add a 'campaign_type' hierarchy to dim_product, all fact tables using that dimension must accommodate. A data governance team should approve dimension changes considering enterprise impact.

Accumulating Snapshot Fact Tables

Beyond transaction facts and periodic snapshots, accumulating snapshot fact tables track the lifecycle of a process from start to finish. They're ideal for analyzing process efficiency, bottlenecks, and completion rates.

Use Cases:

Order fulfillment pipeline (order → pick → pack → ship → deliver)
Insurance claims processing (file → assess → approve → pay)
Software development lifecycle (backlog → design → develop → test → deploy)
Customer onboarding (signup → verify → first order → activate)

Key Characteristics:

One row per process instance (one row per order, claim, etc.)
Multiple date dimension foreign keys (one per milestone)
Dates are initially null and filled as milestones occur
Row is updated as the process progresses

accumulating_snapshot.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
-- ACCUMULATING SNAPSHOT: Order Fulfillment Pipeline
 
CREATE TABLE fact_order_fulfillment (
    -- Grain: One row per order
    order_key               BIGINT PRIMARY KEY,
    order_number            VARCHAR(50) NOT NULL,
    
    -- Milestone Date Keys (filled as process progresses)
    order_date_key          BIGINT NOT NULL REFERENCES dim_date(date_key),
    payment_date_key        BIGINT REFERENCES dim_date(date_key),
    warehouse_receive_key   BIGINT REFERENCES dim_date(date_key),
    pick_date_key           BIGINT REFERENCES dim_date(date_key),
    pack_date_key           BIGINT REFERENCES dim_date(date_key),
    ship_date_key           BIGINT REFERENCES dim_date(date_key),
    delivery_date_key       BIGINT REFERENCES dim_date(date_key),
    
    -- Other Dimensions
    customer_key            BIGINT NOT NULL REFERENCES dim_customer(customer_key),
    ship_to_geography_key   BIGINT NOT NULL REFERENCES dim_geography(geography_key),
    shipping_method_key     BIGINT REFERENCES dim_shipping(shipping_key),
    
    -- Current Status
    current_status          VARCHAR(50) NOT NULL,
    
    -- Measures: Lag times between milestones
    payment_lag_days        INT,  -- order to payment
    warehouse_lag_days      INT,  -- payment to warehouse
    pick_lag_days           INT,  -- warehouse to pick
    pack_lag_days           INT,  -- pick to pack
    ship_lag_days           INT,  -- pack to ship
    delivery_lag_days       INT,  -- ship to delivery
    total_lead_time_days    INT,  -- order to delivery
    
    -- Value Measures
    order_amount            DECIMAL(12,2),
    shipping_cost           DECIMAL(10,2),
    
    -- Audit
    last_updated            TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
 
-- LOADING PATTERN: Updates as order progresses
 
-- Initial insert when order placed
INSERT INTO fact_order_fulfillment (
    order_key, order_number, order_date_key, customer_key, 
    ship_to_geography_key, current_status, order_amount
)
VALUES (101, 'ORD-2024-001', 20240115, 5001, 3001, 'Ordered', 299.99);
 
 
-- Update when payment received
UPDATE fact_order_fulfillment
SET payment_date_key = 20240116,
    payment_lag_days = 20240116 - order_date_key,
    current_status = 'Paid',
    last_updated = CURRENT_TIMESTAMP
WHERE order_key = 101;
 
 
-- Update when shipped
UPDATE fact_order_fulfillment
SET ship_date_key = 20240118,
    ship_lag_days = 20240118 - pack_date_key,
    shipping_method_key = 2,
    current_status = 'Shipped',
    last_updated = CURRENT_TIMESTAMP
WHERE order_key = 101;
 
 
-- Update when delivered
UPDATE fact_order_fulfillment  
SET delivery_date_key = 20240121,
    delivery_lag_days = 20240121 - ship_date_key,
    total_lead_time_days = 20240121 - order_date_key,
    current_status = 'Delivered',
    last_updated = CURRENT_TIMESTAMP
WHERE order_key = 101;
 
 
-- ANALYSIS QUERIES
 
-- Average time at each pipeline stage
SELECT 
    AVG(payment_lag_days) AS avg_payment_days,
    AVG(pick_lag_days) AS avg_pick_days,
    AVG(pack_lag_days) AS avg_pack_days,
    AVG(ship_lag_days) AS avg_ship_days,
    AVG(delivery_lag_days) AS avg_delivery_days,
    AVG(total_lead_time_days) AS avg_total_days
FROM fact_order_fulfillment
WHERE delivery_date_key IS NOT NULL;  -- Completed orders only
 
 
-- Identify bottleneck stage by shipping method
SELECT 
    sm.shipping_method_name,
    AVG(pick_lag_days) AS avg_pick_days,
    AVG(pack_lag_days) AS avg_pack_days,
    AVG(ship_lag_days) AS avg_ship_days,
    AVG(delivery_lag_days) AS avg_delivery_days
FROM fact_order_fulfillment f
JOIN dim_shipping sm ON f.shipping_method_key = sm.shipping_key
WHERE delivery_date_key IS NOT NULL
GROUP BY sm.shipping_method_name;

Accumulating vs. Transaction Facts

Like choosing the right SCD type, choosing between accumulating snapshots and transaction facts depends on requirements. Use accumulating snapshots when: (1) you care about process duration and bottlenecks, (2) the process has a defined end state, and (3) updates are infrequent. Use transaction facts when every event matters individually.

Advanced Modeling Techniques

Complex business scenarios require specialized dimensional modeling techniques beyond the basics.

Factless Fact Tables:

Some business events have no numeric measures—just the occurrence of an event involving certain dimensions. Examples:

Student attendance (which student attended which class on which day)
Product promotions (which products were on promotion in which stores)
Employee certifications (which employee has which certification)

Factless fact tables contain only dimension foreign keys, possibly with a count column of 1.

advanced_techniques.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
-- FACTLESS FACT TABLE: Promotion Coverage
 
CREATE TABLE fact_promotion_coverage (
    date_key            BIGINT NOT NULL REFERENCES dim_date(date_key),
    product_key         BIGINT NOT NULL REFERENCES dim_product(product_key),
    store_key           BIGINT NOT NULL REFERENCES dim_store(store_key),
    promotion_key       BIGINT NOT NULL REFERENCES dim_promotion(promotion_key),
    
    -- No measures! Just the fact that this product was on this promotion
    -- at this store on this date
    
    PRIMARY KEY (date_key, product_key, store_key, promotion_key)
);
 
-- Query: What percentage of products in Electronics were on promotion?
SELECT 
    p.category_name,
    COUNT(DISTINCT pc.product_key) AS products_on_promo,
    COUNT(DISTINCT all_p.product_key) AS total_products,
    ROUND(COUNT(DISTINCT pc.product_key) * 100.0 / 
          COUNT(DISTINCT all_p.product_key), 2) AS promo_coverage_pct
FROM dim_product all_p
LEFT JOIN fact_promotion_coverage pc ON all_p.product_key = pc.product_key
JOIN dim_date d ON pc.date_key = d.date_key
WHERE all_p.category_name = 'Electronics'
  AND d.calendar_month = 'December 2024'
GROUP BY p.category_name;
 
 
-- BRIDGE TABLE: Many-to-Many Relationships
 
-- Customer with multiple accounts
CREATE TABLE bridge_customer_account (
    customer_key        BIGINT REFERENCES dim_customer(customer_key),
    account_key         BIGINT REFERENCES dim_account(account_key),
    allocation_factor   DECIMAL(5,4) DEFAULT 1.0,  -- Weighting if needed
    is_primary          BOOLEAN DEFAULT false,
    
    PRIMARY KEY (customer_key, account_key)
);
 
-- Usage in query: allocated revenue to customers
SELECT 
    c.customer_name,
    SUM(f.revenue * b.allocation_factor) AS allocated_revenue
FROM fact_account_activity f
JOIN bridge_customer_account b ON f.account_key = b.account_key
JOIN dim_customer c ON b.customer_key = c.customer_key
GROUP BY c.customer_name;
 
 
-- MULTI-VALUED DIMENSIONS: Using Positional Weighting
 
-- Product with multiple colors (can't pick just one)
CREATE TABLE bridge_product_color (
    product_key         BIGINT REFERENCES dim_product(product_key),
    color_key           BIGINT REFERENCES dim_color(color_key),
    color_weight        DECIMAL(5,4) DEFAULT 1.0,  -- Sum to 1.0 per product
    
    PRIMARY KEY (product_key, color_key)
);
 
-- Query: Sales by color (distributed across multi-color products)
SELECT 
    col.color_name,
    SUM(f.revenue * bc.color_weight) AS weighted_revenue
FROM fact_sales f
JOIN bridge_product_color bc ON f.product_key = bc.product_key
JOIN dim_color col ON bc.color_key = col.color_key
GROUP BY col.color_name;

Additional Advanced Patterns

•Mini-Dimensions — Extract frequently changing attributes into separate small dimension. Customer demographics vs. customer account status. Account status changes frequently; demographics don't.
•Swappable Dimensions — Tables that can be substituted for each other. Different product catalogs for different regions using the same foreign key position.
•Audit Dimensions — Track ETL metadata: batch_id, load_date, source_system. Every fact row references the audit dimension for debugging.
•Late-Arriving Facts — Facts that arrive after the corresponding dimension data. Require special handling to match with correct historical dimension version.
•Late-Arriving Dimensions — Dimension data that arrives after facts reference it. Facts initially reference 'Unknown' dimension row, later updated.

Complexity Has Costs

Every advanced technique adds complexity. Bridge tables require weighted aggregation. Factless facts require different query patterns. Use advanced techniques only when the basic star schema genuinely can't handle the requirement. Premature optimization of the model structure is just as dangerous as premature code optimization.

Summary: Dimensional Modeling Mastery

Dimensional modeling provides the methodology that transforms business requirements into analytical database designs. Let's consolidate the key principles:

Key Takeaways

•Dimensional modeling starts with business processes — Identify measurable events, not entities. Business processes become fact tables; the who/what/when become dimensions.
•The four-step process provides rigor — Select process, declare grain, identify dimensions, identify facts. Following these steps systematically ensures well-designed models.
•Slowly changing dimensions preserve history — Type 1 overwrites (no history), Type 2 adds versions (full history), Type 3/6 provide hybrid approaches. Choose based on analytical requirements.
•Bus architecture enables enterprise integration — Conformed dimensions shared across fact tables enable drill-across queries and consistent enterprise metrics.
•Accumulating snapshots track process lifecycles — When you need to analyze process duration and bottlenecks, accumulating snapshots capture multi-milestone events.
•Advanced techniques solve complex scenarios — Factless facts, bridge tables, and mini-dimensions handle many-to-many relationships and multi-valued attributes.

What's Next:

With OLTP normalization, OLAP denormalization, star schema patterns, and dimensional modeling methodology covered, we're prepared to explore hybrid approaches. Real-world systems increasingly require both transactional and analytical capabilities—HTAP (Hybrid Transaction/Analytical Processing) systems, lambda architectures, and real-time analytics platforms bridge the OLTP/OLAP divide.

The next page examines these hybrid strategies, helping you design systems that serve both operational and analytical needs.

Page Complete

You now understand dimensional modeling methodology—the complete process for designing analytical data warehouses. From the four-step design process through slowly changing dimensions to bus architecture and advanced patterns, you have the tools to design enterprise-scale analytical solutions. Next, we'll explore hybrid approaches that combine OLTP and OLAP characteristics.