Loading content...
Dimensional modeling is more than a schema pattern—it's a complete methodology for translating business analytic requirements into physical database structures. Pioneered by Ralph Kimball and popularized in The Data Warehouse Toolkit, dimensional modeling has become the dominant approach for designing data warehouses and analytical systems.
Where the star schema defines the structure, dimensional modeling provides the process: how to interview business users, identify measurements and context, design for historical tracking, and build scalable architectures that integrate across the enterprise. Mastering this methodology transforms you from someone who understands data structures to someone who can design complete analytical solutions.
By the end of this page, you will understand the complete dimensional modeling methodology. You'll learn the four-step design process, master slowly changing dimension techniques, and understand bus architecture principles for enterprise-scale integration. This knowledge enables you to lead analytical data warehouse design projects.
Dimensional modeling operates from a fundamentally different philosophy than traditional database design. Understanding this philosophy is essential for proper application.
Business Process Orientation:
Dimensional modeling starts with business processes, not data entities. Traditional ER modeling asks "What entities exist and how do they relate?" Dimensional modeling asks "What business processes do we want to analyze?"
Examples of business processes:
Each business process generates measurable events—these become fact tables.
| Aspect | ER Modeling (OLTP) | Dimensional Modeling (OLAP) |
|---|---|---|
| Starting Point | Business entities (Customer, Product, Order) | Business processes (Selling, Shipping, Supporting) |
| Design Goal | Data integrity in transactions | Query performance and usability |
| Schema Form | Normalized (3NF, BCNF) | Denormalized (Star Schema) |
| Primary Users | Application developers | Business analysts, executives |
| Query Pattern | Many simple queries | Few complex analytical queries |
| Data Freshness | Real-time current state | Historical snapshots over time |
| Success Metric | Transaction throughput | Query response time, user adoption |
Core Principles:
Bill Inmon advocated normalized enterprise data warehouses (corporate information factory) while Kimball advocated dimensional models. Today, most practitioners use Kimball's dimensional approach for analytical consumption while using Inmon-style normalized staging areas. The 'data lakehouse' pattern combines both philosophies.
Kimball's dimensional modeling methodology defines a rigorous four-step process for designing each fact table. Following this process ensures consistent, well-designed dimensional models.
Step 1: Select the Business Process
Identify the operational business process to model. This should be a measurable event that generates data:
Good business processes have:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
-- FOUR-STEP DESIGN PROCESS EXAMPLE-- Business Process: E-Commerce Order Fulfillment -- ============================================-- STEP 1: SELECT THE BUSINESS PROCESS-- ============================================-- Process: Order fulfillment-- Events: Order placed → Order shipped → Order delivered-- Sources: Order Management System, Shipping System-- Owner: Operations Team -- ============================================-- STEP 2: DECLARE THE GRAIN-- ============================================-- Grain options:-- a) One row per order (order header level)-- b) One row per order line item (most detail)-- c) One row per shipment-- d) One row per order lifecycle stage -- Selected Grain: One row per ORDER LINE ITEM-- Rationale: -- - Enables product-level analysis-- - Can aggregate to order level-- - Captures individual item metrics -- ============================================ -- STEP 3: IDENTIFY THE DIMENSIONS-- ============================================-- Who, What, When, Where, Why, How?-- - WHEN: Order Date, Ship Date, Delivery Date (date dimension)-- - WHO: Customer (customer dimension)-- - WHAT: Product purchased (product dimension)-- - WHERE: Shipping destination (geography dimension)-- - WHERE: Fulfillment warehouse (warehouse dimension)-- - HOW: Shipping method (shipping dimension)-- - WHY: Promotion applied (promotion dimension) -- ============================================-- STEP 4: IDENTIFY THE FACTS-- ============================================-- Measurements at the grain:-- - Quantity ordered (additive)-- - Unit price at time of sale (non-additive)-- - Extended price = qty × unit price (additive)-- - Discount amount (additive)-- - Net revenue (additive)-- - Unit cost (non-additive)-- - Gross profit (additive)-- - Shipping cost allocated (additive) -- RESULTING FACT TABLE DESIGN:CREATE TABLE fact_order_line ( -- Step 2: Grain = one row per order line item order_line_key BIGINT PRIMARY KEY, -- Step 3: Dimension keys order_date_key BIGINT NOT NULL REFERENCES dim_date(date_key), ship_date_key BIGINT REFERENCES dim_date(date_key), delivery_date_key BIGINT REFERENCES dim_date(date_key), customer_key BIGINT NOT NULL REFERENCES dim_customer(customer_key), product_key BIGINT NOT NULL REFERENCES dim_product(product_key), ship_to_geography_key BIGINT NOT NULL REFERENCES dim_geography(geography_key), warehouse_key BIGINT NOT NULL REFERENCES dim_warehouse(warehouse_key), shipping_method_key BIGINT NOT NULL REFERENCES dim_shipping(shipping_key), promotion_key BIGINT NOT NULL REFERENCES dim_promotion(promotion_key), -- Degenerate dimension order_number VARCHAR(50) NOT NULL, line_number INT NOT NULL, -- Step 4: Fact measures quantity INT NOT NULL, unit_price DECIMAL(10,2) NOT NULL, extended_price DECIMAL(12,2) NOT NULL, discount_amount DECIMAL(10,2) DEFAULT 0, net_revenue DECIMAL(12,2) NOT NULL, unit_cost DECIMAL(10,2), gross_profit DECIMAL(12,2), shipping_cost DECIMAL(10,2));Step 2: Declare the Grain
The grain defines what each fact table row represents. This is the most critical decision:
Grain Guidelines:
Step 3: Identify the Dimensions
Dimensions provide context—the who, what, when, where, why, and how. For each dimension:
Step 4: Identify the Facts
Facts are the numeric measurements at the declared grain. For each fact:
If a measurement doesn't make sense at the declared grain, either the measurement is wrong or the grain needs reconsideration. Example: 'Order Total' doesn't belong in a line-item grain fact table—it would repeat for every line in the order, breaking additivity.
Dimension attributes change over time. A customer moves to a new city. A product is recategorized. An employee is promoted. Slowly Changing Dimensions (SCD) define how to handle these changes while preserving analytical accuracy.
The Challenge:
When a customer moves from New York to Chicago:
The answer depends on the business question being asked. SCD techniques provide options.
| Type | Behavior | History Preserved | Use Case |
|---|---|---|---|
| Type 0 | Retain original value forever | Original only | Fixed attributes (birth date, original signup date) |
| Type 1 | Overwrite with new value | None | Corrections, attributes where history doesn't matter |
| Type 2 | Add new row with new value | Full history | Attributes critical for historical accuracy |
| Type 3 | Add column for previous value | Limited (usually 1) | Need current and previous only |
| Type 4 | Separate history table | Full history | Mini-dimensions for rapidly changing attributes |
| Type 6 | Combination of 1, 2, 3 | Full + current | Need historical analysis AND current-state reporting |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
-- SLOWLY CHANGING DIMENSION IMPLEMENTATIONS -- ============================================-- TYPE 1: Overwrite (No History)-- ============================================-- Customer email changed: just updateUPDATE dim_customerSET email = 'new.email@domain.com', updated_at = CURRENT_TIMESTAMPWHERE customer_key = 12345; -- Historical orders will show new email-- Simple but loses history -- ============================================-- TYPE 2: Add New Row (Full History)-- ============================================-- Customer moves from NYC to Chicago -- Step 1: Expire the current recordUPDATE dim_customerSET expiration_date = CURRENT_DATE - 1, is_current = falseWHERE customer_key = 12345 AND is_current = true; -- Step 2: Insert new version with new addressINSERT INTO dim_customer ( customer_key, -- NEW surrogate key customer_id, -- Same natural key customer_name, city, -- New value state, effective_date, expiration_date, is_current, version_number)SELECT NEXT VALUE FOR customer_key_seq, customer_id, customer_name, 'Chicago', -- New city 'IL', -- New state CURRENT_DATE, '9999-12-31', true, version_number + 1FROM dim_customerWHERE customer_key = 12345; -- Historical orders keep old customer_key (NYC)-- New orders get new customer_key (Chicago) -- Query: Show sales by customer's HISTORICAL locationSELECT c.city, SUM(f.revenue) as revenueFROM fact_sales fJOIN dim_customer c ON f.customer_key = c.customer_keyGROUP BY c.city;-- NYC and Chicago appear separately -- ============================================-- TYPE 3: Previous Value Column-- ============================================CREATE TABLE dim_customer_type3 ( customer_key BIGINT PRIMARY KEY, customer_id VARCHAR(50), customer_name VARCHAR(255), -- Current and previous value current_city VARCHAR(100), previous_city VARCHAR(100), city_change_date DATE, -- Or for segment changes current_segment VARCHAR(50), previous_segment VARCHAR(50), segment_change_date DATE); -- Update when customer moves:UPDATE dim_customer_type3SET previous_city = current_city, current_city = 'Chicago', city_change_date = CURRENT_DATEWHERE customer_key = 12345; -- Can analyze: "Sales by current segment" AND "Sales by previous segment"-- But only one level of history -- ============================================-- TYPE 6 (Hybrid): Combination-- ============================================CREATE TABLE dim_customer_type6 ( customer_key BIGINT PRIMARY KEY, customer_id VARCHAR(50), customer_name VARCHAR(255), -- Type 2 fields (historical row versioning) historical_city VARCHAR(100), -- Value when this version was created effective_date DATE, expiration_date DATE, is_current BOOLEAN, -- Type 1 field (current value on ALL rows) current_city VARCHAR(100), -- Updated on ALL rows when customer moves -- Type 3 field (previous value) previous_city VARCHAR(100)); -- Benefits: -- historical_city: Accurate point-in-time analysis-- current_city: "All orders from currently-Chicago customers"-- previous_city: "Customers who moved from NYC"Choosing SCD Type:
The right SCD type depends on the attribute and business requirements:
Use Type 1 when:
Use Type 2 when:
Use Type 3 when:
Use Type 6 when:
For most analytical warehouses, Type 2 SCD is the default choice for important dimension attributes. The storage overhead is manageable, and it provides the flexibility to answer both historical and current-state questions. Use Type 1 only for attributes where history truly doesn't matter.
As organizations build multiple dimensional models for different business processes, integration becomes critical. The bus architecture provides a framework for building integrated, enterprise-wide analytical capability.
Conformed Dimensions:
A conformed dimension is a dimension shared across multiple fact tables with identical meaning. When dim_product is used by both fact_sales and fact_inventory, it must have:
Conformed dimensions enable drill-across queries—joining results from different fact tables.
123456789101112131415161718192021222324252627282930313233343536373839404142
-- DRILL-ACROSS QUERY: Compare metrics from different fact tables-- Only possible with conformed dimensions -- Compare sales quantity vs returns quantity by product categoryWITH sales_by_category AS ( SELECT p.category_name, SUM(fs.quantity) AS quantity_sold, SUM(fs.net_revenue) AS sales_revenue FROM fact_sales fs JOIN dim_product p ON fs.product_key = p.product_key JOIN dim_date d ON fs.date_key = d.date_key WHERE d.calendar_year = 2024 GROUP BY p.category_name),returns_by_category AS ( SELECT p.category_name, SUM(fr.quantity) AS quantity_returned, SUM(fr.refund_amount) AS refund_amount FROM fact_returns fr JOIN dim_product p ON fr.product_key = p.product_key -- SAME dimension JOIN dim_date d ON fr.return_date_key = d.date_key -- SAME dimension WHERE d.calendar_year = 2024 GROUP BY p.category_name)SELECT COALESCE(s.category_name, r.category_name) AS category, s.quantity_sold, r.quantity_returned, ROUND(r.quantity_returned * 100.0 / NULLIF(s.quantity_sold, 0), 2) AS return_rate_pct, s.sales_revenue, r.refund_amount, s.sales_revenue - COALESCE(r.refund_amount, 0) AS net_revenueFROM sales_by_category sFULL OUTER JOIN returns_by_category r ON s.category_name = r.category_nameORDER BY return_rate_pct DESC; -- This query is ONLY possible because:-- 1. dim_product is conformed across fact_sales and fact_returns-- 2. dim_date is conformed across both fact tables-- 3. category_name means the same thing in both contextsThe Bus Matrix:
The bus matrix is a planning tool that documents which dimensions are used by which business processes/fact tables. It visualizes conformed dimension coverage across the enterprise.
| Business Process | Date | Product | Customer | Store | Employee | Promotion | Shipping |
|---|---|---|---|---|---|---|---|
| Retail Sales | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Inventory | ✓ | ✓ | ✓ | ||||
| Order Fulfillment | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Returns | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Customer Support | ✓ | ✓ | ✓ | ||||
| Marketing Campaigns | ✓ | ✓ | ✓ | ✓ |
Conformity Requirements:
Identical Dimensions:
Subset Conformity:
dim_product includes all products; fact_inventory may only use active productsShrunken Roleplay:
dim_customer for sales includes demographicsdim_customer for support might be shrunken version without purchase historyConformed dimensions require governance. When marketing wants to add a 'campaign_type' hierarchy to dim_product, all fact tables using that dimension must accommodate. A data governance team should approve dimension changes considering enterprise impact.
Beyond transaction facts and periodic snapshots, accumulating snapshot fact tables track the lifecycle of a process from start to finish. They're ideal for analyzing process efficiency, bottlenecks, and completion rates.
Use Cases:
Key Characteristics:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
-- ACCUMULATING SNAPSHOT: Order Fulfillment Pipeline CREATE TABLE fact_order_fulfillment ( -- Grain: One row per order order_key BIGINT PRIMARY KEY, order_number VARCHAR(50) NOT NULL, -- Milestone Date Keys (filled as process progresses) order_date_key BIGINT NOT NULL REFERENCES dim_date(date_key), payment_date_key BIGINT REFERENCES dim_date(date_key), warehouse_receive_key BIGINT REFERENCES dim_date(date_key), pick_date_key BIGINT REFERENCES dim_date(date_key), pack_date_key BIGINT REFERENCES dim_date(date_key), ship_date_key BIGINT REFERENCES dim_date(date_key), delivery_date_key BIGINT REFERENCES dim_date(date_key), -- Other Dimensions customer_key BIGINT NOT NULL REFERENCES dim_customer(customer_key), ship_to_geography_key BIGINT NOT NULL REFERENCES dim_geography(geography_key), shipping_method_key BIGINT REFERENCES dim_shipping(shipping_key), -- Current Status current_status VARCHAR(50) NOT NULL, -- Measures: Lag times between milestones payment_lag_days INT, -- order to payment warehouse_lag_days INT, -- payment to warehouse pick_lag_days INT, -- warehouse to pick pack_lag_days INT, -- pick to pack ship_lag_days INT, -- pack to ship delivery_lag_days INT, -- ship to delivery total_lead_time_days INT, -- order to delivery -- Value Measures order_amount DECIMAL(12,2), shipping_cost DECIMAL(10,2), -- Audit last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- LOADING PATTERN: Updates as order progresses -- Initial insert when order placedINSERT INTO fact_order_fulfillment ( order_key, order_number, order_date_key, customer_key, ship_to_geography_key, current_status, order_amount)VALUES (101, 'ORD-2024-001', 20240115, 5001, 3001, 'Ordered', 299.99); -- Update when payment receivedUPDATE fact_order_fulfillmentSET payment_date_key = 20240116, payment_lag_days = 20240116 - order_date_key, current_status = 'Paid', last_updated = CURRENT_TIMESTAMPWHERE order_key = 101; -- Update when shippedUPDATE fact_order_fulfillmentSET ship_date_key = 20240118, ship_lag_days = 20240118 - pack_date_key, shipping_method_key = 2, current_status = 'Shipped', last_updated = CURRENT_TIMESTAMPWHERE order_key = 101; -- Update when deliveredUPDATE fact_order_fulfillment SET delivery_date_key = 20240121, delivery_lag_days = 20240121 - ship_date_key, total_lead_time_days = 20240121 - order_date_key, current_status = 'Delivered', last_updated = CURRENT_TIMESTAMPWHERE order_key = 101; -- ANALYSIS QUERIES -- Average time at each pipeline stageSELECT AVG(payment_lag_days) AS avg_payment_days, AVG(pick_lag_days) AS avg_pick_days, AVG(pack_lag_days) AS avg_pack_days, AVG(ship_lag_days) AS avg_ship_days, AVG(delivery_lag_days) AS avg_delivery_days, AVG(total_lead_time_days) AS avg_total_daysFROM fact_order_fulfillmentWHERE delivery_date_key IS NOT NULL; -- Completed orders only -- Identify bottleneck stage by shipping methodSELECT sm.shipping_method_name, AVG(pick_lag_days) AS avg_pick_days, AVG(pack_lag_days) AS avg_pack_days, AVG(ship_lag_days) AS avg_ship_days, AVG(delivery_lag_days) AS avg_delivery_daysFROM fact_order_fulfillment fJOIN dim_shipping sm ON f.shipping_method_key = sm.shipping_keyWHERE delivery_date_key IS NOT NULLGROUP BY sm.shipping_method_name;Like choosing the right SCD type, choosing between accumulating snapshots and transaction facts depends on requirements. Use accumulating snapshots when: (1) you care about process duration and bottlenecks, (2) the process has a defined end state, and (3) updates are infrequent. Use transaction facts when every event matters individually.
Complex business scenarios require specialized dimensional modeling techniques beyond the basics.
Factless Fact Tables:
Some business events have no numeric measures—just the occurrence of an event involving certain dimensions. Examples:
Factless fact tables contain only dimension foreign keys, possibly with a count column of 1.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
-- FACTLESS FACT TABLE: Promotion Coverage CREATE TABLE fact_promotion_coverage ( date_key BIGINT NOT NULL REFERENCES dim_date(date_key), product_key BIGINT NOT NULL REFERENCES dim_product(product_key), store_key BIGINT NOT NULL REFERENCES dim_store(store_key), promotion_key BIGINT NOT NULL REFERENCES dim_promotion(promotion_key), -- No measures! Just the fact that this product was on this promotion -- at this store on this date PRIMARY KEY (date_key, product_key, store_key, promotion_key)); -- Query: What percentage of products in Electronics were on promotion?SELECT p.category_name, COUNT(DISTINCT pc.product_key) AS products_on_promo, COUNT(DISTINCT all_p.product_key) AS total_products, ROUND(COUNT(DISTINCT pc.product_key) * 100.0 / COUNT(DISTINCT all_p.product_key), 2) AS promo_coverage_pctFROM dim_product all_pLEFT JOIN fact_promotion_coverage pc ON all_p.product_key = pc.product_keyJOIN dim_date d ON pc.date_key = d.date_keyWHERE all_p.category_name = 'Electronics' AND d.calendar_month = 'December 2024'GROUP BY p.category_name; -- BRIDGE TABLE: Many-to-Many Relationships -- Customer with multiple accountsCREATE TABLE bridge_customer_account ( customer_key BIGINT REFERENCES dim_customer(customer_key), account_key BIGINT REFERENCES dim_account(account_key), allocation_factor DECIMAL(5,4) DEFAULT 1.0, -- Weighting if needed is_primary BOOLEAN DEFAULT false, PRIMARY KEY (customer_key, account_key)); -- Usage in query: allocated revenue to customersSELECT c.customer_name, SUM(f.revenue * b.allocation_factor) AS allocated_revenueFROM fact_account_activity fJOIN bridge_customer_account b ON f.account_key = b.account_keyJOIN dim_customer c ON b.customer_key = c.customer_keyGROUP BY c.customer_name; -- MULTI-VALUED DIMENSIONS: Using Positional Weighting -- Product with multiple colors (can't pick just one)CREATE TABLE bridge_product_color ( product_key BIGINT REFERENCES dim_product(product_key), color_key BIGINT REFERENCES dim_color(color_key), color_weight DECIMAL(5,4) DEFAULT 1.0, -- Sum to 1.0 per product PRIMARY KEY (product_key, color_key)); -- Query: Sales by color (distributed across multi-color products)SELECT col.color_name, SUM(f.revenue * bc.color_weight) AS weighted_revenueFROM fact_sales fJOIN bridge_product_color bc ON f.product_key = bc.product_keyJOIN dim_color col ON bc.color_key = col.color_keyGROUP BY col.color_name;Every advanced technique adds complexity. Bridge tables require weighted aggregation. Factless facts require different query patterns. Use advanced techniques only when the basic star schema genuinely can't handle the requirement. Premature optimization of the model structure is just as dangerous as premature code optimization.
Dimensional modeling provides the methodology that transforms business requirements into analytical database designs. Let's consolidate the key principles:
What's Next:
With OLTP normalization, OLAP denormalization, star schema patterns, and dimensional modeling methodology covered, we're prepared to explore hybrid approaches. Real-world systems increasingly require both transactional and analytical capabilities—HTAP (Hybrid Transaction/Analytical Processing) systems, lambda architectures, and real-time analytics platforms bridge the OLTP/OLAP divide.
The next page examines these hybrid strategies, helping you design systems that serve both operational and analytical needs.
You now understand dimensional modeling methodology—the complete process for designing analytical data warehouses. From the four-step design process through slowly changing dimensions to bus architecture and advanced patterns, you have the tools to design enterprise-scale analytical solutions. Next, we'll explore hybrid approaches that combine OLTP and OLAP characteristics.