Database Management SystemsStar Schema

Star Schema: The Foundation of Dimensional Modeling

LevelIntermediate

Duration75 mins

TopicStar Schema

2 / 5

Dimension Tables: Providing Context and Meaning

The Who, What, When, Where, and Why of Your Data

A fact table tells you that $89.97 was transacted—but by itself, this number is meaningless. Who made the purchase? What was bought? When did it happen? Where did the transaction occur? Why were they buying (response to a promotion? seasonal need?).

Dimension tables answer all of these questions. They are the structures that transform raw measurements into actionable business intelligence. While fact tables store the numbers, dimension tables store the words—the textual, descriptive attributes that give facts meaning.

In the star schema, dimension tables radiate outward from the central fact table like points on a star. Each dimension represents a perspective for analyzing the facts—a way of slicing, filtering, grouping, and labeling the data. Master dimension design, and you master the ability to answer any business question.

What You Will Learn

By the end of this page, you will understand dimension table anatomy, design denormalized dimension structures, implement slowly changing dimension (SCD) patterns to track historical changes, model dimensional hierarchies, and apply design patterns used in enterprise-scale data warehouses.

Understanding Dimension Tables

A dimension table stores descriptive, textual information about business entities. It provides the context necessary to analyze and aggregate fact table measurements. Where fact tables are numeric and narrow, dimension tables are textual and wide.

Fundamental Characteristics

1. Descriptive Attributes (Not Measurements)

Dimension tables contain textual, descriptive data—names, descriptions, categories, codes, dates, and hierarchical relationships. They do not contain numeric measurements that would be aggregated.

2. Wide and Shallow Structure

Unlike the narrow-and-deep fact tables (few columns, billions of rows), dimension tables are wide (many columns—often 50 to 100+ attributes) but shallow (relatively few rows—thousands to millions).

3. Single Primary Key

Each dimension table has a surrogate primary key that fact tables reference. This key has no business meaning—it's simply an integer that enables efficient joins.

4. Denormalized Design

Dimension tables typically violate third normal form intentionally. Attributes are denormalized into flat structures that eliminate joins and simplify queries.

Example Product Dimension Structure
Column	Example Value	Purpose
product_key	4521	Surrogate key (FK in fact tables)
product_code	SKU-A2847	Natural/business key
product_name	Premium Wireless Mouse	Display name
product_description	Ergonomic wireless mouse with...	Full description
brand_name	TechCorp	Brand (denormalized)
category_name	Computer Peripherals	Category (denormalized)
subcategory_name	Input Devices	Subcategory (denormalized)
department_name	Electronics	Department (denormalized)
unit_cost	18.00	Current cost
unit_list_price	29.99	Current list price
package_size	Standard	Package type
color	Black	Product color
weight_oz	3.5	Weight in ounces
is_active	TRUE	Currently available flag
introduction_date	2022-01-15	When introduced
discontinue_date	NULL	When discontinued

Notice the Denormalization

The product dimension includes brand_name, category_name, subcategory_name, and department_name directly—not as foreign keys to separate tables. This intentional denormalization simplifies queries and improves performance. A single join to the product dimension gives you all product context.

Surrogate Keys vs Natural Keys

One of the most important dimensional modeling decisions is the choice between natural keys (business identifiers like SKU codes, employee IDs, or social security numbers) and surrogate keys (meaningless integers generated by the data warehouse).

The Case for Surrogate Keys

Surrogate keys are strongly preferred in dimensional modeling, and nearly every production data warehouse uses them. Here's why:

1. Handle Slowly Changing Dimensions

When a customer's address changes, do you update the existing row or create a new one to track history? With natural keys, you're stuck with a single customer record. With surrogate keys, you can have multiple rows for the same customer, each with a different surrogate key representing a different version.

2. Integrate Multiple Source Systems

Your customer might be identified as C-1042 in the CRM, CUST_1042 in the ERP, and 1042@company.com in the marketing platform. Surrogate keys let you unify these into a single dimension while preserving the ability to trace back to source systems.

3. Handle NULL Foreign Keys

Some fact rows may have missing dimension values ("unknown customer"). With natural keys, you're forced to use NULL, which complicates joins and aggregations. With surrogate keys, you create a special "Unknown" dimension row with key -1 or 0.

4. Improve Join Performance

Integer surrogate keys are smaller (4 or 8 bytes) than character-based natural keys (often 20+ bytes). This reduces storage, improves cache efficiency, and speeds up joins.

Surrogate Key Benefits

•Enable SCD history tracking
•Support multiple source systems
•Handle late-arriving dimensions
•Efficient integer joins
•Warehouse independence from sources
•Consistent key format across dimensions

Natural Key Uses

•Preserved as alternate key
•Used for ETL matching
•Displayed in reports
•Required for source system lookups
•Part of unique constraints
•Useful for data lineage

dimension_key_structure.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Dimension table with surrogate key and natural key
CREATE TABLE customer_dim (
    -- Surrogate key: Used in all fact table relationships
    customer_key        INT IDENTITY(1,1) PRIMARY KEY,
    
    -- Natural key(s): Business identifiers from source systems
    customer_id         VARCHAR(20) NOT NULL,       -- CRM system ID
    erp_account_number  VARCHAR(15),                -- ERP system ID
    
    -- SCD tracking columns
    effective_date      DATE NOT NULL,
    expiration_date     DATE,                       -- NULL = current
    is_current          BIT DEFAULT 1,
    
    -- Descriptive attributes
    customer_name       VARCHAR(100) NOT NULL,
    email               VARCHAR(100),
    phone               VARCHAR(20),
    customer_type       VARCHAR(20),                -- B2B, B2C
    credit_rating       VARCHAR(10),
    
    -- Denormalized geographic hierarchy
    address_line1       VARCHAR(100),
    city                VARCHAR(50),
    state_province      VARCHAR(50),
    postal_code         VARCHAR(20),
    country             VARCHAR(50),
    region              VARCHAR(20),                -- Derived: Northeast, West, etc.
    
    -- Analytics segments
    acquisition_channel VARCHAR(30),
    loyalty_tier        VARCHAR(20),
    lifetime_value_band VARCHAR(20),
    
    -- ETL metadata
    source_system       VARCHAR(20),
    etl_load_timestamp  DATETIME DEFAULT GETDATE()
);
 
-- Unique constraint on natural key + version
CREATE UNIQUE INDEX ix_customer_natural 
    ON customer_dim(customer_id, effective_date);

Slowly Changing Dimensions (SCD)

Dimension data changes over time. Customers move to new addresses. Products get reclassified into different categories. Employees change departments. The question is: how should the data warehouse handle these changes?

The answer depends on whether you need to track history. Slowly Changing Dimension (SCD) techniques provide standardized approaches, classified into types:

SCD Type 0: Retain Original

The attribute value is fixed at the first time the dimension member appears and never changes. Some attributes should never update—the original acquisition date of a customer, the original category under which a product was first sold.

Use Case: Audit attributes, original classification values, historical research.

SCD Type 1: Overwrite

When a change occurs, simply overwrite the old value with the new value. No history is maintained.

Characteristics:

Simplest to implement
Dimension table size stays constant
Historical facts are restated to reflect current values
Appropriate when history is irrelevant or when correcting data errors

scd_type_1_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- SCD Type 1: Overwrite
-- Customer changes email address: just update the row
 
-- Before update:
-- customer_key | customer_id | email
-- 1042         | CUST-A457   | john@oldmail.com
 
UPDATE customer_dim
SET email = 'john@newmail.com',
    etl_load_timestamp = GETDATE()
WHERE customer_key = 1042;
 
-- After update:
-- customer_key | customer_id | email
-- 1042         | CUST-A457   | john@newmail.com
 
-- All historical facts linked to customer_key 1042 now 
-- show the new email when joined.

SCD Type 2: Add New Row (Full History)

The most commonly used SCD type. When a change occurs, mark the current row as expired and insert a new row with the updated values. Each version of the dimension member gets a unique surrogate key.

Characteristics:

Maintains complete history
Creates multiple rows per natural key
Historical facts remain linked to the dimension version that existed at transaction time
Most accurate for historical analysis
Increases dimension table size over time

scd_type_2_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SCD Type 2: Add New Row
-- Customer moves from Boston to Seattle
 
-- BEFORE: Single row
-- customer_key | customer_id | city    | effective_date | expiration_date | is_current
-- 1042         | CUST-A457   | Boston  | 2020-01-15     | NULL            | 1
 
-- Step 1: Expire the current row
UPDATE customer_dim
SET expiration_date = '2024-06-30',
    is_current = 0
WHERE customer_key = 1042;
 
-- Step 2: Insert new row with new surrogate key
INSERT INTO customer_dim (
    customer_id, customer_name, email, city, state_province,
    effective_date, expiration_date, is_current
)
VALUES (
    'CUST-A457', 'John Smith', 'john@email.com', 'Seattle', 'WA',
    '2024-07-01', NULL, 1
);
 
-- AFTER: Two rows
-- customer_key | customer_id | city    | effective_date | expiration_date | is_current
-- 1042         | CUST-A457   | Boston  | 2020-01-15     | 2024-06-30      | 0
-- 5831         | CUST-A457   | Seattle | 2024-07-01     | NULL            | 1
 
-- Historical facts joined on customer_key = 1042 show Boston
-- New facts joined on customer_key = 5831 show Seattle
-- Both are the same logical customer (CUST-A457)

SCD Type 3: Add New Column

Store both the current and previous value by adding a column for the old value. Limited history—typically only the immediately previous value.

Characteristics:

Fixed number of rows (no growth from changes)
Limited history (usually just current and previous)
Enables "current vs. prior" comparisons
Rarely used in practice due to inflexibility

scd_type_3_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- SCD Type 3: Add Column for Previous Value
-- Track current and previous sales territory
 
CREATE TABLE salesperson_dim (
    salesperson_key     INT PRIMARY KEY,
    salesperson_id      VARCHAR(20),
    salesperson_name    VARCHAR(100),
    
    -- Current value
    current_territory   VARCHAR(50),
    territory_change_date DATE,
    
    -- Previous value (Type 3)
    prior_territory     VARCHAR(50)
);
 
-- When territory changes: Seattle → Portland
UPDATE salesperson_dim
SET prior_territory = current_territory,
    current_territory = 'Portland',
    territory_change_date = '2024-07-01'
WHERE salesperson_key = 287;
 
-- Enables queries like:
-- "Sales by current territory vs prior territory"

SCD Type Comparison
Type	History	Complexity	Space Growth	Best Use Case
Type 0	None (fixed)	Trivial	None	Original values, audit data
Type 1	None (overwrite)	Simple	None	Corrections, name standardization
Type 2	Full	Complex	Linear with changes	Most business dimensions
Type 3	Limited (prev only)	Moderate	Fixed (extra columns)	Before/after comparisons

Hybrid Approaches

Real dimensions often use mixed SCD types for different attributes. Customer name might be Type 1 (corrections), address might be Type 2 (full history), and acquisition date might be Type 0 (never changes). The SCD type is an attribute-level decision, not a table-level decision.

Common Dimension Types

Certain dimensions appear in nearly every data warehouse. Understanding these standard patterns accelerates design and ensures you don't overlook critical functionality.

The Date Dimension

The date dimension is the most universal dimension—virtually every fact table includes a date foreign key. Unlike other dimensions that are populated from source system data, the date dimension is generated once and pre-populated with all needed dates.

Essential Date Dimension Attributes:

date_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
CREATE TABLE date_dim (
    -- Surrogate key (often formatted as YYYYMMDD for convenience)
    date_key                INT PRIMARY KEY,            -- 20240115
    
    -- Full date
    full_date               DATE NOT NULL UNIQUE,       -- 2024-01-15
    
    -- Calendar attributes
    calendar_year           INT NOT NULL,               -- 2024
    calendar_quarter        INT NOT NULL,               -- 1
    calendar_month          INT NOT NULL,               -- 1
    calendar_week           INT NOT NULL,               -- 3
    day_of_year             INT NOT NULL,               -- 15
    day_of_quarter          INT NOT NULL,               -- 15
    day_of_month            INT NOT NULL,               -- 15
    day_of_week             INT NOT NULL,               -- 2 (Monday)
    
    -- Display names
    day_name                VARCHAR(10) NOT NULL,       -- 'Monday'
    month_name              VARCHAR(10) NOT NULL,       -- 'January'
    quarter_name            VARCHAR(10) NOT NULL,       -- 'Q1'
    year_month              VARCHAR(7) NOT NULL,        -- '2024-01'
    year_quarter            VARCHAR(7) NOT NULL,        -- '2024-Q1'
    
    -- Fiscal calendar (company-specific)
    fiscal_year             INT NOT NULL,               -- 2024 (or offset)
    fiscal_quarter          INT NOT NULL,
    fiscal_month            INT NOT NULL,
    fiscal_week             INT NOT NULL,
    fiscal_period           VARCHAR(20),                -- 'FY24-P07'
    
    -- Flags
    is_weekend              BIT NOT NULL,               -- 1 for Sat/Sun
    is_holiday              BIT DEFAULT 0,              -- Company holidays
    is_business_day         BIT NOT NULL,               -- Workdays
    is_month_end            BIT NOT NULL,
    is_quarter_end          BIT NOT NULL,
    is_year_end             BIT NOT NULL,
    
    -- Holiday attributes
    holiday_name            VARCHAR(50),                -- 'Independence Day'
    
    -- Prior period references (for easy comparisons)
    prior_year_date_key     INT,                        -- Same date last year
    prior_month_date_key    INT,                        -- Same date last month
    prior_week_date_key     INT                         -- Same date last week
);
 
-- Pre-populate 20 years of dates
-- The date dimension is generated, not loaded from source systems

The Time Dimension

For fact tables that require intraday granularity, a time dimension (separate from the date dimension) provides time-of-day analysis.

Example Time Dimension Attributes:

time_key (e.g., 083000 for 8:30:00 AM)
full_time (08:30:00)
hour_of_day (8)
minute_of_hour (30)
am_pm ('AM')
day_part ('Morning', 'Afternoon', 'Evening', 'Night')
business_hour (TRUE for 9-5)

Junk Dimensions

Junk dimensions consolidate miscellaneous, low-cardinality flags and indicators into a single dimension rather than cluttering the fact table.

Instead of having five boolean columns in the fact table (is_returned, is_gift, is_rush, is_taxable, is_discounted), create a junk dimension with all combinations:

junk_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Junk dimension consolidating transaction flags
CREATE TABLE transaction_flags_dim (
    transaction_flags_key   INT PRIMARY KEY,
    is_returned             BIT NOT NULL,
    is_gift                 BIT NOT NULL,
    is_rush_order           BIT NOT NULL,
    is_taxable              BIT NOT NULL,
    is_discounted           BIT NOT NULL
);
 
-- Pre-populate all combinations: 2^5 = 32 rows
-- transaction_flags_key | is_returned | is_gift | is_rush | is_taxable | is_discounted
-- 1                     | 0           | 0       | 0       | 0          | 0
-- 2                     | 0           | 0       | 0       | 0          | 1
-- 3                     | 0           | 0       | 0       | 1          | 0
-- ... (32 combinations total)
 
-- Fact table references single junk dimension key
CREATE TABLE sales_fact (
    sale_key                BIGINT,
    date_key                INT,
    product_key             INT,
    customer_key            INT,
    transaction_flags_key   INT,    -- Single FK replaces 5 columns
    sales_amount            DECIMAL(12,2)
);

Role-Playing Dimensions

Role-playing dimensions occur when the same dimension table is joined to a fact table multiple times, each time playing a different role.

Classic example: A shipment fact table with order_date, ship_date, and delivery_date. All three reference the same date dimension, but each join plays a different analytical role.

role_playing_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Single date dimension used in three roles
SELECT 
    od.calendar_month AS order_month,
    sd.calendar_month AS ship_month,
    dd.calendar_month AS delivery_month,
    AVG(DATEDIFF(day, od.full_date, sd.full_date)) AS avg_order_to_ship,
    AVG(DATEDIFF(day, sd.full_date, dd.full_date)) AS avg_ship_to_delivery
FROM shipment_fact f
JOIN date_dim od ON f.order_date_key = od.date_key      -- Role: Order Date
JOIN date_dim sd ON f.ship_date_key = sd.date_key       -- Role: Ship Date  
JOIN date_dim dd ON f.delivery_date_key = dd.date_key   -- Role: Delivery Date
GROUP BY od.calendar_month, sd.calendar_month, dd.calendar_month;
 
-- Optionally create views for role clarity:
CREATE VIEW order_date AS SELECT * FROM date_dim;
CREATE VIEW ship_date AS SELECT * FROM date_dim;
CREATE VIEW delivery_date AS SELECT * FROM date_dim;

Dimensional Hierarchies

Hierarchies define drill-down relationships within dimensions—natural aggregation levels where data can be summarized. Understanding hierarchies is essential for designing dimensions that support interactive analysis.

Types of Hierarchies

Balanced Hierarchies

All branches of the hierarchy have the same depth, and every level is meaningful.

Example: Standard Date Hierarchy

Year → Quarter → Month → Week → Day
2024 → Q1 → January → Week 3 → January 15

Ragged (Unbalanced) Hierarchies

Different branches have different depths. Some paths skip levels.

Example: Geographic Hierarchy

World → Region → Country → State → City
                                    ↓
World → Region → Country → City (small countries skip state level)

Parent-Child Hierarchies

Self-referencing hierarchies of unknown depth, where each node points to its parent.

Example: Organizational Hierarchy

CEO → VP Sales → Regional Director → Territory Manager → Sales Rep
(Depth varies by department and organization structure)

denormalized_hierarchy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- Denormalized hierarchy in product dimension
-- All hierarchy levels stored as attributes (the dimensional modeling way)
 
CREATE TABLE product_dim (
    product_key             INT PRIMARY KEY,
    product_code            VARCHAR(20),
    product_name            VARCHAR(100),
    
    -- Hierarchy Level 1: Department
    department_code         VARCHAR(10),
    department_name         VARCHAR(50),
    
    -- Hierarchy Level 2: Category
    category_code           VARCHAR(10),
    category_name           VARCHAR(50),
    
    -- Hierarchy Level 3: Subcategory
    subcategory_code        VARCHAR(10),
    subcategory_name        VARCHAR(50),
    
    -- Hierarchy Level 4: Brand
    brand_code              VARCHAR(10),
    brand_name              VARCHAR(50),
    
    -- Product is Level 5 (most granular)
    -- ...other product attributes
);
 
-- Query at any hierarchy level with simple GROUP BY
-- Sales by Department
SELECT department_name, SUM(sales_amount)
FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_key
GROUP BY department_name;
 
-- Sales by Category within Department
SELECT department_name, category_name, SUM(sales_amount)
FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_key
GROUP BY department_name, category_name;
 
-- Drill-down to product
SELECT department_name, category_name, subcategory_name, 
       brand_name, product_name, SUM(sales_amount)
FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_key
WHERE department_name = 'Electronics'
GROUP BY department_name, category_name, subcategory_name, 
         brand_name, product_name;

Denormalization Is The Key

Notice that the hierarchy is completely flattened into single columns. There are no separate Department, Category, or Subcategory tables. This denormalization—which would be incorrect in OLTP design—is intentional for OLAP. It enables any level of drill-down with a single join and simple GROUP BY clauses.

Dimension Design Patterns

Expert dimension designers apply established patterns that solve common challenges. These patterns represent accumulated wisdom from thousands of data warehouse implementations.

Conformed Dimensions

Conformed dimensions are dimensions shared across multiple fact tables (and potentially multiple data marts or subject areas). They ensure consistent analysis and enable cross-fact-table reporting.

Example: The Customer dimension is used by Sales facts, Service Call facts, and Marketing Campaign facts. A single, shared customer dimension ensures that "customer" means the same thing everywhere.

Benefits of conformed dimensions:

Consistent business definitions
Cross-functional analysis (marketing campaigns impact on sales)
Reduced ETL complexity
Single source of truth

conformed_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Conformed customer dimension used across multiple fact tables
-- Same dimension, same meaning, everywhere
 
-- Sales Fact
SELECT c.customer_segment, SUM(sf.sales_amount) AS sales
FROM sales_fact sf
JOIN customer_dim c ON sf.customer_key = c.customer_key
GROUP BY c.customer_segment;
 
-- Service Calls Fact (same customer dimension)
SELECT c.customer_segment, COUNT(*) AS support_calls
FROM service_call_fact scf
JOIN customer_dim c ON scf.customer_key = c.customer_key
GROUP BY c.customer_segment;
 
-- Combined cross-fact analysis
SELECT 
    c.customer_segment,
    SUM(sf.sales_amount) AS sales,
    COUNT(DISTINCT scf.call_key) AS support_calls,
    SUM(sf.sales_amount) / NULLIF(COUNT(DISTINCT scf.call_key), 0) AS sales_per_call
FROM customer_dim c
LEFT JOIN sales_fact sf ON c.customer_key = sf.customer_key
LEFT JOIN service_call_fact scf ON c.customer_key = scf.customer_key
GROUP BY c.customer_segment;

Bridge Tables for Many-to-Many Relationships

Bridge tables (also called mapping tables) handle many-to-many relationships between fact tables and dimensions that cannot be modeled with a simple foreign key.

Classic example: A patient might have multiple diagnoses, and each diagnosis appears on multiple patients. A patient visit fact cannot have a single diagnosis foreign key.

bridge_table.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Bridge table pattern for many-to-many relationship
-- Patient visits can have multiple diagnoses
 
-- Diagnosis Bridge Table
CREATE TABLE patient_diagnosis_bridge (
    patient_diagnosis_group_key INT,    -- Group identifier
    diagnosis_key               INT,     -- FK to diagnosis_dim
    diagnosis_priority          INT,     -- Primary=1, Secondary=2, etc.
    weighting_factor            DECIMAL(5,4)  -- For weighted aggregation
);
 
-- Patient Visit Fact references the group, not individual diagnoses
CREATE TABLE patient_visit_fact (
    visit_key                   BIGINT PRIMARY KEY,
    patient_key                 INT,
    date_key                    INT,
    provider_key                INT,
    patient_diagnosis_group_key INT,     -- Links to bridge
    visit_charge                DECIMAL(10,2)
);
 
-- Query: Visits by diagnosis (through bridge)
-- Note: aggregates will double-count if patient has multiple diagnoses
SELECT 
    d.diagnosis_category,
    COUNT(*) * b.weighting_factor AS weighted_visit_count,
    SUM(f.visit_charge * b.weighting_factor) AS weighted_charges
FROM patient_visit_fact f
JOIN patient_diagnosis_bridge b 
    ON f.patient_diagnosis_group_key = b.patient_diagnosis_group_key
JOIN diagnosis_dim d 
    ON b.diagnosis_key = d.diagnosis_key
GROUP BY d.diagnosis_category;

Bridge Table Caution

Bridge tables introduce complexity and potential double-counting in aggregations. Use weighting factors when facts should be attributed proportionally across many-to-many relationships. Always document the intended aggregation semantics clearly.

Dimension Design Best Practices

Essential Dimension Design Practices

•Always use integer surrogate keys — Even if natural keys are unique and stable, surrogate keys enable SCD support and cross-system integration.
•Denormalize aggressively — Store all hierarchy levels as dimension attributes. One wide table, not snowflaked tables requiring multiple joins.
•Handle unknown/missing gracefully — Create an explicit row (key = -1) for unknown dimension members rather than using NULL foreign keys in facts.
•Include robust descriptive attributes — Dimensions should contain all the textual data users need for filtering and grouping. More attributes is usually better.
•Design for the questions, not the source data — Include derived, calculated, and banded attributes that match how business users think (age bands, value tiers, geographic regions).
•Implement appropriate SCD types per attribute — Different attributes within the same dimension may require different change handling.
•Conform dimensions across the enterprise — Shared business concepts (customer, product, date) should be single, shared dimensions.
•Include audit metadata — source_system, etl_load_timestamp, and effective dates support debugging and compliance.

Summary: Mastering Dimension Tables

Key Takeaways

•Dimensions provide context — They answer who, what, when, where, and why for fact measurements.
•Denormalization is intentional — Wide, flat dimension tables with embedded hierarchies optimize query performance.
•Surrogate keys enable flexibility — Integer keys support SCD, cross-system integration, and efficient joins.
•SCD types handle change — Type 1 overwrites, Type 2 preserves history, and hybrid approaches apply per attribute.
•Standard patterns accelerate design — Date dimensions, junk dimensions, role-playing dimensions, and bridge tables solve common challenges.
•Conformed dimensions unify analysis — Shared dimensions enable enterprise-wide reporting with consistent definitions.

What's Next:

With fact and dimension tables understood, we explore star joins—how queries combine these structures to answer business questions. You'll learn the join patterns that make star schemas so query-friendly and understand why this design dominates analytical database architecture.

Page Complete

You now understand dimension tables at the expert level—their structure, SCD handling, hierarchy design, and common patterns. Combined with your fact table knowledge, you have the foundation for designing enterprise-grade star schemas.

2 / 5

Loading learning content...

Database Management SystemsStar Schema

Star Schema: The Foundation of Dimensional Modeling

LevelIntermediate

Duration75 mins

TopicStar Schema

2 / 5

Dimension Tables: Providing Context and Meaning

The Who, What, When, Where, and Why of Your Data

What You Will Learn

Understanding Dimension Tables

Fundamental Characteristics

1. Descriptive Attributes (Not Measurements)

Dimension tables contain textual, descriptive data—names, descriptions, categories, codes, dates, and hierarchical relationships. They do not contain numeric measurements that would be aggregated.

2. Wide and Shallow Structure

3. Single Primary Key

Each dimension table has a surrogate primary key that fact tables reference. This key has no business meaning—it's simply an integer that enables efficient joins.

4. Denormalized Design

Dimension tables typically violate third normal form intentionally. Attributes are denormalized into flat structures that eliminate joins and simplify queries.

Example Product Dimension Structure
Column	Example Value	Purpose
product_key	4521	Surrogate key (FK in fact tables)
product_code	SKU-A2847	Natural/business key
product_name	Premium Wireless Mouse	Display name
product_description	Ergonomic wireless mouse with...	Full description
brand_name	TechCorp	Brand (denormalized)
category_name	Computer Peripherals	Category (denormalized)
subcategory_name	Input Devices	Subcategory (denormalized)
department_name	Electronics	Department (denormalized)
unit_cost	18.00	Current cost
unit_list_price	29.99	Current list price
package_size	Standard	Package type
color	Black	Product color
weight_oz	3.5	Weight in ounces
is_active	TRUE	Currently available flag
introduction_date	2022-01-15	When introduced
discontinue_date	NULL	When discontinued

Notice the Denormalization

Surrogate Keys vs Natural Keys

The Case for Surrogate Keys

Surrogate keys are strongly preferred in dimensional modeling, and nearly every production data warehouse uses them. Here's why:

1. Handle Slowly Changing Dimensions

2. Integrate Multiple Source Systems

3. Handle NULL Foreign Keys

4. Improve Join Performance

Integer surrogate keys are smaller (4 or 8 bytes) than character-based natural keys (often 20+ bytes). This reduces storage, improves cache efficiency, and speeds up joins.

Surrogate Key Benefits

•Enable SCD history tracking
•Support multiple source systems
•Handle late-arriving dimensions
•Efficient integer joins
•Warehouse independence from sources
•Consistent key format across dimensions

Natural Key Uses

•Preserved as alternate key
•Used for ETL matching
•Displayed in reports
•Required for source system lookups
•Part of unique constraints
•Useful for data lineage

dimension_key_structure.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Dimension table with surrogate key and natural key
CREATE TABLE customer_dim (
    -- Surrogate key: Used in all fact table relationships
    customer_key        INT IDENTITY(1,1) PRIMARY KEY,
    
    -- Natural key(s): Business identifiers from source systems
    customer_id         VARCHAR(20) NOT NULL,       -- CRM system ID
    erp_account_number  VARCHAR(15),                -- ERP system ID
    
    -- SCD tracking columns
    effective_date      DATE NOT NULL,
    expiration_date     DATE,                       -- NULL = current
    is_current          BIT DEFAULT 1,
    
    -- Descriptive attributes
    customer_name       VARCHAR(100) NOT NULL,
    email               VARCHAR(100),
    phone               VARCHAR(20),
    customer_type       VARCHAR(20),                -- B2B, B2C
    credit_rating       VARCHAR(10),
    
    -- Denormalized geographic hierarchy
    address_line1       VARCHAR(100),
    city                VARCHAR(50),
    state_province      VARCHAR(50),
    postal_code         VARCHAR(20),
    country             VARCHAR(50),
    region              VARCHAR(20),                -- Derived: Northeast, West, etc.
    
    -- Analytics segments
    acquisition_channel VARCHAR(30),
    loyalty_tier        VARCHAR(20),
    lifetime_value_band VARCHAR(20),
    
    -- ETL metadata
    source_system       VARCHAR(20),
    etl_load_timestamp  DATETIME DEFAULT GETDATE()
);
 
-- Unique constraint on natural key + version
CREATE UNIQUE INDEX ix_customer_natural 
    ON customer_dim(customer_id, effective_date);

Slowly Changing Dimensions (SCD)

The answer depends on whether you need to track history. Slowly Changing Dimension (SCD) techniques provide standardized approaches, classified into types:

SCD Type 0: Retain Original

Use Case: Audit attributes, original classification values, historical research.

SCD Type 1: Overwrite

When a change occurs, simply overwrite the old value with the new value. No history is maintained.

Characteristics:

Simplest to implement
Dimension table size stays constant
Historical facts are restated to reflect current values
Appropriate when history is irrelevant or when correcting data errors

scd_type_1_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- SCD Type 1: Overwrite
-- Customer changes email address: just update the row
 
-- Before update:
-- customer_key | customer_id | email
-- 1042         | CUST-A457   | john@oldmail.com
 
UPDATE customer_dim
SET email = 'john@newmail.com',
    etl_load_timestamp = GETDATE()
WHERE customer_key = 1042;
 
-- After update:
-- customer_key | customer_id | email
-- 1042         | CUST-A457   | john@newmail.com
 
-- All historical facts linked to customer_key 1042 now 
-- show the new email when joined.

SCD Type 2: Add New Row (Full History)

Characteristics:

Maintains complete history
Creates multiple rows per natural key
Historical facts remain linked to the dimension version that existed at transaction time
Most accurate for historical analysis
Increases dimension table size over time

scd_type_2_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SCD Type 2: Add New Row
-- Customer moves from Boston to Seattle
 
-- BEFORE: Single row
-- customer_key | customer_id | city    | effective_date | expiration_date | is_current
-- 1042         | CUST-A457   | Boston  | 2020-01-15     | NULL            | 1
 
-- Step 1: Expire the current row
UPDATE customer_dim
SET expiration_date = '2024-06-30',
    is_current = 0
WHERE customer_key = 1042;
 
-- Step 2: Insert new row with new surrogate key
INSERT INTO customer_dim (
    customer_id, customer_name, email, city, state_province,
    effective_date, expiration_date, is_current
)
VALUES (
    'CUST-A457', 'John Smith', 'john@email.com', 'Seattle', 'WA',
    '2024-07-01', NULL, 1
);
 
-- AFTER: Two rows
-- customer_key | customer_id | city    | effective_date | expiration_date | is_current
-- 1042         | CUST-A457   | Boston  | 2020-01-15     | 2024-06-30      | 0
-- 5831         | CUST-A457   | Seattle | 2024-07-01     | NULL            | 1
 
-- Historical facts joined on customer_key = 1042 show Boston
-- New facts joined on customer_key = 5831 show Seattle
-- Both are the same logical customer (CUST-A457)

SCD Type 3: Add New Column

Store both the current and previous value by adding a column for the old value. Limited history—typically only the immediately previous value.

Characteristics:

Fixed number of rows (no growth from changes)
Limited history (usually just current and previous)
Enables "current vs. prior" comparisons
Rarely used in practice due to inflexibility

scd_type_3_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- SCD Type 3: Add Column for Previous Value
-- Track current and previous sales territory
 
CREATE TABLE salesperson_dim (
    salesperson_key     INT PRIMARY KEY,
    salesperson_id      VARCHAR(20),
    salesperson_name    VARCHAR(100),
    
    -- Current value
    current_territory   VARCHAR(50),
    territory_change_date DATE,
    
    -- Previous value (Type 3)
    prior_territory     VARCHAR(50)
);
 
-- When territory changes: Seattle → Portland
UPDATE salesperson_dim
SET prior_territory = current_territory,
    current_territory = 'Portland',
    territory_change_date = '2024-07-01'
WHERE salesperson_key = 287;
 
-- Enables queries like:
-- "Sales by current territory vs prior territory"

SCD Type Comparison
Type	History	Complexity	Space Growth	Best Use Case
Type 0	None (fixed)	Trivial	None	Original values, audit data
Type 1	None (overwrite)	Simple	None	Corrections, name standardization
Type 2	Full	Complex	Linear with changes	Most business dimensions
Type 3	Limited (prev only)	Moderate	Fixed (extra columns)	Before/after comparisons

Hybrid Approaches

Common Dimension Types

Certain dimensions appear in nearly every data warehouse. Understanding these standard patterns accelerates design and ensures you don't overlook critical functionality.

The Date Dimension

Essential Date Dimension Attributes:

date_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
CREATE TABLE date_dim (
    -- Surrogate key (often formatted as YYYYMMDD for convenience)
    date_key                INT PRIMARY KEY,            -- 20240115
    
    -- Full date
    full_date               DATE NOT NULL UNIQUE,       -- 2024-01-15
    
    -- Calendar attributes
    calendar_year           INT NOT NULL,               -- 2024
    calendar_quarter        INT NOT NULL,               -- 1
    calendar_month          INT NOT NULL,               -- 1
    calendar_week           INT NOT NULL,               -- 3
    day_of_year             INT NOT NULL,               -- 15
    day_of_quarter          INT NOT NULL,               -- 15
    day_of_month            INT NOT NULL,               -- 15
    day_of_week             INT NOT NULL,               -- 2 (Monday)
    
    -- Display names
    day_name                VARCHAR(10) NOT NULL,       -- 'Monday'
    month_name              VARCHAR(10) NOT NULL,       -- 'January'
    quarter_name            VARCHAR(10) NOT NULL,       -- 'Q1'
    year_month              VARCHAR(7) NOT NULL,        -- '2024-01'
    year_quarter            VARCHAR(7) NOT NULL,        -- '2024-Q1'
    
    -- Fiscal calendar (company-specific)
    fiscal_year             INT NOT NULL,               -- 2024 (or offset)
    fiscal_quarter          INT NOT NULL,
    fiscal_month            INT NOT NULL,
    fiscal_week             INT NOT NULL,
    fiscal_period           VARCHAR(20),                -- 'FY24-P07'
    
    -- Flags
    is_weekend              BIT NOT NULL,               -- 1 for Sat/Sun
    is_holiday              BIT DEFAULT 0,              -- Company holidays
    is_business_day         BIT NOT NULL,               -- Workdays
    is_month_end            BIT NOT NULL,
    is_quarter_end          BIT NOT NULL,
    is_year_end             BIT NOT NULL,
    
    -- Holiday attributes
    holiday_name            VARCHAR(50),                -- 'Independence Day'
    
    -- Prior period references (for easy comparisons)
    prior_year_date_key     INT,                        -- Same date last year
    prior_month_date_key    INT,                        -- Same date last month
    prior_week_date_key     INT                         -- Same date last week
);
 
-- Pre-populate 20 years of dates
-- The date dimension is generated, not loaded from source systems

The Time Dimension

For fact tables that require intraday granularity, a time dimension (separate from the date dimension) provides time-of-day analysis.

Example Time Dimension Attributes:

time_key (e.g., 083000 for 8:30:00 AM)
full_time (08:30:00)
hour_of_day (8)
minute_of_hour (30)
am_pm ('AM')
day_part ('Morning', 'Afternoon', 'Evening', 'Night')
business_hour (TRUE for 9-5)

Junk Dimensions

Junk dimensions consolidate miscellaneous, low-cardinality flags and indicators into a single dimension rather than cluttering the fact table.

Instead of having five boolean columns in the fact table (is_returned, is_gift, is_rush, is_taxable, is_discounted), create a junk dimension with all combinations:

junk_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Junk dimension consolidating transaction flags
CREATE TABLE transaction_flags_dim (
    transaction_flags_key   INT PRIMARY KEY,
    is_returned             BIT NOT NULL,
    is_gift                 BIT NOT NULL,
    is_rush_order           BIT NOT NULL,
    is_taxable              BIT NOT NULL,
    is_discounted           BIT NOT NULL
);
 
-- Pre-populate all combinations: 2^5 = 32 rows
-- transaction_flags_key | is_returned | is_gift | is_rush | is_taxable | is_discounted
-- 1                     | 0           | 0       | 0       | 0          | 0
-- 2                     | 0           | 0       | 0       | 0          | 1
-- 3                     | 0           | 0       | 0       | 1          | 0
-- ... (32 combinations total)
 
-- Fact table references single junk dimension key
CREATE TABLE sales_fact (
    sale_key                BIGINT,
    date_key                INT,
    product_key             INT,
    customer_key            INT,
    transaction_flags_key   INT,    -- Single FK replaces 5 columns
    sales_amount            DECIMAL(12,2)
);

Role-Playing Dimensions

Role-playing dimensions occur when the same dimension table is joined to a fact table multiple times, each time playing a different role.

Classic example: A shipment fact table with order_date, ship_date, and delivery_date. All three reference the same date dimension, but each join plays a different analytical role.

role_playing_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Single date dimension used in three roles
SELECT 
    od.calendar_month AS order_month,
    sd.calendar_month AS ship_month,
    dd.calendar_month AS delivery_month,
    AVG(DATEDIFF(day, od.full_date, sd.full_date)) AS avg_order_to_ship,
    AVG(DATEDIFF(day, sd.full_date, dd.full_date)) AS avg_ship_to_delivery
FROM shipment_fact f
JOIN date_dim od ON f.order_date_key = od.date_key      -- Role: Order Date
JOIN date_dim sd ON f.ship_date_key = sd.date_key       -- Role: Ship Date  
JOIN date_dim dd ON f.delivery_date_key = dd.date_key   -- Role: Delivery Date
GROUP BY od.calendar_month, sd.calendar_month, dd.calendar_month;
 
-- Optionally create views for role clarity:
CREATE VIEW order_date AS SELECT * FROM date_dim;
CREATE VIEW ship_date AS SELECT * FROM date_dim;
CREATE VIEW delivery_date AS SELECT * FROM date_dim;

Dimensional Hierarchies

Types of Hierarchies

Balanced Hierarchies

All branches of the hierarchy have the same depth, and every level is meaningful.

Example: Standard Date Hierarchy

Year → Quarter → Month → Week → Day
2024 → Q1 → January → Week 3 → January 15

Ragged (Unbalanced) Hierarchies

Different branches have different depths. Some paths skip levels.

Example: Geographic Hierarchy

World → Region → Country → State → City
                                    ↓
World → Region → Country → City (small countries skip state level)

Parent-Child Hierarchies

Self-referencing hierarchies of unknown depth, where each node points to its parent.

Example: Organizational Hierarchy

CEO → VP Sales → Regional Director → Territory Manager → Sales Rep
(Depth varies by department and organization structure)

denormalized_hierarchy.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- Denormalized hierarchy in product dimension
-- All hierarchy levels stored as attributes (the dimensional modeling way)
 
CREATE TABLE product_dim (
    product_key             INT PRIMARY KEY,
    product_code            VARCHAR(20),
    product_name            VARCHAR(100),
    
    -- Hierarchy Level 1: Department
    department_code         VARCHAR(10),
    department_name         VARCHAR(50),
    
    -- Hierarchy Level 2: Category
    category_code           VARCHAR(10),
    category_name           VARCHAR(50),
    
    -- Hierarchy Level 3: Subcategory
    subcategory_code        VARCHAR(10),
    subcategory_name        VARCHAR(50),
    
    -- Hierarchy Level 4: Brand
    brand_code              VARCHAR(10),
    brand_name              VARCHAR(50),
    
    -- Product is Level 5 (most granular)
    -- ...other product attributes
);
 
-- Query at any hierarchy level with simple GROUP BY
-- Sales by Department
SELECT department_name, SUM(sales_amount)
FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_key
GROUP BY department_name;
 
-- Sales by Category within Department
SELECT department_name, category_name, SUM(sales_amount)
FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_key
GROUP BY department_name, category_name;
 
-- Drill-down to product
SELECT department_name, category_name, subcategory_name, 
       brand_name, product_name, SUM(sales_amount)
FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_key
WHERE department_name = 'Electronics'
GROUP BY department_name, category_name, subcategory_name, 
         brand_name, product_name;

Denormalization Is The Key

Dimension Design Patterns

Expert dimension designers apply established patterns that solve common challenges. These patterns represent accumulated wisdom from thousands of data warehouse implementations.

Conformed Dimensions

Benefits of conformed dimensions:

Consistent business definitions
Cross-functional analysis (marketing campaigns impact on sales)
Reduced ETL complexity
Single source of truth

conformed_dimension.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Conformed customer dimension used across multiple fact tables
-- Same dimension, same meaning, everywhere
 
-- Sales Fact
SELECT c.customer_segment, SUM(sf.sales_amount) AS sales
FROM sales_fact sf
JOIN customer_dim c ON sf.customer_key = c.customer_key
GROUP BY c.customer_segment;
 
-- Service Calls Fact (same customer dimension)
SELECT c.customer_segment, COUNT(*) AS support_calls
FROM service_call_fact scf
JOIN customer_dim c ON scf.customer_key = c.customer_key
GROUP BY c.customer_segment;
 
-- Combined cross-fact analysis
SELECT 
    c.customer_segment,
    SUM(sf.sales_amount) AS sales,
    COUNT(DISTINCT scf.call_key) AS support_calls,
    SUM(sf.sales_amount) / NULLIF(COUNT(DISTINCT scf.call_key), 0) AS sales_per_call
FROM customer_dim c
LEFT JOIN sales_fact sf ON c.customer_key = sf.customer_key
LEFT JOIN service_call_fact scf ON c.customer_key = scf.customer_key
GROUP BY c.customer_segment;

Bridge Tables for Many-to-Many Relationships

Bridge tables (also called mapping tables) handle many-to-many relationships between fact tables and dimensions that cannot be modeled with a simple foreign key.

Classic example: A patient might have multiple diagnoses, and each diagnosis appears on multiple patients. A patient visit fact cannot have a single diagnosis foreign key.

bridge_table.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Bridge table pattern for many-to-many relationship
-- Patient visits can have multiple diagnoses
 
-- Diagnosis Bridge Table
CREATE TABLE patient_diagnosis_bridge (
    patient_diagnosis_group_key INT,    -- Group identifier
    diagnosis_key               INT,     -- FK to diagnosis_dim
    diagnosis_priority          INT,     -- Primary=1, Secondary=2, etc.
    weighting_factor            DECIMAL(5,4)  -- For weighted aggregation
);
 
-- Patient Visit Fact references the group, not individual diagnoses
CREATE TABLE patient_visit_fact (
    visit_key                   BIGINT PRIMARY KEY,
    patient_key                 INT,
    date_key                    INT,
    provider_key                INT,
    patient_diagnosis_group_key INT,     -- Links to bridge
    visit_charge                DECIMAL(10,2)
);
 
-- Query: Visits by diagnosis (through bridge)
-- Note: aggregates will double-count if patient has multiple diagnoses
SELECT 
    d.diagnosis_category,
    COUNT(*) * b.weighting_factor AS weighted_visit_count,
    SUM(f.visit_charge * b.weighting_factor) AS weighted_charges
FROM patient_visit_fact f
JOIN patient_diagnosis_bridge b 
    ON f.patient_diagnosis_group_key = b.patient_diagnosis_group_key
JOIN diagnosis_dim d 
    ON b.diagnosis_key = d.diagnosis_key
GROUP BY d.diagnosis_category;

Bridge Table Caution

Dimension Design Best Practices

Essential Dimension Design Practices

•Always use integer surrogate keys — Even if natural keys are unique and stable, surrogate keys enable SCD support and cross-system integration.
•Denormalize aggressively — Store all hierarchy levels as dimension attributes. One wide table, not snowflaked tables requiring multiple joins.
•Handle unknown/missing gracefully — Create an explicit row (key = -1) for unknown dimension members rather than using NULL foreign keys in facts.
•Include robust descriptive attributes — Dimensions should contain all the textual data users need for filtering and grouping. More attributes is usually better.
•Design for the questions, not the source data — Include derived, calculated, and banded attributes that match how business users think (age bands, value tiers, geographic regions).
•Implement appropriate SCD types per attribute — Different attributes within the same dimension may require different change handling.
•Conform dimensions across the enterprise — Shared business concepts (customer, product, date) should be single, shared dimensions.
•Include audit metadata — source_system, etl_load_timestamp, and effective dates support debugging and compliance.

Summary: Mastering Dimension Tables

Key Takeaways

•Dimensions provide context — They answer who, what, when, where, and why for fact measurements.
•Denormalization is intentional — Wide, flat dimension tables with embedded hierarchies optimize query performance.
•Surrogate keys enable flexibility — Integer keys support SCD, cross-system integration, and efficient joins.
•SCD types handle change — Type 1 overwrites, Type 2 preserves history, and hybrid approaches apply per attribute.
•Standard patterns accelerate design — Date dimensions, junk dimensions, role-playing dimensions, and bridge tables solve common challenges.
•Conformed dimensions unify analysis — Shared dimensions enable enterprise-wide reporting with consistent definitions.

What's Next:

Page Complete

2 / 5