Loading learning content...
A fact table tells you that $89.97 was transacted—but by itself, this number is meaningless. Who made the purchase? What was bought? When did it happen? Where did the transaction occur? Why were they buying (response to a promotion? seasonal need?).
Dimension tables answer all of these questions. They are the structures that transform raw measurements into actionable business intelligence. While fact tables store the numbers, dimension tables store the words—the textual, descriptive attributes that give facts meaning.
In the star schema, dimension tables radiate outward from the central fact table like points on a star. Each dimension represents a perspective for analyzing the facts—a way of slicing, filtering, grouping, and labeling the data. Master dimension design, and you master the ability to answer any business question.
By the end of this page, you will understand dimension table anatomy, design denormalized dimension structures, implement slowly changing dimension (SCD) patterns to track historical changes, model dimensional hierarchies, and apply design patterns used in enterprise-scale data warehouses.
A dimension table stores descriptive, textual information about business entities. It provides the context necessary to analyze and aggregate fact table measurements. Where fact tables are numeric and narrow, dimension tables are textual and wide.
1. Descriptive Attributes (Not Measurements)
Dimension tables contain textual, descriptive data—names, descriptions, categories, codes, dates, and hierarchical relationships. They do not contain numeric measurements that would be aggregated.
2. Wide and Shallow Structure
Unlike the narrow-and-deep fact tables (few columns, billions of rows), dimension tables are wide (many columns—often 50 to 100+ attributes) but shallow (relatively few rows—thousands to millions).
3. Single Primary Key
Each dimension table has a surrogate primary key that fact tables reference. This key has no business meaning—it's simply an integer that enables efficient joins.
4. Denormalized Design
Dimension tables typically violate third normal form intentionally. Attributes are denormalized into flat structures that eliminate joins and simplify queries.
| Column | Example Value | Purpose |
|---|---|---|
| product_key | 4521 | Surrogate key (FK in fact tables) |
| product_code | SKU-A2847 | Natural/business key |
| product_name | Premium Wireless Mouse | Display name |
| product_description | Ergonomic wireless mouse with... | Full description |
| brand_name | TechCorp | Brand (denormalized) |
| category_name | Computer Peripherals | Category (denormalized) |
| subcategory_name | Input Devices | Subcategory (denormalized) |
| department_name | Electronics | Department (denormalized) |
| unit_cost | 18.00 | Current cost |
| unit_list_price | 29.99 | Current list price |
| package_size | Standard | Package type |
| color | Black | Product color |
| weight_oz | 3.5 | Weight in ounces |
| is_active | TRUE | Currently available flag |
| introduction_date | 2022-01-15 | When introduced |
| discontinue_date | NULL | When discontinued |
The product dimension includes brand_name, category_name, subcategory_name, and department_name directly—not as foreign keys to separate tables. This intentional denormalization simplifies queries and improves performance. A single join to the product dimension gives you all product context.
One of the most important dimensional modeling decisions is the choice between natural keys (business identifiers like SKU codes, employee IDs, or social security numbers) and surrogate keys (meaningless integers generated by the data warehouse).
Surrogate keys are strongly preferred in dimensional modeling, and nearly every production data warehouse uses them. Here's why:
1. Handle Slowly Changing Dimensions
When a customer's address changes, do you update the existing row or create a new one to track history? With natural keys, you're stuck with a single customer record. With surrogate keys, you can have multiple rows for the same customer, each with a different surrogate key representing a different version.
2. Integrate Multiple Source Systems
Your customer might be identified as C-1042 in the CRM, CUST_1042 in the ERP, and 1042@company.com in the marketing platform. Surrogate keys let you unify these into a single dimension while preserving the ability to trace back to source systems.
3. Handle NULL Foreign Keys
Some fact rows may have missing dimension values ("unknown customer"). With natural keys, you're forced to use NULL, which complicates joins and aggregations. With surrogate keys, you create a special "Unknown" dimension row with key -1 or 0.
4. Improve Join Performance
Integer surrogate keys are smaller (4 or 8 bytes) than character-based natural keys (often 20+ bytes). This reduces storage, improves cache efficiency, and speeds up joins.
123456789101112131415161718192021222324252627282930313233343536373839404142
-- Dimension table with surrogate key and natural keyCREATE TABLE customer_dim ( -- Surrogate key: Used in all fact table relationships customer_key INT IDENTITY(1,1) PRIMARY KEY, -- Natural key(s): Business identifiers from source systems customer_id VARCHAR(20) NOT NULL, -- CRM system ID erp_account_number VARCHAR(15), -- ERP system ID -- SCD tracking columns effective_date DATE NOT NULL, expiration_date DATE, -- NULL = current is_current BIT DEFAULT 1, -- Descriptive attributes customer_name VARCHAR(100) NOT NULL, email VARCHAR(100), phone VARCHAR(20), customer_type VARCHAR(20), -- B2B, B2C credit_rating VARCHAR(10), -- Denormalized geographic hierarchy address_line1 VARCHAR(100), city VARCHAR(50), state_province VARCHAR(50), postal_code VARCHAR(20), country VARCHAR(50), region VARCHAR(20), -- Derived: Northeast, West, etc. -- Analytics segments acquisition_channel VARCHAR(30), loyalty_tier VARCHAR(20), lifetime_value_band VARCHAR(20), -- ETL metadata source_system VARCHAR(20), etl_load_timestamp DATETIME DEFAULT GETDATE()); -- Unique constraint on natural key + versionCREATE UNIQUE INDEX ix_customer_natural ON customer_dim(customer_id, effective_date);Dimension data changes over time. Customers move to new addresses. Products get reclassified into different categories. Employees change departments. The question is: how should the data warehouse handle these changes?
The answer depends on whether you need to track history. Slowly Changing Dimension (SCD) techniques provide standardized approaches, classified into types:
The attribute value is fixed at the first time the dimension member appears and never changes. Some attributes should never update—the original acquisition date of a customer, the original category under which a product was first sold.
Use Case: Audit attributes, original classification values, historical research.
When a change occurs, simply overwrite the old value with the new value. No history is maintained.
Characteristics:
123456789101112131415161718
-- SCD Type 1: Overwrite-- Customer changes email address: just update the row -- Before update:-- customer_key | customer_id | email-- 1042 | CUST-A457 | john@oldmail.com UPDATE customer_dimSET email = 'john@newmail.com', etl_load_timestamp = GETDATE()WHERE customer_key = 1042; -- After update:-- customer_key | customer_id | email-- 1042 | CUST-A457 | john@newmail.com -- All historical facts linked to customer_key 1042 now -- show the new email when joined.The most commonly used SCD type. When a change occurs, mark the current row as expired and insert a new row with the updated values. Each version of the dimension member gets a unique surrogate key.
Characteristics:
12345678910111213141516171819202122232425262728293031
-- SCD Type 2: Add New Row-- Customer moves from Boston to Seattle -- BEFORE: Single row-- customer_key | customer_id | city | effective_date | expiration_date | is_current-- 1042 | CUST-A457 | Boston | 2020-01-15 | NULL | 1 -- Step 1: Expire the current rowUPDATE customer_dimSET expiration_date = '2024-06-30', is_current = 0WHERE customer_key = 1042; -- Step 2: Insert new row with new surrogate keyINSERT INTO customer_dim ( customer_id, customer_name, email, city, state_province, effective_date, expiration_date, is_current)VALUES ( 'CUST-A457', 'John Smith', 'john@email.com', 'Seattle', 'WA', '2024-07-01', NULL, 1); -- AFTER: Two rows-- customer_key | customer_id | city | effective_date | expiration_date | is_current-- 1042 | CUST-A457 | Boston | 2020-01-15 | 2024-06-30 | 0-- 5831 | CUST-A457 | Seattle | 2024-07-01 | NULL | 1 -- Historical facts joined on customer_key = 1042 show Boston-- New facts joined on customer_key = 5831 show Seattle-- Both are the same logical customer (CUST-A457)Store both the current and previous value by adding a column for the old value. Limited history—typically only the immediately previous value.
Characteristics:
12345678910111213141516171819202122232425
-- SCD Type 3: Add Column for Previous Value-- Track current and previous sales territory CREATE TABLE salesperson_dim ( salesperson_key INT PRIMARY KEY, salesperson_id VARCHAR(20), salesperson_name VARCHAR(100), -- Current value current_territory VARCHAR(50), territory_change_date DATE, -- Previous value (Type 3) prior_territory VARCHAR(50)); -- When territory changes: Seattle → PortlandUPDATE salesperson_dimSET prior_territory = current_territory, current_territory = 'Portland', territory_change_date = '2024-07-01'WHERE salesperson_key = 287; -- Enables queries like:-- "Sales by current territory vs prior territory"| Type | History | Complexity | Space Growth | Best Use Case |
|---|---|---|---|---|
| Type 0 | None (fixed) | Trivial | None | Original values, audit data |
| Type 1 | None (overwrite) | Simple | None | Corrections, name standardization |
| Type 2 | Full | Complex | Linear with changes | Most business dimensions |
| Type 3 | Limited (prev only) | Moderate | Fixed (extra columns) | Before/after comparisons |
Real dimensions often use mixed SCD types for different attributes. Customer name might be Type 1 (corrections), address might be Type 2 (full history), and acquisition date might be Type 0 (never changes). The SCD type is an attribute-level decision, not a table-level decision.
Certain dimensions appear in nearly every data warehouse. Understanding these standard patterns accelerates design and ensures you don't overlook critical functionality.
The date dimension is the most universal dimension—virtually every fact table includes a date foreign key. Unlike other dimensions that are populated from source system data, the date dimension is generated once and pre-populated with all needed dates.
Essential Date Dimension Attributes:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
CREATE TABLE date_dim ( -- Surrogate key (often formatted as YYYYMMDD for convenience) date_key INT PRIMARY KEY, -- 20240115 -- Full date full_date DATE NOT NULL UNIQUE, -- 2024-01-15 -- Calendar attributes calendar_year INT NOT NULL, -- 2024 calendar_quarter INT NOT NULL, -- 1 calendar_month INT NOT NULL, -- 1 calendar_week INT NOT NULL, -- 3 day_of_year INT NOT NULL, -- 15 day_of_quarter INT NOT NULL, -- 15 day_of_month INT NOT NULL, -- 15 day_of_week INT NOT NULL, -- 2 (Monday) -- Display names day_name VARCHAR(10) NOT NULL, -- 'Monday' month_name VARCHAR(10) NOT NULL, -- 'January' quarter_name VARCHAR(10) NOT NULL, -- 'Q1' year_month VARCHAR(7) NOT NULL, -- '2024-01' year_quarter VARCHAR(7) NOT NULL, -- '2024-Q1' -- Fiscal calendar (company-specific) fiscal_year INT NOT NULL, -- 2024 (or offset) fiscal_quarter INT NOT NULL, fiscal_month INT NOT NULL, fiscal_week INT NOT NULL, fiscal_period VARCHAR(20), -- 'FY24-P07' -- Flags is_weekend BIT NOT NULL, -- 1 for Sat/Sun is_holiday BIT DEFAULT 0, -- Company holidays is_business_day BIT NOT NULL, -- Workdays is_month_end BIT NOT NULL, is_quarter_end BIT NOT NULL, is_year_end BIT NOT NULL, -- Holiday attributes holiday_name VARCHAR(50), -- 'Independence Day' -- Prior period references (for easy comparisons) prior_year_date_key INT, -- Same date last year prior_month_date_key INT, -- Same date last month prior_week_date_key INT -- Same date last week); -- Pre-populate 20 years of dates-- The date dimension is generated, not loaded from source systemsFor fact tables that require intraday granularity, a time dimension (separate from the date dimension) provides time-of-day analysis.
Example Time Dimension Attributes:
Junk dimensions consolidate miscellaneous, low-cardinality flags and indicators into a single dimension rather than cluttering the fact table.
Instead of having five boolean columns in the fact table (is_returned, is_gift, is_rush, is_taxable, is_discounted), create a junk dimension with all combinations:
1234567891011121314151617181920212223242526
-- Junk dimension consolidating transaction flagsCREATE TABLE transaction_flags_dim ( transaction_flags_key INT PRIMARY KEY, is_returned BIT NOT NULL, is_gift BIT NOT NULL, is_rush_order BIT NOT NULL, is_taxable BIT NOT NULL, is_discounted BIT NOT NULL); -- Pre-populate all combinations: 2^5 = 32 rows-- transaction_flags_key | is_returned | is_gift | is_rush | is_taxable | is_discounted-- 1 | 0 | 0 | 0 | 0 | 0-- 2 | 0 | 0 | 0 | 0 | 1-- 3 | 0 | 0 | 0 | 1 | 0-- ... (32 combinations total) -- Fact table references single junk dimension keyCREATE TABLE sales_fact ( sale_key BIGINT, date_key INT, product_key INT, customer_key INT, transaction_flags_key INT, -- Single FK replaces 5 columns sales_amount DECIMAL(12,2));Role-playing dimensions occur when the same dimension table is joined to a fact table multiple times, each time playing a different role.
Classic example: A shipment fact table with order_date, ship_date, and delivery_date. All three reference the same date dimension, but each join plays a different analytical role.
1234567891011121314151617
-- Single date dimension used in three rolesSELECT od.calendar_month AS order_month, sd.calendar_month AS ship_month, dd.calendar_month AS delivery_month, AVG(DATEDIFF(day, od.full_date, sd.full_date)) AS avg_order_to_ship, AVG(DATEDIFF(day, sd.full_date, dd.full_date)) AS avg_ship_to_deliveryFROM shipment_fact fJOIN date_dim od ON f.order_date_key = od.date_key -- Role: Order DateJOIN date_dim sd ON f.ship_date_key = sd.date_key -- Role: Ship Date JOIN date_dim dd ON f.delivery_date_key = dd.date_key -- Role: Delivery DateGROUP BY od.calendar_month, sd.calendar_month, dd.calendar_month; -- Optionally create views for role clarity:CREATE VIEW order_date AS SELECT * FROM date_dim;CREATE VIEW ship_date AS SELECT * FROM date_dim;CREATE VIEW delivery_date AS SELECT * FROM date_dim;Hierarchies define drill-down relationships within dimensions—natural aggregation levels where data can be summarized. Understanding hierarchies is essential for designing dimensions that support interactive analysis.
All branches of the hierarchy have the same depth, and every level is meaningful.
Example: Standard Date Hierarchy
Year → Quarter → Month → Week → Day
2024 → Q1 → January → Week 3 → January 15
Different branches have different depths. Some paths skip levels.
Example: Geographic Hierarchy
World → Region → Country → State → City
↓
World → Region → Country → City (small countries skip state level)
Self-referencing hierarchies of unknown depth, where each node points to its parent.
Example: Organizational Hierarchy
CEO → VP Sales → Regional Director → Territory Manager → Sales Rep
(Depth varies by department and organization structure)
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- Denormalized hierarchy in product dimension-- All hierarchy levels stored as attributes (the dimensional modeling way) CREATE TABLE product_dim ( product_key INT PRIMARY KEY, product_code VARCHAR(20), product_name VARCHAR(100), -- Hierarchy Level 1: Department department_code VARCHAR(10), department_name VARCHAR(50), -- Hierarchy Level 2: Category category_code VARCHAR(10), category_name VARCHAR(50), -- Hierarchy Level 3: Subcategory subcategory_code VARCHAR(10), subcategory_name VARCHAR(50), -- Hierarchy Level 4: Brand brand_code VARCHAR(10), brand_name VARCHAR(50), -- Product is Level 5 (most granular) -- ...other product attributes); -- Query at any hierarchy level with simple GROUP BY-- Sales by DepartmentSELECT department_name, SUM(sales_amount)FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_keyGROUP BY department_name; -- Sales by Category within DepartmentSELECT department_name, category_name, SUM(sales_amount)FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_keyGROUP BY department_name, category_name; -- Drill-down to productSELECT department_name, category_name, subcategory_name, brand_name, product_name, SUM(sales_amount)FROM sales_fact f JOIN product_dim p ON f.product_key = p.product_keyWHERE department_name = 'Electronics'GROUP BY department_name, category_name, subcategory_name, brand_name, product_name;Notice that the hierarchy is completely flattened into single columns. There are no separate Department, Category, or Subcategory tables. This denormalization—which would be incorrect in OLTP design—is intentional for OLAP. It enables any level of drill-down with a single join and simple GROUP BY clauses.
Expert dimension designers apply established patterns that solve common challenges. These patterns represent accumulated wisdom from thousands of data warehouse implementations.
Conformed dimensions are dimensions shared across multiple fact tables (and potentially multiple data marts or subject areas). They ensure consistent analysis and enable cross-fact-table reporting.
Example: The Customer dimension is used by Sales facts, Service Call facts, and Marketing Campaign facts. A single, shared customer dimension ensures that "customer" means the same thing everywhere.
Benefits of conformed dimensions:
12345678910111213141516171819202122232425
-- Conformed customer dimension used across multiple fact tables-- Same dimension, same meaning, everywhere -- Sales FactSELECT c.customer_segment, SUM(sf.sales_amount) AS salesFROM sales_fact sfJOIN customer_dim c ON sf.customer_key = c.customer_keyGROUP BY c.customer_segment; -- Service Calls Fact (same customer dimension)SELECT c.customer_segment, COUNT(*) AS support_callsFROM service_call_fact scfJOIN customer_dim c ON scf.customer_key = c.customer_keyGROUP BY c.customer_segment; -- Combined cross-fact analysisSELECT c.customer_segment, SUM(sf.sales_amount) AS sales, COUNT(DISTINCT scf.call_key) AS support_calls, SUM(sf.sales_amount) / NULLIF(COUNT(DISTINCT scf.call_key), 0) AS sales_per_callFROM customer_dim cLEFT JOIN sales_fact sf ON c.customer_key = sf.customer_keyLEFT JOIN service_call_fact scf ON c.customer_key = scf.customer_keyGROUP BY c.customer_segment;Bridge tables (also called mapping tables) handle many-to-many relationships between fact tables and dimensions that cannot be modeled with a simple foreign key.
Classic example: A patient might have multiple diagnoses, and each diagnosis appears on multiple patients. A patient visit fact cannot have a single diagnosis foreign key.
123456789101112131415161718192021222324252627282930313233
-- Bridge table pattern for many-to-many relationship-- Patient visits can have multiple diagnoses -- Diagnosis Bridge TableCREATE TABLE patient_diagnosis_bridge ( patient_diagnosis_group_key INT, -- Group identifier diagnosis_key INT, -- FK to diagnosis_dim diagnosis_priority INT, -- Primary=1, Secondary=2, etc. weighting_factor DECIMAL(5,4) -- For weighted aggregation); -- Patient Visit Fact references the group, not individual diagnosesCREATE TABLE patient_visit_fact ( visit_key BIGINT PRIMARY KEY, patient_key INT, date_key INT, provider_key INT, patient_diagnosis_group_key INT, -- Links to bridge visit_charge DECIMAL(10,2)); -- Query: Visits by diagnosis (through bridge)-- Note: aggregates will double-count if patient has multiple diagnosesSELECT d.diagnosis_category, COUNT(*) * b.weighting_factor AS weighted_visit_count, SUM(f.visit_charge * b.weighting_factor) AS weighted_chargesFROM patient_visit_fact fJOIN patient_diagnosis_bridge b ON f.patient_diagnosis_group_key = b.patient_diagnosis_group_keyJOIN diagnosis_dim d ON b.diagnosis_key = d.diagnosis_keyGROUP BY d.diagnosis_category;Bridge tables introduce complexity and potential double-counting in aggregations. Use weighting factors when facts should be attributed proportionally across many-to-many relationships. Always document the intended aggregation semantics clearly.
What's Next:
With fact and dimension tables understood, we explore star joins—how queries combine these structures to answer business questions. You'll learn the join patterns that make star schemas so query-friendly and understand why this design dominates analytical database architecture.
You now understand dimension tables at the expert level—their structure, SCD handling, hierarchy design, and common patterns. Combined with your fact table knowledge, you have the foundation for designing enterprise-grade star schemas.