Database Management SystemsOLTP vs OLAP Considerations

OLTP vs OLAP Considerations

LevelAdvanced

Duration75 mins

TopicOLTP vs OLAP Considerations

3 / 5

Star Schema

The Blueprint for Analytical Data

The star schema is the most widely adopted pattern for organizing data in analytical databases and data warehouses. Named for its visual appearance—a central fact table surrounded by dimension tables forming a star shape—this schema pattern formalizes the principles of OLAP denormalization into a repeatable, maintainable architecture.

Developed by Ralph Kimball in the 1990s, the star schema has become the de facto standard for dimensional modeling. Every major data warehousing technology, BI tool, and analytics platform is optimized for star schema access patterns. Understanding star schema design is essential for any data professional working with analytical systems.

What You Will Master

By the end of this page, you will understand the complete anatomy of star schemas—fact tables, dimension tables, surrogate keys, and grain. You'll learn design principles that ensure query performance, maintainability, and flexibility. This knowledge enables you to design analytical schemas that serve business intelligence needs efficiently.

Star Schema Architecture

A star schema consists of two types of tables with a characteristic structure:

Fact Tables:

Contain measurements or metrics (quantities, amounts, counts)
Store foreign keys to dimension tables
Represent business events or transactions
Typically very large (billions of rows)
Narrow compared to fully denormalized tables

Dimension Tables:

Contain descriptive attributes (names, categories, hierarchies)
Store the context for facts
Represent business entities (customers, products, locations)
Typically smaller (thousands to millions of rows)
Wide with many descriptive columns

The "star" shape emerges when you visualize the schema: the fact table sits at the center with dimension tables radiating outward, connected by foreign key relationships.

Converting Mermaid diagram...

Why "Star" Not "Normalized":

Notice that dimension tables are denormalized within themselves. The DIM_PRODUCT table contains both product-level and category-level attributes in the same row. In a normalized design, categories would be a separate table with a foreign key. In star schema design, we flatten hierarchies into the dimension table.

This denormalization within dimensions eliminates additional JOINs during queries—a single JOIN from fact to dimension retrieves all hierarchical levels.

Star Schema vs. Fully Denormalized

Star schema is a structured form of denormalization. Unlike fully denormalizing everything into one massive table, star schema maintains dimension tables for manageability while still eliminating multi-level JOINs. It's a middle ground that balances query performance with practical maintainability.

Fact Table Design Principles

Fact tables are the heart of the star schema—they contain the actual business measurements that users analyze. Proper fact table design is critical for query performance and analytical flexibility.

Defining the Grain:

The grain (or granularity) is the fundamental design decision—it determines what each row in the fact table represents. Common grains include:

One row per order line item
One row per daily sales by product and store
One row per web page view
One row per call detail record

The grain should be the lowest level of detail that business users need. You can always aggregate from fine grain to coarse grain, but you cannot recover detail that wasn't captured.

Fact Table Types
Type	Grain	Content	Example
Transaction Fact	One row per event	Measurements at point of transaction	Each order item, call record, page view
Periodic Snapshot	One row per time period per entity	Measurements at end of period	Daily account balance, monthly inventory
Accumulating Snapshot	One row per process instance	Measurements across lifecycle stages	Order fulfillment pipeline from placement to delivery
Factless Fact	One row per event	Only dimension keys (no measures)	Student attendance (who attended what class)

Fact Table Columns:

Fact tables contain two types of columns:

1. Foreign Keys (Dimension References):

Surrogate keys linking to dimension tables
Typically BIGINT type for efficiency
Form composite key defining the grain
Enable filtering and grouping via dimension attributes

2. Measures (Metrics):

Numeric values being analyzed
Types: Additive, Semi-Additive, Non-Additive

Measure Types Explained:

Types of Measures

•Additive Measures — Can be summed across all dimensions. Revenue, quantity, cost can be summed by time, product, customer, etc. These are the most common and analytically powerful.
•Semi-Additive Measures — Can be summed across some dimensions but not others. Account balance can be summed across accounts but not across time (you'd average or take period-end).
•Non-Additive Measures — Cannot be meaningfully summed. Ratios, percentages, unit prices are non-additive. Average price across products requires weighted calculation, not simple sum.

fact_table_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Complete Fact Table Design Example
 
CREATE TABLE fact_sales (
    -- Surrogate Keys (Foreign Keys to Dimensions)
    date_key            BIGINT NOT NULL,
    product_key         BIGINT NOT NULL,
    customer_key        BIGINT NOT NULL,
    store_key           BIGINT NOT NULL,
    promotion_key       BIGINT NOT NULL,  -- Possibly "No Promotion" default
    
    -- Degenerate Dimensions (transaction identifiers, no dimension table)
    order_number        VARCHAR(50),
    line_item_number    INT,
    
    -- Additive Measures
    quantity_sold       INT NOT NULL,
    unit_price          DECIMAL(10,2) NOT NULL,
    discount_amount     DECIMAL(10,2) DEFAULT 0,
    extended_price      DECIMAL(12,2) NOT NULL,  -- quantity * unit_price
    net_revenue         DECIMAL(12,2) NOT NULL,  -- extended - discount
    cost_of_goods       DECIMAL(12,2) NOT NULL,
    gross_profit        DECIMAL(12,2) NOT NULL,  -- revenue - cost
    
    -- Semi-Additive (careful with time aggregation)
    units_in_stock      INT,  -- Snapshot at time of sale
    
    -- Non-Additive (stored for reference, aggregate with care)
    unit_cost           DECIMAL(10,2),
    margin_percentage   DECIMAL(5,2),
    
    -- Composite Primary Key defines grain
    PRIMARY KEY (date_key, product_key, customer_key, store_key, line_item_number),
    
    -- Foreign Key Constraints
    FOREIGN KEY (date_key) REFERENCES dim_date(date_key),
    FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
    FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key),
    FOREIGN KEY (store_key) REFERENCES dim_store(store_key),
    FOREIGN KEY (promotion_key) REFERENCES dim_promotion(promotion_key)
);
 
-- Indexes for common access patterns
CREATE INDEX idx_fact_sales_date ON fact_sales(date_key);
CREATE INDEX idx_fact_sales_product ON fact_sales(product_key);
CREATE INDEX idx_fact_sales_customer ON fact_sales(customer_key);

Beware Non-Additive Aggregation

Summing margin_percentage across products gives meaningless results. Users often request 'average margin' but need weighted average (total profit / total revenue), not simple average of percentages. Design BI reports to compute ratios from additive measures.

Dimension Table Design Principles

Dimension tables provide the context for facts—the "who, what, when, where, why, how" of business events. Well-designed dimension tables enable rich analytical flexibility.

Key Dimension Characteristics:

1. Surrogate Keys: Dimension tables use system-generated surrogate keys (typically sequential integers) rather than natural business keys. This approach provides critical advantages:

Why Use Surrogate Keys

•Performance — Integer joins are faster than string comparisons. Fact tables with billions of rows benefit from compact surrogate keys.
•Source System Independence — If source systems change their key format, the data warehouse is unaffected.
•Handle Missing Values — A surrogate key can represent 'Unknown' or 'Not Applicable' without requiring valid natural keys.
•Support Slowly Changing Dimensions — Multiple surrogate keys can represent different historical versions of the same entity.
•Integrate Multiple Sources — Different source systems may use different keys for the same entity; surrogate keys provide a unified identifier.

2. Denormalized Hierarchies:

Dimensions contain flattened hierarchies within a single table. Instead of normalizing:

product → subcategory → category → department

Star schema includes all levels as columns in one table:

product_name | subcategory | category | department

This enables queries at any hierarchy level without additional JOINs.

3. Descriptive Attributes:

Dimensions should be "wide" with many descriptive columns. Include every attribute users might want to filter or group by:

Text descriptions (product name, customer name)
Categories and classifications (segment, region, product line)
Flags and indicators (is_active, is_promotional, is_premium)
Dates and ranges (effective_date, birth_date, customer_since)

dimension_table_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
-- Complete Product Dimension Example
 
CREATE TABLE dim_product (
    -- Surrogate Key (Primary Key)
    product_key         BIGINT IDENTITY(1,1) PRIMARY KEY,
    
    -- Natural Key (from source system)
    product_id          VARCHAR(50) NOT NULL,  -- Original SKU
    
    -- Product Attributes
    product_name        VARCHAR(255) NOT NULL,
    product_description TEXT,
    
    -- Flattened Category Hierarchy (denormalized)
    subcategory_name    VARCHAR(100),
    subcategory_code    VARCHAR(20),
    category_name       VARCHAR(100),
    category_code       VARCHAR(20),
    department_name     VARCHAR(100),
    department_code     VARCHAR(20),
    
    -- Brand Hierarchy
    brand_name          VARCHAR(100),
    brand_tier          VARCHAR(50),   -- 'Premium', 'Standard', 'Value'
    manufacturer_name   VARCHAR(100),
    
    -- Product Characteristics
    color               VARCHAR(50),
    size                VARCHAR(50),
    weight_kg           DECIMAL(10,2),
    package_type        VARCHAR(50),
    
    -- Pricing (reference, not for aggregation)
    unit_cost           DECIMAL(10,2),
    list_price          DECIMAL(10,2),
    
    -- Flags for Analysis
    is_active           BOOLEAN DEFAULT true,
    is_seasonal         BOOLEAN DEFAULT false,
    is_perishable       BOOLEAN DEFAULT false,
    requires_cold_chain BOOLEAN DEFAULT false,
    
    -- Slowly Changing Dimension Metadata
    effective_date      DATE NOT NULL,
    expiration_date     DATE DEFAULT '9999-12-31',
    is_current          BOOLEAN DEFAULT true,
    version_number      INT DEFAULT 1,
    
    -- Audit Columns
    source_system       VARCHAR(50),
    created_at          TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at          TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Index on natural key for lookups during ETL
CREATE INDEX idx_dim_product_natural ON dim_product(product_id);
-- Index for current records
CREATE INDEX idx_dim_product_current ON dim_product(is_current) WHERE is_current = true;

The Date Dimension

The date dimension is special—it's pre-populated for all dates the warehouse will ever need (often 20+ years). It includes computed attributes: day_of_week, is_holiday, fiscal_quarter, etc. Never store raw dates in fact tables; always reference the date dimension.

Query Patterns and Performance

Star schemas are optimized for predictable analytical query patterns. Understanding these patterns reveals why the design is so effective.

The Fundamental Query Pattern:

Star schema queries follow a consistent structure:

JOIN fact table to required dimensions
FILTER on dimension attributes
GROUP BY dimension attributes
AGGREGATE fact measures

This pattern enables aggressive optimization by query planners.

star_query_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Typical Star Schema Query Pattern
 
-- Query: Total revenue and quantity by category and quarter for 2024
SELECT 
    d.calendar_year,
    d.quarter_name,
    p.category_name,
    SUM(f.net_revenue) AS total_revenue,
    SUM(f.quantity_sold) AS total_quantity,
    COUNT(*) AS transaction_count,
    AVG(f.net_revenue / NULLIF(f.quantity_sold, 0)) AS avg_price
FROM fact_sales f
-- Star JOINs: fact to each dimension
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
-- Filters on dimensions
WHERE d.calendar_year = 2024
  AND p.is_active = true
-- Group by dimension attributes
GROUP BY d.calendar_year, d.quarter_name, p.category_name
ORDER BY d.quarter_name, total_revenue DESC;
 
 
-- Drill-Down Query: From category to product level
SELECT 
    d.month_name,
    p.product_name,
    p.brand_name,
    SUM(f.net_revenue) AS revenue,
    SUM(f.quantity_sold) AS quantity
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
WHERE d.calendar_year = 2024
  AND d.quarter = 1
  AND p.category_name = 'Electronics'  -- Drill from category
GROUP BY d.month_name, p.product_name, p.brand_name
ORDER BY d.month_name, revenue DESC;
 
 
-- Slice and Dice: Multiple dimension filters
SELECT 
    c.customer_segment,
    s.region_name,
    SUM(f.net_revenue) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_customer c ON f.customer_key = c.customer_key
JOIN dim_store s ON f.store_key = s.store_key
WHERE d.calendar_year = 2024
  AND p.department_name = 'Home & Garden'
  AND c.customer_segment IN ('Premium', 'Gold')
  AND s.country = 'USA'
GROUP BY c.customer_segment, s.region_name;

Star Schema Query Optimization:

Database query optimizers recognize star schema patterns and apply specialized techniques:

1. Star Join Optimization: The optimizer identifies star joins (fact to multiple dimensions) and executes them efficiently using Bitmap Index Scan or Hash Join strategies.

2. Dimension Filtering First: Filters are applied to dimensions before joining to facts. If dim_date returns only 90 days and dim_product returns only 50 products, the fact table scan is limited to matching rows.

3. Aggregate Pushdown: Aggregations can sometimes be computed during the scan, avoiding full materialization of join results.

4. Partition Pruning: Fact tables often partitioned by date. The query planner skips irrelevant partitions based on date dimension filters.

Star Schema Query Performance Factors
Factor	Impact	Optimization Technique
Number of Dimensions Joined	Each adds JOIN cost	Join only needed dimensions; BI tools can prune unused joins
Filter Selectivity	High selectivity = fewer rows scanned	Apply filters on dimensions, not computed values
Aggregation Level	Finer grain = more rows to aggregate	Pre-aggregate common roll-ups in summary tables
Fact Table Size	Linear impact on scan time	Partition by date; consider columnar storage
Dimension Cardinality	Affects join hash table size	Lower cardinality dimensions join faster

BI Tool Integration

Business Intelligence tools like Tableau, Power BI, and Looker are optimized for star schemas. They automatically generate efficient star join queries, handle drill-down/roll-up navigation, and cache dimension data. Using star schemas ensures your warehouse integrates seamlessly with the BI ecosystem.

Advanced Star Schema Patterns

Real-world star schema implementations encounter scenarios that require specialized patterns beyond the basic structure.

1. Conformed Dimensions:

Dimensions shared across multiple fact tables ensure consistency:

dim_date is used by fact_sales, fact_inventory, fact_shipments
dim_product is used by fact_sales, fact_returns, fact_promotions

Conformed dimensions enable cross-process analysis. Comparing sales to returns requires compatible product and date dimensions.

2. Role-Playing Dimensions:

The same dimension used multiple times in one fact table with different meanings:

fact_order has order_date_key and ship_date_key both referencing dim_date
Each is a different 'role' of the date dimension
Create view aliases: dim_order_date, dim_ship_date

advanced_star_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- Role-Playing Dimensions
CREATE TABLE fact_order (
    order_key           BIGINT PRIMARY KEY,
    order_date_key      BIGINT REFERENCES dim_date(date_key),
    ship_date_key       BIGINT REFERENCES dim_date(date_key),
    delivery_date_key   BIGINT REFERENCES dim_date(date_key),
    customer_key        BIGINT REFERENCES dim_customer(customer_key),
    revenue             DECIMAL(12,2)
);
 
-- Query using role-playing dimensions
SELECT 
    od.month_name AS order_month,
    sd.month_name AS ship_month,
    SUM(f.revenue) AS total_revenue,
    AVG(dd.date_key - sd.date_key) AS avg_delivery_days
FROM fact_order f
JOIN dim_date od ON f.order_date_key = od.date_key
JOIN dim_date sd ON f.ship_date_key = sd.date_key  
JOIN dim_date dd ON f.delivery_date_key = dd.date_key
GROUP BY od.month_name, sd.month_name;
 
 
-- Junk Dimensions: Grouping low-cardinality flags
CREATE TABLE dim_transaction_profile (
    profile_key         BIGINT PRIMARY KEY,
    payment_type        VARCHAR(20),  -- 'Cash', 'Credit', 'Debit'
    delivery_type       VARCHAR(20),  -- 'Standard', 'Express', 'Pickup'
    order_source        VARCHAR(20),  -- 'Web', 'Mobile', 'Store'
    is_gift             BOOLEAN,
    is_business         BOOLEAN
);
-- Instead of 5 foreign keys in fact table, one profile_key
 
-- Outrigger Dimensions: Secondary dimension attached to dimension
CREATE TABLE dim_product (
    product_key         BIGINT PRIMARY KEY,
    product_name        VARCHAR(255),
    -- ... product attributes ...
    first_available_date_key BIGINT REFERENCES dim_date(date_key)  -- Outrigger
);
 
-- Degenerate Dimensions: Transaction identifiers in fact table
CREATE TABLE fact_order_line (
    date_key            BIGINT,
    product_key         BIGINT,
    -- Degenerate dimension: no separate table, just the value
    order_number        VARCHAR(50),  -- Natural key from source
    line_number         INT,
    quantity            INT,
    revenue             DECIMAL(12,2)
);

Special Dimension Types

•Junk Dimension — Combines low-cardinality flags/indicators into one dimension. Instead of storing 5 boolean columns in the fact table, store one foreign key to a profile dimension.
•Degenerate Dimension — Dimension attribute stored directly in fact table without a separate dimension table. Transaction numbers, invoice numbers—identifiers that have no additional attributes.
•Outrigger Dimension — Secondary dimension attached to a dimension rather than the fact table. Example: product's first_available_date references the date dimension.
•Multi-Valued Dimension — Entity with multiple relationships (customer with multiple addresses). Requires bridge table between fact and dimension.
•Heterogeneous Dimension — Different types of entities in one dimension with shared attributes plus type-specific attributes.

Dimension Design Trade-offs

Each special dimension pattern solves a specific problem but adds complexity. Use junk dimensions when you have 5+ low-cardinality flags. Use degenerate dimensions for transaction identifiers that are never queried independently. Default to standard dimensions unless you have a compelling reason for alternatives.

Star Schema Best Practices

Implementing star schemas effectively requires adherence to proven design principles. These best practices ensure optimal performance and maintainability.

Design Principles:

Star Schema Design Principles

•Grain First — Define the fact table grain before anything else. The grain determines what can be analyzed. Changing grain later is extremely disruptive.
•Dimensional Richness — Include every attribute users might filter or group by. Adding a new attribute to a dimension is cheap; not having it when needed is expensive.
•Fact Table Focus — Fact tables should contain only foreign keys and measures. Move all descriptive attributes to dimensions.
•Consistent Conformity — Shared dimensions must be identical across fact tables. Don't create multiple incompatible 'product' dimensions.
•Surrogate Keys Always — Use surrogate keys, not natural keys. This applies to every dimension including date.
•Handle Nulls Explicitly — Create 'Unknown' or 'Not Applicable' dimension rows rather than allowing null foreign keys in facts.

Common Star Schema Mistakes and Solutions
Mistake	Problem	Solution
Snowflaking dimensions	Adds JOINs, negates star schema benefits	Flatten hierarchies into single dimension table
Storing derived values in facts	Maintenance burden, inconsistency risk	Compute derived values in ETL or at query time
Changing grain after deployment	Breaks existing reports, requires re-ETL	Define grain carefully upfront with business stakeholders
Null foreign keys in facts	Complicates queries, breaks aggregations	Use 'Unknown' dimension rows with key value -1 or 0
Using natural keys in facts	Performance degradation, source system coupling	Always use surrogate keys
Too few dimension attributes	Limits analytical flexibility	Include all relevant attributes; storage is cheap

Physical Design Considerations:

Partitioning: Fact tables should be partitioned by date (the most common filter). This enables partition pruning—queries for a specific year only scan that year's partition.

Indexing:

Bitmap indexes on dimension foreign keys (in columnar databases)
B-tree indexes on dimension primary keys
Avoid over-indexing fact tables; columnar storage often outperforms indexes for analytical scans

Compression: Enable compression on fact tables. Repetitive foreign keys compress extremely well. Columnar databases achieve 5-10x compression on typical fact tables.

Avoid Snowflaking

The temptation to normalize dimensions into 'snowflake schema' (multiple levels of dimension tables) should be resisted. While it reduces storage slightly, it reintroduces the JOIN overhead that star schemas eliminate. The storage savings rarely justify the performance cost.

Summary: The Star Schema Foundation

The star schema represents decades of refined best practices for analytical database design. Let's consolidate the key principles:

Key Takeaways

•Star schema is structured denormalization — It formalizes OLAP denormalization principles into a maintainable pattern: central fact tables surrounded by denormalized dimension tables.
•Fact tables contain measurements and foreign keys — They represent business events at a defined grain, containing only numeric measures and surrogate key references.
•Dimension tables contain context and attributes — They provide the who, what, when, where with flattened hierarchies and rich descriptive attributes.
•Surrogate keys provide flexibility — System-generated keys enable slowly changing dimensions, source system independence, and optimal join performance.
•Query patterns are predictable and optimizable — Star joins from fact to dimensions enable aggressive query optimization and BI tool integration.
•Best practices prevent common mistakes — Grain-first design, dimensional richness, avoiding snowflaking, and explicit null handling ensure successful implementations.

What's Next:

The star schema provides the structural foundation; dimensional modeling provides the design methodology. The next page explores dimensional modeling principles—how to identify facts and dimensions from business requirements, design for historical tracking, and build bus architectures that scale across the enterprise.

Dimensional modeling extends star schema concepts with a complete methodology for translating business questions into physical database structures.

Page Complete

You now understand the star schema pattern—the gold standard for analytical data modeling. Fact tables capture business events; dimension tables provide context; and the star shape enables efficient analytical queries. Next, we'll explore dimensional modeling methodology for designing these schemas from business requirements.

3 / 5

Loading learning content...

Database Management SystemsOLTP vs OLAP Considerations

OLTP vs OLAP Considerations

LevelAdvanced

Duration75 mins

TopicOLTP vs OLAP Considerations

3 / 5

Star Schema

The Blueprint for Analytical Data

What You Will Master

Star Schema Architecture

A star schema consists of two types of tables with a characteristic structure:

Fact Tables:

Contain measurements or metrics (quantities, amounts, counts)
Store foreign keys to dimension tables
Represent business events or transactions
Typically very large (billions of rows)
Narrow compared to fully denormalized tables

Dimension Tables:

Contain descriptive attributes (names, categories, hierarchies)
Store the context for facts
Represent business entities (customers, products, locations)
Typically smaller (thousands to millions of rows)
Wide with many descriptive columns

The "star" shape emerges when you visualize the schema: the fact table sits at the center with dimension tables radiating outward, connected by foreign key relationships.

Converting Mermaid diagram...

Why "Star" Not "Normalized":

This denormalization within dimensions eliminates additional JOINs during queries—a single JOIN from fact to dimension retrieves all hierarchical levels.

Star Schema vs. Fully Denormalized

Fact Table Design Principles

Fact tables are the heart of the star schema—they contain the actual business measurements that users analyze. Proper fact table design is critical for query performance and analytical flexibility.

Defining the Grain:

The grain (or granularity) is the fundamental design decision—it determines what each row in the fact table represents. Common grains include:

One row per order line item
One row per daily sales by product and store
One row per web page view
One row per call detail record

The grain should be the lowest level of detail that business users need. You can always aggregate from fine grain to coarse grain, but you cannot recover detail that wasn't captured.

Fact Table Types
Type	Grain	Content	Example
Transaction Fact	One row per event	Measurements at point of transaction	Each order item, call record, page view
Periodic Snapshot	One row per time period per entity	Measurements at end of period	Daily account balance, monthly inventory
Accumulating Snapshot	One row per process instance	Measurements across lifecycle stages	Order fulfillment pipeline from placement to delivery
Factless Fact	One row per event	Only dimension keys (no measures)	Student attendance (who attended what class)

Fact Table Columns:

Fact tables contain two types of columns:

1. Foreign Keys (Dimension References):

Surrogate keys linking to dimension tables
Typically BIGINT type for efficiency
Form composite key defining the grain
Enable filtering and grouping via dimension attributes

2. Measures (Metrics):

Numeric values being analyzed
Types: Additive, Semi-Additive, Non-Additive

Measure Types Explained:

Types of Measures

•Additive Measures — Can be summed across all dimensions. Revenue, quantity, cost can be summed by time, product, customer, etc. These are the most common and analytically powerful.
•Semi-Additive Measures — Can be summed across some dimensions but not others. Account balance can be summed across accounts but not across time (you'd average or take period-end).
•Non-Additive Measures — Cannot be meaningfully summed. Ratios, percentages, unit prices are non-additive. Average price across products requires weighted calculation, not simple sum.

fact_table_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Complete Fact Table Design Example
 
CREATE TABLE fact_sales (
    -- Surrogate Keys (Foreign Keys to Dimensions)
    date_key            BIGINT NOT NULL,
    product_key         BIGINT NOT NULL,
    customer_key        BIGINT NOT NULL,
    store_key           BIGINT NOT NULL,
    promotion_key       BIGINT NOT NULL,  -- Possibly "No Promotion" default
    
    -- Degenerate Dimensions (transaction identifiers, no dimension table)
    order_number        VARCHAR(50),
    line_item_number    INT,
    
    -- Additive Measures
    quantity_sold       INT NOT NULL,
    unit_price          DECIMAL(10,2) NOT NULL,
    discount_amount     DECIMAL(10,2) DEFAULT 0,
    extended_price      DECIMAL(12,2) NOT NULL,  -- quantity * unit_price
    net_revenue         DECIMAL(12,2) NOT NULL,  -- extended - discount
    cost_of_goods       DECIMAL(12,2) NOT NULL,
    gross_profit        DECIMAL(12,2) NOT NULL,  -- revenue - cost
    
    -- Semi-Additive (careful with time aggregation)
    units_in_stock      INT,  -- Snapshot at time of sale
    
    -- Non-Additive (stored for reference, aggregate with care)
    unit_cost           DECIMAL(10,2),
    margin_percentage   DECIMAL(5,2),
    
    -- Composite Primary Key defines grain
    PRIMARY KEY (date_key, product_key, customer_key, store_key, line_item_number),
    
    -- Foreign Key Constraints
    FOREIGN KEY (date_key) REFERENCES dim_date(date_key),
    FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
    FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key),
    FOREIGN KEY (store_key) REFERENCES dim_store(store_key),
    FOREIGN KEY (promotion_key) REFERENCES dim_promotion(promotion_key)
);
 
-- Indexes for common access patterns
CREATE INDEX idx_fact_sales_date ON fact_sales(date_key);
CREATE INDEX idx_fact_sales_product ON fact_sales(product_key);
CREATE INDEX idx_fact_sales_customer ON fact_sales(customer_key);

Beware Non-Additive Aggregation

Dimension Table Design Principles

Dimension tables provide the context for facts—the "who, what, when, where, why, how" of business events. Well-designed dimension tables enable rich analytical flexibility.

Key Dimension Characteristics:

1. Surrogate Keys: Dimension tables use system-generated surrogate keys (typically sequential integers) rather than natural business keys. This approach provides critical advantages:

Why Use Surrogate Keys

•Performance — Integer joins are faster than string comparisons. Fact tables with billions of rows benefit from compact surrogate keys.
•Source System Independence — If source systems change their key format, the data warehouse is unaffected.
•Handle Missing Values — A surrogate key can represent 'Unknown' or 'Not Applicable' without requiring valid natural keys.
•Support Slowly Changing Dimensions — Multiple surrogate keys can represent different historical versions of the same entity.
•Integrate Multiple Sources — Different source systems may use different keys for the same entity; surrogate keys provide a unified identifier.

2. Denormalized Hierarchies:

Dimensions contain flattened hierarchies within a single table. Instead of normalizing:

product → subcategory → category → department

Star schema includes all levels as columns in one table:

product_name | subcategory | category | department

This enables queries at any hierarchy level without additional JOINs.

3. Descriptive Attributes:

Dimensions should be "wide" with many descriptive columns. Include every attribute users might want to filter or group by:

Text descriptions (product name, customer name)
Categories and classifications (segment, region, product line)
Flags and indicators (is_active, is_promotional, is_premium)
Dates and ranges (effective_date, birth_date, customer_since)

dimension_table_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
-- Complete Product Dimension Example
 
CREATE TABLE dim_product (
    -- Surrogate Key (Primary Key)
    product_key         BIGINT IDENTITY(1,1) PRIMARY KEY,
    
    -- Natural Key (from source system)
    product_id          VARCHAR(50) NOT NULL,  -- Original SKU
    
    -- Product Attributes
    product_name        VARCHAR(255) NOT NULL,
    product_description TEXT,
    
    -- Flattened Category Hierarchy (denormalized)
    subcategory_name    VARCHAR(100),
    subcategory_code    VARCHAR(20),
    category_name       VARCHAR(100),
    category_code       VARCHAR(20),
    department_name     VARCHAR(100),
    department_code     VARCHAR(20),
    
    -- Brand Hierarchy
    brand_name          VARCHAR(100),
    brand_tier          VARCHAR(50),   -- 'Premium', 'Standard', 'Value'
    manufacturer_name   VARCHAR(100),
    
    -- Product Characteristics
    color               VARCHAR(50),
    size                VARCHAR(50),
    weight_kg           DECIMAL(10,2),
    package_type        VARCHAR(50),
    
    -- Pricing (reference, not for aggregation)
    unit_cost           DECIMAL(10,2),
    list_price          DECIMAL(10,2),
    
    -- Flags for Analysis
    is_active           BOOLEAN DEFAULT true,
    is_seasonal         BOOLEAN DEFAULT false,
    is_perishable       BOOLEAN DEFAULT false,
    requires_cold_chain BOOLEAN DEFAULT false,
    
    -- Slowly Changing Dimension Metadata
    effective_date      DATE NOT NULL,
    expiration_date     DATE DEFAULT '9999-12-31',
    is_current          BOOLEAN DEFAULT true,
    version_number      INT DEFAULT 1,
    
    -- Audit Columns
    source_system       VARCHAR(50),
    created_at          TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at          TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Index on natural key for lookups during ETL
CREATE INDEX idx_dim_product_natural ON dim_product(product_id);
-- Index for current records
CREATE INDEX idx_dim_product_current ON dim_product(is_current) WHERE is_current = true;

The Date Dimension

Query Patterns and Performance

Star schemas are optimized for predictable analytical query patterns. Understanding these patterns reveals why the design is so effective.

The Fundamental Query Pattern:

Star schema queries follow a consistent structure:

JOIN fact table to required dimensions
FILTER on dimension attributes
GROUP BY dimension attributes
AGGREGATE fact measures

This pattern enables aggressive optimization by query planners.

star_query_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Typical Star Schema Query Pattern
 
-- Query: Total revenue and quantity by category and quarter for 2024
SELECT 
    d.calendar_year,
    d.quarter_name,
    p.category_name,
    SUM(f.net_revenue) AS total_revenue,
    SUM(f.quantity_sold) AS total_quantity,
    COUNT(*) AS transaction_count,
    AVG(f.net_revenue / NULLIF(f.quantity_sold, 0)) AS avg_price
FROM fact_sales f
-- Star JOINs: fact to each dimension
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
-- Filters on dimensions
WHERE d.calendar_year = 2024
  AND p.is_active = true
-- Group by dimension attributes
GROUP BY d.calendar_year, d.quarter_name, p.category_name
ORDER BY d.quarter_name, total_revenue DESC;
 
 
-- Drill-Down Query: From category to product level
SELECT 
    d.month_name,
    p.product_name,
    p.brand_name,
    SUM(f.net_revenue) AS revenue,
    SUM(f.quantity_sold) AS quantity
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
WHERE d.calendar_year = 2024
  AND d.quarter = 1
  AND p.category_name = 'Electronics'  -- Drill from category
GROUP BY d.month_name, p.product_name, p.brand_name
ORDER BY d.month_name, revenue DESC;
 
 
-- Slice and Dice: Multiple dimension filters
SELECT 
    c.customer_segment,
    s.region_name,
    SUM(f.net_revenue) AS revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_customer c ON f.customer_key = c.customer_key
JOIN dim_store s ON f.store_key = s.store_key
WHERE d.calendar_year = 2024
  AND p.department_name = 'Home & Garden'
  AND c.customer_segment IN ('Premium', 'Gold')
  AND s.country = 'USA'
GROUP BY c.customer_segment, s.region_name;

Star Schema Query Optimization:

Database query optimizers recognize star schema patterns and apply specialized techniques:

1. Star Join Optimization: The optimizer identifies star joins (fact to multiple dimensions) and executes them efficiently using Bitmap Index Scan or Hash Join strategies.

3. Aggregate Pushdown: Aggregations can sometimes be computed during the scan, avoiding full materialization of join results.

4. Partition Pruning: Fact tables often partitioned by date. The query planner skips irrelevant partitions based on date dimension filters.

Star Schema Query Performance Factors
Factor	Impact	Optimization Technique
Number of Dimensions Joined	Each adds JOIN cost	Join only needed dimensions; BI tools can prune unused joins
Filter Selectivity	High selectivity = fewer rows scanned	Apply filters on dimensions, not computed values
Aggregation Level	Finer grain = more rows to aggregate	Pre-aggregate common roll-ups in summary tables
Fact Table Size	Linear impact on scan time	Partition by date; consider columnar storage
Dimension Cardinality	Affects join hash table size	Lower cardinality dimensions join faster

BI Tool Integration

Advanced Star Schema Patterns

Real-world star schema implementations encounter scenarios that require specialized patterns beyond the basic structure.

1. Conformed Dimensions:

Dimensions shared across multiple fact tables ensure consistency:

dim_date is used by fact_sales, fact_inventory, fact_shipments
dim_product is used by fact_sales, fact_returns, fact_promotions

Conformed dimensions enable cross-process analysis. Comparing sales to returns requires compatible product and date dimensions.

2. Role-Playing Dimensions:

The same dimension used multiple times in one fact table with different meanings:

fact_order has order_date_key and ship_date_key both referencing dim_date
Each is a different 'role' of the date dimension
Create view aliases: dim_order_date, dim_ship_date

advanced_star_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- Role-Playing Dimensions
CREATE TABLE fact_order (
    order_key           BIGINT PRIMARY KEY,
    order_date_key      BIGINT REFERENCES dim_date(date_key),
    ship_date_key       BIGINT REFERENCES dim_date(date_key),
    delivery_date_key   BIGINT REFERENCES dim_date(date_key),
    customer_key        BIGINT REFERENCES dim_customer(customer_key),
    revenue             DECIMAL(12,2)
);
 
-- Query using role-playing dimensions
SELECT 
    od.month_name AS order_month,
    sd.month_name AS ship_month,
    SUM(f.revenue) AS total_revenue,
    AVG(dd.date_key - sd.date_key) AS avg_delivery_days
FROM fact_order f
JOIN dim_date od ON f.order_date_key = od.date_key
JOIN dim_date sd ON f.ship_date_key = sd.date_key  
JOIN dim_date dd ON f.delivery_date_key = dd.date_key
GROUP BY od.month_name, sd.month_name;
 
 
-- Junk Dimensions: Grouping low-cardinality flags
CREATE TABLE dim_transaction_profile (
    profile_key         BIGINT PRIMARY KEY,
    payment_type        VARCHAR(20),  -- 'Cash', 'Credit', 'Debit'
    delivery_type       VARCHAR(20),  -- 'Standard', 'Express', 'Pickup'
    order_source        VARCHAR(20),  -- 'Web', 'Mobile', 'Store'
    is_gift             BOOLEAN,
    is_business         BOOLEAN
);
-- Instead of 5 foreign keys in fact table, one profile_key
 
-- Outrigger Dimensions: Secondary dimension attached to dimension
CREATE TABLE dim_product (
    product_key         BIGINT PRIMARY KEY,
    product_name        VARCHAR(255),
    -- ... product attributes ...
    first_available_date_key BIGINT REFERENCES dim_date(date_key)  -- Outrigger
);
 
-- Degenerate Dimensions: Transaction identifiers in fact table
CREATE TABLE fact_order_line (
    date_key            BIGINT,
    product_key         BIGINT,
    -- Degenerate dimension: no separate table, just the value
    order_number        VARCHAR(50),  -- Natural key from source
    line_number         INT,
    quantity            INT,
    revenue             DECIMAL(12,2)
);

Special Dimension Types

•Junk Dimension — Combines low-cardinality flags/indicators into one dimension. Instead of storing 5 boolean columns in the fact table, store one foreign key to a profile dimension.
•Degenerate Dimension — Dimension attribute stored directly in fact table without a separate dimension table. Transaction numbers, invoice numbers—identifiers that have no additional attributes.
•Outrigger Dimension — Secondary dimension attached to a dimension rather than the fact table. Example: product's first_available_date references the date dimension.
•Multi-Valued Dimension — Entity with multiple relationships (customer with multiple addresses). Requires bridge table between fact and dimension.
•Heterogeneous Dimension — Different types of entities in one dimension with shared attributes plus type-specific attributes.

Dimension Design Trade-offs

Star Schema Best Practices

Implementing star schemas effectively requires adherence to proven design principles. These best practices ensure optimal performance and maintainability.

Design Principles:

Star Schema Design Principles

•Grain First — Define the fact table grain before anything else. The grain determines what can be analyzed. Changing grain later is extremely disruptive.
•Dimensional Richness — Include every attribute users might filter or group by. Adding a new attribute to a dimension is cheap; not having it when needed is expensive.
•Fact Table Focus — Fact tables should contain only foreign keys and measures. Move all descriptive attributes to dimensions.
•Consistent Conformity — Shared dimensions must be identical across fact tables. Don't create multiple incompatible 'product' dimensions.
•Surrogate Keys Always — Use surrogate keys, not natural keys. This applies to every dimension including date.
•Handle Nulls Explicitly — Create 'Unknown' or 'Not Applicable' dimension rows rather than allowing null foreign keys in facts.

Common Star Schema Mistakes and Solutions
Mistake	Problem	Solution
Snowflaking dimensions	Adds JOINs, negates star schema benefits	Flatten hierarchies into single dimension table
Storing derived values in facts	Maintenance burden, inconsistency risk	Compute derived values in ETL or at query time
Changing grain after deployment	Breaks existing reports, requires re-ETL	Define grain carefully upfront with business stakeholders
Null foreign keys in facts	Complicates queries, breaks aggregations	Use 'Unknown' dimension rows with key value -1 or 0
Using natural keys in facts	Performance degradation, source system coupling	Always use surrogate keys
Too few dimension attributes	Limits analytical flexibility	Include all relevant attributes; storage is cheap

Physical Design Considerations:

Partitioning: Fact tables should be partitioned by date (the most common filter). This enables partition pruning—queries for a specific year only scan that year's partition.

Indexing:

Bitmap indexes on dimension foreign keys (in columnar databases)
B-tree indexes on dimension primary keys
Avoid over-indexing fact tables; columnar storage often outperforms indexes for analytical scans

Compression: Enable compression on fact tables. Repetitive foreign keys compress extremely well. Columnar databases achieve 5-10x compression on typical fact tables.

Avoid Snowflaking

Summary: The Star Schema Foundation

The star schema represents decades of refined best practices for analytical database design. Let's consolidate the key principles:

Key Takeaways

•Star schema is structured denormalization — It formalizes OLAP denormalization principles into a maintainable pattern: central fact tables surrounded by denormalized dimension tables.
•Fact tables contain measurements and foreign keys — They represent business events at a defined grain, containing only numeric measures and surrogate key references.
•Dimension tables contain context and attributes — They provide the who, what, when, where with flattened hierarchies and rich descriptive attributes.
•Surrogate keys provide flexibility — System-generated keys enable slowly changing dimensions, source system independence, and optimal join performance.
•Query patterns are predictable and optimizable — Star joins from fact to dimensions enable aggressive query optimization and BI tool integration.
•Best practices prevent common mistakes — Grain-first design, dimensional richness, avoiding snowflaking, and explicit null handling ensure successful implementations.

What's Next:

Dimensional modeling extends star schema concepts with a complete methodology for translating business questions into physical database structures.

Page Complete

3 / 5