Loading learning content...
Every business generates measurements—transactions, events, metrics, quantities. Somewhere in your organization right now, orders are being placed, calls are being logged, website pages are being viewed, and products are being shipped. Each of these events represents a fact—a quantifiable occurrence in your business that, when aggregated and analyzed, tells the story of how your organization operates.
The fact table stands at the center of the star schema, serving as the repository for these business measurements. It is the single most important table in any data warehouse, representing the primary subject of analysis around which all other structures orbit.
Understanding fact tables is not merely an academic exercise—it is the gateway to designing analytical systems that can answer the questions executives ask: How much did we sell last quarter? Which products are trending? Where are we losing customers? The quality of your fact table design directly determines whether these questions can be answered in seconds or never at all.
By the end of this page, you will understand the anatomy of fact tables, distinguish between different fact types (transactional, periodic, accumulating), master granularity decisions that make or break warehouse designs, and apply expert-level patterns for designing fact tables that scale to billions of rows while remaining query-performant.
A fact table is a database table that stores measurements, metrics, or facts about a business process. Unlike operational tables that capture complete entity states, fact tables are specifically designed to record what happened—the events, transactions, or periodic snapshots that constitute your business activity.
Fact tables possess several distinguishing characteristics that separate them from other database tables:
1. Numeric Measurements (Facts)
The primary content of a fact table consists of numeric, quantitative values called facts or measures. These are the numbers that analysts aggregate, summarize, and compare. Examples include:
2. Foreign Keys to Dimensions
Fact tables contain foreign keys that reference dimension tables, establishing the context for each measurement. A sales fact row might contain keys pointing to:
3. Narrow and Deep Structure
Fact tables are typically narrow (relatively few columns—mostly foreign keys and a handful of fact columns) but deep (containing millions or billions of rows). This contrasts sharply with dimension tables, which are wide and shallow.
| Column | Type | Example Value | Purpose |
|---|---|---|---|
| sale_key | Surrogate Key | 7492031 | Unique identifier for this fact row |
| date_key | Foreign Key | 20240115 | Links to date dimension |
| product_key | Foreign Key | 4521 | Links to product dimension |
| customer_key | Foreign Key | 89234 | Links to customer dimension |
| store_key | Foreign Key | 127 | Links to store dimension |
| promotion_key | Foreign Key | 18 | Links to promotion dimension |
| quantity_sold | Fact (Additive) | 3 | Units purchased in this transaction |
| unit_price | Fact (Non-Additive) | 29.99 | Price per unit at time of sale |
| sales_amount | Fact (Additive) | 89.97 | Total revenue = quantity × price |
| cost_amount | Fact (Additive) | 54.00 | Total cost of goods sold |
| profit_amount | Fact (Derived) | 35.97 | Revenue minus cost |
Notice that each row represents a single line item in a single transaction. This is the grain of the fact table—the level of detail at which facts are recorded. The grain decision is the most important design choice you will make, as we explore in detail later in this page.
Not all numeric measures behave the same way when aggregated. Understanding additivity—which facts can be meaningfully summed across which dimensions—is crucial for designing useful analytical systems.
Additive facts can be summed across all dimensions without producing meaningless results. These are the most useful facts because they support the full range of analytical operations.
Examples of Additive Facts:
The mathematical property that makes these facts additive is that the same unit of measure applies regardless of the dimension. A dollar is a dollar whether we're counting across days or across products.
When designing fact tables, prioritize additive facts wherever possible. They provide the maximum analytical flexibility. If you have a choice between storing a ratio or storing the numerator and denominator separately, store both components—the ratio can always be calculated, but component values cannot be recovered from stored ratios.
Semi-additive facts can be summed across some dimensions but not others—typically not across time. These usually represent balance or level measurements at a point in time.
Examples of Semi-Additive Facts:
For semi-additive facts, typical analytical operations use AVG, MIN, MAX, or last value across the time dimension rather than SUM.
| Fact Type | Sum Across Products | Sum Across Customers | Sum Across Time | Typical Operations |
|---|---|---|---|---|
| Sales Revenue (Additive) | ✓ Valid | ✓ Valid | ✓ Valid | SUM, AVG, COUNT |
| Account Balance (Semi-Additive) | ✓ Valid | ✓ Valid | ✗ Invalid | AVG, MIN, MAX, LAST |
| Unit Price (Non-Additive) | ✗ Invalid | ✗ Invalid | ✗ Invalid | AVG, MIN, MAX, weighted calculations |
| Percentage (Non-Additive) | ✗ Invalid | ✗ Invalid | ✗ Invalid | Recalculate from components |
Non-Additive facts cannot be meaningfully summed across any dimension. These are typically ratios, percentages, averages, or other derived calculations.
Examples of Non-Additive Facts:
The Solution: Store Components, Not Ratios
The expert approach to handling non-additive facts is to store the components rather than the calculated ratio. Instead of storing profit_margin_percent, store profit_amount and revenue_amount. The percentage can be calculated at query time:
SELECT
product_name,
SUM(profit_amount) / SUM(revenue_amount) * 100 AS profit_margin_percent
FROM sales_fact f
JOIN product_dim p ON f.product_key = p.product_key
GROUP BY product_name;
This approach enables correct weighted averages across any dimension combination.
Different business processes require different fact table structures. The three fundamental types—transactional, periodic snapshot, and accumulating snapshot—each serve distinct analytical needs.
Transaction fact tables (also called transaction grain fact tables) record individual events at the most atomic level. Each row represents a single, instantaneous occurrence in the business process.
Characteristics:
Examples:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- Transaction Fact Table: Retail Sales-- Grain: One row per line item per transaction CREATE TABLE sales_transaction_fact ( -- Surrogate key transaction_line_key BIGINT PRIMARY KEY, -- Degenerate dimension (transaction header info) transaction_number VARCHAR(20) NOT NULL, line_number INT NOT NULL, -- Foreign keys to dimensions date_key INT NOT NULL, -- When time_key INT NOT NULL, -- What time of day product_key INT NOT NULL, -- What customer_key INT NOT NULL, -- Who store_key INT NOT NULL, -- Where cashier_key INT NOT NULL, -- By whom promotion_key INT NOT NULL, -- Under what promotion payment_method_key INT NOT NULL, -- How paid -- Additive facts quantity_sold INT NOT NULL, unit_list_price DECIMAL(10,2), unit_discount_amount DECIMAL(10,2), extended_sales_amount DECIMAL(12,2) NOT NULL, extended_cost_amount DECIMAL(12,2) NOT NULL, extended_profit_amount DECIMAL(12,2) NOT NULL, -- Timestamps for ETL tracking etl_load_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Example query: Daily sales by product categorySELECT d.calendar_date, p.category_name, SUM(f.quantity_sold) AS total_units, SUM(f.extended_sales_amount) AS total_revenue, SUM(f.extended_profit_amount) AS total_profitFROM sales_transaction_fact fJOIN date_dim d ON f.date_key = d.date_keyJOIN product_dim p ON f.product_key = p.product_keyWHERE d.calendar_year = 2024GROUP BY d.calendar_date, p.category_nameORDER BY d.calendar_date, total_revenue DESC;Periodic snapshot fact tables capture the state of a measurement at regular, predictable intervals. Unlike transaction tables that record events as they occur, periodic snapshots take a "photograph" of cumulative metrics at the end of each period.
Characteristics:
Examples:
1234567891011121314151617181920212223242526272829303132333435363738394041
-- Periodic Snapshot Fact Table: Daily Inventory-- Grain: One row per product per warehouse per day CREATE TABLE inventory_daily_snapshot_fact ( -- Composite key (could also use surrogate) date_key INT NOT NULL, product_key INT NOT NULL, warehouse_key INT NOT NULL, -- Semi-additive facts (point-in-time levels) quantity_on_hand INT NOT NULL, -- Current stock quantity_on_order INT NOT NULL, -- Pending receipts quantity_reserved INT NOT NULL, -- Allocated to orders quantity_available INT AS (quantity_on_hand + quantity_on_order - quantity_reserved), -- Additive facts (activity during the day) quantity_received INT DEFAULT 0, -- Units received today quantity_shipped INT DEFAULT 0, -- Units shipped today quantity_returned INT DEFAULT 0, -- Units returned today quantity_adjusted INT DEFAULT 0, -- Inventory adjustments -- Value facts unit_cost DECIMAL(10,2), total_inventory_cost DECIMAL(14,2), -- on_hand × unit_cost PRIMARY KEY (date_key, product_key, warehouse_key)); -- Example query: Average inventory levels over timeSELECT p.category_name, d.month_name, AVG(f.quantity_on_hand) AS avg_daily_inventory, MAX(f.quantity_on_hand) AS peak_inventory, MIN(f.quantity_on_hand) AS trough_inventoryFROM inventory_daily_snapshot_fact fJOIN date_dim d ON f.date_key = d.date_keyJOIN product_dim p ON f.product_key = p.product_keyWHERE d.calendar_year = 2024GROUP BY p.category_name, d.month_name, d.month_numberORDER BY p.category_name, d.month_number;Accumulating snapshot fact tables track the complete lifecycle of a process that has a well-defined beginning, intermediate steps, and end. Unlike transaction tables (which create new rows for each event) or periodic snapshots (which are replaced each period), accumulating snapshots have rows that are updated as the process progresses.
Characteristics:
Examples:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
-- Accumulating Snapshot Fact Table: Order Fulfillment-- Grain: One row per order (updated as order progresses) CREATE TABLE order_fulfillment_fact ( order_key BIGINT PRIMARY KEY, order_number VARCHAR(20) NOT NULL, -- Degenerate dimension -- Foreign keys customer_key INT NOT NULL, product_key INT NOT NULL, source_warehouse_key INT, shipping_carrier_key INT, -- Milestone date keys (updated as process progresses) order_date_key INT NOT NULL, -- Order placed warehouse_assign_date_key INT, -- Warehouse assigned pick_start_date_key INT, -- Picking began pick_complete_date_key INT, -- Picking finished pack_date_key INT, -- Packed ship_date_key INT, -- Shipped delivery_date_key INT, -- Delivered -- Lag measurements (in days or hours) order_to_assign_days INT, -- Assignment delay pick_duration_hours DECIMAL(6,2), -- Time to pick assign_to_ship_days INT, -- Warehouse processing time ship_to_delivery_days INT, -- Transit time total_fulfillment_days INT, -- End-to-end cycle time -- Facts quantity_ordered INT NOT NULL, quantity_shipped INT, quantity_delivered INT, extended_amount DECIMAL(12,2), shipping_cost DECIMAL(10,2), -- Status tracking current_status VARCHAR(20), -- 'ordered','picking','shipped','delivered' is_late BOOLEAN DEFAULT FALSE, promised_delivery_date_key INT, -- ETL metadata last_update_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Example query: Order fulfillment performance analysisSELECT d.month_name, AVG(total_fulfillment_days) AS avg_cycle_time, AVG(assign_to_ship_days) AS avg_warehouse_time, AVG(ship_to_delivery_days) AS avg_transit_time, COUNT(*) AS total_orders, SUM(CASE WHEN is_late THEN 1 ELSE 0 END) AS late_orders, SUM(CASE WHEN is_late THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS late_pctFROM order_fulfillment_fact fJOIN date_dim d ON f.order_date_key = d.date_keyWHERE f.delivery_date_key IS NOT NULL -- Only delivered orders AND d.calendar_year = 2024GROUP BY d.month_name, d.month_numberORDER BY d.month_number;Accumulating snapshot tables require UPDATE operations during ETL, which is more complex than the INSERT-only pattern of transaction and periodic snapshot tables. The operational overhead is justified when analyzing pipeline efficiency, cycle times, and bottleneck detection are critical business needs.
Granularity (also called grain) defines the level of detail stored in a fact table. It answers the question: What does a single row in this fact table represent?
The grain declaration is the most critical decision in fact table design. Every other design choice—which dimensions are needed, which facts make sense, how large the table will grow—follows directly from this single decision.
Every fact table should have a clear, unambiguous grain statement. This statement should be understandable to business users, not just technical staff:
Good grain statements:
Poor (ambiguous) grain statements:
Never, under any circumstances, mix grains in a single fact table. A table where some rows represent individual transactions while other rows represent daily summaries corrupts algebraic properties. SUM, AVG, and COUNT become unreliable. This mistake is nearly impossible to correct without completely rebuilding the table.
The grain choice involves balancing several competing concerns:
The atomic grain is the lowest level of detail captured by a business process. For a retail point-of-sale system, this is typically the individual line item (one product, one quantity, one price) within a transaction.
Why atomic grain is preferred:
Maximum analytical flexibility: You can always aggregate atomic data upward (daily sales from transactions), but you cannot decompose aggregated data downward (transactions from daily summaries).
Handles unforeseen questions: Business users will inevitably ask questions you didn't anticipate. Atomic grain data lets you answer them.
Supports drill-through: When investigating anomalies, analysts need to see the underlying transactions.
The grain declaration determines which dimensions are applicable. If the grain is "one row per line item on a retail transaction," then valid dimensions include:
However, a dimension like "supplier" might not be directly applicable—while products have suppliers, the transaction itself doesn't have a single supplier if multiple products from different suppliers were purchased.
The test: Ask yourself, "For a single fact row, is there exactly one value for this dimension?" If yes, the dimension is compatible with the grain. If no, either the dimension doesn't belong, or you need a bridge table or different grain.
Degenerate dimensions are dimension keys that exist in the fact table but have no corresponding dimension table. They are "degenerate" because they have been stripped of any descriptive attributes—only the key value itself remains meaningful.
Consider a retail fact table at the line-item grain. Each line item belongs to a transaction, identified by a transaction number. But what attributes would a "transaction dimension" contain?
After referencing all the "real" dimensions, the transaction number itself has no remaining descriptive attributes. It's just an identifier. Creating a dimension table with a single column (the transaction number) wastes space and adds join overhead for no benefit.
Solution: Store the transaction number directly in the fact table as a degenerate dimension.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Fact table with degenerate dimensionsCREATE TABLE retail_sales_fact ( sale_line_key BIGINT PRIMARY KEY, -- Degenerate dimensions (stored in fact, no separate dimension table) invoice_number VARCHAR(15) NOT NULL, -- Transaction identifier line_number INT NOT NULL, -- Line within transaction po_number VARCHAR(20), -- Purchase order reference -- Regular foreign keys to dimensions date_key INT NOT NULL, customer_key INT NOT NULL, product_key INT NOT NULL, store_key INT NOT NULL, -- Facts quantity INT NOT NULL, sales_amount DECIMAL(12,2) NOT NULL, cost_amount DECIMAL(12,2) NOT NULL); -- Query using degenerate dimension for drill-through-- "Show me all line items from invoice ABC-2024-00042"SELECT f.invoice_number, f.line_number, p.product_name, f.quantity, f.sales_amountFROM retail_sales_fact fJOIN product_dim p ON f.product_key = p.product_keyWHERE f.invoice_number = 'ABC-2024-00042'ORDER BY f.line_number; -- Transaction-level aggregation using degenerate dimension-- "Average items per transaction"SELECT d.calendar_month, COUNT(DISTINCT f.invoice_number) AS transaction_count, COUNT(*) AS line_item_count, COUNT(*) * 1.0 / COUNT(DISTINCT f.invoice_number) AS avg_items_per_transactionFROM retail_sales_fact fJOIN date_dim d ON f.date_key = d.date_keyGROUP BY d.calendar_monthORDER BY d.calendar_month;Invoice numbers, order numbers, confirmation codes, ticket numbers, claim numbers, and transaction IDs are all common candidates for degenerate dimensions. Any operational system identifier that groups fact rows together but has no additional attributes can be stored directly in the fact table.
A factless fact table is a fact table that contains only dimension foreign keys—no numeric fact columns at all. This might seem paradoxical (a fact table without facts?), but it serves crucial analytical purposes.
Some business events are meaningful simply because they occurred, even if there's no associated measurement. The fact worth recording is the intersection of dimensions—that these particular dimension values occurred together.
Examples:
123456789101112131415161718192021222324252627282930
-- Factless Fact Table: Student Attendance-- Grain: One row per student per class per day attended CREATE TABLE student_attendance_fact ( date_key INT NOT NULL, student_key INT NOT NULL, class_key INT NOT NULL, instructor_key INT NOT NULL, room_key INT NOT NULL, -- Optionally include a dummy fact for counting attendance_count INT DEFAULT 1, PRIMARY KEY (date_key, student_key, class_key)); -- Query: Attendance rate by classSELECT c.class_name, COUNT(*) AS days_attended, (SELECT COUNT(DISTINCT date_key) FROM date_dim WHERE calendar_year = 2024 AND is_school_day = TRUE) AS school_days, COUNT(*) * 100.0 / (SELECT COUNT(DISTINCT date_key) FROM date_dim WHERE calendar_year = 2024 AND is_school_day = TRUE) AS attendance_rateFROM student_attendance_fact fJOIN class_dim c ON f.class_key = c.class_keyJOIN date_dim d ON f.date_key = d.date_keyWHERE d.calendar_year = 2024GROUP BY c.class_name;The second use of factless fact tables is to answer "what could have happened but didn't?" questions. These are coverage or eligibility tables that define all possible combinations.
The problem:
Your sales fact table records what products were sold at each store. But you also want to analyze what products were available at each store but weren't sold. The absence of a fact row means "no event," but you need a baseline to know what was possible.
1234567891011121314151617181920212223242526272829303132
-- Factless Coverage Table: Product-Store-Promotion Coverage-- "What promotions were available for what products at what stores?" CREATE TABLE promotion_coverage_fact ( date_key INT NOT NULL, product_key INT NOT NULL, store_key INT NOT NULL, promotion_key INT NOT NULL, PRIMARY KEY (date_key, product_key, store_key, promotion_key)); -- Query: Products with promotions that had ZERO sales-- (Identify underperforming promotions)SELECT prom.promotion_name, p.product_name, s.store_name, d.calendar_dateFROM promotion_coverage_fact covJOIN product_dim p ON cov.product_key = p.product_keyJOIN store_dim s ON cov.store_key = s.store_keyJOIN promotion_dim prom ON cov.promotion_key = prom.promotion_keyJOIN date_dim d ON cov.date_key = d.date_key-- LEFT JOIN to sales fact to find non-matchesLEFT JOIN sales_fact sf ON cov.date_key = sf.date_key AND cov.product_key = sf.product_key AND cov.store_key = sf.store_key AND cov.promotion_key = sf.promotion_keyWHERE sf.sale_key IS NULL -- No matching saleORDER BY d.calendar_date, prom.promotion_name;Some factless fact tables include a constant value column (always 1) to simplify COUNT operations. This is a stylistic choice. With or without it, COUNT(*) works the same, but some query tools work better when at least one fact column exists.
After decades of data warehouse implementations, a set of proven design practices has emerged. Following these guidelines leads to fact tables that are performant, maintainable, and analytically powerful.
Fact tables dominate warehouse storage. A retail organization with 1,000 stores, 50,000 products, and 5 years of history might have:
At 100 bytes per row (typical for a fact table with 5-10 foreign keys and 4-6 fact columns), this equals:
Add indexes and you might reach 2-3 TB for this single table. This scale is routine for enterprise warehouses, but it demands:
The fact table is the nucleus of dimensional modeling. Every analytical query ultimately aggregates or filters fact table data. Mastering fact table design means understanding not just the technical structure, but the business process it represents.
What's Next:
With fact tables understood, we turn to dimension tables—the structures that give meaning to fact table measurements. Where fact tables answer "how much?" and "how many?", dimensions answer "who?", "what?", "when?", "where?", and "why?" Understanding dimensions completes your foundation for star schema design.
You now understand fact tables at the expert level—their structure, types, granularity, and design patterns. This foundation enables you to design analytical databases that support the complex queries business intelligence demands.