Loading content...
Operational databases are in constant motion. Every second brings new orders, updated customer profiles, changed inventory levels, deleted records. This volatility is essential—it reflects the living, changing state of business operations.
Now imagine trying to analyze a moving target:
You start building a monthly revenue report at 2:00 PM. By 3:00 PM, when you run the query again to verify, the numbers have changed—new transactions arrived, adjustments were posted, records were deleted. Which version is 'correct'? Neither—both were true at their moment of execution.
This is the volatility problem in analytical systems. When data changes during analysis, results become inconsistent, irreproducible, and untrustworthy.
Non-volatility is the data warehouse characteristic that addresses this fundamental challenge. Warehouse data, once loaded, is stable. It doesn't change during analytical windows. Reports run today and tomorrow on the same data produce identical results. The analytical foundation is solid.
By the end of this page, you will deeply understand non-volatility: what it means operationally, why it's essential for trustworthy analytics, how it's implemented through load patterns and isolation mechanisms, and how it differs from operational ACID properties. You'll learn to design warehouse load strategies that maintain stability while enabling fresh data.
Non-volatility means that once data is correctly loaded into the data warehouse, it is not updated or deleted as part of normal operations. The warehouse is primarily read-only between load operations, and loads themselves are controlled, scheduled events—not continuous streams.
"Data in the operational environment is regularly updated, changed, and deleted. Data in the data warehouse is loaded and accessed, but is not updated (in the general sense of the word update) in the normal course of processing." — Bill Inmon
What Non-Volatility Provides:
Non-volatility is implemented through the load-only pattern: data enters the warehouse through controlled load processes, and once loaded, existing data is not modified.
The Three Operations:
Load Windows and Stable Windows:
The warehouse alternates between two states:
Loading Window — The controlled period when ETL processes run, inserting new data and (rarely) correcting errors. This window is typically scheduled (nightly, hourly) and brief relative to the stable window.
Stable Window — The analytical window between loads. Data is read-only. All queries see consistent data. This is when users interact with the warehouse.
The clear separation between loading and querying enables both data freshness and query stability—a balance operational systems cannot achieve.
The simplest implementation of non-volatility is append-only design: never update or delete existing rows. New records are inserted; corrections are handled by inserting correction/adjustment records. This pattern naturally preserves audit trails and maximizes query stability. Many modern data platforms optimize heavily for append-only workloads.
While non-volatility is the default, certain legitimate scenarios require controlled modifications to warehouse data. These are exceptions, not the norm:
| Exception Type | Description | Handling Approach |
|---|---|---|
| Error Correction | Source data was loaded incorrectly due to ETL bugs or source system errors | Controlled correction process with audit logging; may require fact table updates or dimension version corrections |
| Late-Arriving Dimensions | Dimension information arrives after related facts were loaded | Update dimension records to fill missing attributes; may trigger reprocessing of affected aggregations |
| Late-Arriving Facts | Transaction data arrives days or weeks after the event occurred | Insert with original transaction date (not load date); handle in aggregation refresh |
| SCD Type 1 Corrections | Attribute corrections that don't require history (e.g., fixing typos) | Direct update to current dimension row; by definition, these don't require historical tracking |
| Regulatory Deletion | GDPR 'right to be forgotten' or similar compliance requirements | Controlled deletion or anonymization with audit trail; may require cascading to all related records |
| Aggregation Refreshes | Pre-computed aggregates need recalculation when source data changes | Scheduled or triggered rebuild of affected aggregate tables/materialized views |
Every exception to non-volatility should be governed. Who can approve data corrections? What audit trail is created? How are downstream consumers notified? When is it safe to make changes? Without governance, exceptions become backdoors that undermine warehouse reliability.
Correction Patterns:
When corrections are necessary, several patterns preserve as much non-volatility benefit as possible:
Reversal Records — Instead of updating a fact row, insert a reversing entry (negative quantity) followed by a corrected entry. The audit trail is preserved.
Logical Deletion — Instead of physical DELETE, set an 'is_deleted' flag. Row remains for historical queries but is excluded from current views.
Version Increment — For dimensions, create a new version (Type 2) rather than updating in place, backfilling the effective date.
Restatement Tables — Maintain separate 'restated' fact tables for corrected data while preserving original tables for audit.
Non-volatility isn't just about data integrity—it enables significant performance optimizations that would be impossible in volatile, update-heavy environments:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
-- ==============================================-- EXAMPLE: Aggressive Indexing in Non-Volatile Warehouse-- ============================================== -- In a volatile OLTP system, this many indexes would kill write performance.-- In a non-volatile warehouse, they optimize diverse query patterns with minimal cost. CREATE TABLE fact_sales ( sale_key BIGINT PRIMARY KEY, customer_key INT, product_key INT, store_key INT, date_key INT, quantity INT, revenue DECIMAL(12,2), cost DECIMAL(12,2), load_date DATE); -- Indexes for common access patternsCREATE INDEX idx_sales_customer ON fact_sales(customer_key);CREATE INDEX idx_sales_product ON fact_sales(product_key);CREATE INDEX idx_sales_store ON fact_sales(store_key);CREATE INDEX idx_sales_date ON fact_sales(date_key); -- Composite indexes for common query combinationsCREATE INDEX idx_sales_date_store ON fact_sales(date_key, store_key);CREATE INDEX idx_sales_customer_date ON fact_sales(customer_key, date_key);CREATE INDEX idx_sales_product_date ON fact_sales(product_key, date_key); -- Covering indexes for frequent aggregation queriesCREATE INDEX idx_sales_date_revenue ON fact_sales(date_key) INCLUDE (revenue, quantity); -- Bitmap indexes (in databases that support them) for low-cardinalityCREATE BITMAP INDEX idx_sales_customer_seg ON fact_sales(customer_segment); -- ==============================================-- MATERIALIZED VIEW: Pre-computed daily aggregates-- Only refreshed on load; not real-time maintained-- ============================================== CREATE MATERIALIZED VIEW mv_daily_sales_summary ASSELECT d.date_key, d.calendar_date, d.year_month, s.store_key, st.region, SUM(f.revenue) AS total_revenue, SUM(f.quantity) AS total_units, COUNT(DISTINCT f.customer_key) AS unique_customersFROM fact_sales fJOIN dim_date d ON f.date_key = d.date_keyJOIN dim_store st ON f.store_key = s.store_keyGROUP BY d.date_key, d.calendar_date, d.year_month, s.store_key, st.region; -- Refresh after each load cycle, not continuously-- REFRESH MATERIALIZED VIEW mv_daily_sales_summary;A common tension in warehouse design is balancing non-volatility (stability) with the desire for real-time or near-real-time data. How do we maintain analytical consistency while reducing latency?
The Spectrum of Latency:
| Pattern | Latency | Non-Volatility Approach | Use Case |
|---|---|---|---|
| Batch/Nightly | 12-24 hours | Full non-volatility. Load once per day. Stable for 24 hours. | Traditional BI, historical analysis, classic reporting |
| Micro-Batch | 5-60 minutes | Near-non-volatile. Frequent small loads. Short stable windows. | Operational reporting, dashboards requiring freshness |
| Near-Real-Time | Seconds-minutes | Streaming ingestion with periodic snapshots for stability. | Live dashboards, operational monitoring |
| Real-Time | Sub-second | Hybrid architecture: streaming layer + stable warehouse. | Fraud detection, real-time personalization, alerting |
Lambda and Kappa Architectures:
Two architectural patterns address real-time requirements while preserving non-volatility benefits:
Lambda Architecture:
Kappa Architecture:
Both architectures recognize that real-time freshness and stable analysis serve different needs—and sometimes both are required.
Many organizations discover that 15-minute latency satisfies most 'real-time' requirements. True sub-second needs are rarer than assumed. Design for the latency actually needed—not imagined. Over-engineering for real-time often sacrifices the reliability benefits of non-volatility without delivering proportional value.
Non-volatility is closely related to—but distinct from—snapshot isolation in database systems. Understanding their relationship clarifies how warehouse consistency is achieved:
Snapshot Isolation (Transaction Level)
A database isolation level where each transaction sees a consistent snapshot of data as of transaction start. Changes made by other concurrent transactions are invisible.
Non-Volatility (Warehouse Level)
An architectural property where data is loaded periodically and remains stable between loads. The warehouse as a whole has extended stability windows.
Practical Implications:
In a non-volatile warehouse:
Non-volatility provides stronger guarantees than snapshot isolation for analytical workloads, with simpler implementation.
Modern cloud warehouses (Snowflake, BigQuery, Databricks) combine both: they use snapshot isolation for transaction consistency AND encourage non-volatile loading patterns for analytical consistency. Continuous streaming ingest is supported, but best practices still recommend controlled load windows for stable analytical workloads.
Implementing non-volatility requires deliberate architectural and operational choices:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
-- ==============================================-- AUDIT TABLE: Track all data modifications-- ============================================== CREATE TABLE etl_data_modification_log ( modification_id BIGINT PRIMARY KEY IDENTITY, modification_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP, table_name VARCHAR(128) NOT NULL, operation VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE row_count INT NOT NULL, performed_by VARCHAR(100) NOT NULL, -- For approved exceptions is_standard_load BOOLEAN DEFAULT TRUE, exception_type VARCHAR(50), -- 'ERROR_CORRECTION', 'COMPLIANCE', etc. approval_ticket VARCHAR(50), -- For troubleshooting batch_id VARCHAR(100), source_description VARCHAR(500)); -- ==============================================-- TRIGGER: Alert on unexpected modifications-- ============================================== -- Example: Trigger to log and alert on fact table updatesCREATE OR REPLACE FUNCTION log_unexpected_modification()RETURNS TRIGGER AS $$BEGIN -- Only INSERT is expected for fact tables IF TG_OP IN ('UPDATE', 'DELETE') THEN INSERT INTO etl_data_modification_log ( table_name, operation, row_count, performed_by, is_standard_load, exception_type ) VALUES ( TG_TABLE_NAME, TG_OP, 1, CURRENT_USER, FALSE, 'UNEXPECTED_MODIFICATION' ); -- In production: also send alert to operations team -- PERFORM send_alert('Unexpected ' || TG_OP || ' on ' || TG_TABLE_NAME); END IF; RETURN COALESCE(NEW, OLD);END;$$ LANGUAGE plpgsql; CREATE TRIGGER trg_fact_sales_modificationBEFORE UPDATE OR DELETE ON fact_salesFOR EACH ROW EXECUTE FUNCTION log_unexpected_modification();Non-volatility completes the four defining characteristics of a data warehouse. Let's consolidate what we've learned about this final pillar:
| Characteristic | Core Meaning | Key Implementation |
|---|---|---|
| Subject-Oriented | Organized around business subjects, not applications | Dimensional modeling, fact-dimension schemas, business naming |
| Integrated | Unified from heterogeneous sources with consistent semantics | ETL standardization, entity resolution, conformed dimensions |
| Time-Variant | Historical depth preserved across extended time horizons | Slowly Changing Dimensions, temporal keys, date dimensions |
| Non-Volatile | Stable between loads; append-dominant operations | Load windows, read-only periods, change governance |
Module Complete: Data Warehouse Concepts
You have now mastered the foundational concepts that define what a data warehouse is and why each characteristic matters. These aren't abstract principles—they're design requirements that shape every decision in warehouse architecture:
With these concepts internalized, you're prepared to explore the practical implementation: star schemas, snowflake schemas, ETL processes, and OLAP operations in the modules ahead.
You now understand all four defining characteristics of a data warehouse as articulated by Bill Inmon. Subject-Orientation organizes data around business subjects. Integration unifies heterogeneous sources. Time-Variance preserves historical depth. Non-Volatility ensures analytical stability. These pillars provide the foundation for all warehouse design decisions ahead.