Oltp Vs Olap Considerations - Learning Module

Loading content...

0/241

Hybrid Approaches

Bridging the OLTP-OLAP Divide

The traditional separation between OLTP (normalized, transactional) and OLAP (denormalized, analytical) systems served organizations well for decades. OLTP systems processed business operations during the day; nightly ETL jobs loaded data into OLAP warehouses for next-day analytics. This architecture was clean, well-understood, and effective.

But modern business demands more:

"I need to see today's sales—not yesterday's."
"Our fraud detection must analyze transactions in milliseconds, not hours."
"Real-time dashboards should show current inventory, not overnight snapshots."
"Machine learning models need to score against live data."

These requirements drive the convergence of transactional and analytical processing into hybrid architectures that provide both capabilities—sometimes in the same system, sometimes through carefully integrated separate systems.

What You Will Master

By the end of this page, you will understand the spectrum of hybrid approaches from HTAP databases to lambda architectures. You'll learn to evaluate when hybrid systems make sense versus maintaining separation, understand the emerging data lakehouse pattern, and design architectures that balance transactional integrity with analytical speed.

The Evolution Toward Hybrid Architectures

Understanding hybrid architectures requires understanding how we arrived at the OLTP/OLAP split—and why market forces now push toward convergence.

Historical Context:

In the 1980s-1990s, database technology couldn't efficiently serve both transactional and analytical workloads. The fundamental differences in access patterns (point lookups vs. scans), data characteristics (current vs. historical), and user expectations (thousands of concurrent users vs. few heavy users) required specialized systems.

The data warehouse emerged as the solution: a separate analytical system fed by ETL from operational sources. This architecture dominated for three decades.

Evolution of Data Architecture
Era	Architecture	Analytics Latency	Limitations
1980s-1990s	Separate OLTP + Data Warehouse	Days to weeks	Data staleness, ETL complexity
2000s-2010s	Enterprise Data Warehouse (EDW)	Hours to days	Cost, rigidity, still batch-oriented
2010s	Data Lake + Warehouse	Hours	Complexity, data swamp risk
2020s	HTAP / Data Lakehouse	Minutes to seconds	Emerging, complexity varies

Drivers of Convergence:

Why Hybrid Now?

•Business Speed Expectations — Digital businesses expect real-time insights. Amazon adjusts pricing in real-time; Netflix personalizes in milliseconds; Uber optimizes routes continuously.
•Technology Advances — In-memory computing, columnar storage, vectorized processing, and cloud scalability enable new architectures that were impossible before.
•Streaming Data — IoT, clickstreams, and event-driven architectures generate continuous data flows that don't fit batch ETL patterns.
•Machine Learning Integration — ML models need fresh data for inference but historical data for training—blending operational and analytical needs.
•Cost of Duplication — Maintaining separate OLTP and OLAP systems means duplicate storage, duplicate governance, duplicate skills.

Not Either/Or

Hybrid architectures don't eliminate the OLTP/OLAP distinction—they manage it differently. The fundamental differences remain; hybrid approaches provide ways to serve both needs without complete separation.

HTAP: Hybrid Transaction/Analytical Processing

HTAP (Hybrid Transaction/Analytical Processing) databases attempt to serve both transactional and analytical workloads from a single system. Rather than replicating data to a separate warehouse, HTAP runs analytics against the same data that serves transactions.

How HTAP Works:

HTAP databases typically employ dual storage engines or innovative architectures that handle both access patterns:

1. Row + Column Storage:

Recent/hot data stored in row-oriented format for OLTP
Older/cold data stored in columnar format for OLAP
Automatic migration between formats

2. In-Memory with Disk Persistence:

OLTP operations against in-memory data
OLAP operations use snapshots or replicas
Changes replicated with minimal lag

3. Read Replicas with Analytical Optimization:

Primary handles OLTP
Read replicas optimized for analytical queries
Near-real-time synchronization

HTAP Database Technologies
Database	HTAP Approach	Strengths	Considerations
TiDB	Separate TiKV (row) + TiFlash (column)	Strong consistency, MySQL compatible	Complex deployment
SingleStore (MemSQL)	Universal storage (row + column)	Fast ingestion, real-time analytics	Memory-intensive
SAP HANA	In-memory with columnar + row tables	Proven enterprise scale	Cost, vendor lock-in
SQL Server (Operational Analytics)	Columnstore indexes on OLTP tables	No data movement	Index overhead on writes
PostgreSQL + Citus	Distributed with columnar extension	Open source, familiar	Requires tuning
CockroachDB	Distributed SQL with analytical capabilities	Geo-distributed, strong consistency	Analytical features emerging

htap_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- HTAP EXAMPLE: SQL Server Operational Analytics
-- Same table serves OLTP transactions AND analytical queries
 
CREATE TABLE sales (
    sale_id         BIGINT IDENTITY(1,1) PRIMARY KEY,
    sale_date       DATE NOT NULL,
    customer_id     INT NOT NULL,
    product_id      INT NOT NULL,
    quantity        INT NOT NULL,
    unit_price      DECIMAL(10,2) NOT NULL,
    total_amount    AS (quantity * unit_price) PERSISTED,
    
    -- B-tree indexes for OLTP point lookups
    INDEX idx_customer NONCLUSTERED (customer_id),
    INDEX idx_product NONCLUSTERED (product_id),
    
    -- Columnstore index for OLAP scans and aggregations
    INDEX idx_columnstore NONCLUSTERED COLUMNSTORE
        (sale_date, customer_id, product_id, quantity, unit_price, total_amount)
);
 
-- OLTP Transaction: Insert new sale
-- Uses row storage, B-tree indexes
INSERT INTO sales (sale_date, customer_id, product_id, quantity, unit_price)
VALUES ('2024-01-15', 1001, 5023, 3, 29.99);
 
-- OLTP Query: Get customer's recent orders
-- Uses row storage, B-tree index on customer_id
SELECT * FROM sales 
WHERE customer_id = 1001 
ORDER BY sale_date DESC;
 
-- OLAP Query: Analyze sales by product category (joins to products table)
-- Uses columnstore index for scan and aggregation
SELECT 
    p.category,
    YEAR(s.sale_date) AS year,
    SUM(s.total_amount) AS revenue,
    COUNT(*) AS transaction_count
FROM sales s
JOIN products p ON s.product_id = p.product_id
WHERE s.sale_date >= '2023-01-01'
GROUP BY p.category, YEAR(s.sale_date)
ORDER BY year, revenue DESC;
 
-- Both queries execute against the SAME table
-- Query optimizer automatically chooses appropriate storage/index

HTAP Trade-offs:

Advantages:

Real-time analytics on current data
Reduced data duplication and storage
Simplified architecture (fewer systems to manage)
Reduced ETL latency (no batch loading)

Challenges:

Write performance may suffer from analytical overhead
Complex to tune for both workloads
Resource contention between OLTP and OLAP
May not scale to extreme volumes for either workload

HTAP Is Not Universal

HTAP works well for moderate-scale mixed workloads. High-volume OLTP (millions of TPS) or massive-scale OLAP (petabytes of historical data) still often require specialized systems. Evaluate your actual workload requirements before committing to HTAP.

Lambda and Kappa Architectures

For large-scale systems requiring both real-time and historical analytics, Lambda and Kappa architectures provide frameworks for managing the complexity.

Lambda Architecture:

Proposed by Nathan Marz, Lambda architecture divides processing into three layers:

Batch Layer: Processes complete historical data, creates comprehensive batch views
Speed Layer: Processes real-time streams, creates incremental real-time views
Serving Layer: Merges batch and real-time views to answer queries

The architecture acknowledges that batch processing is more accurate (reprocesses everything) while streaming processing is more timely (but may have gaps or approximations). By combining both, you get the best of both worlds.

Converting Mermaid diagram...

Lambda Architecture Components:

Lambda Architecture Layers
Layer	Purpose	Latency	Typical Technologies
Speed Layer	Process real-time events, low-latency views	Seconds	Kafka, Flink, Spark Streaming, Druid
Batch Layer	Complete reprocessing, highest accuracy	Hours	Spark, Hadoop, BigQuery, Redshift
Serving Layer	Merge views, serve queries	Milliseconds	Cassandra, Redis, Elasticsearch, Druid

Kappa Architecture:

Jay Kreps proposed Kappa as a simplification of Lambda. Instead of maintaining separate batch and speed layers, Kappa treats all data as streams:

All events flow through a streaming platform (Kafka)
The stream log becomes the "source of truth"
Reprocessing replays the log through updated processing logic
Only one code path (streaming) instead of two (batch + streaming)

Kappa Advantages:

Simpler: one processing paradigm, one codebase
Easier deployment and testing
Reduced operational complexity

Kappa Challenges:

Requires robust stream storage (Kafka with long retention)
Complex aggregations may be harder in streaming
Reprocessing large histories is slower than batch

lambda_vs_kappa.md
Comparison
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Lambda Architecture
 
### Workflow:
1. Events → Kafka → Speed Layer (Flink) → Real-time view (Redis)
2. Events → S3 (raw) → Batch job (Spark) → Batch view (Redshift)
3. Query merges Redis (recent) + Redshift (historical)
 
### Code complexity:
- streaming_process.scala (speed layer logic)
- batch_process.scala (batch layer logic)  
- merge_views.scala (serving layer logic)
- Two different implementations for same business logic!
 
 
## Kappa Architecture
 
### Workflow:
1. Events → Kafka (long-term retention)
2. Kafka → Stream processor (Flink) → Real-time view
3. To reprocess: replay Kafka from beginning through updated Flink job
 
### Code complexity:
- stream_process.scala (one implementation)
- Reprocessing uses same code, different Kafka offset
 
 
## When to Choose
 
### Lambda:
- Very large historical datasets (PB+)
- Complex aggregations that don't fit streaming well
- Need guaranteed complete accuracy on historical data
- Already have batch infrastructure
 
### Kappa:
- Can store complete event history in Kafka
- Streaming logic can handle all processing needs
- Want simpler operations and deployment
- New system without batch legacy

Practical Reality

Most production systems end up with hybrid approaches that don't strictly follow either architecture. The key insight from both is treating events as the source of truth and being explicit about latency-accuracy trade-offs.

Operational Data Stores (ODS)

The Operational Data Store (ODS) is a hybrid that sits between OLTP sources and the data warehouse. It provides near-real-time integrated data for operational reporting without the full transformation to dimensional models.

ODS Characteristics:

Near-Real-Time Loading: Updated continuously or with very short batch windows (minutes)
Subject-Oriented: Organized by business subjects (customers, orders) not source systems
Integrated: Combines data from multiple source systems into consistent format
Volatile: Data is updated in place (not historical snapshots)
Supports Operational Queries: Current state queries, not historical trends

ODS Use Cases

•Customer 360 View: Real-time combined view of customer across systems
•Operational Dashboards: Current inventory, order status, call center metrics
•Data Quality Hub: Integrated, cleansed data for downstream systems
•Enterprise Service Bus Source: Consistent master data for APIs
•Feed to Data Warehouse: Cleansed staging area before warehouse load

ODS Limitations

•No Historical Trend Analysis: Data is current-state only
•Not Optimized for Complex Analytics: Not denormalized for aggregations
•Limited User Base: Too volatile for ad-hoc analysis
•Scope Creep Risk: Can become a shadow data warehouse
•Synchronization Complexity: Keeping consistent with multiple sources

ods_architecture.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
-- OPERATIONAL DATA STORE: Normalized, Near-Real-Time, Current State
 
-- ODS is typically normalized (3NF) like OLTP sources
-- But integrates data from multiple source systems
 
CREATE TABLE ods_customer (
    customer_id         VARCHAR(50) PRIMARY KEY,
    source_system       VARCHAR(20) NOT NULL,  -- 'CRM', 'ECOM', 'POS'
    
    -- Integrated/reconciled customer data
    customer_name       VARCHAR(255) NOT NULL,
    email               VARCHAR(255),
    phone               VARCHAR(50),
    
    -- Best address from multiple sources
    street_address      VARCHAR(255),
    city                VARCHAR(100),
    state               VARCHAR(50),
    postal_code         VARCHAR(20),
    country             VARCHAR(2),
    
    -- Lifecycle tracking (current state, not historical)
    customer_status     VARCHAR(20),
    customer_segment    VARCHAR(50),
    lifetime_value      DECIMAL(12,2),
    
    -- Audit/integration metadata
    crm_customer_id     VARCHAR(50),  -- FK to source CRM
    ecom_customer_id    VARCHAR(50),  -- FK to source E-commerce
    pos_customer_id     VARCHAR(50),  -- FK to source POS
    
    last_activity_date  TIMESTAMP,
    last_sync_date      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
 
CREATE TABLE ods_order (
    order_id            VARCHAR(50) PRIMARY KEY,
    source_system       VARCHAR(20) NOT NULL,
    
    customer_id         VARCHAR(50) REFERENCES ods_customer(customer_id),
    
    -- Current order state
    order_status        VARCHAR(30),
    order_total         DECIMAL(12,2),
    
    order_date          TIMESTAMP,
    last_status_change  TIMESTAMP,
    expected_delivery   TIMESTAMP,
    
    last_sync_date      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
 
-- ODS Query: Real-time Customer 360
SELECT 
    c.customer_id,
    c.customer_name,
    c.email,
    c.customer_segment,
    c.lifetime_value,
    COUNT(o.order_id) AS total_orders,
    SUM(o.order_total) AS total_revenue,
    MAX(o.order_date) AS last_order_date,
    COUNT(CASE WHEN o.order_status = 'Pending' THEN 1 END) AS pending_orders
FROM ods_customer c
LEFT JOIN ods_order o ON c.customer_id = o.customer_id
WHERE c.customer_id = 'CUST-12345'
GROUP BY c.customer_id, c.customer_name, c.email, 
         c.customer_segment, c.lifetime_value;
 
 
-- ODS to Data Warehouse: Feed dimensional load
INSERT INTO dw_staging.dim_customer_staging (
    customer_id, customer_name, email, segment, city, state, country
)
SELECT 
    customer_id, customer_name, email, customer_segment, 
    city, state, country
FROM ods_customer
WHERE last_sync_date > (SELECT max_load_date FROM dw_control.load_log);

ODS in Modern Architecture

The traditional ODS concept has evolved. Modern equivalents include CDC-fed real-time data hubs, API-first data platforms, and streaming state stores. The core concept—integrated, current-state, operational data—remains valuable.

The Data Lakehouse Pattern

The data lakehouse represents the latest evolution in hybrid architectures, combining the best features of data lakes and data warehouses while minimizing their weaknesses.

The Problem It Solves:

Data Lake Issues:

Lack of ACID transactions
Poor performance for BI/SQL queries
Data quality/governance challenges
Schema-on-read complexity

Data Warehouse Issues:

High cost for storage
Limited support for unstructured data
Separate system from data lake (duplication)
Poor for ML/data science workloads

The Lakehouse Solution:

Lakehouse adds a metadata layer and transactional capabilities on top of data lake storage (typically cloud object storage like S3). This enables warehouse-like features without warehouse limitations.

Lakehouse Technologies
Technology	Foundation	Key Features	Ecosystem
Delta Lake	Apache Spark + Parquet	ACID, time travel, schema evolution, Z-ordering	Databricks, Azure, AWS
Apache Iceberg	Multiple engines + Parquet/ORC	Hidden partitioning, schema evolution, time travel	Netflix, Apple, Snowflake
Apache Hudi	Spark/Flink + Parquet	Upserts, incremental processing, CDC	Uber, AWS
Snowflake External Tables	Snowflake + S3/GCS/Azure	SQL on lake data, managed metadata	Snowflake ecosystem

lakehouse_example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
-- DATA LAKEHOUSE: Delta Lake Example (Databricks/Spark SQL)
 
-- Create lakehouse table with ACID support on object storage
CREATE TABLE gold.sales_fact (
    sale_id         BIGINT,
    sale_date       DATE,
    customer_id     BIGINT,
    product_id      BIGINT,
    quantity        INT,
    unit_price      DECIMAL(10,2),
    total_amount    DECIMAL(12,2),
    region          STRING,
    channel         STRING
)
USING DELTA
PARTITIONED BY (sale_date)
LOCATION 's3://my-lakehouse/gold/sales_fact';
 
 
-- ACID Transactions: MERGE for upserts (like warehouse)
MERGE INTO gold.sales_fact AS target
USING staging.new_sales AS source
ON target.sale_id = source.sale_id
WHEN MATCHED THEN
    UPDATE SET 
        quantity = source.quantity,
        total_amount = source.total_amount
WHEN NOT MATCHED THEN
    INSERT *;
 
 
-- Time Travel: Query historical data versions
-- What did the data look like 7 days ago?
SELECT * FROM gold.sales_fact 
VERSION AS OF 7
WHERE sale_date = '2024-01-01';
 
-- Or by timestamp
SELECT * FROM gold.sales_fact 
TIMESTAMP AS OF '2024-01-08 10:00:00';
 
 
-- Schema Evolution: Add columns without breaking existing queries
ALTER TABLE gold.sales_fact 
ADD COLUMN discount_code STRING;
 
 
-- Z-Order: Optimize for analytical query patterns
OPTIMIZE gold.sales_fact 
ZORDER BY (region, customer_id);
-- Data physically reorganized for faster filtering on these columns
 
 
-- Analytical Query: Same syntax as data warehouse
SELECT 
    region,
    DATE_TRUNC('month', sale_date) AS month,
    SUM(total_amount) AS revenue,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM gold.sales_fact
WHERE sale_date >= '2024-01-01'
GROUP BY region, DATE_TRUNC('month', sale_date)
ORDER BY month, revenue DESC;
 
 
-- ML Integration: Delta table directly usable in ML pipelines
-- (From same lakehouse, no data movement)
/*
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
 
df = spark.read.format("delta").load("s3://my-lakehouse/gold/sales_fact")
# Build ML model directly on lakehouse data
*/

Lakehouse Benefits

•Unified Storage — One copy of data serves BI queries, ML workloads, and data science exploration. No copying between lake and warehouse.
•Open Formats — Parquet/ORC files on object storage. No vendor lock-in; multiple engines can access the same data.
•ACID Transactions — Reliable updates, deletes, and upserts that were impossible in traditional data lakes.
•Time Travel — Query historical versions of data for debugging, auditing, or reproducible ML experiments.
•Schema Evolution — Add columns, change types without breaking downstream consumers.
•Cost Efficiency — Object storage (S3/GCS) is 10-100x cheaper than data warehouse storage.

Medallion Architecture

Lakehouses commonly use 'medallion' (bronze/silver/gold) architecture: Bronze = raw ingested data, Silver = cleaned/enriched, Gold = business-level aggregations/dimensions. This provides clear data quality tiers while maintaining lakehouse benefits at each level.

Choosing Your Architecture

With multiple hybrid patterns available, selecting the right architecture requires systematic evaluation of requirements.

Architecture Selection Matrix
Requirement	Traditional (Separate Systems)	HTAP	Lambda/Kappa	Lakehouse
Analytics latency	Hours-Days	Seconds	Seconds-Minutes	Minutes-Hours
OLTP performance	Excellent	Good	Excellent	N/A (separate)
Historical analysis	Excellent	Limited	Excellent	Excellent
ML/Data Science	Separate tools	Limited	Good	Excellent
Operational complexity	Medium	Low-Medium	High	Medium
Cost (at scale)	High	Medium	Medium-High	Low-Medium
Maturity	Battle-tested	Emerging	Proven at scale	Rapidly maturing

Decision Questions:

Evaluation Criteria

•What latency do users truly need? — "Real-time" often means "fresher than what we have now," not actual milliseconds. If daily reports suffice, don't over-engineer.
•What's the OLTP transaction volume? — High-volume payment processing (10K+ TPS) probably needs dedicated OLTP. Low-volume internal apps might handle HTAP.
•What's the analytical data volume? — Petabytes of history needs warehouse/lakehouse scale. Terabyte-scale might work in HTAP.
•Who are the users? — Data scientists need lakehouse flexibility. Business users need warehouse simplicity. Developers need HTAP immediacy.
•What's the team's expertise? — New architectures require new skills. Kafka/Flink expertise is different from Spark/warehouse expertise.
•What's the budget? — Cloud warehouses bill for compute; lakehouses bill for storage; HTAP requires sized databases. Model total cost of ownership.

architecture_recommendations.md
Recommendations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## Architecture Recommendations by Scenario
 
### Startup / New Product
**Recommended: Lakehouse or Simple Warehouse**
- Start with Snowflake/BigQuery/Databricks
- Add real-time when you actually need it
- Don't over-engineer for scale you don't have
 
### Mid-Size Company, Mixed Workloads
**Recommended: HTAP or Warehouse + CDC Streaming**
- Consider TiDB, SingleStore for unified approach
- Or PostgreSQL + Debezium/Kafka for streaming integration
- Focus on common use cases, add complexity only when needed
 
### Enterprise, High-Volume OLTP + Analytics
**Recommended: Lambda/Kappa + Data Lakehouse**
- Dedicated OLTP systems (Oracle, SQL Server, PostgreSQL)
- Kafka for event streaming backbone
- Lakehouse (Delta/Iceberg) for analytical storage
- Connect Spark/Flink for stream processing
 
### Real-Time Analytics at Scale (Uber, Netflix scale)
**Recommended: Purpose-Built Stack**
- Specialized OLTP (Vitess, CockroachDB, Spanner)
- Kafka/Pulsar for event backbone
- Flink for complex stream processing
- Druid/ClickHouse for real-time OLAP
- Delta Lake for historical analytics
- Each component optimized for its role

Premature Optimization Warning

The most common mistake is building complex hybrid architectures before they're needed. Start simple. Separate OLTP and OLAP is proven and maintainable. Add hybrid elements when you have concrete requirements that justify complexity.

Summary: Bridging Transactional and Analytical Worlds

We've explored the spectrum of hybrid approaches that bridge OLTP and OLAP paradigms. Let's consolidate the key principles:

Key Takeaways

•Business demands drive convergence — Real-time analytics, ML integration, and streaming data push organizations beyond traditional batch-oriented separation.
•HTAP databases offer unified systems — Single databases serving both workloads simplify architecture but require careful workload balancing.
•Lambda/Kappa architectures handle streaming at scale — Explicit trade-offs between batch accuracy and streaming speed, with different levels of complexity.
•ODS provides operational integration — Near-real-time, normalized, current-state data serving operational reporting and downstream feeds.
•Data lakehouse unifies analytics — Warehouse capabilities on lake storage, enabling BI and ML on shared data with open formats.
•Architecture choice depends on context — No universal best answer; evaluate latency needs, scale, team skills, and existing infrastructure.

Module Complete:

With this page, you've completed the OLTP vs OLAP Considerations module. You now understand:

Why normalization is essential for OLTP systems
Why denormalization is essential for OLAP systems
How star schema formalizes OLAP denormalization
How dimensional modeling provides the design methodology
How hybrid approaches bridge both worlds

This knowledge enables you to design database schemas appropriate for any workload—transactional, analytical, or hybrid. You can evaluate architectural options and make informed decisions about normalization vs. denormalization based on actual system requirements.

Module Complete

Congratulations! You've mastered the OLTP vs OLAP considerations for database design. From normalized transactional systems to denormalized analytical warehouses to modern hybrid architectures, you now have a complete framework for evaluating and designing schemas appropriate for any workload pattern. This knowledge is essential for any senior database architect or data engineer.