Loading content...
In traditional architectures, there's always a gap between operational data and analytical insights.
Your transactional database captures every customer action, every order, every payment—but querying it for analytics would crush your production system. So you build a data warehouse. You create ETL pipelines to copy data nightly. You hire data engineers to maintain the pipeline. You train analysts to understand that yesterday's data is the best they'll get.
This gap has consequences:
TiDB's HTAP (Hybrid Transactional/Analytical Processing) architecture eliminates this gap. Through TiFlash—a columnar storage engine that replicates data in real-time from TiKV—you can run complex analytical queries directly on live transactional data without impacting OLTP performance.
This isn't a compromise. It's a fundamental rethinking of how databases can serve both workloads simultaneously.
By the end of this page, you will understand the architectural differences between OLTP and OLAP workloads, how TiFlash provides columnar storage alongside TiKV's row storage, how real-time replication keeps analytical data current, and how TiDB's optimizer chooses between row and columnar storage for each query.
Before diving into TiDB's HTAP architecture, we must understand why traditional systems separate OLTP and OLAP workloads—and why unifying them is technically challenging.
OLTP (Online Transaction Processing):
OLTP workloads are characterized by:
OLAP (Online Analytical Processing):
OLAP workloads are characterized by:
| Characteristic | OLTP | OLAP |
|---|---|---|
| Query Pattern | Point lookups, narrow ranges | Full scans, complex joins |
| Rows Accessed | 1-100 per query | Millions per query |
| Columns Accessed | All columns (SELECT *) | Few columns (aggregations) |
| Latency Requirement | <10ms | Seconds to minutes |
| Concurrency | Thousands of users | Tens of analysts |
| Write Pattern | Frequent, small writes | Rare bulk loads |
| Data Freshness | Must be real-time | Can tolerate delay |
| Optimal Storage | Row-oriented | Column-oriented |
Why Storage Layout Matters:
The fundamental tension is storage layout. Consider a table with 1 million rows and 20 columns:
Row-Oriented Storage (TiKV):
Row 1: [col1, col2, col3, ..., col20]
Row 2: [col1, col2, col3, ..., col20]
...
Column-Oriented Storage (TiFlash):
Column 1: [row1_val, row2_val, ..., row1M_val]
Column 2: [row1_val, row2_val, ..., row1M_val]
...
The Traditional Solution:
Because of this tension, organizations traditionally maintained separate systems:
TiDB's HTAP architecture challenges this by maintaining both storage layouts simultaneously, with real-time replication keeping them in sync.
You might wonder: can't we use indexes for analytics? Indexes help with point lookups, but aggregate queries (SUM, COUNT) must still scan all matching rows. A well-designed columnar store with compression can be 10-100x more efficient for analytical queries than row stores with indexes.
TiFlash is TiDB's columnar storage engine, designed specifically for analytical workloads. It runs alongside TiKV, maintaining a columnar replica of data that's optimized for aggregations, scans, and complex queries.
Key Design Principles:
Columnar Organization: Data is stored by column, not by row, enabling efficient compression and vectorized processing.
Learner Replicas: TiFlash nodes join Raft groups as learner replicas—they receive data updates but don't participate in leader election or write consensus.
Asynchronous Replication: TiFlash replicas lag slightly behind TiKV (typically milliseconds to seconds), but TiDB ensures read consistency when needed.
Delta-Main Architecture: Recent changes are stored in a row-oriented delta layer, periodically merged into the columnar main layer.
Storage Architecture:
TIFLASH STORAGE ARCHITECTURE════════════════════════════════════════════════════════════════════ TIDB CLUSTER WITH TIFLASH───────────────────────── ┌─────────────────────────────────────────────────────────────────┐│ TiDB Servers ││ (SQL Layer - decides whether to use TiKV or TiFlash) │└─────────────────────────────────────────────────────────────────┘ │ │ ▼ ▼ ┌───────────────────────────┐ ┌───────────────────────────┐ │ TiKV CLUSTER │ │ TiFlash CLUSTER │ │ (Row Storage) │ │ (Columnar Storage) │ ├───────────────────────────┤ ├───────────────────────────┤ │ │ │ │ │ Region 1 [L] Region 2[F]│ │ Region 1 [Learner] │ │ Region 3 [F] Region 4[L]│ │ Region 2 [Learner] │ │ ... │ │ Region 3 [Learner] │ │ │ │ ... │ │ • Point lookups │──►│ • Analytical queries │ │ • Small transactions │ │ • Full scans │ │ • Raft leaders/followers │ │ • Aggregations │ │ │ │ • Raft learner only │ └───────────────────────────┘ └───────────────────────────┘ │ Raft Replication (async, learner) │ ┌───────┴───────┐ │ │ ▼ ▼ TIFLASH NODE INTERNAL STRUCTURE─────────────────────────────── ┌─────────────────────────────────────────────────────────────────┐│ TiFlash Node │├─────────────────────────────────────────────────────────────────┤│ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ Delta Layer (Row Format) │ ││ │ Recent writes, stored in row format for fast ingestion │ ││ │ ┌─────────────────────────────────────────────────────────┐│ ││ │ │ INSERT row1: (id=1001, name="Alice", amount=150.00) ││ ││ │ │ INSERT row2: (id=1002, name="Bob", amount=200.00) ││ ││ │ │ UPDATE row 1001: amount = 175.00 ││ ││ │ └─────────────────────────────────────────────────────────┘│ ││ └────────────────────────────────────────────────────────────┘ ││ │ Periodic Merge ││ ▼ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ Main Layer (Columnar Format) │ ││ │ Historical data, stored in columnar format with encoding │ ││ │ │ ││ │ Column: id [1, 2, 3, 4, ..., 1000000] (compressed) │ ││ │ Column: name ["Alice", "Bob", "Carol", ...] (dict) │ ││ │ Column: amount [95.0, 150.0, 200.0, ...] (run-length) │ ││ │ Column: timestamp [2024-01-01, 2024-01-01, ...] (delta) │ ││ │ │ ││ │ Compression: LZ4, ZSTD, Dictionary, Run-Length, Delta │ ││ │ Typical compression ratio: 5-20x │ ││ └────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Columnar Advantages for Analytics:
I/O Efficiency: Only read columns needed for query (often 10-20% of data)
Compression: Same-type values compress much better (10-20x is common)
Vectorized Execution: Process batches of column values using SIMD instructions
Enabling TiFlash Replicas:
123456789101112131415161718192021222324252627282930
-- Enable TiFlash replica for a tableALTER TABLE orders SET TIFLASH REPLICA 1; -- Enable TiFlash replica for partition tablesALTER TABLE events SET TIFLASH REPLICA 2; -- 2 replicas for HA -- Check TiFlash replica statusSELECT TABLE_SCHEMA, TABLE_NAME, REPLICA_COUNT, LOCATION_LABELS, AVAILABLE, PROGRESSFROM information_schema.tiflash_replica; /*+--------------+------------+---------------+-----------------+-----------+----------+| TABLE_SCHEMA | TABLE_NAME | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |+--------------+------------+---------------+-----------------+-----------+----------+| mydb | orders | 1 | NULL | 1 | 1.0 || mydb | events | 2 | NULL | 1 | 1.0 |+--------------+------------+---------------+-----------------+-----------+----------+*/ -- PROGRESS = 1.0 means replication is complete-- AVAILABLE = 1 means TiFlash can serve queries -- Remove TiFlash replica (rare, usually for cost reduction)ALTER TABLE logs SET TIFLASH REPLICA 0;TiFlash uses Raft learner replicas, which receive data but don't participate in write consensus. This means adding TiFlash doesn't increase write latency—TiKV writes complete without waiting for TiFlash acknowledgment. TiFlash catches up asynchronously.
The magic of TiDB's HTAP lies in how TiFlash maintains a consistent view of data without impacting OLTP performance. This is achieved through Raft Learner replicas.
What is a Raft Learner?
In the Raft consensus protocol, learners are special replicas:
This makes learners perfect for TiFlash:
RAFT LEARNER REPLICATION FLOW════════════════════════════════════════════════════════════════════ WRITE: INSERT INTO orders VALUES (1001, 'Alice', 150.00, NOW()); STEP 1: Normal Raft Commit (TiKV only)─────────────────────────────────────── Client │ └──► TiKV Leader (Region 42) │ ├──► TiKV Follower 1: AppendLog ─► ACK ├──► TiKV Follower 2: AppendLog ─► ACK │ └── Majority achieved (2 of 2 followers) │ └──► Commit & Apply ─► Response to Client (SUCCESS) │ │ (Client returns here, write complete) STEP 2: Async Learner Replication (TiFlash)─────────────────────────────────────────── ... milliseconds later ... TiKV Leader (Region 42) │ └──► TiFlash Learner: AppendLog (async) │ ▼ ┌──────────────┐ │ Raft Log │ │ Received │ └──────┬───────┘ │ ▼ ┌──────────────┐ │ Apply to │ │ Delta Layer │ (row format for fast apply) └──────┬───────┘ │ ▼ (background, periodic) ┌──────────────┐ │ Merge to │ │ Main Layer │ (columnar format) └──────────────┘ REPLICATION LAG CHARACTERISTICS─────────────────────────────── Typical Lag: - Normal operations: 10-100ms - Under heavy write load: 100ms - 1s - Catching up after TiFlash restart: seconds to minutes Query Implications: - Strong reads (read at specific timestamp): May wait for TiFlash to catch up - Weak reads (read latest in TiFlash): Slightly stale, no waiting - Default: TiDB uses strong reads for consistencyConsistency Guarantees:
TiDB provides different consistency options for TiFlash reads:
1. Strong Consistency (Default):
2. Weak Consistency:
Read Consistency in Practice:
123456789101112131415161718192021222324252627282930313233
-- Check TiFlash replication lagSELECT ADDRESS, STORE_ID, REGION_COUNT, --- Learner safe TS indicates how far TiFlash has caught up TIFLASH_VERSIONFROM information_schema.tiflash_cluster_info; -- Default: Strong consistency (waits for TiFlash to catch up)SELECT COUNT(*), SUM(amount) FROM orders WHERE order_date = CURDATE();-- This will wait if TiFlash hasn't received today's latest writes -- Force TiFlash execution with hintSELECT /*+ READ_FROM_STORAGE(TIFLASH[orders]) */ customer_id, COUNT(*) as order_count, SUM(amount) as total_spentFROM ordersWHERE created_at >= '2024-01-01'GROUP BY customer_idHAVING total_spent > 1000; -- For dashboards where slight staleness is acceptable-- Set tiflash_allow_read_stale_data for the sessionSET SESSION tiflash_allow_stale_read = ON; -- Or use AS OF TIMESTAMP for point-in-time readsSELECT COUNT(*) FROM orders AS OF TIMESTAMP TIMESTAMPADD(SECOND, -60, NOW());-- Reads data as of 60 seconds ago (definitely available in TiFlash)For financial reporting or compliance, use strong consistency to ensure accuracy. For real-time dashboards, stale reads (10-30 seconds) are often acceptable and reduce query latency. Match consistency mode to business requirements.
One of TiDB's most powerful features is its ability to automatically choose between TiKV (row storage) and TiFlash (columnar storage) based on query characteristics. The optimizer analyzes each query and selects the optimal engine—or even uses both engines in a single query.
How the Optimizer Decides:
TiDB's cost-based optimizer estimates the cost of executing each query plan on TiKV vs TiFlash:
Factors Favoring TiKV:
Factors Favoring TiFlash:
The optimizer considers:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
-- EXPLAIN shows which engine TiDB will use -- Example 1: Point lookup → TiKVEXPLAIN SELECT * FROM orders WHERE id = 12345;/*+-------------------------+----------+-----------+---------------+--------------------------------+| id | estRows | task | access object | operator info |+-------------------------+----------+-----------+---------------+--------------------------------+| Point_Get | 1.00 | root | table:orders | handle:12345 |+-------------------------+----------+-----------+---------------+--------------------------------+ ↑ task=root means TiKV (point get is always TiKV)*/ -- Example 2: Aggregation → TiFlash (automatic)EXPLAIN SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spentFROM ordersWHERE created_at >= '2024-01-01'GROUP BY customer_id;/*+----------------------------------+------------+-------------------+---------------+--------------------------------+| id | estRows | task | access object | operator info |+----------------------------------+------------+-------------------+---------------+--------------------------------+| HashAgg | 50000.00 | root | | group by:orders.customer_id || └─TableReader | 1000000.00 | root | | data:ExchangeSender || └─ExchangeSender | 1000000.00 | mpp[tiflash] | | ExchangeType: PassThrough || └─HashAgg | 50000.00 | mpp[tiflash] | | group by:orders.customer_id || └─TableFullScan | 1000000.00 | mpp[tiflash] | table:orders | keep order:false |+----------------------------------+------------+-------------------+---------------+--------------------------------+ ↑ task=mpp[tiflash] indicates TiFlash execution with MPP (Massively Parallel Processing)*/ -- Example 3: Hybrid - Join TiFlash aggregation with TiKV lookupEXPLAIN SELECT c.name, c.email, stats.order_count, stats.total_spentFROM customers cINNER JOIN ( SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent FROM orders WHERE created_at >= '2024-01-01' GROUP BY customer_id HAVING COUNT(*) > 10) stats ON c.id = stats.customer_id;/*+------------------------------------------+----------+-------------------+---------------+---------------------------+| id | estRows | task | access object | operator info |+------------------------------------------+----------+-------------------+---------------+---------------------------+| Projection | 1000.00 | root | | ... || └─HashJoin | 1000.00 | root | | inner join || ├─TableReader(Build) | 1000.00 | root | | table:customers || │ └─TableFullScan | 1000.00 | cop[tikv] | table:c | || └─TableReader(Probe) | 1000.00 | root | | data:ExchangeSender || └─... | | mpp[tiflash] | | aggregation on TiFlash |+------------------------------------------+----------+-------------------+---------------+---------------------------+ ↑ cop[tikv] for customer lookup, mpp[tiflash] for order aggregation*/ -- Force specific engine with hints-- Force TiFlashSELECT /*+ READ_FROM_STORAGE(TIFLASH[orders]) */ SUM(amount) FROM orders; -- Force TiKV SELECT /*+ READ_FROM_STORAGE(TIKV[orders]) */ SUM(amount) FROM orders; -- For most queries, let the optimizer decide (it's usually right)MPP (Massively Parallel Processing):
For complex analytical queries, TiFlash doesn't just store data in columnar format—it also processes queries using MPP (Massively Parallel Processing).
MPP distributes query execution across all TiFlash nodes:
This enables complex queries to scale horizontally—adding TiFlash nodes increases analytical query capacity.
Analytical queries that take minutes on TiKV often complete in seconds on TiFlash/MPP. Organizations report 10-100x speedups for dashboard queries after enabling TiFlash, with no changes to their SQL.
HTAP architecture enables use cases that were previously impractical or required complex infrastructure. Let's explore scenarios where TiDB's unified OLTP/OLAP capability provides significant value.
1. Real-Time Dashboards on Transactional Data:
Traditional approach:
With TiDB HTAP:
123456789101112131415161718192021222324252627282930313233343536373839
-- Real-time revenue dashboard (runs every minute)SELECT DATE_FORMAT(ordered_at, '%Y-%m-%d %H:00') as hour, COUNT(*) as orders, SUM(total_amount) as revenue, AVG(total_amount) as avg_order_value, COUNT(DISTINCT customer_id) as unique_customersFROM ordersWHERE ordered_at >= DATE_SUB(NOW(), INTERVAL 24 HOUR)GROUP BY hourORDER BY hour DESC; -- Query hits TiFlash (automatic), returns in ~100ms-- Same query on TiKV would take 5-10 seconds (full scan) -- Customer segmentation (would traditionally require data warehouse)WITH customer_metrics AS ( SELECT customer_id, COUNT(*) as order_count, SUM(total_amount) as lifetime_value, DATEDIFF(NOW(), MAX(ordered_at)) as days_since_last_order, DATEDIFF(NOW(), MIN(ordered_at)) as customer_age_days FROM orders GROUP BY customer_id)SELECT CASE WHEN lifetime_value > 10000 AND days_since_last_order < 30 THEN 'VIP Active' WHEN lifetime_value > 10000 THEN 'VIP At-Risk' WHEN lifetime_value > 1000 AND days_since_last_order < 60 THEN 'Regular Active' WHEN lifetime_value > 1000 THEN 'Regular At-Risk' ELSE 'Occasional' END as segment, COUNT(*) as customer_count, SUM(lifetime_value) as segment_value, AVG(order_count) as avg_ordersFROM customer_metricsGROUP BY segment;2. Anti-Fraud and Anomaly Detection:
Fraud detection requires analyzing patterns across large datasets in real-time. With HTAP:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Fraud detection: Find accounts with unusual activity in last hour-- This needs OLAP (aggregate analysis) + OLTP (immediate action) WITH hourly_patterns AS ( -- TiFlash: Analyze all transactions in last hour SELECT account_id, COUNT(*) as txn_count, SUM(amount) as total_amount, COUNT(DISTINCT merchant_id) as unique_merchants, COUNT(DISTINCT country_code) as countries_used FROM transactions WHERE created_at >= DATE_SUB(NOW(), INTERVAL 1 HOUR) GROUP BY account_id),historical_patterns AS ( -- TiFlash: Get 30-day averages for comparison SELECT account_id, AVG(daily_txns) as avg_daily_txns, STDDEV(daily_txns) as stddev_txns, AVG(daily_amount) as avg_daily_amount FROM ( SELECT account_id, DATE(created_at) as day, COUNT(*) as daily_txns, SUM(amount) as daily_amount FROM transactions WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY) GROUP BY account_id, day ) daily GROUP BY account_id)SELECT h.account_id, h.txn_count as hourly_txns, h.total_amount as hourly_amount, h.countries_used, p.avg_daily_txns, -- Flag if hourly activity exceeds 50% of average daily CASE WHEN h.txn_count > p.avg_daily_txns * 0.5 THEN 'HIGH_FREQUENCY' ELSE 'NORMAL' END as frequency_flag, -- Flag if multiple countries in one hour CASE WHEN h.countries_used > 2 THEN 'MULTI_GEO' ELSE 'NORMAL' END as geo_flagFROM hourly_patterns hJOIN historical_patterns p ON h.account_id = p.account_idWHERE h.txn_count > p.avg_daily_txns * 0.5 -- Anomaly threshold OR h.countries_used > 2; -- Geographic anomaly -- Results feed into TiKV-based blocking/alerting system3. Personalization and Recommendation:
Recommendation engines need to:
With separate systems, recommendations are based on stale data. With HTAP, recommendations can consider the user's last-minute activity.
4. Operational Intelligence:
The Common Thread:
All these use cases share a pattern: analytics that inform real-time operations. When analytics and transactions are in separate systems with ETL between them, there's always a gap. HTAP closes that gap.
| Use Case | OLAP Component | OLTP Component | Traditional Challenge | HTAP Benefit |
|---|---|---|---|---|
| Real-time Dashboards | Aggregate metrics | Record transactions | Stale data (T+1) | Live data (seconds) |
| Fraud Detection | Pattern analysis | Transaction blocking | Detection delay | Real-time detection |
| Personalization | User behavior aggregation | Serve recommendations | Stale recommendations | Current-session aware |
| Inventory Optimization | Demand forecasting | Stock updates | Delayed reordering | Dynamic optimization |
| Customer 360 | Aggregate history | Serve to agents | Incomplete view | Complete, current view |
You don't need TiFlash on every table. Start with the 3-5 largest tables that drive analytical queries (orders, transactions, events). The cost-benefit is highest for large, frequently-aggregated tables.
Deploying TiFlash effectively requires understanding its resource requirements and architectural implications.
Resource Requirements:
TiFlash nodes have different resource profiles than TiKV nodes:
CPU:
Memory:
Storage:
Network:
| Scale | Nodes | CPU | Memory | Storage |
|---|---|---|---|---|
| Small (< 1TB) | 2-3 | 16 cores | 64 GB | 500 GB NVMe |
| Medium (1-10TB) | 3-5 | 32 cores | 128 GB | 2 TB NVMe |
| Large (10-100TB) | 5-10+ | 48 cores | 256 GB | 4 TB NVMe |
Deployment Patterns:
1. Dedicated TiFlash Nodes (Recommended):
2. Shared Development/Test:
TiFlash High Availability:
TiFlash replicas work like TiKV:
Recommendation: 2 TiFlash replicas for HA, 1 for non-critical analytics
123456789101112131415161718192021222324252627282930313233343536
-- Monitor TiFlash status and performance -- TiFlash cluster overviewSELECT * FROM information_schema.tiflash_cluster_info; -- Tables with TiFlash replicasSELECT TABLE_SCHEMA, TABLE_NAME, REPLICA_COUNT, AVAILABLE, PROGRESSFROM information_schema.tiflash_replicaWHERE REPLICA_COUNT > 0; -- Check if TiFlash is being used (query history)EXPLAIN ANALYZESELECT customer_id, COUNT(*), SUM(amount)FROM ordersGROUP BY customer_id; -- Look for "tiflash" in the task column:/*+----------------------------------+----------+---------+-----------+-------------------+| id | estRows | actRows | task | access object |+----------------------------------+----------+---------+-----------+-------------------+| HashAgg | 50000.00 | 48721 | root | || └─TableReader | 50000.00 | 48721 | root | || └─ExchangeSender | 50000.00 | 48721 | mpp[tiflash] | ← TiFlash/MPP+----------------------------------+----------+---------+-----------+-------------------+*/ -- TiFlash-specific metrics (in Prometheus/Grafana)-- tiflash_proxy_apply_log_duration_seconds: Replication lag-- tiflash_system_current_metric_Region_Count: Regions in TiFlash-- tiflash_coprocessor_executor_count: Query processing statsWhen you first enable TiFlash replicas on a large table, initial sync can take hours depending on data size and network bandwidth. Plan for this during off-peak hours. Monitor PROGRESS in tiflash_replica until it reaches 1.0.
We've explored TiDB's HTAP capabilities—how TiFlash enables real-time analytics on transactional data. Let's consolidate the key principles:
What's Next:
We've covered MySQL compatibility, horizontal scalability, and HTAP capabilities. The final piece is understanding when to choose TiDB—the decision framework for evaluating TiDB against alternatives like MySQL, PostgreSQL, CockroachDB, and other distributed databases. We'll examine the tradeoffs, sweet spots, and anti-patterns to help you make the right choice for your specific requirements.
You now understand TiDB's HTAP architecture—how TiFlash provides columnar storage for analytics, how real-time replication keeps data current, and how the optimizer routes queries to the appropriate engine. Next, we'll explore when TiDB is the right choice for your system.