Loading learning content...
Imagine this scenario: Your company's primary MySQL database is approaching its limits. Traffic has grown 10x in two years. You're sharding manually, your application code is riddled with shard-routing logic, cross-shard queries are painfully slow, and your operations team is drowning in complexity. Every scaling solution you've considered requires rewriting your application—a multi-year project no one has the appetite for.
Then someone suggests TiDB.
"Just point your application at TiDB instead of MySQL," they say. "Same protocol, same SQL, same drivers. But now you have a distributed database that scales horizontally across hundreds of nodes."
It sounds too good to be true. And yet, this is exactly what TiDB was designed to deliver—and it's why companies like Pinterest, Square, PayPal, Zhihu, and JD.com have adopted it for their most demanding workloads. TiDB represents a fundamental shift in how we think about database migration and scaling: you don't have to choose between the familiarity of SQL and the scalability of distributed systems.
By the end of this page, you will understand what MySQL compatibility truly means in a distributed context, how TiDB achieves protocol-level and SQL-level compatibility, what the architectural innovations are that make this possible, and the practical implications for migrating from MySQL to TiDB. You'll be equipped to evaluate whether TiDB's compatibility model addresses your organization's needs.
Before we can appreciate TiDB's achievement, we must understand why MySQL compatibility in a distributed database is genuinely difficult. It's not simply about supporting the same SQL syntax—it's about providing behavioral equivalence at multiple layers of the database stack.
The Layers of Compatibility:
Database compatibility operates at several distinct layers, each presenting unique challenges:
1. Wire Protocol Compatibility:
MySQL clients communicate with the server using a specific binary protocol. This protocol defines how connections are established, how authentication works, how queries are sent and results received, how transactions are managed, and how errors are reported. True wire protocol compatibility means existing MySQL client libraries, connection pools, ORMs, and tools can connect to TiDB without modification.
2. SQL Dialect Compatibility:
MySQL's SQL dialect has numerous extensions, quirks, and behaviors that developers rely upon. This includes specific type handling (unsigned integers, zero dates, strict/non-strict modes), MySQL-specific functions (GROUP_CONCAT, IFNULL, DATE_FORMAT with MySQL patterns), indexing behaviors (index hints, invisible indexes, descending indexes), and stored procedures, triggers, and views.
3. Execution Semantics:
Beyond syntax, compatibility requires matching MySQL's behavior in edge cases. This includes how NULLs are compared and sorted, implicit type conversions, character set and collation handling, transaction isolation levels, and locking behavior (for UPDATE, FOR SHARE, lock wait timeouts).
| Layer | What It Covers | Why It's Difficult in Distributed Systems |
|---|---|---|
| Wire Protocol | Connection handling, query/result encoding, authentication | Must handle distributed connection routing, node failures during connection |
| SQL Syntax | MySQL-specific keywords, functions, operators | Large surface area; MySQL has 500+ functions, many with subtle behaviors |
| Data Types | MySQL-specific types (YEAR, SET, ENUM, unsigned integers) | Must maintain type semantics across distributed storage and computation |
| Execution Semantics | Transaction behavior, isolation levels, locking | Distributed transactions behave differently; locking is non-trivial |
| Error Handling | MySQL error codes, SQL states, error messages | New distributed error scenarios must map to familiar MySQL errors |
| System Variables | Session and global variables (sql_mode, autocommit, etc.) | Must maintain variable semantics in distributed context |
Why Other Distributed Databases Didn't Try:
Most distributed databases took a different path. Google Spanner invented its own SQL dialect. CockroachDB chose PostgreSQL compatibility (a smaller, more standardized surface area). Cassandra abandoned SQL entirely for CQL. Even cloud-native databases like Aurora keep MySQL single-node, just with better storage.
The reason is pragmatic: MySQL compatibility is an enormous undertaking. MySQL has 20+ years of features, edge cases, and behaviors that applications depend upon. Matching all of them while fundamentally changing the underlying architecture is like rebuilding a car engine while the car is driving—and ensuring the driver doesn't notice.
TiDB committed to this challenge because of a critical insight: the majority of MySQL workloads don't use the entire feature set. By carefully analyzing real-world usage patterns and implementing the commonly-used 90% with high fidelity, TiDB provides practical compatibility that serves most migration scenarios.
TiDB is MySQL-compatible, not MySQL-identical. There are differences in optimizer behavior, some edge cases in SQL semantics, and certain features that are not implemented. Most applications work without modification, but thorough testing is essential before migration. TiDB maintains a detailed compatibility matrix documenting known differences.
TiDB's architecture is fundamentally designed to separate SQL processing from data storage, enabling MySQL-compatible SQL while using an entirely different storage system. This separation is the key to achieving distributed scale without breaking application compatibility.
The Three-Tier Architecture:
TiDB is built on three core components that work together:
1. TiDB Server (SQL Layer):
2. TiKV (Storage Layer):
3. PD (Placement Driver):
TiDB ARCHITECTURE OVERVIEW════════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────┐ │ CLIENT APPLICATIONS │ │ (MySQL drivers, ORMs, tools) │ └──────────────────┬──────────────────┘ │ MySQL Protocol ▼┌──────────────────────────────────────────────────────────────────┐│ TiDB SERVERS (SQL Layer) ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ TiDB-1 │ │ TiDB-2 │ │ TiDB-N │ STATELESS ││ │ │ │ │ │ │ ││ │ • Parser │ │ • Parser │ │ • Parser │ Each TiDB ││ │ • Optimizer │ │ • Optimizer │ │ • Optimizer │ is identical││ │ • Executor │ │ • Executor │ │ • Executor │ and can ││ │ • TXN Coord │ │ • TXN Coord │ │ • TXN Coord │ handle any ││ └─────────────┘ └─────────────┘ └─────────────┘ query │└──────────────────────────────────────────────────────────────────┘ │ TiKV Client Protocol (gRPC) ▼┌──────────────────────────────────────────────────────────────────┐│ TiKV CLUSTER (Storage Layer) ││ ││ Region 1 Region 2 Region 3 Region N ││ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐││ │Raft │ │Raft │ │Raft │ │Raft │││ │Leader │ │Leader │ │Leader │ │Leader │││ │ │ │ │ │ │ │ │││ │[Data] │ │[Data] │ │[Data] │ │[Data] │││ └────┬───┘ └────┬───┘ └────┬───┘ └────┬───┘││ │ │ │ │ ││ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐││ │Follw 1│ │Follw 1│ │Follw 1│ │Follw 1│││ │Follw 2│ │Follw 2│ │Follw 2│ │Follw 2│││ └───────┘ └───────┘ └───────┘ └───────┘││ ││ Each region (default 96MB) is a Raft group with 3 replicas ││ Regions are automatically split/merged based on size+load │└──────────────────────────────────────────────────────────────────┘ │ Metadata & Schedules ▼┌──────────────────────────────────────────────────────────────────┐│ PD CLUSTER (Placement Driver) ││ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ PD-1 │ │ PD-2 │ │ PD-3 │ ││ │ (Leader) │ │ (Follower) │ │ (Follower) │ ││ └─────────────┘ └─────────────┘ └─────────────┘ ││ ││ Responsibilities: ││ • Cluster metadata (schema, region locations) ││ • Timestamp Oracle (TSO) for transactions ││ • Scheduling (replica placement, load balancing) ││ • Cluster-wide coordination and decision-making │└──────────────────────────────────────────────────────────────────┘Why This Architecture Enables Compatibility:
The separation of SQL processing from storage is crucial for MySQL compatibility:
SQL Layer is Purpose-Built for MySQL: The TiDB server was written from scratch in Go specifically to implement MySQL's SQL dialect. It's not adapting an existing database—it's built for this purpose.
Storage Layer is Transparent: Applications interact only with the SQL layer, which speaks MySQL protocol. The distributed storage (TiKV) is completely hidden from the application.
Stateless SQL Nodes Enable Familiar Patterns: Like MySQL with ProxySQL or MaxScale, TiDB's SQL nodes are stateless. Existing patterns for connection pooling, load balancing, and failover work unchanged.
Transaction Coordination is Internal: Distributed transaction complexity is handled within TiDB. Applications issue BEGIN/COMMIT as they always have; TiDB coordinates the distributed commit behind the scenes.
The Parser and Optimizer:
TiDB's parser is designed to accept MySQL syntax directly. When a query arrives:
TiDB's parser is maintained as a separate open-source project (github.com/pingcap/tidb/parser) and is used by other tools in the ecosystem. This parser is continuously updated to track MySQL's evolving syntax, ensuring TiDB stays compatible with newer MySQL features as they're released.
TiDB implements the MySQL wire protocol—the binary communication format between MySQL clients and servers. This is what enables existing MySQL drivers, connection pools, and tools to work with TiDB without modification.
What the MySQL Protocol Defines:
The MySQL client-server protocol is a binary protocol with several phases:
Connection Phase:
Command Phase:
TiDB implements all of this:
1234567891011121314151617181920212223242526272829303132
MySQL PROTOCOL - CONNECTION HANDSHAKE (from TiDB perspective)═══════════════════════════════════════════════════════════════════ CLIENT TiDB SERVER────── ─────────── │ │ │ ────── TCP Connect to port 4000 ─────────────► │ │ │ │ ◄───── Handshake Packet ────────────────────── │ │ • Protocol version: 10 │ │ • Server version: "5.7.25-TiDB-v7.1.0" │ TiDB reports │ • Thread ID: 12345 │ as MySQL 5.7.25 │ • Capabilities: 0xF7DF │ │ • Auth plugin: caching_sha2_password │ │ │ │ ────── Handshake Response ──────────────────► │ │ • Username: "app_user" │ │ • Auth response: <scrambled password> │ │ • Database: "myapp_production" │ │ │ │ ◄───── OK Packet (Auth successful) ─────────── │ │ │ │ ────── COM_QUERY ───────────────────────────► │ │ "SELECT * FROM users WHERE id = 1" │ │ │ │ ◄───── Column Definition Packets ───────────── │ │ ◄───── Row Data Packets ────────────────────── │ │ ◄───── EOF Packet ──────────────────────────── │ │ │ From the client's perspective, this is identical to MySQL.The client has no knowledge it's talking to a distributed database.Protocol Features Supported:
TiDB supports the core protocol features that applications depend on:
Version String Strategy:
TiDB reports itself as MySQL 5.7.25 (or configurable) in the handshake:
"5.7.25-TiDB-v7.1.0"
This is intentional. Many client libraries and ORMs check the version string to determine feature availability. By reporting as MySQL 5.7, TiDB ensures clients use compatible code paths. The "-TiDB" suffix allows TiDB-aware tools to detect the actual database when needed.
ORM Compatibility:
Because ORMs use these drivers, they also work with TiDB:
This compatibility extends to database tools:
TiDB's stateless design means connections can be load-balanced across TiDB servers. However, applications with session-level state (temporary tables, user variables, prepared statements) should use sticky sessions or connection affinity to ensure consistent behavior within a session.
Beyond protocol compatibility, TiDB implements comprehensive MySQL SQL syntax. This includes not just standard SQL, but MySQL's specific extensions and behaviors that applications rely upon.
DDL Compatibility:
TiDB supports MySQL's DDL statements with distributed extensions:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Standard MySQL CREATE TABLE syntax works directlyCREATE TABLE users ( id BIGINT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(255) NOT NULL UNIQUE, username VARCHAR(100) NOT NULL, password_hash CHAR(60) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, status ENUM('active', 'inactive', 'pending') DEFAULT 'pending', metadata JSON, INDEX idx_username (username), INDEX idx_created_at (created_at), FULLTEXT INDEX idx_email_ft (email) -- Supported in newer TiDB versions) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; -- TiDB-specific: Shard Row ID to distribute AUTO_INCREMENTCREATE TABLE orders ( id BIGINT PRIMARY KEY AUTO_RANDOM, -- AUTO_RANDOM for distributed workloads user_id BIGINT NOT NULL, total_amount DECIMAL(10, 2), ordered_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (user_id) REFERENCES users(id)) SHARD_ROW_ID_BITS = 4 PRE_SPLIT_REGIONS = 4;-- This pre-splits the table into 16 regions for better write distribution -- ALTER TABLE works as expectedALTER TABLE users ADD COLUMN phone VARCHAR(20), ADD INDEX idx_phone (phone), MODIFY COLUMN metadata JSON COMMENT 'User metadata in JSON format'; -- Partitioning (MySQL syntax)CREATE TABLE events ( id BIGINT PRIMARY KEY AUTO_INCREMENT, event_type VARCHAR(50), occurred_at DATETIME NOT NULL, payload JSON) PARTITION BY RANGE (YEAR(occurred_at)) ( PARTITION p2022 VALUES LESS THAN (2023), PARTITION p2023 VALUES LESS THAN (2024), PARTITION p2024 VALUES LESS THAN (2025), PARTITION pmax VALUES LESS THAN MAXVALUE);DML and Query Compatibility:
TiDB supports the full range of MySQL DML statements and query features:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
-- Standard INSERT with various formsINSERT INTO users (email, username, password_hash) VALUES ('alice@example.com', 'alice', '$2a$10$...'); INSERT INTO users (email, username, password_hash) VALUES ('bob@example.com', 'bob', '$2a$10$...'), ('carol@example.com', 'carol', '$2a$10$...'); -- REPLACE and INSERT ... ON DUPLICATE KEY UPDATE (MySQL-specific)REPLACE INTO users (id, email, username, password_hash) VALUES (1, 'alice_new@example.com', 'alice', '$2a$10$...'); INSERT INTO users (email, username, password_hash)VALUES ('alice@example.com', 'alice', '$2a$10$...')ON DUPLICATE KEY UPDATE username = VALUES(username), updated_at = CURRENT_TIMESTAMP; -- Complex SELECT with MySQL-specific featuresSELECT u.id, u.username, COUNT(o.id) AS order_count, COALESCE(SUM(o.total_amount), 0) AS total_spent, GROUP_CONCAT(o.id ORDER BY o.ordered_at DESC SEPARATOR ',') AS recent_orders, IF(COUNT(o.id) > 10, 'VIP', 'Regular') AS customer_tierFROM users uLEFT JOIN orders o ON u.id = o.user_idWHERE u.status = 'active' AND u.created_at >= DATE_SUB(NOW(), INTERVAL 1 YEAR)GROUP BY u.id, u.usernameHAVING total_spent > 100ORDER BY total_spent DESCLIMIT 100 OFFSET 0; -- MySQL-specific functionsSELECT IFNULL(nickname, username) AS display_name, DATE_FORMAT(created_at, '%Y-%m-%d %H:%i:%s') AS formatted_date, UNIX_TIMESTAMP(created_at) AS created_timestamp, JSON_EXTRACT(metadata, '$.preferences.theme') AS theme_preference, REGEXP_REPLACE(email, '@.*', '@***') AS masked_emailFROM users; -- Window functions (MySQL 8.0+ syntax)SELECT id, username, created_at, ROW_NUMBER() OVER (ORDER BY created_at) AS signup_rank, DENSE_RANK() OVER (PARTITION BY status ORDER BY created_at) AS rank_in_status, LAG(username) OVER (ORDER BY created_at) AS previous_signupFROM users;Transactional SQL:
TiDB supports MySQL's transactional statements with distributed semantics:
123456789101112131415161718192021222324252627
-- Standard transaction syntaxSTART TRANSACTION; SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;-- Acquired distributed pessimistic lock on row UPDATE accounts SET balance = balance - 100 WHERE id = 1;UPDATE accounts SET balance = balance + 100 WHERE id = 2; COMMIT;-- 2PC across all affected TiKV regions -- Savepoints (supported)BEGIN;UPDATE accounts SET balance = balance - 50 WHERE id = 1;SAVEPOINT before_second_update;UPDATE accounts SET balance = balance + 50 WHERE id = 2;ROLLBACK TO SAVEPOINT before_second_update;COMMIT; -- Transaction isolation levelsSET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- DefaultSET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; -- TiDB-specific: Optimistic vs Pessimistic transactionsSET tidb_txn_mode = 'optimistic'; -- Detect conflicts at commit timeSET tidb_txn_mode = 'pessimistic'; -- Acquire locks at statement execution (default since TiDB 4.0)TiDB defaults to pessimistic transactions (since v4.0) for maximum MySQL compatibility. Optimistic mode offers better performance for low-contention workloads but requires applications to handle commit conflicts. Most MySQL applications expect pessimistic behavior, so the default is usually correct.
Despite extensive compatibility, TiDB has differences from MySQL that you must understand before migration. These stem from fundamental architectural differences—TiDB is a distributed system, while MySQL is designed for single-node operation.
AUTO_INCREMENT Behavior:
In MySQL, AUTO_INCREMENT generates strictly sequential integers. In TiDB, strict sequentiality would require global coordination (a bottleneck), so AUTO_INCREMENT provides:
For applications that require sequential IDs for ordering (bad practice but common), TiDB offers:
AUTO_INCREMENT with AUTO_ID_CACHE=1 (slower but more sequential)AUTO_RANDOM for better distribution (recommended for high-concurrency)| Feature | MySQL Behavior | TiDB Behavior | Recommendation |
|---|---|---|---|
| AUTO_INCREMENT | Strictly sequential | Unique, mostly monotonic | Use AUTO_RANDOM for new tables; don't rely on ID ordering |
| Storage Engine | InnoDB, MyISAM, etc. | TiKV only (InnoDB syntax accepted, ignored) | No action needed; ENGINE clause is accepted for compatibility |
| Stored Procedures | Full support | Experimental/Limited | Move logic to application layer or use simpler procedures |
| Triggers | Full support | Not supported | Implement in application layer |
| Foreign Keys | Enforced | Parsed but not enforced (before v6.6); enforced in v6.6+ | Upgrade to v6.6+ if needed; otherwise enforce in application |
| Query Optimizer | Cost-based, MySQL hints | Cost-based, different cost model | May need query tuning; use TiDB-specific hints |
| Full-Text Search | InnoDB fulltext | Limited support | Consider external search (Elasticsearch) |
| Spatial Data | Geometry types, indexes | Types supported; indexes limited | Evaluate requirements; may need external spatial DB |
Transaction Handling Differences:
While TiDB supports MySQL's transaction syntax, the distributed nature affects behavior:
Lock Wait Behavior: In MySQL, row locks wait until timeout or the lock is released. In TiDB, the default deadlock detection is more aggressive, which may surface deadlocks faster.
Large Transactions: TiDB has limits on transaction size (default 100MB) because large transactions affect cluster stability. Very large batch operations should be broken into smaller transactions.
External Consistency: TiDB provides external consistency (linearizability) which is stronger than MySQL's default. This is usually an improvement but may cause causal delays for reads immediately after writes in different sessions.
Features Not Supported:
Some MySQL features are not available in TiDB:
Before migrating any production workload, run comprehensive integration tests with your actual queries and access patterns. Most applications work without changes, but edge cases in your specific SQL usage may differ. TiDB provides tools like 'SHOW WARNINGS' and compatibility reports to identify potential issues.
TiDB's MySQL compatibility isn't just technical—it's designed to enable practical migration paths. Organizations can choose migration strategies based on their risk tolerance and timeline.
Migration Strategy Options:
1. Full Cutover Migration:
2. DM (Data Migration) with Syncer:
3. Dual-Write with Shadow Traffic:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
# TiDB Data Migration (DM) configuration example# This enables real-time replication from MySQL to TiDB name: "mysql-to-tidb-migration"task-mode: "all" # full + incremental sync target-database: host: "tidb-cluster.internal" port: 4000 user: "migration_user" password: "${TIDB_PASSWORD}" mysql-instances: - source-id: "mysql-primary-1" block-allow-list: "production-dbs" mydumper-config-name: "global" loader-config-name: "global" syncer-config-name: "global" block-allow-list: production-dbs: do-dbs: ["myapp_production", "myapp_analytics"] do-tables: - db-name: "myapp_production" tbl-name: "~.*" # All tables ignore-tables: - db-name: "myapp_production" tbl-name: "sessions" # Skip session table mydumpers: global: threads: 8 chunk-filesize: 64 # MB loaders: global: pool-size: 32 dir: "/tmp/dm-data" syncers: global: worker-count: 64 batch: 100 enable-gtid: true safe-mode: true # Idempotent mode for safety # Migration maintains binlog position for resumability# After full sync, incremental sync keeps TiDB in sync with MySQL# Cutover: Stop MySQL writes → verify sync → redirect to TiDBApplication Changes Required:
For most applications, the migration requires only connection string changes:
Before (MySQL):
host=mysql.internal port=3306 dbname=myapp user=app password=secret
After (TiDB):
host=tidb.internal port=4000 dbname=myapp user=app password=secret
Additional considerations:
Connection Pool Sizing: TiDB handles more concurrent connections efficiently (it's distributed), but you may want to adjust pool sizes
Retry Logic: Add retry logic for transactional conflicts (especially if using optimistic transactions)
Large Transactions: Break batch operations into smaller chunks (<100MB per transaction by default)
Monitoring Integration: Update dashboards to use TiDB's Prometheus metrics
Companies like Zhihu (Chinese Q&A platform similar to Quora) migrated from MySQL to TiDB to serve 500+ million users. They reported that 90% of their SQL queries required zero modification. The remaining 10% needed minor optimizations, primarily around query plan hints and transaction sizing.
We've explored TiDB's approach to MySQL compatibility—what it means, how it's achieved, and what limitations exist. Let's consolidate the key insights:
What's Next:
MySQL compatibility solves the migration problem, but it doesn't explain why you'd want TiDB in the first place. In the next page, we'll explore Horizontal Scalability—how TiDB scales to handle data and traffic that would crush a single MySQL instance. We'll examine the distributed architecture in detail and understand how TiDB achieves scale while maintaining the SQL semantics we've discussed.
You now understand TiDB's approach to MySQL compatibility—the protocol, SQL dialect, and behavioral compatibility that enables migration from MySQL to a distributed database. Next, we'll explore how TiDB scales horizontally to handle workloads that would exceed any single MySQL instance.