Loading learning content...
Every database, no matter how powerful, eventually hits a wall. This wall isn't a bug to fix or an optimization to discover—it's a fundamental limitation of running on a single machine. The most expensive server money can buy has finite CPU cores, finite memory, finite disk I/O, and finite network bandwidth. When your data grows beyond what one machine can handle, you face a critical architectural decision.
Sharding—also called horizontal partitioning—is the strategy of splitting a database across multiple machines, where each machine holds a subset of the data. Unlike vertical scaling (buying bigger hardware) or replication (copying data for redundancy), sharding actually divides your dataset, allowing you to scale storage and throughput almost linearly by adding more nodes.
By the end of this page, you will understand why sharding becomes necessary, the specific limitations it addresses, and the fundamental tradeoffs you accept when moving from a single database to a sharded architecture. This understanding is essential before diving into specific sharding strategies.
To understand why sharding is necessary, let's trace the journey of a growing application—a story that plays out in thousands of companies every year.
Phase 1: The Single Database (0 to 1 million users)
Your startup launches with a single PostgreSQL or MySQL instance. Everything works beautifully. Queries are fast, joins are simple, transactions are straightforward. You focus on building features, not infrastructure. This is the golden era.
Phase 2: Read Scaling with Replicas (1 to 10 million users)
As traffic grows, read queries start competing with write queries. You add read replicas. Now reads fan out across multiple replicas while writes go to a single primary. This buys time, but only for read-heavy workloads.
Phase 3: Vertical Scaling (10 to 50 million users)
Write traffic increases. You upgrade to increasingly expensive hardware—more RAM, faster SSDs, more CPU cores. Each upgrade is costly and provides diminishing returns. You're now paying $50,000/month for a database server.
Phase 4: The Breaking Point (50+ million users)
You hit hard limits that no amount of money can solve:
Many teams fall into the trap of continuously buying bigger servers, hoping the next upgrade will 'hold for another year.' This is expensive and ultimately futile. The largest available servers have known limits, and cloud providers charge premium prices at the top tier. Worse, you're building on a foundation that cannot support your future growth.
Let's put concrete numbers to these limits. Understanding the math helps you predict when sharding becomes necessary for your system.
Storage Growth Analysis
Consider a system that stores user activity data. Each user generates 10 events per day, each event is 1KB on average. With 10 million daily active users:
With retention requirements and indexes (which often 2-3x your data size), you're looking at 100TB+ within two years. No single database instance handles this gracefully.
Write Throughput Analysis
A well-tuned PostgreSQL instance on high-end hardware might sustain 30,000-50,000 write transactions per second. Sounds like a lot? Let's check:
This seems safe, but consider growth. At 100M DAU:
You're now at the edge. Factor in spikes (viral content, Black Friday) that can 10x traffic, and you've exceeded capacity.
| Resource | Practical Limit | Hard Limit | Consequence of Exceeding |
|---|---|---|---|
| Storage | 20-50TB | ~100TB | Cannot store more data; system halts |
| Write TPS | 10K-50K | ~100K | Writes queue indefinitely; timeouts cascade |
| Working Set (RAM) | 256GB-1TB | ~4TB | Every query hits disk; latency explodes |
| Connections | 1,000-5,000 | ~10,000 | New connections refused; cascading failures |
| Index Size | 10-50GB/table | RAM limit | Index scans become table scans; queries timeout |
The Latency Tax
As you approach these limits, performance degrades non-linearly. A database at 50% capacity might have p99 latency of 50ms. At 80%, it might be 200ms. At 95%, you're looking at multi-second latencies and timeout storms.
This is why experienced architects plan for sharding before hitting limits—the transition is much smoother when you're not fighting fires.
Sharding addresses the fundamental bottlenecks we've discussed by distributing data across multiple independent database instances. Each shard is a complete database that handles a subset of your data. Let's examine how sharding solves each limit:
Sharding is not a silver bullet. It introduces significant complexity and constraints that must be understood before adoption. Many teams have sharded prematurely and regretted it. Others have sharded too late and suffered outages. Understanding tradeoffs helps you make the right decision.
Fundamental Tradeoffs:
| Aspect | Single Database | Sharded Database |
|---|---|---|
| Cross-entity Queries | Join any tables freely | Cross-shard joins are expensive or impossible |
| Transactions | ACID across all data | ACID within shard; distributed transactions complex |
| Schema Changes | Single migration | Coordinate across all shards |
| Operational Complexity | One database to manage | N databases to manage |
| Application Complexity | Query any data | Determine correct shard for every query |
| Cost Model | One expensive server | Many cheaper servers (often lower total cost) |
| Failure Domain | Total outage if down | Partial outage (only affected shard) |
The Cross-Shard Query Problem
This is often the most painful tradeoff. In a single database, you can join users with orders with products with inventory in a single query. With sharding, if users and orders are on different shards (or different rows are on different shards), you need to:
This is why shard key selection (covered later in this module) is so critical. A good shard key minimizes cross-shard queries. A bad shard key turns every query into an expensive scatter-gather operation.
A well-designed sharding strategy routes 80% of queries to a single shard. The remaining 20% may require cross-shard coordination, but these are typically background jobs, analytics, or rare user flows. If your access patterns require constant cross-shard queries, you've chosen the wrong shard key—or sharding may not be the right solution for your use case.
Sharding is a major architectural decision. Implementing it too early adds unnecessary complexity. Implementing it too late leads to painful migrations under pressure. Here's a framework for making this decision:
I've seen teams shard at 1TB 'for future scale' and spend years managing complexity they didn't need. I've also seen teams refuse to shard at 50TB and suffer constant outages. The key is honest assessment of current limits and growth trajectory. Plan for sharding when you're 12-18 months from hitting limits—enough time to implement well, but not so early that you're solving imaginary problems.
Every major internet company uses sharding. Understanding how they approach it illuminates the patterns and challenges you'll face.
Facebook/Meta: User-Based Sharding
Facebook shards primarily by user_id. Each user's data—posts, photos, messages, friendships—lives together on a shard. This makes the common case (show me my news feed) hit a single shard. Cross-shard queries are needed for interactions between users on different shards, but these are handled asynchronously.
Stripe: Customer-Based Sharding
Stripe shards by merchant (customer_id). All of a merchant's transactions, subscriptions, and payment methods live on the same shard. This ensures transactional integrity for the operations that matter most—processing payments for a single merchant.
Instagram: User and Media Sharding
Instagram uses a combination approach. User data is sharded by user_id, but media (photos/videos) is sharded separately using a different strategy optimized for large binary storage.
Uber: Geographic Sharding
Uber shards by city/region. A ride in San Francisco only needs data from the SF shard. This also provides data locality—EU data in EU shards, addressing regulatory requirements.
| Company | Primary Shard Key | Rationale | Challenges Addressed |
|---|---|---|---|
| user_id | User operations access their own data | News feed, timeline, notifications | |
| Stripe | customer_id | Payment operations are per-merchant | Transaction integrity, reporting |
| Slack | workspace_id | All channel data by workspace | Message history, search within workspace |
| Discord | guild_id | Server-centric access patterns | Messages, roles, member data |
| Shopify | shop_id | Merchant operations isolated | Product catalog, orders, customers |
Notice how many successful sharding implementations use a tenant identifier (customer_id, workspace_id, shop_id). Multi-tenant SaaS naturally aligns with sharding—each tenant's data is isolated, very few operations cross tenant boundaries, and the shard key is obvious. If you're building multi-tenant SaaS, sharding by tenant_id is often the right answer.
Here's a practical framework for deciding when and how to approach sharding in your organization:
If you're at 30% of single-node capacity and growing 10%+ monthly, start planning sharding now. You won't implement immediately, but you'll design your schema, identify shard keys, and instrument for the transition. When you hit 70% capacity, you'll execute a well-planned migration instead of a panicked one.
We've covered the fundamental case for sharding. Let's consolidate the key insights:
What's Next:
Now that you understand why sharding is necessary, we'll explore the foundational concept of horizontal partitioning—how data is actually divided across shards. This builds the conceptual framework for understanding specific sharding strategies like range-based and hash-based sharding.
You now understand the fundamental reasons for database sharding, the limits of single-node databases, and the framework for deciding when to shard. Next, we'll dive into horizontal partitioning—the core mechanism that makes sharding possible.