Database Management SystemsSharding

Sharding: Horizontal Scaling for Distributed Databases

LevelAdvanced

Duration90 mins

TopicSharding

2 / 5

Shard Key Selection: The Art and Science of Partitioning Decisions

The Decision That Defines Your System's Destiny

If there is one decision in distributed database design that deserves obsessive attention, it is shard key selection. The shard key is not merely a technical configuration—it is the architectural backbone upon which every query, every write, every operation in your system will execute for years to come.

A well-chosen shard key creates a system that scales linearly, handles growth gracefully, and delivers consistent performance. A poorly chosen shard key creates hotspots, forces expensive cross-shard operations, and ultimately becomes a technical debt that haunts the organization through multiple painful resharding efforts.

The stakes are high because shard keys are extraordinarily difficult to change. Unlike an index (which can be added or dropped with relative ease), changing a shard key requires physically relocating data across the cluster—an operation that at scale involves downtime, risk, and potentially months of planning and execution.

What You Will Learn

By the end of this page, you will understand how to analyze workloads to identify shard key candidates, the critical properties that make shard keys effective, common shard key patterns for different application types, and the anti-patterns that lead to operational nightmares. You'll develop the analytical framework to make shard key decisions with confidence.

The Shard Key Selection Process: A Systematic Approach

Selecting a shard key is not guesswork—it is a systematic analysis that considers data characteristics, access patterns, growth trajectories, and business requirements. Here is the rigorous process used by experienced database architects:

Phase 1: Query Pattern Analysis

Before selecting a shard key, you must deeply understand how data is accessed. For each major query pattern, document:

What columns appear in WHERE clauses?
What columns appear in JOIN conditions?
What are the estimated query frequencies?
What are the latency requirements for each query type?

Phase 2: Data Characteristics Assessment

Examine the data itself:

What is the cardinality of potential shard key columns?
How is data distributed across values? (Uniform? Skewed?)
How does data grow over time? (By user? By time? Uniformly?)
Are there natural boundaries in the data (tenant, region, etc.)?

shard-key-analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Phase 1: Query Pattern Analysis
-- Examine your query logs or application code to identify patterns
 
-- Example query pattern documentation:
-- Pattern 1: User lookup (99% of traffic, <10ms latency required)
SELECT * FROM orders WHERE user_id = ?;
 
-- Pattern 2: Order lookup by ID (5% of traffic, <10ms required)
SELECT * FROM orders WHERE order_id = ?;
 
-- Pattern 3: Daily sales report (0.1% of traffic, 30s acceptable)
SELECT DATE(created_at), SUM(amount) 
FROM orders 
WHERE created_at > ? 
GROUP BY DATE(created_at);
 
-- Phase 2: Cardinality Analysis
-- Analyze potential shard key columns
 
-- Check cardinality of user_id
SELECT COUNT(DISTINCT user_id) AS cardinality,
       COUNT(*) AS total_rows,
       COUNT(DISTINCT user_id) * 1.0 / COUNT(*) AS selectivity
FROM orders;
-- Result: 1,000,000 distinct users, 50,000,000 orders
-- Excellent cardinality for sharding
 
-- Check distribution uniformity
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
ORDER BY order_count DESC
LIMIT 20;
-- Check: Are some users disproportionately active?
-- If top user has 1% of all orders: potential hotspot risk
 
-- Check data growth pattern
SELECT DATE_TRUNC('month', created_at) AS month,
       COUNT(DISTINCT user_id) AS new_users,
       COUNT(*) AS new_orders
FROM orders
GROUP BY month
ORDER BY month;
-- Analyze: Is growth uniform across users or concentrated?

Phase 3: Candidate Evaluation

For each potential shard key, evaluate against the three critical properties:

Candidate Key	Cardinality	Distribution	Query Affinity	Verdict
`user_id`	1M+ (excellent)	Somewhat skewed (VIP users)	High (99% of queries include it)	Strong candidate
`order_id`	Equal to rows (excellent)	Perfectly uniform	Medium (only direct lookups)	Possible, but poor for user-centric queries
`created_at`	Continuous (good)	Monotonic (terrible)	Low (only analytics)	Reject—hotspot guaranteed
`(user_id, order_id)`	Maximum (excellent)	Good	High	Good compound key

Phase 4: Future-Proofing

Consider not just current patterns but anticipated evolution:

Will query patterns change as the product evolves?
What happens when you 10x data volume?
Are there foreseeable features that would require different access patterns?
Can the chosen key accommodate resharding if needed?

Document Your Decision

The shard key decision should be formally documented with rationale. This documentation becomes invaluable during future capacity planning, debugging performance issues, or evaluating whether resharding is necessary. Include: candidates considered, analysis results, tradeoffs accepted, and assumptions made.

The Three Critical Shard Key Properties: Deep Dive

Every effective shard key exhibits three properties: high cardinality, uniform distribution, and query affinity. Understanding these properties deeply is essential for making sound decisions.

Property 1: High Cardinality

Cardinality refers to the number of distinct values in the shard key. The shard key's cardinality sets an upper bound on the number of shards you can ever have.

If country (cardinality ~200) is your shard key, you can never have more than 200 shards.
If user_id (cardinality in millions) is your shard key, you can scale to millions of shards.

Cardinality Requirements:

Minimum: At least 10x your target shard count (for future growth)
Ideal: Cardinality >> expected number of shards
Perfect: UUIDs, auto-increment IDs across large user bases, composite keys

Converting Mermaid diagram...

Property 2: Uniform Distribution

Even with high cardinality, data distribution matters. If 50% of your users are in one city, and you shard by user_id with hash partitioning, distribution is fine. But if you shard by city, that one city becomes a massive hotspot.

Distribution Analysis:

-- Analyze value frequency distribution
SELECT 
    NTILE(10) OVER (ORDER BY cnt) AS decile,
    MIN(cnt) AS min_count,
    MAX(cnt) AS max_count,
    AVG(cnt) AS avg_count
FROM (
    SELECT user_id, COUNT(*) AS cnt 
    FROM orders 
    GROUP BY user_id
) AS user_counts
GROUP BY decile
ORDER BY decile;

If the top decile has 100x more records than the bottom decile, you have significant skew that will translate to shard imbalance.

Property 3: Query Affinity

The shard key must align with how data is accessed. If 95% of queries filter by user_id, user_id should be the shard key or a prefix of the shard key.

Query Affinity Analysis:

Query Pattern	Frequency	Shard Key: user_id	Shard Key: order_id
`WHERE user_id = ?`	80%	✅ Single shard	❌ All shards (scatter)
`WHERE order_id = ?`	15%	❌ All shards (scatter)	✅ Single shard
`WHERE user_id = ? AND order_id = ?`	4%	✅ Single shard	✅ Single shard
Aggregate over all orders	1%	❌ All shards	❌ All shards

With user_id shard key: 84% of queries hit single shard. With order_id shard key: 20% of queries hit single shard. Choose user_id.

The 80% Rule

Aim for a shard key where at least 80% of queries can be satisfied by a single shard. If your access patterns are diverse and no single shard key achieves this, consider compound shard keys, secondary sharding strategies, or evaluating whether this data should be sharded at all.

Compound Shard Keys: When One Column Isn't Enough

When no single column satisfies all shard key requirements, compound shard keys combine multiple columns to achieve better overall properties. This is particularly common in multi-tenant systems, hierarchical data, and time-series applications.

Compound Key Structure:

A compound shard key is ordered, and the first column is the primary prefix. The partition function typically hashes or ranges on the combined value:

// Compound key: (tenant_id, user_id)
shard_id = hash(concat(tenant_id, user_id)) % num_shards;

// Or hierarchical:
shard_id = (tenant_id % 100) + (hash(user_id) % 10);

Common Compound Key Patterns:

Compound Shard Key Patterns
Pattern	Compound Key	Use Case	Benefit
Multi-tenant	`(tenant_id, entity_id)`	SaaS applications	Tenant isolation + per-tenant scalability
Hierarchical	`(parent_id, child_id)`	Organizational data	Parent-child colocation for efficient joins
Time-bucketed	`(time_bucket, entity_id)`	Time-series with entity queries	Balances time-based queries with entity lookups
Geographic entity	`(region, entity_id)`	Geo-distributed applications	Regional data locality + entity queries
Prefixed random	`(category, uuid)`	Categorical data with hotspots	Spreads hot categories across shards

compound-key-example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Multi-tenant SaaS with compound shard key
# Shard key: (tenant_id, user_id)
 
class MultiTenantShardRouter:
    def __init__(self, num_shards: int):
        self.num_shards = num_shards
    
    def get_shard_id(self, tenant_id: str, user_id: str) -> int:
        """
        Compound key sharding for multi-tenant application.
        
        Properties:
        - All data for a tenant is NOT necessarily on one shard
          (unless tenant has few users)
        - User data within a tenant IS on one shard (colocation)
        - Distributes large tenants across multiple shards
        """
        compound_key = f"{tenant_id}:{user_id}"
        return self.hash_to_shard(compound_key)
    
    def hash_to_shard(self, key: str) -> int:
        # Use consistent hash for better resharding properties
        import hashlib
        hash_value = int(hashlib.md5(key.encode()).hexdigest(), 16)
        return hash_value % self.num_shards
    
    def route_query(self, tenant_id: str, user_id: str = None):
        """
        Query routing based on available key components.
        """
        if user_id is not None:
            # Full compound key available - single shard query
            shard = self.get_shard_id(tenant_id, user_id)
            return [shard]
        else:
            # Only tenant_id - must scatter to all shards
            # (unless using tenant-first sharding strategy)
            return list(range(self.num_shards))
    
# Query patterns and their routing:
router = MultiTenantShardRouter(num_shards=64)
 
# Pattern 1: User-specific query -> single shard
# "Get all orders for user X in tenant Y"
shards = router.route_query(tenant_id="acme", user_id="user123")
# Result: [42] - single shard
 
# Pattern 2: Tenant-wide query -> all shards (scatter-gather)
# "Get all users in tenant Y"
shards = router.route_query(tenant_id="acme", user_id=None)
# Result: [0, 1, 2, ..., 63] - all shards

Compound Key Prefix Queries:

With compound keys, queries that include the prefix can be routed efficiently. Queries that skip the prefix require scatter-gather:

Query Filter	Compound Key `(A, B)`	Routing
`WHERE A = ? AND B = ?`	Full key match	Single shard
`WHERE A = ?`	Prefix match	Single shard (or subset)
`WHERE B = ?`	Skips prefix	All shards
No filter	Full scan	All shards

Design implication: Order compound key columns by query frequency. The most-frequently-filtered column should be first.

Tenant-per-Shard vs. Tenant-Distributed

For multi-tenant systems, a key decision is whether small tenants should be isolated per-shard (simpler isolation, potential waste) or distributed across shards (better utilization, cross-shard tenant queries). Large tenants typically MUST be distributed to avoid single-shard bottlenecks. This often requires hybrid strategies.

Shard Key Anti-Patterns: Mistakes That Doom Systems

Learning from common mistakes is as valuable as understanding best practices. These anti-patterns have caused production outages, forced emergency resharding projects, and cost companies millions in engineering time and infrastructure.

Anti-Pattern 1: Monotonically Increasing Keys

Using auto-increment IDs or timestamps as shard keys with range-based sharding creates a fatal hotspot: all new data goes to the last shard.

Time 1: Shard 1 (IDs 1-1M)      ← All historical data
        Shard 2 (IDs 1M-2M)     ← Recent data
        Shard 3 (IDs 2M-3M)     ← All new writes! HOTSPOT!

This is the single most common sharding mistake. The 'newest' shard becomes overwhelmed while older shards sit idle.

Anti-Pattern: Timestamp Shard Key

•Range sharding by created_at
•Shard 1: Jan 2023, Shard 2: Feb 2023, ...
•All writes go to 'current month' shard
•Older shards become read-only (waste)
•Current shard becomes bottleneck

Solution: Hash-Based Distribution

•Hash sharding by immutable entity ID
•shard = hash(order_id) % num_shards
•Writes distributed across all shards
•All shards active for reads and writes
•Uniform load distribution over time

Anti-Pattern 2: Low Cardinality Keys

Choosing a shard key with limited distinct values caps your scalability:

status (5 values): Maximum 5 shards forever
country (200 values): Limited to ~200 shards, with severe skew
day_of_week (7 values): Cannot scale beyond 7 shards

Anti-Pattern 3: Mutable Shard Keys

If the shard key value can change, records must be physically moved between shards when updates occur:

-- If sharded by 'status':
UPDATE orders SET status = 'shipped' WHERE id = 123;
-- This requires: DELETE from shard 1, INSERT into shard 2!

This creates massive overhead, requires distributed transactions for atomicity, and is a recipe for data inconsistency.

Anti-Pattern 4: Query Pattern Mismatch

Choosing a shard key that doesn't align with access patterns forces expensive scatter-gather:

-- Sharded by: user_id
-- Common query: SELECT * FROM products WHERE category = 'electronics';

-- Result: This query must hit ALL shards because category
-- has no relationship to user_id shard key!

Anti-Pattern Summary: What NOT to Use

Never use as shard keys: auto-increment IDs with range sharding, timestamps with range sharding, low-cardinality columns (status, country, category), mutable columns, columns not present in most query filters. These choices create hotspots, limit scalability, or force expensive cross-shard operations.

Workload Analysis Techniques: Data-Driven Decisions

Effective shard key selection requires rigorous workload analysis. This section provides techniques for gathering and analyzing the data needed to make informed decisions.

Technique 1: Query Log Analysis

Most databases provide query logging. Analyze logs to understand access patterns:

query-analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- PostgreSQL: Enable and analyze query stats
-- Step 1: Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
 
-- Step 2: Analyze query patterns
SELECT 
    query,
    calls,
    total_exec_time / calls AS avg_time_ms,
    rows / calls AS avg_rows
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY calls DESC
LIMIT 20;
 
-- Step 3: Identify which columns appear in WHERE clauses
-- Manual analysis or use query parsing tools
 
-- Look for patterns like:
-- 42% of queries: WHERE user_id = ?
-- 31% of queries: WHERE user_id = ? AND order_id = ?
-- 15% of queries: WHERE order_id = ?
-- 8%  of queries: WHERE created_at BETWEEN ? AND ?
-- 4%  of queries: Full table operations
 
-- Conclusion: user_id is present in 73% of queries
-- Strong shard key candidate

Technique 2: Data Distribution Profiling

Analyze how data is distributed across potential shard key values:

distribution-analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Distribution Analysis for Shard Key Candidates
 
-- 1. Cardinality check
SELECT 
    'user_id' AS candidate,
    COUNT(DISTINCT user_id) AS cardinality,
    COUNT(*) AS total_rows
FROM orders
 
UNION ALL
 
SELECT 
    'merchant_id',
    COUNT(DISTINCT merchant_id),
    COUNT(*)
FROM orders;
 
-- 2. Hotspot detection: find the most concentrated values
WITH value_counts AS (
    SELECT user_id, COUNT(*) AS cnt
    FROM orders
    GROUP BY user_id
)
SELECT 
    'user_id' AS candidate,
    MAX(cnt) AS max_per_value,
    AVG(cnt) AS avg_per_value,
    MAX(cnt) / AVG(cnt) AS hotspot_ratio,
    STDDEV(cnt) / AVG(cnt) AS coefficient_of_variation
FROM value_counts;
 
-- Interpretation:
-- hotspot_ratio < 10: Good distribution
-- hotspot_ratio 10-50: Moderate skew, may be acceptable
-- hotspot_ratio > 50: Severe skew, consider alternatives
 
-- 3. Percentile analysis for skew
WITH value_counts AS (
    SELECT user_id, COUNT(*) AS cnt
    FROM orders
    GROUP BY user_id
)
SELECT 
    percentile_cont(0.50) WITHIN GROUP (ORDER BY cnt) AS p50,
    percentile_cont(0.90) WITHIN GROUP (ORDER BY cnt) AS p90,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY cnt) AS p99,
    MAX(cnt) AS max_count
FROM value_counts;
 
-- If p99 is 10x p50, top 1% of users have 10x average data

Technique 3: Growth Pattern Analysis

Understand how data grows over time to predict future distribution:

-- Monthly growth by shard key values
SELECT 
    DATE_TRUNC('month', created_at) AS month,
    COUNT(DISTINCT user_id) AS new_users_active,
    COUNT(*) AS new_orders
FROM orders
WHERE created_at > CURRENT_DATE - INTERVAL '1 year'
GROUP BY month
ORDER BY month;

-- Are new users creating more data than existing users?
-- Is growth uniform or concentrated in specific segments?

Technique 4: Simulated Sharding Analysis

Before committing to a shard key, simulate the distribution:

shard-simulation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Simulate shard distribution with proposed key
-- Using hash(user_id) % 16 for 16 shards
 
WITH shard_assignment AS (
    SELECT 
        id,
        user_id,
        -- Simulate hash function (PostgreSQL)
        ABS(hashtext(user_id::text)) % 16 AS shard_id
    FROM orders
)
SELECT 
    shard_id,
    COUNT(*) AS record_count,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS percentage,
    COUNT(*) * 1.0 / (SUM(COUNT(*)) OVER () / 16) AS relative_to_ideal
FROM shard_assignment
GROUP BY shard_id
ORDER BY shard_id;
 
-- Ideal: Each shard has ~6.25% (100%/16) of data
-- Acceptable: All shards between 4% and 10%
-- Problem: Any shard > 15% or < 2%
 
-- Also check shard sizes in bytes if row sizes vary significantly

Automate Analysis with Scripts

Create reusable scripts that analyze any table for sharding suitability. Input: table name and candidate columns. Output: cardinality, distribution metrics, and simulated shard balance. This enables rapid evaluation of multiple candidates and supports ongoing monitoring.

Shard Key Patterns by Application Type

Different application domains have characteristic shard key patterns that have proven effective across the industry. While each system is unique, these patterns provide strong starting points for shard key design.

Pattern 1: Consumer Applications (B2C)

Applications serving individual users (social networks, e-commerce, streaming):

Consumer Application Shard Key Patterns
Application Type	Primary Shard Key	Colocated Data	Cross-Shard Operations
Social Network	`user_id`	Posts, comments, likes, friendships	Feed aggregation, global search
E-commerce	`customer_id`	Orders, cart, preferences	Product catalog, inventory
Streaming Service	`user_id`	Watch history, preferences, queue	Content catalog, recommendations aggregation
Messaging App	`user_id`	Messages, contacts, settings	Group chats (may need special handling)

Pattern 2: Enterprise/SaaS Applications (B2B)

Multi-tenant applications serving business customers:

Enterprise Application Shard Key Patterns
Application Type	Primary Shard Key	Rationale	Considerations
CRM (small tenants)	`tenant_id`	Tenant isolation, simple routing	Large tenants may need distribution
CRM (mixed sizes)	`(tenant_id, entity_id)`	Distributes large tenants	Tenant-wide queries require scatter
Project Management	`workspace_id`	Workspace is unit of collaboration	Cross-workspace features need care
Analytics Platform	`(tenant_id, time_bucket)`	Per-tenant time-series queries	Balances time range and entity queries

Pattern 3: Geographic and IoT Applications

Applications with inherent geographic or device locality:

Geographic/IoT Shard Key Patterns
Application Type	Primary Shard Key	Rationale	Trade-offs
Ride-sharing	`(region, trip_id)`	Trips are region-local	Cross-region trips rare, handled specially
IoT Fleet	`device_id`	Device data always queried together	Fleet-wide aggregation expensive
Mapping/GIS	`geo_hash_prefix`	Spatial locality	Varies by zoom level and use case
Delivery/Logistics	`(zone, order_id)`	Delivery zones are operational units	Zone-crossing orders need handling

Pattern 4: Financial Applications

Applications with strict consistency and audit requirements:

Financial Application Shard Key Patterns
Application Type	Primary Shard Key	Rationale	Special Considerations
Banking	`account_id`	Account is unit of balance integrity	Transfers between accounts are cross-shard
Trading Platform	`(market, symbol, time_bucket)`	Market data is symbol-focused	Cross-symbol analytics require aggregation
Payment Processing	`(merchant_id, transaction_id)`	Merchant settlement isolation	Consumer queries may cross merchants

Patterns Are Starting Points

These patterns reflect common practice but are not prescriptive. Your specific access patterns, data distribution, and business requirements may warrant different choices. Always validate patterns against your actual workload analysis.

Handling Special Cases: VIP Users, Global Data, and Hotspots

Real-world systems rarely have perfectly uniform access patterns. Certain users, entities, or data categories require special handling to prevent them from creating bottlenecks.

Case 1: VIP / Power Users

Some users generate orders of magnitude more data than typical users. A social media influencer with 10M followers will have vastly more activity than an average user.

Strategies:

Sub-sharding: Large users are internally distributed across multiple shards

For user_id with > 10M records:
shard = hash(user_id, record_id) % dedicated_user_shards

Dedicated shards: Extreme cases get isolated infrastructure
```
If user_id in VIP_LIST: route to vip_shard_pool
```
Rate limiting at shard level: Accept that some shards are hotter, but bound the impact

Case 2: Global / Shared Data

Some data doesn't belong to any shard key value—it's global:

Reference data (countries, currencies, product catalog)
Configuration and settings
Aggregate statistics

Strategies:

Read replicas: Replicate global data to all nodes as read-only
```
global_data_cache = replicated to all shards
```

Dedicated global database: Separate database for shared data

SELECT * FROM global_db.countries;
JOIN with shard-local data at application layer

Full replication to each shard: Each shard has a copy (for small datasets)

Case 3: Temporal Hotspots

Some applications have time-based access patterns that create temporary hotspots:

New product launches concentrate activity
Breaking news spikes read traffic
Batch jobs hit specific time ranges

Strategies:

Time-bucketed compound keys: Spread temporal access across shards

shard_key = (minute_bucket, entity_id)
-- Distributes real-time ingestion across all shards

Hot/Cold tiering: Recent data on fast storage, historical on cheaper storage
Caching layers: Absorb read spikes with caching

vip-handling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# VIP User Handling with Sub-Sharding
 
class SmartShardRouter:
    def __init__(self, num_shards: int, vip_threshold: int):
        self.num_shards = num_shards
        self.vip_threshold = vip_threshold
        self.vip_users = set()  # Populated from metrics
        self.vip_sub_shards = 4  # VIPs spread across 4 sub-shards
    
    def update_vip_list(self, user_metrics: dict):
        """
        Periodically update VIP list based on activity metrics.
        VIP = users with > vip_threshold records.
        """
        self.vip_users = {
            user_id for user_id, count in user_metrics.items()
            if count > self.vip_threshold
        }
    
    def get_shard(self, user_id: str, record_id: str = None) -> int:
        """
        Route with VIP sub-sharding.
        
        Regular users: hash(user_id) % num_shards
        VIP users: base_shard + hash(record_id) % vip_sub_shards
        """
        base_shard = self.hash_to_shard(user_id)
        
        if user_id in self.vip_users and record_id:
            # VIP: distribute records across sub-shard range
            sub_shard = hash(record_id) % self.vip_sub_shards
            return (base_shard * self.vip_sub_shards + sub_shard) % self.num_shards
        else:
            return base_shard
    
    def get_shards_for_user(self, user_id: str) -> list:
        """
        For queries that need ALL data for a user.
        """
        if user_id in self.vip_users:
            base = self.hash_to_shard(user_id)
            return [
                (base * self.vip_sub_shards + i) % self.num_shards
                for i in range(self.vip_sub_shards)
            ]
        else:
            return [self.hash_to_shard(user_id)]

Complexity vs. Necessity

Special case handling adds significant complexity. Only implement these patterns when data or access analysis proves they're necessary. Premature optimization for hypothetical VIP users can create maintenance burden without corresponding benefit.

Summary: Shard Key Selection Mastery

Shard key selection is the most consequential decision in a sharding implementation. Let's consolidate the frameworks and principles covered:

Key Takeaways

•Follow a systematic process — Query pattern analysis → data characteristics → candidate evaluation → future-proofing. Never choose a shard key without rigorous analysis.
•Demand all three properties — High cardinality, uniform distribution, and query affinity are non-negotiable. Compromise on one, and the system will suffer.
•Use compound keys when needed — Multi-tenant systems, hierarchical data, and complex access patterns often require compound shard keys. Order columns by query frequency.
•Avoid known anti-patterns — Monotonic keys, low cardinality, mutable keys, and query mismatches are guaranteed paths to operational pain.
•Analyze with data, not intuition — Query logs, distribution metrics, and simulated sharding provide objective evidence for decisions. Document your analysis.
•Plan for special cases explicitly — VIP users, global data, and temporal hotspots require intentional handling strategies, not afterthoughts.

What's Next:

With shard key selection principles established, we'll explore how data is actually distributed across shards. The next page covers range sharding—a partitioning strategy that assigns data to shards based on contiguous ranges of the shard key, enabling range queries but requiring careful management to avoid hotspots.

Page Complete

You now possess a comprehensive framework for shard key selection—the systematic process, critical properties, common patterns, dangerous anti-patterns, and analytical techniques. Apply this framework rigorously before committing to any sharding implementation.

2 / 5

Loading learning content...

Database Management SystemsSharding

Sharding: Horizontal Scaling for Distributed Databases

LevelAdvanced

Duration90 mins

TopicSharding

2 / 5

Shard Key Selection: The Art and Science of Partitioning Decisions

The Decision That Defines Your System's Destiny

What You Will Learn

The Shard Key Selection Process: A Systematic Approach

Phase 1: Query Pattern Analysis

Before selecting a shard key, you must deeply understand how data is accessed. For each major query pattern, document:

What columns appear in WHERE clauses?
What columns appear in JOIN conditions?
What are the estimated query frequencies?
What are the latency requirements for each query type?

Phase 2: Data Characteristics Assessment

Examine the data itself:

What is the cardinality of potential shard key columns?
How is data distributed across values? (Uniform? Skewed?)
How does data grow over time? (By user? By time? Uniformly?)
Are there natural boundaries in the data (tenant, region, etc.)?

shard-key-analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Phase 1: Query Pattern Analysis
-- Examine your query logs or application code to identify patterns
 
-- Example query pattern documentation:
-- Pattern 1: User lookup (99% of traffic, <10ms latency required)
SELECT * FROM orders WHERE user_id = ?;
 
-- Pattern 2: Order lookup by ID (5% of traffic, <10ms required)
SELECT * FROM orders WHERE order_id = ?;
 
-- Pattern 3: Daily sales report (0.1% of traffic, 30s acceptable)
SELECT DATE(created_at), SUM(amount) 
FROM orders 
WHERE created_at > ? 
GROUP BY DATE(created_at);
 
-- Phase 2: Cardinality Analysis
-- Analyze potential shard key columns
 
-- Check cardinality of user_id
SELECT COUNT(DISTINCT user_id) AS cardinality,
       COUNT(*) AS total_rows,
       COUNT(DISTINCT user_id) * 1.0 / COUNT(*) AS selectivity
FROM orders;
-- Result: 1,000,000 distinct users, 50,000,000 orders
-- Excellent cardinality for sharding
 
-- Check distribution uniformity
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
ORDER BY order_count DESC
LIMIT 20;
-- Check: Are some users disproportionately active?
-- If top user has 1% of all orders: potential hotspot risk
 
-- Check data growth pattern
SELECT DATE_TRUNC('month', created_at) AS month,
       COUNT(DISTINCT user_id) AS new_users,
       COUNT(*) AS new_orders
FROM orders
GROUP BY month
ORDER BY month;
-- Analyze: Is growth uniform across users or concentrated?

Phase 3: Candidate Evaluation

For each potential shard key, evaluate against the three critical properties:

Candidate Key	Cardinality	Distribution	Query Affinity	Verdict
`user_id`	1M+ (excellent)	Somewhat skewed (VIP users)	High (99% of queries include it)	Strong candidate
`order_id`	Equal to rows (excellent)	Perfectly uniform	Medium (only direct lookups)	Possible, but poor for user-centric queries
`created_at`	Continuous (good)	Monotonic (terrible)	Low (only analytics)	Reject—hotspot guaranteed
`(user_id, order_id)`	Maximum (excellent)	Good	High	Good compound key

Phase 4: Future-Proofing

Consider not just current patterns but anticipated evolution:

Will query patterns change as the product evolves?
What happens when you 10x data volume?
Are there foreseeable features that would require different access patterns?
Can the chosen key accommodate resharding if needed?

Document Your Decision

The Three Critical Shard Key Properties: Deep Dive

Property 1: High Cardinality

Cardinality refers to the number of distinct values in the shard key. The shard key's cardinality sets an upper bound on the number of shards you can ever have.

If country (cardinality ~200) is your shard key, you can never have more than 200 shards.
If user_id (cardinality in millions) is your shard key, you can scale to millions of shards.

Cardinality Requirements:

Minimum: At least 10x your target shard count (for future growth)
Ideal: Cardinality >> expected number of shards
Perfect: UUIDs, auto-increment IDs across large user bases, composite keys

Converting Mermaid diagram...

Property 2: Uniform Distribution

Distribution Analysis:

-- Analyze value frequency distribution
SELECT 
    NTILE(10) OVER (ORDER BY cnt) AS decile,
    MIN(cnt) AS min_count,
    MAX(cnt) AS max_count,
    AVG(cnt) AS avg_count
FROM (
    SELECT user_id, COUNT(*) AS cnt 
    FROM orders 
    GROUP BY user_id
) AS user_counts
GROUP BY decile
ORDER BY decile;

If the top decile has 100x more records than the bottom decile, you have significant skew that will translate to shard imbalance.

Property 3: Query Affinity

The shard key must align with how data is accessed. If 95% of queries filter by user_id, user_id should be the shard key or a prefix of the shard key.

Query Affinity Analysis:

Query Pattern	Frequency	Shard Key: user_id	Shard Key: order_id
`WHERE user_id = ?`	80%	✅ Single shard	❌ All shards (scatter)
`WHERE order_id = ?`	15%	❌ All shards (scatter)	✅ Single shard
`WHERE user_id = ? AND order_id = ?`	4%	✅ Single shard	✅ Single shard
Aggregate over all orders	1%	❌ All shards	❌ All shards

With user_id shard key: 84% of queries hit single shard. With order_id shard key: 20% of queries hit single shard. Choose user_id.

The 80% Rule

Compound Shard Keys: When One Column Isn't Enough

Compound Key Structure:

A compound shard key is ordered, and the first column is the primary prefix. The partition function typically hashes or ranges on the combined value:

// Compound key: (tenant_id, user_id)
shard_id = hash(concat(tenant_id, user_id)) % num_shards;

// Or hierarchical:
shard_id = (tenant_id % 100) + (hash(user_id) % 10);

Common Compound Key Patterns:

Compound Shard Key Patterns
Pattern	Compound Key	Use Case	Benefit
Multi-tenant	`(tenant_id, entity_id)`	SaaS applications	Tenant isolation + per-tenant scalability
Hierarchical	`(parent_id, child_id)`	Organizational data	Parent-child colocation for efficient joins
Time-bucketed	`(time_bucket, entity_id)`	Time-series with entity queries	Balances time-based queries with entity lookups
Geographic entity	`(region, entity_id)`	Geo-distributed applications	Regional data locality + entity queries
Prefixed random	`(category, uuid)`	Categorical data with hotspots	Spreads hot categories across shards

compound-key-example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Multi-tenant SaaS with compound shard key
# Shard key: (tenant_id, user_id)
 
class MultiTenantShardRouter:
    def __init__(self, num_shards: int):
        self.num_shards = num_shards
    
    def get_shard_id(self, tenant_id: str, user_id: str) -> int:
        """
        Compound key sharding for multi-tenant application.
        
        Properties:
        - All data for a tenant is NOT necessarily on one shard
          (unless tenant has few users)
        - User data within a tenant IS on one shard (colocation)
        - Distributes large tenants across multiple shards
        """
        compound_key = f"{tenant_id}:{user_id}"
        return self.hash_to_shard(compound_key)
    
    def hash_to_shard(self, key: str) -> int:
        # Use consistent hash for better resharding properties
        import hashlib
        hash_value = int(hashlib.md5(key.encode()).hexdigest(), 16)
        return hash_value % self.num_shards
    
    def route_query(self, tenant_id: str, user_id: str = None):
        """
        Query routing based on available key components.
        """
        if user_id is not None:
            # Full compound key available - single shard query
            shard = self.get_shard_id(tenant_id, user_id)
            return [shard]
        else:
            # Only tenant_id - must scatter to all shards
            # (unless using tenant-first sharding strategy)
            return list(range(self.num_shards))
    
# Query patterns and their routing:
router = MultiTenantShardRouter(num_shards=64)
 
# Pattern 1: User-specific query -> single shard
# "Get all orders for user X in tenant Y"
shards = router.route_query(tenant_id="acme", user_id="user123")
# Result: [42] - single shard
 
# Pattern 2: Tenant-wide query -> all shards (scatter-gather)
# "Get all users in tenant Y"
shards = router.route_query(tenant_id="acme", user_id=None)
# Result: [0, 1, 2, ..., 63] - all shards

Compound Key Prefix Queries:

With compound keys, queries that include the prefix can be routed efficiently. Queries that skip the prefix require scatter-gather:

Query Filter	Compound Key `(A, B)`	Routing
`WHERE A = ? AND B = ?`	Full key match	Single shard
`WHERE A = ?`	Prefix match	Single shard (or subset)
`WHERE B = ?`	Skips prefix	All shards
No filter	Full scan	All shards

Design implication: Order compound key columns by query frequency. The most-frequently-filtered column should be first.

Tenant-per-Shard vs. Tenant-Distributed

Shard Key Anti-Patterns: Mistakes That Doom Systems

Anti-Pattern 1: Monotonically Increasing Keys

Using auto-increment IDs or timestamps as shard keys with range-based sharding creates a fatal hotspot: all new data goes to the last shard.

Time 1: Shard 1 (IDs 1-1M)      ← All historical data
        Shard 2 (IDs 1M-2M)     ← Recent data
        Shard 3 (IDs 2M-3M)     ← All new writes! HOTSPOT!

This is the single most common sharding mistake. The 'newest' shard becomes overwhelmed while older shards sit idle.

Anti-Pattern: Timestamp Shard Key

•Range sharding by created_at
•Shard 1: Jan 2023, Shard 2: Feb 2023, ...
•All writes go to 'current month' shard
•Older shards become read-only (waste)
•Current shard becomes bottleneck

Solution: Hash-Based Distribution

•Hash sharding by immutable entity ID
•shard = hash(order_id) % num_shards
•Writes distributed across all shards
•All shards active for reads and writes
•Uniform load distribution over time

Anti-Pattern 2: Low Cardinality Keys

Choosing a shard key with limited distinct values caps your scalability:

status (5 values): Maximum 5 shards forever
country (200 values): Limited to ~200 shards, with severe skew
day_of_week (7 values): Cannot scale beyond 7 shards

Anti-Pattern 3: Mutable Shard Keys

If the shard key value can change, records must be physically moved between shards when updates occur:

-- If sharded by 'status':
UPDATE orders SET status = 'shipped' WHERE id = 123;
-- This requires: DELETE from shard 1, INSERT into shard 2!

This creates massive overhead, requires distributed transactions for atomicity, and is a recipe for data inconsistency.

Anti-Pattern 4: Query Pattern Mismatch

Choosing a shard key that doesn't align with access patterns forces expensive scatter-gather:

-- Sharded by: user_id
-- Common query: SELECT * FROM products WHERE category = 'electronics';

-- Result: This query must hit ALL shards because category
-- has no relationship to user_id shard key!

Anti-Pattern Summary: What NOT to Use

Workload Analysis Techniques: Data-Driven Decisions

Effective shard key selection requires rigorous workload analysis. This section provides techniques for gathering and analyzing the data needed to make informed decisions.

Technique 1: Query Log Analysis

Most databases provide query logging. Analyze logs to understand access patterns:

query-analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- PostgreSQL: Enable and analyze query stats
-- Step 1: Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
 
-- Step 2: Analyze query patterns
SELECT 
    query,
    calls,
    total_exec_time / calls AS avg_time_ms,
    rows / calls AS avg_rows
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY calls DESC
LIMIT 20;
 
-- Step 3: Identify which columns appear in WHERE clauses
-- Manual analysis or use query parsing tools
 
-- Look for patterns like:
-- 42% of queries: WHERE user_id = ?
-- 31% of queries: WHERE user_id = ? AND order_id = ?
-- 15% of queries: WHERE order_id = ?
-- 8%  of queries: WHERE created_at BETWEEN ? AND ?
-- 4%  of queries: Full table operations
 
-- Conclusion: user_id is present in 73% of queries
-- Strong shard key candidate

Technique 2: Data Distribution Profiling

Analyze how data is distributed across potential shard key values:

distribution-analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Distribution Analysis for Shard Key Candidates
 
-- 1. Cardinality check
SELECT 
    'user_id' AS candidate,
    COUNT(DISTINCT user_id) AS cardinality,
    COUNT(*) AS total_rows
FROM orders
 
UNION ALL
 
SELECT 
    'merchant_id',
    COUNT(DISTINCT merchant_id),
    COUNT(*)
FROM orders;
 
-- 2. Hotspot detection: find the most concentrated values
WITH value_counts AS (
    SELECT user_id, COUNT(*) AS cnt
    FROM orders
    GROUP BY user_id
)
SELECT 
    'user_id' AS candidate,
    MAX(cnt) AS max_per_value,
    AVG(cnt) AS avg_per_value,
    MAX(cnt) / AVG(cnt) AS hotspot_ratio,
    STDDEV(cnt) / AVG(cnt) AS coefficient_of_variation
FROM value_counts;
 
-- Interpretation:
-- hotspot_ratio < 10: Good distribution
-- hotspot_ratio 10-50: Moderate skew, may be acceptable
-- hotspot_ratio > 50: Severe skew, consider alternatives
 
-- 3. Percentile analysis for skew
WITH value_counts AS (
    SELECT user_id, COUNT(*) AS cnt
    FROM orders
    GROUP BY user_id
)
SELECT 
    percentile_cont(0.50) WITHIN GROUP (ORDER BY cnt) AS p50,
    percentile_cont(0.90) WITHIN GROUP (ORDER BY cnt) AS p90,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY cnt) AS p99,
    MAX(cnt) AS max_count
FROM value_counts;
 
-- If p99 is 10x p50, top 1% of users have 10x average data

Technique 3: Growth Pattern Analysis

Understand how data grows over time to predict future distribution:

-- Monthly growth by shard key values
SELECT 
    DATE_TRUNC('month', created_at) AS month,
    COUNT(DISTINCT user_id) AS new_users_active,
    COUNT(*) AS new_orders
FROM orders
WHERE created_at > CURRENT_DATE - INTERVAL '1 year'
GROUP BY month
ORDER BY month;

-- Are new users creating more data than existing users?
-- Is growth uniform or concentrated in specific segments?

Technique 4: Simulated Sharding Analysis

Before committing to a shard key, simulate the distribution:

shard-simulation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Simulate shard distribution with proposed key
-- Using hash(user_id) % 16 for 16 shards
 
WITH shard_assignment AS (
    SELECT 
        id,
        user_id,
        -- Simulate hash function (PostgreSQL)
        ABS(hashtext(user_id::text)) % 16 AS shard_id
    FROM orders
)
SELECT 
    shard_id,
    COUNT(*) AS record_count,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS percentage,
    COUNT(*) * 1.0 / (SUM(COUNT(*)) OVER () / 16) AS relative_to_ideal
FROM shard_assignment
GROUP BY shard_id
ORDER BY shard_id;
 
-- Ideal: Each shard has ~6.25% (100%/16) of data
-- Acceptable: All shards between 4% and 10%
-- Problem: Any shard > 15% or < 2%
 
-- Also check shard sizes in bytes if row sizes vary significantly

Automate Analysis with Scripts

Shard Key Patterns by Application Type

Pattern 1: Consumer Applications (B2C)

Applications serving individual users (social networks, e-commerce, streaming):

Consumer Application Shard Key Patterns
Application Type	Primary Shard Key	Colocated Data	Cross-Shard Operations
Social Network	`user_id`	Posts, comments, likes, friendships	Feed aggregation, global search
E-commerce	`customer_id`	Orders, cart, preferences	Product catalog, inventory
Streaming Service	`user_id`	Watch history, preferences, queue	Content catalog, recommendations aggregation
Messaging App	`user_id`	Messages, contacts, settings	Group chats (may need special handling)

Pattern 2: Enterprise/SaaS Applications (B2B)

Multi-tenant applications serving business customers:

Enterprise Application Shard Key Patterns
Application Type	Primary Shard Key	Rationale	Considerations
CRM (small tenants)	`tenant_id`	Tenant isolation, simple routing	Large tenants may need distribution
CRM (mixed sizes)	`(tenant_id, entity_id)`	Distributes large tenants	Tenant-wide queries require scatter
Project Management	`workspace_id`	Workspace is unit of collaboration	Cross-workspace features need care
Analytics Platform	`(tenant_id, time_bucket)`	Per-tenant time-series queries	Balances time range and entity queries

Pattern 3: Geographic and IoT Applications

Applications with inherent geographic or device locality:

Geographic/IoT Shard Key Patterns
Application Type	Primary Shard Key	Rationale	Trade-offs
Ride-sharing	`(region, trip_id)`	Trips are region-local	Cross-region trips rare, handled specially
IoT Fleet	`device_id`	Device data always queried together	Fleet-wide aggregation expensive
Mapping/GIS	`geo_hash_prefix`	Spatial locality	Varies by zoom level and use case
Delivery/Logistics	`(zone, order_id)`	Delivery zones are operational units	Zone-crossing orders need handling

Pattern 4: Financial Applications

Applications with strict consistency and audit requirements:

Financial Application Shard Key Patterns
Application Type	Primary Shard Key	Rationale	Special Considerations
Banking	`account_id`	Account is unit of balance integrity	Transfers between accounts are cross-shard
Trading Platform	`(market, symbol, time_bucket)`	Market data is symbol-focused	Cross-symbol analytics require aggregation
Payment Processing	`(merchant_id, transaction_id)`	Merchant settlement isolation	Consumer queries may cross merchants

Patterns Are Starting Points

Handling Special Cases: VIP Users, Global Data, and Hotspots

Real-world systems rarely have perfectly uniform access patterns. Certain users, entities, or data categories require special handling to prevent them from creating bottlenecks.

Case 1: VIP / Power Users

Some users generate orders of magnitude more data than typical users. A social media influencer with 10M followers will have vastly more activity than an average user.

Strategies:

Sub-sharding: Large users are internally distributed across multiple shards

For user_id with > 10M records:
shard = hash(user_id, record_id) % dedicated_user_shards

Dedicated shards: Extreme cases get isolated infrastructure
```
If user_id in VIP_LIST: route to vip_shard_pool
```
Rate limiting at shard level: Accept that some shards are hotter, but bound the impact

Case 2: Global / Shared Data

Some data doesn't belong to any shard key value—it's global:

Reference data (countries, currencies, product catalog)
Configuration and settings
Aggregate statistics

Strategies:

Read replicas: Replicate global data to all nodes as read-only
```
global_data_cache = replicated to all shards
```

Dedicated global database: Separate database for shared data

SELECT * FROM global_db.countries;
JOIN with shard-local data at application layer

Full replication to each shard: Each shard has a copy (for small datasets)

Case 3: Temporal Hotspots

Some applications have time-based access patterns that create temporary hotspots:

New product launches concentrate activity
Breaking news spikes read traffic
Batch jobs hit specific time ranges

Strategies:

Time-bucketed compound keys: Spread temporal access across shards

shard_key = (minute_bucket, entity_id)
-- Distributes real-time ingestion across all shards

Hot/Cold tiering: Recent data on fast storage, historical on cheaper storage
Caching layers: Absorb read spikes with caching

vip-handling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# VIP User Handling with Sub-Sharding
 
class SmartShardRouter:
    def __init__(self, num_shards: int, vip_threshold: int):
        self.num_shards = num_shards
        self.vip_threshold = vip_threshold
        self.vip_users = set()  # Populated from metrics
        self.vip_sub_shards = 4  # VIPs spread across 4 sub-shards
    
    def update_vip_list(self, user_metrics: dict):
        """
        Periodically update VIP list based on activity metrics.
        VIP = users with > vip_threshold records.
        """
        self.vip_users = {
            user_id for user_id, count in user_metrics.items()
            if count > self.vip_threshold
        }
    
    def get_shard(self, user_id: str, record_id: str = None) -> int:
        """
        Route with VIP sub-sharding.
        
        Regular users: hash(user_id) % num_shards
        VIP users: base_shard + hash(record_id) % vip_sub_shards
        """
        base_shard = self.hash_to_shard(user_id)
        
        if user_id in self.vip_users and record_id:
            # VIP: distribute records across sub-shard range
            sub_shard = hash(record_id) % self.vip_sub_shards
            return (base_shard * self.vip_sub_shards + sub_shard) % self.num_shards
        else:
            return base_shard
    
    def get_shards_for_user(self, user_id: str) -> list:
        """
        For queries that need ALL data for a user.
        """
        if user_id in self.vip_users:
            base = self.hash_to_shard(user_id)
            return [
                (base * self.vip_sub_shards + i) % self.num_shards
                for i in range(self.vip_sub_shards)
            ]
        else:
            return [self.hash_to_shard(user_id)]

Complexity vs. Necessity

Summary: Shard Key Selection Mastery

Shard key selection is the most consequential decision in a sharding implementation. Let's consolidate the frameworks and principles covered:

Key Takeaways

•Follow a systematic process — Query pattern analysis → data characteristics → candidate evaluation → future-proofing. Never choose a shard key without rigorous analysis.
•Demand all three properties — High cardinality, uniform distribution, and query affinity are non-negotiable. Compromise on one, and the system will suffer.
•Use compound keys when needed — Multi-tenant systems, hierarchical data, and complex access patterns often require compound shard keys. Order columns by query frequency.
•Avoid known anti-patterns — Monotonic keys, low cardinality, mutable keys, and query mismatches are guaranteed paths to operational pain.
•Analyze with data, not intuition — Query logs, distribution metrics, and simulated sharding provide objective evidence for decisions. Document your analysis.
•Plan for special cases explicitly — VIP users, global data, and temporal hotspots require intentional handling strategies, not afterthoughts.

What's Next:

Page Complete

2 / 5