Loading learning content...
As databases grow, even well-indexed tables can become management nightmares. A 10-billion-row table with perfect indexes still requires massive index structures, lengthy backup times, slow DDL operations, and challenging maintenance windows. Partitioning addresses these challenges by dividing a logical table into smaller, more manageable physical pieces.
Partitioning is the divide-and-conquer strategy for data management. It transforms a monolithic table into a collection of smaller partitions that can be:
The key insight: Most queries don't need all data. They filter by date range, customer region, product category, or similar criteria. If data is partitioned along these natural boundaries, queries only touch relevant partitions.
This page covers horizontal and vertical partitioning, major partitioning schemes (range, list, hash, composite), partition pruning mechanics, and practical guidance for implementing partitioning in production systems. You'll understand when partitioning helps, when it hurts, and how to design effective partitioning strategies.
Partitioning divides tables in two fundamental ways: horizontal partitioning (dividing rows) and vertical partitioning (dividing columns).
Horizontal Partitioning (Sharding):
Each partition contains a subset of rows with identical schemas:
Original Table:
| id | name | date | amount |
| 1 | Alice | 2024-01 | 100 |
| 2 | Bob | 2024-02 | 200 |
| 3 | Carol | 2024-03 | 150 |
| 4 | Dave | 2024-01 | 300 |
Horizontally Partitioned by Month:
Partition 2024-01: Partition 2024-02: Partition 2024-03:
| id | name | date |amt| | id |name |date |amt| | id |name |date |amt|
| 1 | Alice | 2024-01 |100| | 2 | Bob | 2024-02|200| | 3 |Carol | 2024-03|150|
| 4 | Dave | 2024-01 |300| | | | | | | | | | |
Each partition is a complete table with all columns but only specific rows.
Vertical Partitioning:
Each partition contains all rows but a subset of columns:
Original Table:
| id | name | email | bio (large text) | preferences (JSON) |
| 1 | Alice | alice@mail.com | Long biography... | {theme: 'dark'} |
| 2 | Bob | bob@mail.com | Another long bio... | {theme: 'light'} |
Vertically Partitioned:
Frequent Access Partition: Rare Access Partition:
| id | name | email | | id | bio (large text) | preferences |
| 1 | Alice | alice@mail.com | | 1 | Long biography... | {theme:'dark'} |
| 2 | Bob | bob@mail.com | | 2 | Another long bio..| {theme:'light'}|
Vertical partitioning separates:
This improves I/O efficiency—queries touching only frequent columns read smaller blocks.
| Aspect | Horizontal Partitioning | Vertical Partitioning |
|---|---|---|
| Division basis | Rows (by partition key value) | Columns (by access pattern) |
| Partition schema | All partitions have identical schema | Each partition has different columns |
| Primary benefit | Query pruning, parallel processing | Reduced I/O, column-level caching |
| Implementation | Native DBMS partitioning | Multiple tables with joins or specialized engines |
| Common use | Time-series, multi-tenant, geographic | Wide tables with mixed access patterns |
In distributed database contexts, horizontal partitioning is often called 'sharding' when partitions are distributed across different physical nodes. The concepts are similar, but sharding implies distribution across machines while partitioning may occur within a single database instance.
Range partitioning assigns rows to partitions based on whether the partition key falls within a specified range of values. This is the most common partitioning strategy, particularly for time-series data.
Structure:
Example: Partitioning orders by order_date:
Partition p_2023_q1: order_date >= '2023-01-01' AND order_date < '2023-04-01'
Partition p_2023_q2: order_date >= '2023-04-01' AND order_date < '2023-07-01'
Partition p_2023_q3: order_date >= '2023-07-01' AND order_date < '2023-10-01'
Partition p_2023_q4: order_date >= '2023-10-01' AND order_date < '2024-01-01'
1234567891011121314151617181920212223242526
-- Create a range-partitioned tableCREATE TABLE orders ( order_id SERIAL, customer_id INTEGER NOT NULL, order_date DATE NOT NULL, total_amount DECIMAL(10,2), status VARCHAR(20)) PARTITION BY RANGE (order_date); -- Create partitions for each quarterCREATE TABLE orders_2024_q1 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); CREATE TABLE orders_2024_q2 PARTITION OF orders FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); CREATE TABLE orders_2024_q3 PARTITION OF orders FOR VALUES FROM ('2024-07-01') TO ('2024-10-01'); CREATE TABLE orders_2024_q4 PARTITION OF orders FOR VALUES FROM ('2024-10-01') TO ('2025-01-01'); -- Queries automatically benefit from partition pruningEXPLAIN ANALYZESELECT * FROM orders WHERE order_date = '2024-05-15';-- Only scans orders_2024_q2 partitionRange partitioning can create 'hot' partitions if data distribution is skewed. If 80% of queries target the current month's partition, that partition becomes a bottleneck while others sit idle. Consider finer granularity (daily vs monthly) for high-volume periods, or combine with hash sub-partitioning to spread load.
List partitioning assigns rows to partitions based on discrete values from a predefined list. This is ideal for categorical data with a known, finite set of values.
Structure:
Example: Partitioning by geographic region:
Partition p_americas: region IN ('US', 'CA', 'MX', 'BR', 'AR')
Partition p_europe: region IN ('UK', 'DE', 'FR', 'ES', 'IT')
Partition p_asia: region IN ('JP', 'CN', 'IN', 'KR', 'SG')
Partition p_other: region IN ('AU', 'NZ', 'ZA') -- or DEFAULT
12345678910111213141516171819202122232425262728293031323334
-- List partitioning by regionCREATE TABLE customers ( customer_id SERIAL, name VARCHAR(100), email VARCHAR(255), region VARCHAR(10) NOT NULL, created_at TIMESTAMP DEFAULT NOW()) PARTITION BY LIST (region); -- Create partitions for each region groupCREATE TABLE customers_americas PARTITION OF customers FOR VALUES IN ('US', 'CA', 'MX', 'BR', 'AR'); CREATE TABLE customers_europe PARTITION OF customers FOR VALUES IN ('UK', 'DE', 'FR', 'ES', 'IT', 'NL', 'BE'); CREATE TABLE customers_asia PARTITION OF customers FOR VALUES IN ('JP', 'CN', 'IN', 'KR', 'SG', 'TW', 'HK'); -- Default partition for any other valuesCREATE TABLE customers_other PARTITION OF customers DEFAULT; -- Multi-tenant partitioning (isolation by tenant)CREATE TABLE tenant_data ( id SERIAL, tenant_id VARCHAR(50) NOT NULL, data JSONB) PARTITION BY LIST (tenant_id); CREATE TABLE tenant_acme PARTITION OF tenant_data FOR VALUES IN ('acme');CREATE TABLE tenant_globex PARTITION OF tenant_data FOR VALUES IN ('globex');-- Each tenant's data is physically isolatedHash partitioning distributes rows across partitions using a hash function on the partition key. This ensures even data distribution regardless of key value patterns.
Structure:
partition_number = hash(key) MOD num_partitionsWhy hash partitioning?
Range and list partitioning can create uneven partitions:
Hash partitioning solves this by spreading data evenly across all partitions, ensuring balanced storage and parallel query execution.
12345678910111213141516171819202122232425262728293031323334
-- Hash partitioning for even distributionCREATE TABLE events ( event_id BIGSERIAL, user_id INTEGER NOT NULL, event_type VARCHAR(50), event_data JSONB, created_at TIMESTAMP DEFAULT NOW()) PARTITION BY HASH (user_id); -- Create hash partitions (typically power of 2)CREATE TABLE events_p0 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 0);CREATE TABLE events_p1 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 1);CREATE TABLE events_p2 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 2);CREATE TABLE events_p3 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 3);CREATE TABLE events_p4 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 4);CREATE TABLE events_p5 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 5);CREATE TABLE events_p6 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 6);CREATE TABLE events_p7 PARTITION OF events FOR VALUES WITH (MODULUS 8, REMAINDER 7); -- Queries with equality on partition key benefit from pruningEXPLAIN SELECT * FROM events WHERE user_id = 12345;-- Only one hash partition is scanned -- Range queries on partition key CANNOT pruneEXPLAIN SELECT * FROM events WHERE user_id BETWEEN 1000 AND 2000;-- All partitions must be scanned (hash provides no ordering)| Advantage | Disadvantage |
|---|---|
| Even data distribution | No partition pruning for range queries |
| Balanced parallel execution | Changing partition count is expensive |
| No hot partitions | No semantic meaning to partitions |
| Simple partition management | Cannot drop partitions by date/category |
| Good for point lookups | Full scan for non-key predicates |
For hash partitioning, choose a partition count that: (1) Is a power of 2 for optimal modulus operation, (2) Matches or exceeds your parallel execution capacity, (3) Results in appropriately sized partitions (not too small, not too large), (4) Allows for future growth. A common starting point is 8-32 partitions for moderate-sized tables.
Composite partitioning (or sub-partitioning) combines two partitioning strategies: a primary partition scheme divided further by a secondary scheme. This enables fine-grained data organization for complex access patterns.
Common combinations:
Example use case:
A global e-commerce platform needs to:
Solution: Range partition by order_date, sub-partition by region, with hash sub-sub-partitioning for load distribution.
1234567891011121314151617181920212223242526272829303132333435363738
-- Composite partitioning in PostgreSQL (multi-level)-- First, create the parent table with range partitioningCREATE TABLE sales ( sale_id BIGSERIAL, region VARCHAR(10) NOT NULL, sale_date DATE NOT NULL, amount DECIMAL(10,2)) PARTITION BY RANGE (sale_date); -- Create range partitions that are themselves partitioned by listCREATE TABLE sales_2024_q1 PARTITION OF sales FOR VALUES FROM ('2024-01-01') TO ('2024-04-01') PARTITION BY LIST (region); CREATE TABLE sales_2024_q2 PARTITION OF sales FOR VALUES FROM ('2024-04-01') TO ('2024-07-01') PARTITION BY LIST (region); -- Create sub-partitions for each quarterCREATE TABLE sales_2024_q1_us PARTITION OF sales_2024_q1 FOR VALUES IN ('US');CREATE TABLE sales_2024_q1_eu PARTITION OF sales_2024_q1 FOR VALUES IN ('UK', 'DE', 'FR');CREATE TABLE sales_2024_q1_asia PARTITION OF sales_2024_q1 FOR VALUES IN ('JP', 'CN', 'IN'); CREATE TABLE sales_2024_q2_us PARTITION OF sales_2024_q2 FOR VALUES IN ('US');CREATE TABLE sales_2024_q2_eu PARTITION OF sales_2024_q2 FOR VALUES IN ('UK', 'DE', 'FR');CREATE TABLE sales_2024_q2_asia PARTITION OF sales_2024_q2 FOR VALUES IN ('JP', 'CN', 'IN'); -- Query with both predicates prunes to single sub-partitionEXPLAIN ANALYZESELECT * FROM sales WHERE sale_date = '2024-02-15' AND region = 'US';-- Scans only: sales_2024_q1_usComposite partitioning increases management complexity exponentially. With 12 monthly partitions and 5 regional sub-partitions, you have 60 physical tables to manage. Each requires index creation, statistics gathering, backup coordination, and retention policy application. Use composite partitioning only when the query patterns genuinely benefit from both dimensions.
Partition pruning is the mechanism by which the query optimizer eliminates partitions from consideration when they cannot contain relevant data. This is the primary performance benefit of partitioning.
How pruning works:
WHERE order_date = '2024-05-15'Pruning requirements:
| Query Predicate | Pruning Behavior | Why |
|---|---|---|
WHERE date = '2024-05-15' | Single partition | Exact match on partition key |
WHERE date BETWEEN '2024-01-01' AND '2024-03-31' | One quarter | Range covers specific partitions |
WHERE date > NOW() - INTERVAL '30 days' | Recent partitions only | Bounds computable at plan time |
WHERE customer_id = 123 | All partitions | Non-partition-key column |
WHERE date_func(order_date) = 2024 | All partitions | Function on key prevents pruning |
WHERE order_date = (SELECT MAX(date) FROM ...) | All partitions | Subquery not evaluable at plan time |
123456789101112131415161718192021222324252627282930
-- Analyzing partition pruning with EXPLAIN-- Good: Direct predicate on partition keyEXPLAIN (ANALYZE, BUFFERS)SELECT * FROM orders WHERE order_date = '2024-05-15';-- Shows: Append with only 1 child partition scanned -- Good: Range scan touching few partitionsEXPLAIN (ANALYZE, BUFFERS)SELECT * FROM orders WHERE order_date >= '2024-04-01' AND order_date < '2024-05-01';-- Shows: Only Q2 partitions scanned -- Bad: Function on partition keyEXPLAIN (ANALYZE, BUFFERS)SELECT * FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024;-- Shows: ALL partitions scanned (cannot prune)-- Fix: Use range instead: WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01' -- Track partition statisticsSELECT parent.relname AS parent_table, child.relname AS partition, pg_size_pretty(pg_relation_size(child.oid)) AS size, child.n_live_tup AS rowsFROM pg_inheritsJOIN pg_class parent ON pg_inherits.inhparent = parent.oidJOIN pg_class child ON pg_inherits.inhrelid = child.oidJOIN pg_stat_user_tables child_stat ON child.oid = child_stat.relidWHERE parent.relname = 'orders'ORDER BY child.relname;Always verify pruning with EXPLAIN. Common pruning failures: (1) Functions applied to partition key (DATE_TRUNC(order_date)), (2) Implicit type casts (WHERE integer_date = '2024-05-15'), (3) OR conditions spanning partitions, (4) JOIN conditions where optimizer can't infer bounds, (5) Dynamic conditions not evaluable at plan time.
Effective partitioning requires ongoing management—creating new partitions, archiving old ones, maintaining indexes, and ensuring statistics remain accurate.
Common partition lifecycle operations:
12345678910111213141516171819202122232425262728293031323334353637383940414243
-- Add new partition for upcoming quarterCREATE TABLE orders_2025_q1 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2025-04-01'); -- Drop old partition (instant delete of all data in that partition)DROP TABLE orders_2022_q1;-- Alternative: Detach first for safetyALTER TABLE orders DETACH PARTITION orders_2022_q1;-- Now orders_2022_q1 is a standalone table, can archive then drop -- Attach an existing table as partition-- Useful for bulk loading: load into staging table, then attachCREATE TABLE orders_2025_q2 (LIKE orders INCLUDING ALL);-- ... bulk load data into orders_2025_q2 ... -- Run constraint check before attachingALTER TABLE orders_2025_q2 ADD CONSTRAINT date_check CHECK (order_date >= '2025-04-01' AND order_date < '2025-07-01'); -- Attach as partitionALTER TABLE orders ATTACH PARTITION orders_2025_q2 FOR VALUES FROM ('2025-04-01') TO ('2025-07-01'); -- Automate partition creation (example procedure)CREATE OR REPLACE PROCEDURE create_next_month_partition()LANGUAGE plpgsql AS $$DECLARE next_month DATE := DATE_TRUNC('month', NOW() + INTERVAL '1 month'); partition_name TEXT := 'orders_' || TO_CHAR(next_month, 'YYYY_MM'); start_date DATE := next_month; end_date DATE := next_month + INTERVAL '1 month';BEGIN EXECUTE format( 'CREATE TABLE IF NOT EXISTS %I PARTITION OF orders FOR VALUES FROM (%L) TO (%L)', partition_name, start_date, end_date );END;$$; -- Schedule with pg_cron or external scheduler-- SELECT cron.schedule('create_partition', '0 0 25 * *', 'CALL create_next_month_partition()');Partitioning transforms table management for large datasets, enabling performance, maintenance, and lifecycle benefits that are impossible with monolithic tables.
What's next:
With storage, indexing, and partitioning covered, we turn to denormalization decisions—deliberately violating normalization rules to improve query performance. This controversial but essential technique requires understanding when the benefits outweigh the costs.
You now understand partitioning strategies, their trade-offs, and implementation approaches. You can analyze workloads to select appropriate partition schemes, ensure partition pruning works effectively, and manage partition lifecycles for large-scale systems. Next: strategic denormalization for performance optimization.