Database Management SystemPhysical Design

Physical Design

LevelIntermediate

Duration90 mins

TopicPhysical Design

3 / 5

Partitioning

Divide and Conquer: The Power of Partitioning

As databases grow, even well-indexed tables can become management nightmares. A 10-billion-row table with perfect indexes still requires massive index structures, lengthy backup times, slow DDL operations, and challenging maintenance windows. Partitioning addresses these challenges by dividing a logical table into smaller, more manageable physical pieces.

Partitioning is the divide-and-conquer strategy for data management. It transforms a monolithic table into a collection of smaller partitions that can be:

Queried independently (partition pruning)
Maintained independently (backup, rebuild, archive)
Stored differently (different tablespaces, storage tiers)
Managed independently (different retention policies)

The key insight: Most queries don't need all data. They filter by date range, customer region, product category, or similar criteria. If data is partitioned along these natural boundaries, queries only touch relevant partitions.

What You Will Learn

This page covers horizontal and vertical partitioning, major partitioning schemes (range, list, hash, composite), partition pruning mechanics, and practical guidance for implementing partitioning in production systems. You'll understand when partitioning helps, when it hurts, and how to design effective partitioning strategies.

Types of Partitioning

Partitioning divides tables in two fundamental ways: horizontal partitioning (dividing rows) and vertical partitioning (dividing columns).

Horizontal Partitioning (Sharding):

Each partition contains a subset of rows with identical schemas:

Original Table:
| id | name   | date       | amount |
| 1  | Alice  | 2024-01    | 100    |
| 2  | Bob    | 2024-02    | 200    |
| 3  | Carol  | 2024-03    | 150    |
| 4  | Dave   | 2024-01    | 300    |

Horizontally Partitioned by Month:

Partition 2024-01:           Partition 2024-02:       Partition 2024-03:
| id | name  | date    |amt| | id |name |date    |amt| | id |name  |date    |amt|
| 1  | Alice | 2024-01 |100| | 2  | Bob | 2024-02|200| | 3  |Carol | 2024-03|150|
| 4  | Dave  | 2024-01 |300| |    |     |        |   | |    |      |        |   |

Each partition is a complete table with all columns but only specific rows.

Vertical Partitioning:

Each partition contains all rows but a subset of columns:

Original Table:
| id | name  | email           | bio (large text)       | preferences (JSON) |
| 1  | Alice | alice@mail.com  | Long biography...      | {theme: 'dark'}    |
| 2  | Bob   | bob@mail.com    | Another long bio...    | {theme: 'light'}   |

Vertically Partitioned:

Frequent Access Partition:           Rare Access Partition:
| id | name  | email          |       | id | bio (large text)  | preferences    |
| 1  | Alice | alice@mail.com |       | 1  | Long biography... | {theme:'dark'} |
| 2  | Bob   | bob@mail.com   |       | 2  | Another long bio..| {theme:'light'}|

Vertical partitioning separates:

Frequently accessed columns from rarely accessed ones
Small, fixed-size columns from large, variable-size columns
Hot data (updated often) from cold data (read-only)

This improves I/O efficiency—queries touching only frequent columns read smaller blocks.

Horizontal vs Vertical Partitioning Comparison
Aspect	Horizontal Partitioning	Vertical Partitioning
Division basis	Rows (by partition key value)	Columns (by access pattern)
Partition schema	All partitions have identical schema	Each partition has different columns
Primary benefit	Query pruning, parallel processing	Reduced I/O, column-level caching
Implementation	Native DBMS partitioning	Multiple tables with joins or specialized engines
Common use	Time-series, multi-tenant, geographic	Wide tables with mixed access patterns

Terminology Note

In distributed database contexts, horizontal partitioning is often called 'sharding' when partitions are distributed across different physical nodes. The concepts are similar, but sharding implies distribution across machines while partitioning may occur within a single database instance.

Range Partitioning

Range partitioning assigns rows to partitions based on whether the partition key falls within a specified range of values. This is the most common partitioning strategy, particularly for time-series data.

Structure:

Define non-overlapping ranges for the partition key
Each row is assigned to exactly one partition based on its key value
Ranges are typically contiguous but can have gaps

Example: Partitioning orders by order_date:

Partition p_2023_q1: order_date >= '2023-01-01' AND order_date < '2023-04-01'
Partition p_2023_q2: order_date >= '2023-04-01' AND order_date < '2023-07-01'
Partition p_2023_q3: order_date >= '2023-07-01' AND order_date < '2023-10-01'
Partition p_2023_q4: order_date >= '2023-10-01' AND order_date < '2024-01-01'

range_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Create a range-partitioned table
CREATE TABLE orders (
    order_id SERIAL,
    customer_id INTEGER NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(10,2),
    status VARCHAR(20)
) PARTITION BY RANGE (order_date);
 
-- Create partitions for each quarter
CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
 
CREATE TABLE orders_2024_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
 
CREATE TABLE orders_2024_q3 PARTITION OF orders
    FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');
 
CREATE TABLE orders_2024_q4 PARTITION OF orders
    FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');
 
-- Queries automatically benefit from partition pruning
EXPLAIN ANALYZE
SELECT * FROM orders WHERE order_date = '2024-05-15';
-- Only scans orders_2024_q2 partition

Range Partitioning Benefits

•Partition pruning — Queries with range predicates scan only relevant partitions
•Easy maintenance — Drop old partitions for data retention (instant DELETE)
•Parallel query execution — Scan multiple partitions simultaneously
•Tiered storage — Move old partitions to cheaper storage
•Focused indexing — Create indexes only on recent, active partitions

Range Partitioning Pitfall: Uneven Distribution

Range partitioning can create 'hot' partitions if data distribution is skewed. If 80% of queries target the current month's partition, that partition becomes a bottleneck while others sit idle. Consider finer granularity (daily vs monthly) for high-volume periods, or combine with hash sub-partitioning to spread load.

List Partitioning

List partitioning assigns rows to partitions based on discrete values from a predefined list. This is ideal for categorical data with a known, finite set of values.

Structure:

Define explicit lists of values for each partition
Each value belongs to exactly one partition
Commonly used for status codes, regions, categories, or tenants

Example: Partitioning by geographic region:

Partition p_americas: region IN ('US', 'CA', 'MX', 'BR', 'AR')
Partition p_europe:   region IN ('UK', 'DE', 'FR', 'ES', 'IT')
Partition p_asia:     region IN ('JP', 'CN', 'IN', 'KR', 'SG')
Partition p_other:    region IN ('AU', 'NZ', 'ZA')  -- or DEFAULT

list_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- List partitioning by region
CREATE TABLE customers (
    customer_id SERIAL,
    name VARCHAR(100),
    email VARCHAR(255),
    region VARCHAR(10) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY LIST (region);
 
-- Create partitions for each region group
CREATE TABLE customers_americas PARTITION OF customers
    FOR VALUES IN ('US', 'CA', 'MX', 'BR', 'AR');
 
CREATE TABLE customers_europe PARTITION OF customers
    FOR VALUES IN ('UK', 'DE', 'FR', 'ES', 'IT', 'NL', 'BE');
 
CREATE TABLE customers_asia PARTITION OF customers
    FOR VALUES IN ('JP', 'CN', 'IN', 'KR', 'SG', 'TW', 'HK');
 
-- Default partition for any other values
CREATE TABLE customers_other PARTITION OF customers DEFAULT;
 
-- Multi-tenant partitioning (isolation by tenant)
CREATE TABLE tenant_data (
    id SERIAL,
    tenant_id VARCHAR(50) NOT NULL,
    data JSONB
) PARTITION BY LIST (tenant_id);
 
CREATE TABLE tenant_acme PARTITION OF tenant_data 
    FOR VALUES IN ('acme');
CREATE TABLE tenant_globex PARTITION OF tenant_data 
    FOR VALUES IN ('globex');
-- Each tenant's data is physically isolated

List Partitioning Ideal For

•Multi-tenant SaaS applications
•Geographic segmentation
•Product/service categories
•Status-based data lifecycle
•Regulatory data isolation

List Partitioning Challenges

•Must predefine all values (or use DEFAULT)
•Adding new values may require DDL
•Uneven distribution across partitions
•Not suitable for high-cardinality keys
•Cross-partition queries still scan all

Hash Partitioning

Hash partitioning distributes rows across partitions using a hash function on the partition key. This ensures even data distribution regardless of key value patterns.

Structure:

Hash function computes: partition_number = hash(key) MOD num_partitions
Each row deterministically assigned to one partition
Distribution is uniform (assuming good hash function)
No control over which specific values go to which partition

Why hash partitioning?

Range and list partitioning can create uneven partitions:

Range: Recent dates get all the load
List: Popular categories have more data

Hash partitioning solves this by spreading data evenly across all partitions, ensuring balanced storage and parallel query execution.

hash_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Hash partitioning for even distribution
CREATE TABLE events (
    event_id BIGSERIAL,
    user_id INTEGER NOT NULL,
    event_type VARCHAR(50),
    event_data JSONB,
    created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY HASH (user_id);
 
-- Create hash partitions (typically power of 2)
CREATE TABLE events_p0 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 3);
CREATE TABLE events_p4 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 4);
CREATE TABLE events_p5 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 5);
CREATE TABLE events_p6 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 6);
CREATE TABLE events_p7 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 7);
 
-- Queries with equality on partition key benefit from pruning
EXPLAIN SELECT * FROM events WHERE user_id = 12345;
-- Only one hash partition is scanned
 
-- Range queries on partition key CANNOT prune
EXPLAIN SELECT * FROM events WHERE user_id BETWEEN 1000 AND 2000;
-- All partitions must be scanned (hash provides no ordering)

Hash Partitioning Trade-offs
Advantage	Disadvantage
Even data distribution	No partition pruning for range queries
Balanced parallel execution	Changing partition count is expensive
No hot partitions	No semantic meaning to partitions
Simple partition management	Cannot drop partitions by date/category
Good for point lookups	Full scan for non-key predicates

Choosing Partition Count

For hash partitioning, choose a partition count that: (1) Is a power of 2 for optimal modulus operation, (2) Matches or exceeds your parallel execution capacity, (3) Results in appropriately sized partitions (not too small, not too large), (4) Allows for future growth. A common starting point is 8-32 partitions for moderate-sized tables.

Composite (Sub-)Partitioning

Composite partitioning (or sub-partitioning) combines two partitioning strategies: a primary partition scheme divided further by a secondary scheme. This enables fine-grained data organization for complex access patterns.

Common combinations:

Range-Hash: Range by date, hash within each range by customer
Range-List: Range by date, list by region within each range
List-Hash: List by region, hash by user_id within each region
Range-Range: Range by year, sub-range by month

Example use case:

A global e-commerce platform needs to:

Query orders by date range (for reporting)
Process orders by region (for compliance)
Distribute load evenly across servers

Solution: Range partition by order_date, sub-partition by region, with hash sub-sub-partitioning for load distribution.

composite_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Composite partitioning in PostgreSQL (multi-level)
-- First, create the parent table with range partitioning
CREATE TABLE sales (
    sale_id BIGSERIAL,
    region VARCHAR(10) NOT NULL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);
 
-- Create range partitions that are themselves partitioned by list
CREATE TABLE sales_2024_q1 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01')
    PARTITION BY LIST (region);
 
CREATE TABLE sales_2024_q2 PARTITION OF sales
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01')
    PARTITION BY LIST (region);
 
-- Create sub-partitions for each quarter
CREATE TABLE sales_2024_q1_us PARTITION OF sales_2024_q1 
    FOR VALUES IN ('US');
CREATE TABLE sales_2024_q1_eu PARTITION OF sales_2024_q1 
    FOR VALUES IN ('UK', 'DE', 'FR');
CREATE TABLE sales_2024_q1_asia PARTITION OF sales_2024_q1 
    FOR VALUES IN ('JP', 'CN', 'IN');
 
CREATE TABLE sales_2024_q2_us PARTITION OF sales_2024_q2 
    FOR VALUES IN ('US');
CREATE TABLE sales_2024_q2_eu PARTITION OF sales_2024_q2 
    FOR VALUES IN ('UK', 'DE', 'FR');
CREATE TABLE sales_2024_q2_asia PARTITION OF sales_2024_q2 
    FOR VALUES IN ('JP', 'CN', 'IN');
 
-- Query with both predicates prunes to single sub-partition
EXPLAIN ANALYZE
SELECT * FROM sales 
WHERE sale_date = '2024-02-15' AND region = 'US';
-- Scans only: sales_2024_q1_us

Composite Partition Complexity

Composite partitioning increases management complexity exponentially. With 12 monthly partitions and 5 regional sub-partitions, you have 60 physical tables to manage. Each requires index creation, statistics gathering, backup coordination, and retention policy application. Use composite partitioning only when the query patterns genuinely benefit from both dimensions.

Partition Pruning and Query Optimization

Partition pruning is the mechanism by which the query optimizer eliminates partitions from consideration when they cannot contain relevant data. This is the primary performance benefit of partitioning.

How pruning works:

Query contains predicate on partition key: WHERE order_date = '2024-05-15'
Optimizer identifies which partitions could contain matching rows
Only those partitions are scanned; others are skipped entirely
Query cost drops proportionally to partitions eliminated

Pruning requirements:

Predicate must reference the partition key
Predicate must be evaluable at planning time (constants, bound parameters)
Predicate must constrain the key sufficiently to eliminate partitions

Partition Pruning Effectiveness
Query Predicate	Pruning Behavior	Why
`WHERE date = '2024-05-15'`	Single partition	Exact match on partition key
`WHERE date BETWEEN '2024-01-01' AND '2024-03-31'`	One quarter	Range covers specific partitions
`WHERE date > NOW() - INTERVAL '30 days'`	Recent partitions only	Bounds computable at plan time
`WHERE customer_id = 123`	All partitions	Non-partition-key column
`WHERE date_func(order_date) = 2024`	All partitions	Function on key prevents pruning
`WHERE order_date = (SELECT MAX(date) FROM ...)`	All partitions	Subquery not evaluable at plan time

partition_pruning_analysis.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Analyzing partition pruning with EXPLAIN
-- Good: Direct predicate on partition key
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE order_date = '2024-05-15';
-- Shows: Append with only 1 child partition scanned
 
-- Good: Range scan touching few partitions
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
WHERE order_date >= '2024-04-01' AND order_date < '2024-05-01';
-- Shows: Only Q2 partitions scanned
 
-- Bad: Function on partition key
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024;
-- Shows: ALL partitions scanned (cannot prune)
-- Fix: Use range instead: WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'
 
-- Track partition statistics
SELECT
    parent.relname AS parent_table,
    child.relname AS partition,
    pg_size_pretty(pg_relation_size(child.oid)) AS size,
    child.n_live_tup AS rows
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_stat_user_tables child_stat ON child.oid = child_stat.relid
WHERE parent.relname = 'orders'
ORDER BY child.relname;

Ensure Pruning is Happening

Always verify pruning with EXPLAIN. Common pruning failures: (1) Functions applied to partition key (DATE_TRUNC(order_date)), (2) Implicit type casts (WHERE integer_date = '2024-05-15'), (3) OR conditions spanning partitions, (4) JOIN conditions where optimizer can't infer bounds, (5) Dynamic conditions not evaluable at plan time.

Partition Management and Lifecycle

Effective partitioning requires ongoing management—creating new partitions, archiving old ones, maintaining indexes, and ensuring statistics remain accurate.

Common partition lifecycle operations:

Partition Lifecycle Operations

•Add partition — Create new partition for incoming data (future dates, new tenants)
•Drop partition — Remove entire partition instantly (vs. DELETE which is logged)
•Truncate partition — Empty partition keeping structure (for reload scenarios)
•Detach partition — Remove from table without dropping (for archival/analysis)
•Attach partition — Add existing table as partition (for data loading)
•Split partition — Divide one partition into multiple (rebalancing)
•Merge partitions — Combine multiple partitions into one (reducing count)
•Move partition — Transfer partition to different tablespace/storage tier

partition_lifecycle_operations.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Add new partition for upcoming quarter
CREATE TABLE orders_2025_q1 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
 
-- Drop old partition (instant delete of all data in that partition)
DROP TABLE orders_2022_q1;
-- Alternative: Detach first for safety
ALTER TABLE orders DETACH PARTITION orders_2022_q1;
-- Now orders_2022_q1 is a standalone table, can archive then drop
 
-- Attach an existing table as partition
-- Useful for bulk loading: load into staging table, then attach
CREATE TABLE orders_2025_q2 (LIKE orders INCLUDING ALL);
-- ... bulk load data into orders_2025_q2 ...
 
-- Run constraint check before attaching
ALTER TABLE orders_2025_q2 
    ADD CONSTRAINT date_check 
    CHECK (order_date >= '2025-04-01' AND order_date < '2025-07-01');
 
-- Attach as partition
ALTER TABLE orders ATTACH PARTITION orders_2025_q2
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
 
-- Automate partition creation (example procedure)
CREATE OR REPLACE PROCEDURE create_next_month_partition()
LANGUAGE plpgsql AS $$
DECLARE
    next_month DATE := DATE_TRUNC('month', NOW() + INTERVAL '1 month');
    partition_name TEXT := 'orders_' || TO_CHAR(next_month, 'YYYY_MM');
    start_date DATE := next_month;
    end_date DATE := next_month + INTERVAL '1 month';
BEGIN
    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF orders
         FOR VALUES FROM (%L) TO (%L)',
        partition_name, start_date, end_date
    );
END;
$$;
 
-- Schedule with pg_cron or external scheduler
-- SELECT cron.schedule('create_partition', '0 0 25 * *', 'CALL create_next_month_partition()');

Partition Maintenance Best Practices

Automate partition creation—never let data fail insert because partition doesn't exist. 2) Create partitions ahead of time (e.g., next month's partition on the 25th). 3) Use DETACH before DROP to allow verification. 4) Maintain separate retention policies per partition age. 5) Analyze partitions after bulk operations. 6) Monitor partition sizes for imbalance.

Summary: Partitioning

Partitioning transforms table management for large datasets, enabling performance, maintenance, and lifecycle benefits that are impossible with monolithic tables.

Key Takeaways

•Horizontal partitioning divides rows; vertical partitioning divides columns—both reduce I/O for targeted queries
•Range partitioning — Ideal for time-series and naturally ordered data; enables date-based retention
•List partitioning — Perfect for categorical data, multi-tenant isolation, geographic segmentation
•Hash partitioning — Ensures even distribution; excellent for parallel processing; no range pruning
•Composite partitioning — Combines strategies for complex access patterns; increases management complexity
•Partition pruning — The primary performance benefit; verify with EXPLAIN; avoid functions on partition keys
•Lifecycle management — Automate partition creation, leverage instant DROP for retention, use DETACH for archival
•Choose partition key wisely — Must align with query patterns; cannot be changed without rebuilding

What's next:

With storage, indexing, and partitioning covered, we turn to denormalization decisions—deliberately violating normalization rules to improve query performance. This controversial but essential technique requires understanding when the benefits outweigh the costs.

Page Complete

You now understand partitioning strategies, their trade-offs, and implementation approaches. You can analyze workloads to select appropriate partition schemes, ensure partition pruning works effectively, and manage partition lifecycles for large-scale systems. Next: strategic denormalization for performance optimization.

3 / 5

Loading learning content...

Database Management SystemPhysical Design

Physical Design

LevelIntermediate

Duration90 mins

TopicPhysical Design

3 / 5

Partitioning

Divide and Conquer: The Power of Partitioning

Partitioning is the divide-and-conquer strategy for data management. It transforms a monolithic table into a collection of smaller partitions that can be:

Queried independently (partition pruning)
Maintained independently (backup, rebuild, archive)
Stored differently (different tablespaces, storage tiers)
Managed independently (different retention policies)

What You Will Learn

Types of Partitioning

Partitioning divides tables in two fundamental ways: horizontal partitioning (dividing rows) and vertical partitioning (dividing columns).

Horizontal Partitioning (Sharding):

Each partition contains a subset of rows with identical schemas:

Original Table:
| id | name   | date       | amount |
| 1  | Alice  | 2024-01    | 100    |
| 2  | Bob    | 2024-02    | 200    |
| 3  | Carol  | 2024-03    | 150    |
| 4  | Dave   | 2024-01    | 300    |

Horizontally Partitioned by Month:

Partition 2024-01:           Partition 2024-02:       Partition 2024-03:
| id | name  | date    |amt| | id |name |date    |amt| | id |name  |date    |amt|
| 1  | Alice | 2024-01 |100| | 2  | Bob | 2024-02|200| | 3  |Carol | 2024-03|150|
| 4  | Dave  | 2024-01 |300| |    |     |        |   | |    |      |        |   |

Each partition is a complete table with all columns but only specific rows.

Vertical Partitioning:

Each partition contains all rows but a subset of columns:

Original Table:
| id | name  | email           | bio (large text)       | preferences (JSON) |
| 1  | Alice | alice@mail.com  | Long biography...      | {theme: 'dark'}    |
| 2  | Bob   | bob@mail.com    | Another long bio...    | {theme: 'light'}   |

Vertically Partitioned:

Frequent Access Partition:           Rare Access Partition:
| id | name  | email          |       | id | bio (large text)  | preferences    |
| 1  | Alice | alice@mail.com |       | 1  | Long biography... | {theme:'dark'} |
| 2  | Bob   | bob@mail.com   |       | 2  | Another long bio..| {theme:'light'}|

Vertical partitioning separates:

Frequently accessed columns from rarely accessed ones
Small, fixed-size columns from large, variable-size columns
Hot data (updated often) from cold data (read-only)

This improves I/O efficiency—queries touching only frequent columns read smaller blocks.

Horizontal vs Vertical Partitioning Comparison
Aspect	Horizontal Partitioning	Vertical Partitioning
Division basis	Rows (by partition key value)	Columns (by access pattern)
Partition schema	All partitions have identical schema	Each partition has different columns
Primary benefit	Query pruning, parallel processing	Reduced I/O, column-level caching
Implementation	Native DBMS partitioning	Multiple tables with joins or specialized engines
Common use	Time-series, multi-tenant, geographic	Wide tables with mixed access patterns

Terminology Note

Range Partitioning

Structure:

Define non-overlapping ranges for the partition key
Each row is assigned to exactly one partition based on its key value
Ranges are typically contiguous but can have gaps

Example: Partitioning orders by order_date:

Partition p_2023_q1: order_date >= '2023-01-01' AND order_date < '2023-04-01'
Partition p_2023_q2: order_date >= '2023-04-01' AND order_date < '2023-07-01'
Partition p_2023_q3: order_date >= '2023-07-01' AND order_date < '2023-10-01'
Partition p_2023_q4: order_date >= '2023-10-01' AND order_date < '2024-01-01'

range_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Create a range-partitioned table
CREATE TABLE orders (
    order_id SERIAL,
    customer_id INTEGER NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(10,2),
    status VARCHAR(20)
) PARTITION BY RANGE (order_date);
 
-- Create partitions for each quarter
CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
 
CREATE TABLE orders_2024_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
 
CREATE TABLE orders_2024_q3 PARTITION OF orders
    FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');
 
CREATE TABLE orders_2024_q4 PARTITION OF orders
    FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');
 
-- Queries automatically benefit from partition pruning
EXPLAIN ANALYZE
SELECT * FROM orders WHERE order_date = '2024-05-15';
-- Only scans orders_2024_q2 partition

Range Partitioning Benefits

•Partition pruning — Queries with range predicates scan only relevant partitions
•Easy maintenance — Drop old partitions for data retention (instant DELETE)
•Parallel query execution — Scan multiple partitions simultaneously
•Tiered storage — Move old partitions to cheaper storage
•Focused indexing — Create indexes only on recent, active partitions

Range Partitioning Pitfall: Uneven Distribution

List Partitioning

List partitioning assigns rows to partitions based on discrete values from a predefined list. This is ideal for categorical data with a known, finite set of values.

Structure:

Define explicit lists of values for each partition
Each value belongs to exactly one partition
Commonly used for status codes, regions, categories, or tenants

Example: Partitioning by geographic region:

Partition p_americas: region IN ('US', 'CA', 'MX', 'BR', 'AR')
Partition p_europe:   region IN ('UK', 'DE', 'FR', 'ES', 'IT')
Partition p_asia:     region IN ('JP', 'CN', 'IN', 'KR', 'SG')
Partition p_other:    region IN ('AU', 'NZ', 'ZA')  -- or DEFAULT

list_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- List partitioning by region
CREATE TABLE customers (
    customer_id SERIAL,
    name VARCHAR(100),
    email VARCHAR(255),
    region VARCHAR(10) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY LIST (region);
 
-- Create partitions for each region group
CREATE TABLE customers_americas PARTITION OF customers
    FOR VALUES IN ('US', 'CA', 'MX', 'BR', 'AR');
 
CREATE TABLE customers_europe PARTITION OF customers
    FOR VALUES IN ('UK', 'DE', 'FR', 'ES', 'IT', 'NL', 'BE');
 
CREATE TABLE customers_asia PARTITION OF customers
    FOR VALUES IN ('JP', 'CN', 'IN', 'KR', 'SG', 'TW', 'HK');
 
-- Default partition for any other values
CREATE TABLE customers_other PARTITION OF customers DEFAULT;
 
-- Multi-tenant partitioning (isolation by tenant)
CREATE TABLE tenant_data (
    id SERIAL,
    tenant_id VARCHAR(50) NOT NULL,
    data JSONB
) PARTITION BY LIST (tenant_id);
 
CREATE TABLE tenant_acme PARTITION OF tenant_data 
    FOR VALUES IN ('acme');
CREATE TABLE tenant_globex PARTITION OF tenant_data 
    FOR VALUES IN ('globex');
-- Each tenant's data is physically isolated

List Partitioning Ideal For

•Multi-tenant SaaS applications
•Geographic segmentation
•Product/service categories
•Status-based data lifecycle
•Regulatory data isolation

List Partitioning Challenges

•Must predefine all values (or use DEFAULT)
•Adding new values may require DDL
•Uneven distribution across partitions
•Not suitable for high-cardinality keys
•Cross-partition queries still scan all

Hash Partitioning

Hash partitioning distributes rows across partitions using a hash function on the partition key. This ensures even data distribution regardless of key value patterns.

Structure:

Hash function computes: partition_number = hash(key) MOD num_partitions
Each row deterministically assigned to one partition
Distribution is uniform (assuming good hash function)
No control over which specific values go to which partition

Why hash partitioning?

Range and list partitioning can create uneven partitions:

Range: Recent dates get all the load
List: Popular categories have more data

Hash partitioning solves this by spreading data evenly across all partitions, ensuring balanced storage and parallel query execution.

hash_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Hash partitioning for even distribution
CREATE TABLE events (
    event_id BIGSERIAL,
    user_id INTEGER NOT NULL,
    event_type VARCHAR(50),
    event_data JSONB,
    created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY HASH (user_id);
 
-- Create hash partitions (typically power of 2)
CREATE TABLE events_p0 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE events_p1 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 1);
CREATE TABLE events_p2 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 2);
CREATE TABLE events_p3 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 3);
CREATE TABLE events_p4 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 4);
CREATE TABLE events_p5 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 5);
CREATE TABLE events_p6 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 6);
CREATE TABLE events_p7 PARTITION OF events 
    FOR VALUES WITH (MODULUS 8, REMAINDER 7);
 
-- Queries with equality on partition key benefit from pruning
EXPLAIN SELECT * FROM events WHERE user_id = 12345;
-- Only one hash partition is scanned
 
-- Range queries on partition key CANNOT prune
EXPLAIN SELECT * FROM events WHERE user_id BETWEEN 1000 AND 2000;
-- All partitions must be scanned (hash provides no ordering)

Hash Partitioning Trade-offs
Advantage	Disadvantage
Even data distribution	No partition pruning for range queries
Balanced parallel execution	Changing partition count is expensive
No hot partitions	No semantic meaning to partitions
Simple partition management	Cannot drop partitions by date/category
Good for point lookups	Full scan for non-key predicates

Choosing Partition Count

Composite (Sub-)Partitioning

Common combinations:

Range-Hash: Range by date, hash within each range by customer
Range-List: Range by date, list by region within each range
List-Hash: List by region, hash by user_id within each region
Range-Range: Range by year, sub-range by month

Example use case:

A global e-commerce platform needs to:

Query orders by date range (for reporting)
Process orders by region (for compliance)
Distribute load evenly across servers

Solution: Range partition by order_date, sub-partition by region, with hash sub-sub-partitioning for load distribution.

composite_partitioning_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Composite partitioning in PostgreSQL (multi-level)
-- First, create the parent table with range partitioning
CREATE TABLE sales (
    sale_id BIGSERIAL,
    region VARCHAR(10) NOT NULL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);
 
-- Create range partitions that are themselves partitioned by list
CREATE TABLE sales_2024_q1 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01')
    PARTITION BY LIST (region);
 
CREATE TABLE sales_2024_q2 PARTITION OF sales
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01')
    PARTITION BY LIST (region);
 
-- Create sub-partitions for each quarter
CREATE TABLE sales_2024_q1_us PARTITION OF sales_2024_q1 
    FOR VALUES IN ('US');
CREATE TABLE sales_2024_q1_eu PARTITION OF sales_2024_q1 
    FOR VALUES IN ('UK', 'DE', 'FR');
CREATE TABLE sales_2024_q1_asia PARTITION OF sales_2024_q1 
    FOR VALUES IN ('JP', 'CN', 'IN');
 
CREATE TABLE sales_2024_q2_us PARTITION OF sales_2024_q2 
    FOR VALUES IN ('US');
CREATE TABLE sales_2024_q2_eu PARTITION OF sales_2024_q2 
    FOR VALUES IN ('UK', 'DE', 'FR');
CREATE TABLE sales_2024_q2_asia PARTITION OF sales_2024_q2 
    FOR VALUES IN ('JP', 'CN', 'IN');
 
-- Query with both predicates prunes to single sub-partition
EXPLAIN ANALYZE
SELECT * FROM sales 
WHERE sale_date = '2024-02-15' AND region = 'US';
-- Scans only: sales_2024_q1_us

Composite Partition Complexity

Partition Pruning and Query Optimization

How pruning works:

Query contains predicate on partition key: WHERE order_date = '2024-05-15'
Optimizer identifies which partitions could contain matching rows
Only those partitions are scanned; others are skipped entirely
Query cost drops proportionally to partitions eliminated

Pruning requirements:

Predicate must reference the partition key
Predicate must be evaluable at planning time (constants, bound parameters)
Predicate must constrain the key sufficiently to eliminate partitions

Partition Pruning Effectiveness
Query Predicate	Pruning Behavior	Why
`WHERE date = '2024-05-15'`	Single partition	Exact match on partition key
`WHERE date BETWEEN '2024-01-01' AND '2024-03-31'`	One quarter	Range covers specific partitions
`WHERE date > NOW() - INTERVAL '30 days'`	Recent partitions only	Bounds computable at plan time
`WHERE customer_id = 123`	All partitions	Non-partition-key column
`WHERE date_func(order_date) = 2024`	All partitions	Function on key prevents pruning
`WHERE order_date = (SELECT MAX(date) FROM ...)`	All partitions	Subquery not evaluable at plan time

partition_pruning_analysis.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Analyzing partition pruning with EXPLAIN
-- Good: Direct predicate on partition key
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE order_date = '2024-05-15';
-- Shows: Append with only 1 child partition scanned
 
-- Good: Range scan touching few partitions
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
WHERE order_date >= '2024-04-01' AND order_date < '2024-05-01';
-- Shows: Only Q2 partitions scanned
 
-- Bad: Function on partition key
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024;
-- Shows: ALL partitions scanned (cannot prune)
-- Fix: Use range instead: WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'
 
-- Track partition statistics
SELECT
    parent.relname AS parent_table,
    child.relname AS partition,
    pg_size_pretty(pg_relation_size(child.oid)) AS size,
    child.n_live_tup AS rows
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_stat_user_tables child_stat ON child.oid = child_stat.relid
WHERE parent.relname = 'orders'
ORDER BY child.relname;

Ensure Pruning is Happening

Partition Management and Lifecycle

Effective partitioning requires ongoing management—creating new partitions, archiving old ones, maintaining indexes, and ensuring statistics remain accurate.

Common partition lifecycle operations:

Partition Lifecycle Operations

•Add partition — Create new partition for incoming data (future dates, new tenants)
•Drop partition — Remove entire partition instantly (vs. DELETE which is logged)
•Truncate partition — Empty partition keeping structure (for reload scenarios)
•Detach partition — Remove from table without dropping (for archival/analysis)
•Attach partition — Add existing table as partition (for data loading)
•Split partition — Divide one partition into multiple (rebalancing)
•Merge partitions — Combine multiple partitions into one (reducing count)
•Move partition — Transfer partition to different tablespace/storage tier

partition_lifecycle_operations.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Add new partition for upcoming quarter
CREATE TABLE orders_2025_q1 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
 
-- Drop old partition (instant delete of all data in that partition)
DROP TABLE orders_2022_q1;
-- Alternative: Detach first for safety
ALTER TABLE orders DETACH PARTITION orders_2022_q1;
-- Now orders_2022_q1 is a standalone table, can archive then drop
 
-- Attach an existing table as partition
-- Useful for bulk loading: load into staging table, then attach
CREATE TABLE orders_2025_q2 (LIKE orders INCLUDING ALL);
-- ... bulk load data into orders_2025_q2 ...
 
-- Run constraint check before attaching
ALTER TABLE orders_2025_q2 
    ADD CONSTRAINT date_check 
    CHECK (order_date >= '2025-04-01' AND order_date < '2025-07-01');
 
-- Attach as partition
ALTER TABLE orders ATTACH PARTITION orders_2025_q2
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
 
-- Automate partition creation (example procedure)
CREATE OR REPLACE PROCEDURE create_next_month_partition()
LANGUAGE plpgsql AS $$
DECLARE
    next_month DATE := DATE_TRUNC('month', NOW() + INTERVAL '1 month');
    partition_name TEXT := 'orders_' || TO_CHAR(next_month, 'YYYY_MM');
    start_date DATE := next_month;
    end_date DATE := next_month + INTERVAL '1 month';
BEGIN
    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF orders
         FOR VALUES FROM (%L) TO (%L)',
        partition_name, start_date, end_date
    );
END;
$$;
 
-- Schedule with pg_cron or external scheduler
-- SELECT cron.schedule('create_partition', '0 0 25 * *', 'CALL create_next_month_partition()');

Partition Maintenance Best Practices

Automate partition creation—never let data fail insert because partition doesn't exist. 2) Create partitions ahead of time (e.g., next month's partition on the 25th). 3) Use DETACH before DROP to allow verification. 4) Maintain separate retention policies per partition age. 5) Analyze partitions after bulk operations. 6) Monitor partition sizes for imbalance.

Summary: Partitioning

Partitioning transforms table management for large datasets, enabling performance, maintenance, and lifecycle benefits that are impossible with monolithic tables.

Key Takeaways

•Horizontal partitioning divides rows; vertical partitioning divides columns—both reduce I/O for targeted queries
•Range partitioning — Ideal for time-series and naturally ordered data; enables date-based retention
•List partitioning — Perfect for categorical data, multi-tenant isolation, geographic segmentation
•Hash partitioning — Ensures even distribution; excellent for parallel processing; no range pruning
•Composite partitioning — Combines strategies for complex access patterns; increases management complexity
•Partition pruning — The primary performance benefit; verify with EXPLAIN; avoid functions on partition keys
•Lifecycle management — Automate partition creation, leverage instant DROP for retention, use DETACH for archival
•Choose partition key wisely — Must align with query patterns; cannot be changed without rebuilding

What's next:

Page Complete

3 / 5