Window Functions Basics - Learning Module

Loading content...

0/252

PARTITION BY

Dividing Data for Independent Analysis

The PARTITION BY clause is the window function equivalent of GROUP BY, but with a crucial difference: instead of collapsing rows into summary groups, it divides them into independent analytical windows while preserving every individual row.

When you write OVER (PARTITION BY department), you're telling the database: "Run this window function separately for each department, as if each department's rows were in their own isolated table."

This partitioning capability unlocks sophisticated analytical patterns:

Computing metrics relative to group context (individual vs team average)
Ranking within categories rather than globally
Running totals that reset at natural boundaries
Finding top N items per category in a single query

Mastering PARTITION BY is essential for leveraging window functions in real-world data analysis.

What You Will Learn

By the end of this page, you will understand how PARTITION BY divides data, how to partition by multiple columns, how NULL values are handled, performance implications of partitioning strategies, and advanced patterns using partitions for complex analytical queries.

PARTITION BY Fundamentals

At its core, PARTITION BY creates logical groupings (partitions) of rows within which the window function operates independently. Each partition is processed as if it were a separate dataset.

Syntax:

OVER (PARTITION BY expression1, expression2, ...)

Key characteristics:

Rows with identical PARTITION BY values belong to the same partition
The window function "resets" for each partition
Partition membership is determined before window function execution
All rows remain in the output (unlike GROUP BY)

partition_by_basics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Sample dataset: Sales by region and product
-- | sale_id | region | product  | amount |
-- |---------|--------|----------|--------|
-- | 1       | North  | Widget   | 100    |
-- | 2       | North  | Gadget   | 150    |
-- | 3       | North  | Widget   | 200    |
-- | 4       | South  | Widget   | 120    |
-- | 5       | South  | Gadget   | 180    |
-- | 6       | East   | Widget   | 90     |
 
-- Without PARTITION BY: global calculations
SELECT 
    sale_id,
    region,
    amount,
    SUM(amount) OVER () as grand_total,           -- All rows: 840
    AVG(amount) OVER () as grand_avg              -- All rows: 140
FROM sales;
 
-- With PARTITION BY region: regional calculations
SELECT 
    sale_id,
    region,
    amount,
    SUM(amount) OVER (PARTITION BY region) as region_total,
    AVG(amount) OVER (PARTITION BY region) as region_avg
FROM sales;
 
-- Result:
-- | sale_id | region | amount | region_total | region_avg |
-- |---------|--------|--------|--------------|------------|
-- | 1       | North  | 100    | 450          | 150        |  -- North partition
-- | 2       | North  | 150    | 450          | 150        |
-- | 3       | North  | 200    | 450          | 150        |
-- | 4       | South  | 120    | 300          | 150        |  -- South partition  
-- | 5       | South  | 180    | 300          | 150        |
-- | 6       | East   | 90     | 90           | 90         |  -- East partition

Mental Model: Isolated Windows

Think of each partition as a separate window pane through which rows can see only other rows in the same partition. North region rows see only North region data through their window. South region rows see only South data through theirs. They're completely isolated from each other's calculations.

With vs Without PARTITION BY

Understanding the difference between partitioned and non-partitioned window functions is crucial for selecting the right approach for your analysis.

Without PARTITION BY

•All rows form one partition
•Aggregates computed across entire result set
•Same aggregate value for every row
•Useful for global context (% of total, deviation from overall mean)
•ROW_NUMBER returns 1 to N for all N rows

With PARTITION BY

•Rows split into multiple partitions
•Aggregates computed per partition independently
•Same value only for rows in same partition
•Useful for group context (% of department, rank within team)
•ROW_NUMBER returns 1 to M for each partition of M rows

partition_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Employee salary data
-- | emp_id | name   | dept | salary |
-- |--------|--------|------|--------|
-- | 1      | Alice  | IT   | 90000  |
-- | 2      | Bob    | IT   | 80000  |
-- | 3      | Carol  | HR   | 60000  |
-- | 4      | David  | HR   | 65000  |
 
-- Side-by-side comparison
SELECT 
    name,
    dept,
    salary,
    -- Global context
    AVG(salary) OVER () as company_avg,
    RANK() OVER (ORDER BY salary DESC) as company_rank,
    -- Department context
    AVG(salary) OVER (PARTITION BY dept) as dept_avg,
    RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank
FROM employees;
 
-- Result:
-- | name  | dept | salary | company_avg | company_rank | dept_avg | dept_rank |
-- |-------|------|--------|-------------|--------------|----------|-----------|
-- | Alice | IT   | 90000  | 73750       | 1            | 85000    | 1         |
-- | Bob   | IT   | 80000  | 73750       | 2            | 85000    | 2         |
-- | David | HR   | 65000  | 73750       | 3            | 62500    | 1         |
-- | Carol | HR   | 60000  | 73750       | 4            | 62500    | 2         |
 
-- Observations:
-- company_avg is 73750 for ALL rows (global)
-- dept_avg differs: IT=85000, HR=62500 (partition-specific)
-- company_rank: 1-4 across all employees
-- dept_rank: 1-2 within each department (resets per partition)

Common Pattern: Both Contexts Together

A powerful pattern is computing both global and partition-level metrics in the same query, as shown above. This enables analysis like 'Alice earns more than the company average (90K vs 73.75K) and is #1 in her department, but only #1 company-wide because IT salaries are higher overall.'

Multiple Partition Columns

You can partition by multiple columns to create more granular groupings. When multiple columns are specified, rows must match on ALL columns to be in the same partition.

Syntax:

OVER (PARTITION BY column1, column2, column3, ...)

Think of it as creating a composite partition key, similar to a composite primary key in a table.

multi_column_partition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Sales data with multiple dimensions
-- | sale_id | year | quarter | region | amount |
-- |---------|------|---------|--------|--------|
-- | 1       | 2023 | Q1      | North  | 1000   |
-- | 2       | 2023 | Q1      | North  | 1200   |
-- | 3       | 2023 | Q1      | South  | 800    |
-- | 4       | 2023 | Q2      | North  | 1500   |
-- | 5       | 2024 | Q1      | North  | 1100   |
-- | 6       | 2024 | Q1      | North  | 1300   |
 
-- Partition by year AND quarter AND region
SELECT 
    sale_id,
    year,
    quarter,
    region,
    amount,
    SUM(amount) OVER (PARTITION BY year, quarter, region) as period_region_total,
    AVG(amount) OVER (PARTITION BY year, quarter, region) as period_region_avg
FROM sales;
 
-- Result:
-- | sale_id | year | quarter | region | amount | period_region_total | period_region_avg |
-- |---------|------|---------|--------|--------|---------------------|-------------------|
-- | 1       | 2023 | Q1      | North  | 1000   | 2200                | 1100              | -- 2023-Q1-North
-- | 2       | 2023 | Q1      | North  | 1200   | 2200                | 1100              | -- 2023-Q1-North
-- | 3       | 2023 | Q1      | South  | 800    | 800                 | 800               | -- 2023-Q1-South
-- | 4       | 2023 | Q2      | North  | 1500   | 1500                | 1500              | -- 2023-Q2-North
-- | 5       | 2024 | Q1      | North  | 1100   | 2400                | 1200              | -- 2024-Q1-North
-- | 6       | 2024 | Q1      | North  | 1300   | 2400                | 1200              | -- 2024-Q1-North
 
-- Each unique (year, quarter, region) combination = one partition

Hierarchical analysis using different partition granularities:

A powerful technique is computing metrics at different levels of granularity in the same query:

hierarchical_partitions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Multiple granularity levels in one query
SELECT 
    year,
    quarter,
    region,
    amount,
    -- Finest granularity: year + quarter + region
    SUM(amount) OVER (PARTITION BY year, quarter, region) as qtr_region_total,
    -- Medium granularity: year + region
    SUM(amount) OVER (PARTITION BY year, region) as year_region_total,
    -- Coarse granularity: region only
    SUM(amount) OVER (PARTITION BY region) as all_time_region_total,
    -- No partition: grand total
    SUM(amount) OVER () as grand_total
FROM sales;
 
-- This gives each row context at multiple levels:
-- "This Q1 2023 North sale of $1000 is part of:
--   - Q1 2023 North total: $2200
--   - 2023 North total: $3700 (Q1 + Q2)
--   - All-time North total: $6100
--   - Grand total: $6900"

Percentage at Different Levels

This pattern is especially useful for percentage calculations: 'What percentage of quarterly regional sales does this transaction represent? What percentage of annual regional sales? What percentage of all sales ever?'

NULL Handling in PARTITION BY

NULL values in PARTITION BY columns receive special treatment that differs from standard SQL NULL semantics.

The key rule: For PARTITION BY purposes, NULL values are considered equal to each other. All rows with NULL in the partition column(s) are grouped into the same partition.

This is a notable exception to the typical SQL rule where NULL = NULL evaluates to UNKNOWN rather than TRUE.

null_partition_handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Employees with some NULL department values
-- | emp_id | name    | department | salary |
-- |--------|---------|------------|--------|
-- | 1      | Alice   | IT         | 90000  |
-- | 2      | Bob     | IT         | 85000  |
-- | 3      | Carol   | NULL       | 70000  |
-- | 4      | David   | NULL       | 65000  |
-- | 5      | Eve     | HR         | 60000  |
 
SELECT 
    name,
    department,
    salary,
    COUNT(*) OVER (PARTITION BY department) as dept_count,
    SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;
 
-- Result:
-- | name  | department | salary | dept_count | dept_total |
-- |-------|------------|--------|------------|------------|
-- | Alice | IT         | 90000  | 2          | 175000     |
-- | Bob   | IT         | 85000  | 2          | 175000     |
-- | Carol | NULL       | 70000  | 2          | 135000     |  -- NULL partition
-- | David | NULL       | 65000  | 2          | 135000     |  -- NULL partition (together!)
-- | Eve   | HR         | 60000  | 1          | 60000      |
 
-- Carol and David are in the SAME partition despite NULL department
-- This is because NULL = NULL is treated as TRUE for PARTITION BY

NULL Partition Behavior

Be aware of this behavior when your partition columns contain NULLs. If you don't want NULLs grouped together, you may need to use COALESCE or CASE to convert NULLs to distinct placeholder values, or filter them out before the window function executes.

handling_null_partitions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Option 1: Exclude NULL rows
SELECT 
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees
WHERE department IS NOT NULL;
 
-- Option 2: Replace NULL with a placeholder (each NULL becomes unique)
SELECT 
    name,
    department,
    salary,
    AVG(salary) OVER (
        PARTITION BY COALESCE(department, 'Unknown-' || emp_id::text)
    ) as dept_avg
FROM employees;
-- Each NULL department gets a unique partition
 
-- Option 3: Replace NULL with a single known value
SELECT 
    name,
    COALESCE(department, 'Unassigned') as department,
    salary,
    AVG(salary) OVER (
        PARTITION BY COALESCE(department, 'Unassigned')
    ) as dept_avg
FROM employees;
-- All NULL departments grouped as 'Unassigned' partition

Multiple columns with NULLs:

When partitioning by multiple columns, the same NULL-equals-NULL rule applies to each column independently. Rows match if all corresponding partition column values are equal (treating NULLs as equal).

multi_column_null.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Multi-column with NULLs
-- | id | region | category | value |
-- |----|--------|----------|-------|
-- | 1  | North  | A        | 100   |
-- | 2  | North  | NULL     | 200   |
-- | 3  | NULL   | NULL     | 300   |
-- | 4  | NULL   | NULL     | 400   |
 
SELECT 
    id,
    region,
    category,
    value,
    COUNT(*) OVER (PARTITION BY region, category) as partition_count
FROM data;
 
-- Result:
-- | id | region | category | value | partition_count |
-- |----|--------|----------|-------|-----------------|
-- | 1  | North  | A        | 100   | 1               | -- (North, A) partition
-- | 2  | North  | NULL     | 200   | 1               | -- (North, NULL) partition
-- | 3  | NULL   | NULL     | 300   | 2               | -- (NULL, NULL) partition
-- | 4  | NULL   | NULL     | 400   | 2               | -- (NULL, NULL) partition
 
-- Rows 3 and 4 share a partition because (NULL, NULL) = (NULL, NULL) for partitioning

Expressions in PARTITION BY

PARTITION BY is not limited to simple column references. You can use expressions—computed values, function calls, CASE statements—to define partitions based on derived criteria.

This enables powerful patterns like:

Grouping by date parts (month, year, day of week)
Creating custom categories from continuous values (salary bands)
Combining multiple columns into derived groupings

partition_expressions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Partition by year extracted from a date column
SELECT 
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY EXTRACT(YEAR FROM order_date)) as year_total,
    SUM(amount) OVER (
        PARTITION BY EXTRACT(YEAR FROM order_date), 
                     EXTRACT(MONTH FROM order_date)
    ) as month_total
FROM orders;
 
-- Partition by salary bands using CASE
SELECT 
    name,
    salary,
    CASE 
        WHEN salary < 50000 THEN 'Entry'
        WHEN salary < 80000 THEN 'Mid'
        WHEN salary < 120000 THEN 'Senior'
        ELSE 'Executive'
    END as salary_band,
    AVG(salary) OVER (
        PARTITION BY CASE 
            WHEN salary < 50000 THEN 'Entry'
            WHEN salary < 80000 THEN 'Mid'
            WHEN salary < 120000 THEN 'Senior'
            ELSE 'Executive'
        END
    ) as band_avg_salary,
    COUNT(*) OVER (
        PARTITION BY CASE 
            WHEN salary < 50000 THEN 'Entry'
            WHEN salary < 80000 THEN 'Mid'
            WHEN salary < 120000 THEN 'Senior'
            ELSE 'Executive'
        END
    ) as employees_in_band
FROM employees;

Partition by computed flags:

partition_computed_flags.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Partition by whether value exceeds threshold
SELECT 
    name,
    sales,
    sales > 10000 as is_high_performer,
    AVG(sales) OVER (PARTITION BY sales > 10000) as group_avg,
    COUNT(*) OVER (PARTITION BY sales > 10000) as group_count
FROM salespeople;
 
-- Partition by calculated quarter
SELECT 
    order_date,
    amount,
    'Q' || CEIL(EXTRACT(MONTH FROM order_date) / 3.0)::int as quarter,
    SUM(amount) OVER (
        PARTITION BY CEIL(EXTRACT(MONTH FROM order_date) / 3.0)
    ) as quarter_total
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024;
 
-- Partition by first letter of name (alphabetic grouping)
SELECT
    name,
    salary,
    LEFT(name, 1) as first_letter,
    COUNT(*) OVER (PARTITION BY LEFT(name, 1)) as names_in_letter_group
FROM employees;

Performance with Complex Expressions

When using complex expressions in PARTITION BY, the database computes them for every row. For better performance with repeated expressions (like CASE statements), consider using a CTE or subquery to compute the derived value once, then partition by the result column.

optimized_expression_partition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- More efficient: compute expression once in CTE
WITH categorized AS (
    SELECT 
        *,
        CASE 
            WHEN salary < 50000 THEN 'Entry'
            WHEN salary < 80000 THEN 'Mid'
            WHEN salary < 120000 THEN 'Senior'
            ELSE 'Executive'
        END as salary_band
    FROM employees
)
SELECT 
    name,
    salary,
    salary_band,
    AVG(salary) OVER (PARTITION BY salary_band) as band_avg,
    RANK() OVER (PARTITION BY salary_band ORDER BY salary DESC) as rank_in_band
FROM categorized;
 
-- Expression computed once per row, then reused for multiple window functions

PARTITION BY with ORDER BY

When PARTITION BY and ORDER BY are used together, they work in sequence:

PARTITION BY divides rows into partitions
ORDER BY sorts rows within each partition
The window function operates on sorted rows within each partition independently

This combination enables some of the most powerful window function patterns.

partition_order_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Pattern 1: Running totals per group
-- Sales over time, by region
SELECT 
    sale_date,
    region,
    amount,
    SUM(amount) OVER (
        PARTITION BY region 
        ORDER BY sale_date
    ) as region_ytd
FROM sales;
 
-- Result: Each region has its own running total that resets
-- North: 100 → 250 → 400 → ...
-- South: 80 → 180 → 270 → ...
 
-- Pattern 2: Ranking within groups
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY department 
        ORDER BY salary DESC
    ) as dept_salary_rank
FROM employees;
 
-- Each department has ranks 1, 2, 3, ... independently
 
-- Pattern 3: Compare to previous value within group
SELECT 
    month,
    region,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY region 
        ORDER BY month
    ) as prev_month_revenue,
    revenue - LAG(revenue) OVER (
        PARTITION BY region 
        ORDER BY month
    ) as month_over_month_change
FROM monthly_revenue;
 
-- LAG looks at previous row WITHIN THE SAME REGION only

The "Top N per Group" pattern:

One of the most common and powerful uses of PARTITION BY + ORDER BY is selecting the top N rows from each group:

top_n_per_group.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Find top 3 highest-paid employees per department
WITH ranked AS (
    SELECT 
        name,
        department,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 3;
 
-- Result shows only the top 3 earners from each department
 
-- Variation: Use RANK() if you want ties included
WITH ranked AS (
    SELECT 
        name,
        department,
        salary,
        RANK() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as rank
    FROM employees
)
SELECT name, department, salary, rank
FROM ranked
WHERE rank <= 3;
 
-- If 3 people tie for rank 1, all 3 are included (ranks 1,1,1 then 4...)

Frame Default Changes with ORDER BY

Remember: When ORDER BY is present in OVER, the default frame changes from 'all rows in partition' to 'rows from start of partition up to current row'. This is why SUM() OVER (PARTITION BY x ORDER BY y) gives running totals, not partition totals.

Performance Considerations

While window functions are generally efficient, PARTITION BY choices can significantly impact query performance. Understanding these considerations helps you write queries that perform well at scale.

Key Performance Factors

•Number of partitions — Many small partitions can be more memory-intensive than few large partitions, as partition metadata and boundaries must be tracked.
•Partition size distribution — Highly skewed partitions (one very large, many tiny) can cause imbalanced processing.
•Expression complexity — Complex PARTITION BY expressions (like CASE or function calls) are evaluated for every row.
•Multiple window functions — Queries with multiple window functions sharing the same PARTITION BY can often be optimized together.
•Index availability — If the PARTITION BY column is indexed, the database may be able to process partitions more efficiently.

performance_tips.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- TIP 1: Reuse the same PARTITION BY specification
-- BAD: Different specs force separate computation passes
SELECT 
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg,
    SUM(salary) OVER (PARTITION BY LOWER(department)) as dept_sum  -- Different!
FROM employees;
 
-- GOOD: Same spec allows single-pass optimization
SELECT 
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg,
    SUM(salary) OVER (PARTITION BY department) as dept_sum
FROM employees;
 
-- TIP 2: Use named window for consistency and clarity
SELECT 
    department,
    salary,
    AVG(salary) OVER dept_window as dept_avg,
    SUM(salary) OVER dept_window as dept_sum,
    COUNT(*) OVER dept_window as dept_count
FROM employees
WINDOW dept_window AS (PARTITION BY department);
 
-- TIP 3: Pre-compute complex partition keys
-- Instead of partitioning by complex expression multiple times:
WITH prepped AS (
    SELECT 
        *,
        CASE WHEN amount > 1000 THEN 'Large' ELSE 'Small' END as size_category
    FROM orders
)
SELECT 
    *,
    COUNT(*) OVER (PARTITION BY size_category) as category_count,
    AVG(amount) OVER (PARTITION BY size_category) as category_avg
FROM prepped;

Partition Strategy Trade-offs
Scenario	Impact	Mitigation
High cardinality partition key (e.g., user_id)	Many small partitions; high memory for partition tracking	Filter data first; consider sampling for analytics
Low cardinality partition key (e.g., status)	Few large partitions; single partitions may be massive	Often fine; watch for memory on very large tables
Skewed distribution (one huge partition)	Processing imbalance; one partition dominates runtime	Consider bucketing or filtering outliers
Complex expression in PARTITION BY	Expression evaluated per row	Pre-compute in CTE; use indexed computed columns

Optimizer Awareness

Modern database optimizers are sophisticated about window function execution. They may automatically combine multiple window functions with the same PARTITION BY and ORDER BY into a single pass. Check your database's EXPLAIN output to see how window functions are actually executed.

Summary: PARTITION BY

We've explored PARTITION BY comprehensively. Let's consolidate the key takeaways:

Key Takeaways

•PARTITION BY divides rows into independent windows — Each partition is processed separately, with window functions resetting at partition boundaries.
•Without PARTITION BY, all rows form one partition — Aggregates are computed across the entire result set, giving every row the same value.
•Multiple columns create composite partitions — Rows must match on ALL partition columns to share a partition.
•NULL values are grouped together — Unlike normal SQL where NULL ≠ NULL, PARTITION BY treats NULLs as equal for grouping.
•Expressions are valid partition keys — Use CASE, functions, and computed values to create custom partitioning logic.
•PARTITION BY + ORDER BY enables powerful patterns — Running totals per group, ranking within categories, top-N per group.
•Performance matters at scale — Reuse partition specifications, pre-compute complex expressions, and be mindful of partition cardinality.

What's next:

Now that we've mastered partitioning data into windows, we'll explore ORDER BY in windows—understanding how row ordering within partitions affects calculations, enables running computations, and makes ranking functions meaningful.

Page Complete

You now have a comprehensive understanding of PARTITION BY—the mechanism that divides data into independent analytical windows. You can partition by columns, expressions, and multiple keys while understanding NULL handling and performance implications. Next, we'll dive deep into ORDER BY in windows.

PARTITION BY

Dividing Data for Independent Analysis

When you write OVER (PARTITION BY department), you're telling the database: "Run this window function separately for each department, as if each department's rows were in their own isolated table."

This partitioning capability unlocks sophisticated analytical patterns:

Computing metrics relative to group context (individual vs team average)
Ranking within categories rather than globally
Running totals that reset at natural boundaries
Finding top N items per category in a single query

Mastering PARTITION BY is essential for leveraging window functions in real-world data analysis.

What You Will Learn

PARTITION BY Fundamentals

At its core, PARTITION BY creates logical groupings (partitions) of rows within which the window function operates independently. Each partition is processed as if it were a separate dataset.

Syntax:

OVER (PARTITION BY expression1, expression2, ...)

Key characteristics:

Rows with identical PARTITION BY values belong to the same partition
The window function "resets" for each partition
Partition membership is determined before window function execution
All rows remain in the output (unlike GROUP BY)

partition_by_basics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Sample dataset: Sales by region and product
-- | sale_id | region | product  | amount |
-- |---------|--------|----------|--------|
-- | 1       | North  | Widget   | 100    |
-- | 2       | North  | Gadget   | 150    |
-- | 3       | North  | Widget   | 200    |
-- | 4       | South  | Widget   | 120    |
-- | 5       | South  | Gadget   | 180    |
-- | 6       | East   | Widget   | 90     |
 
-- Without PARTITION BY: global calculations
SELECT 
    sale_id,
    region,
    amount,
    SUM(amount) OVER () as grand_total,           -- All rows: 840
    AVG(amount) OVER () as grand_avg              -- All rows: 140
FROM sales;
 
-- With PARTITION BY region: regional calculations
SELECT 
    sale_id,
    region,
    amount,
    SUM(amount) OVER (PARTITION BY region) as region_total,
    AVG(amount) OVER (PARTITION BY region) as region_avg
FROM sales;
 
-- Result:
-- | sale_id | region | amount | region_total | region_avg |
-- |---------|--------|--------|--------------|------------|
-- | 1       | North  | 100    | 450          | 150        |  -- North partition
-- | 2       | North  | 150    | 450          | 150        |
-- | 3       | North  | 200    | 450          | 150        |
-- | 4       | South  | 120    | 300          | 150        |  -- South partition  
-- | 5       | South  | 180    | 300          | 150        |
-- | 6       | East   | 90     | 90           | 90         |  -- East partition

Mental Model: Isolated Windows

With vs Without PARTITION BY

Understanding the difference between partitioned and non-partitioned window functions is crucial for selecting the right approach for your analysis.

Without PARTITION BY

•All rows form one partition
•Aggregates computed across entire result set
•Same aggregate value for every row
•Useful for global context (% of total, deviation from overall mean)
•ROW_NUMBER returns 1 to N for all N rows

With PARTITION BY

•Rows split into multiple partitions
•Aggregates computed per partition independently
•Same value only for rows in same partition
•Useful for group context (% of department, rank within team)
•ROW_NUMBER returns 1 to M for each partition of M rows

partition_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Employee salary data
-- | emp_id | name   | dept | salary |
-- |--------|--------|------|--------|
-- | 1      | Alice  | IT   | 90000  |
-- | 2      | Bob    | IT   | 80000  |
-- | 3      | Carol  | HR   | 60000  |
-- | 4      | David  | HR   | 65000  |
 
-- Side-by-side comparison
SELECT 
    name,
    dept,
    salary,
    -- Global context
    AVG(salary) OVER () as company_avg,
    RANK() OVER (ORDER BY salary DESC) as company_rank,
    -- Department context
    AVG(salary) OVER (PARTITION BY dept) as dept_avg,
    RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rank
FROM employees;
 
-- Result:
-- | name  | dept | salary | company_avg | company_rank | dept_avg | dept_rank |
-- |-------|------|--------|-------------|--------------|----------|-----------|
-- | Alice | IT   | 90000  | 73750       | 1            | 85000    | 1         |
-- | Bob   | IT   | 80000  | 73750       | 2            | 85000    | 2         |
-- | David | HR   | 65000  | 73750       | 3            | 62500    | 1         |
-- | Carol | HR   | 60000  | 73750       | 4            | 62500    | 2         |
 
-- Observations:
-- company_avg is 73750 for ALL rows (global)
-- dept_avg differs: IT=85000, HR=62500 (partition-specific)
-- company_rank: 1-4 across all employees
-- dept_rank: 1-2 within each department (resets per partition)

Common Pattern: Both Contexts Together

Multiple Partition Columns

You can partition by multiple columns to create more granular groupings. When multiple columns are specified, rows must match on ALL columns to be in the same partition.

Syntax:

OVER (PARTITION BY column1, column2, column3, ...)

Think of it as creating a composite partition key, similar to a composite primary key in a table.

multi_column_partition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Sales data with multiple dimensions
-- | sale_id | year | quarter | region | amount |
-- |---------|------|---------|--------|--------|
-- | 1       | 2023 | Q1      | North  | 1000   |
-- | 2       | 2023 | Q1      | North  | 1200   |
-- | 3       | 2023 | Q1      | South  | 800    |
-- | 4       | 2023 | Q2      | North  | 1500   |
-- | 5       | 2024 | Q1      | North  | 1100   |
-- | 6       | 2024 | Q1      | North  | 1300   |
 
-- Partition by year AND quarter AND region
SELECT 
    sale_id,
    year,
    quarter,
    region,
    amount,
    SUM(amount) OVER (PARTITION BY year, quarter, region) as period_region_total,
    AVG(amount) OVER (PARTITION BY year, quarter, region) as period_region_avg
FROM sales;
 
-- Result:
-- | sale_id | year | quarter | region | amount | period_region_total | period_region_avg |
-- |---------|------|---------|--------|--------|---------------------|-------------------|
-- | 1       | 2023 | Q1      | North  | 1000   | 2200                | 1100              | -- 2023-Q1-North
-- | 2       | 2023 | Q1      | North  | 1200   | 2200                | 1100              | -- 2023-Q1-North
-- | 3       | 2023 | Q1      | South  | 800    | 800                 | 800               | -- 2023-Q1-South
-- | 4       | 2023 | Q2      | North  | 1500   | 1500                | 1500              | -- 2023-Q2-North
-- | 5       | 2024 | Q1      | North  | 1100   | 2400                | 1200              | -- 2024-Q1-North
-- | 6       | 2024 | Q1      | North  | 1300   | 2400                | 1200              | -- 2024-Q1-North
 
-- Each unique (year, quarter, region) combination = one partition

Hierarchical analysis using different partition granularities:

A powerful technique is computing metrics at different levels of granularity in the same query:

hierarchical_partitions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Multiple granularity levels in one query
SELECT 
    year,
    quarter,
    region,
    amount,
    -- Finest granularity: year + quarter + region
    SUM(amount) OVER (PARTITION BY year, quarter, region) as qtr_region_total,
    -- Medium granularity: year + region
    SUM(amount) OVER (PARTITION BY year, region) as year_region_total,
    -- Coarse granularity: region only
    SUM(amount) OVER (PARTITION BY region) as all_time_region_total,
    -- No partition: grand total
    SUM(amount) OVER () as grand_total
FROM sales;
 
-- This gives each row context at multiple levels:
-- "This Q1 2023 North sale of $1000 is part of:
--   - Q1 2023 North total: $2200
--   - 2023 North total: $3700 (Q1 + Q2)
--   - All-time North total: $6100
--   - Grand total: $6900"

Percentage at Different Levels

NULL Handling in PARTITION BY

NULL values in PARTITION BY columns receive special treatment that differs from standard SQL NULL semantics.

The key rule: For PARTITION BY purposes, NULL values are considered equal to each other. All rows with NULL in the partition column(s) are grouped into the same partition.

This is a notable exception to the typical SQL rule where NULL = NULL evaluates to UNKNOWN rather than TRUE.

null_partition_handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Employees with some NULL department values
-- | emp_id | name    | department | salary |
-- |--------|---------|------------|--------|
-- | 1      | Alice   | IT         | 90000  |
-- | 2      | Bob     | IT         | 85000  |
-- | 3      | Carol   | NULL       | 70000  |
-- | 4      | David   | NULL       | 65000  |
-- | 5      | Eve     | HR         | 60000  |
 
SELECT 
    name,
    department,
    salary,
    COUNT(*) OVER (PARTITION BY department) as dept_count,
    SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;
 
-- Result:
-- | name  | department | salary | dept_count | dept_total |
-- |-------|------------|--------|------------|------------|
-- | Alice | IT         | 90000  | 2          | 175000     |
-- | Bob   | IT         | 85000  | 2          | 175000     |
-- | Carol | NULL       | 70000  | 2          | 135000     |  -- NULL partition
-- | David | NULL       | 65000  | 2          | 135000     |  -- NULL partition (together!)
-- | Eve   | HR         | 60000  | 1          | 60000      |
 
-- Carol and David are in the SAME partition despite NULL department
-- This is because NULL = NULL is treated as TRUE for PARTITION BY

NULL Partition Behavior

handling_null_partitions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Option 1: Exclude NULL rows
SELECT 
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees
WHERE department IS NOT NULL;
 
-- Option 2: Replace NULL with a placeholder (each NULL becomes unique)
SELECT 
    name,
    department,
    salary,
    AVG(salary) OVER (
        PARTITION BY COALESCE(department, 'Unknown-' || emp_id::text)
    ) as dept_avg
FROM employees;
-- Each NULL department gets a unique partition
 
-- Option 3: Replace NULL with a single known value
SELECT 
    name,
    COALESCE(department, 'Unassigned') as department,
    salary,
    AVG(salary) OVER (
        PARTITION BY COALESCE(department, 'Unassigned')
    ) as dept_avg
FROM employees;
-- All NULL departments grouped as 'Unassigned' partition

Multiple columns with NULLs:

multi_column_null.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Multi-column with NULLs
-- | id | region | category | value |
-- |----|--------|----------|-------|
-- | 1  | North  | A        | 100   |
-- | 2  | North  | NULL     | 200   |
-- | 3  | NULL   | NULL     | 300   |
-- | 4  | NULL   | NULL     | 400   |
 
SELECT 
    id,
    region,
    category,
    value,
    COUNT(*) OVER (PARTITION BY region, category) as partition_count
FROM data;
 
-- Result:
-- | id | region | category | value | partition_count |
-- |----|--------|----------|-------|-----------------|
-- | 1  | North  | A        | 100   | 1               | -- (North, A) partition
-- | 2  | North  | NULL     | 200   | 1               | -- (North, NULL) partition
-- | 3  | NULL   | NULL     | 300   | 2               | -- (NULL, NULL) partition
-- | 4  | NULL   | NULL     | 400   | 2               | -- (NULL, NULL) partition
 
-- Rows 3 and 4 share a partition because (NULL, NULL) = (NULL, NULL) for partitioning

Expressions in PARTITION BY

PARTITION BY is not limited to simple column references. You can use expressions—computed values, function calls, CASE statements—to define partitions based on derived criteria.

This enables powerful patterns like:

Grouping by date parts (month, year, day of week)
Creating custom categories from continuous values (salary bands)
Combining multiple columns into derived groupings

partition_expressions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Partition by year extracted from a date column
SELECT 
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY EXTRACT(YEAR FROM order_date)) as year_total,
    SUM(amount) OVER (
        PARTITION BY EXTRACT(YEAR FROM order_date), 
                     EXTRACT(MONTH FROM order_date)
    ) as month_total
FROM orders;
 
-- Partition by salary bands using CASE
SELECT 
    name,
    salary,
    CASE 
        WHEN salary < 50000 THEN 'Entry'
        WHEN salary < 80000 THEN 'Mid'
        WHEN salary < 120000 THEN 'Senior'
        ELSE 'Executive'
    END as salary_band,
    AVG(salary) OVER (
        PARTITION BY CASE 
            WHEN salary < 50000 THEN 'Entry'
            WHEN salary < 80000 THEN 'Mid'
            WHEN salary < 120000 THEN 'Senior'
            ELSE 'Executive'
        END
    ) as band_avg_salary,
    COUNT(*) OVER (
        PARTITION BY CASE 
            WHEN salary < 50000 THEN 'Entry'
            WHEN salary < 80000 THEN 'Mid'
            WHEN salary < 120000 THEN 'Senior'
            ELSE 'Executive'
        END
    ) as employees_in_band
FROM employees;

Partition by computed flags:

partition_computed_flags.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Partition by whether value exceeds threshold
SELECT 
    name,
    sales,
    sales > 10000 as is_high_performer,
    AVG(sales) OVER (PARTITION BY sales > 10000) as group_avg,
    COUNT(*) OVER (PARTITION BY sales > 10000) as group_count
FROM salespeople;
 
-- Partition by calculated quarter
SELECT 
    order_date,
    amount,
    'Q' || CEIL(EXTRACT(MONTH FROM order_date) / 3.0)::int as quarter,
    SUM(amount) OVER (
        PARTITION BY CEIL(EXTRACT(MONTH FROM order_date) / 3.0)
    ) as quarter_total
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024;
 
-- Partition by first letter of name (alphabetic grouping)
SELECT
    name,
    salary,
    LEFT(name, 1) as first_letter,
    COUNT(*) OVER (PARTITION BY LEFT(name, 1)) as names_in_letter_group
FROM employees;

Performance with Complex Expressions

optimized_expression_partition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- More efficient: compute expression once in CTE
WITH categorized AS (
    SELECT 
        *,
        CASE 
            WHEN salary < 50000 THEN 'Entry'
            WHEN salary < 80000 THEN 'Mid'
            WHEN salary < 120000 THEN 'Senior'
            ELSE 'Executive'
        END as salary_band
    FROM employees
)
SELECT 
    name,
    salary,
    salary_band,
    AVG(salary) OVER (PARTITION BY salary_band) as band_avg,
    RANK() OVER (PARTITION BY salary_band ORDER BY salary DESC) as rank_in_band
FROM categorized;
 
-- Expression computed once per row, then reused for multiple window functions

PARTITION BY with ORDER BY

When PARTITION BY and ORDER BY are used together, they work in sequence:

PARTITION BY divides rows into partitions
ORDER BY sorts rows within each partition
The window function operates on sorted rows within each partition independently

This combination enables some of the most powerful window function patterns.

partition_order_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Pattern 1: Running totals per group
-- Sales over time, by region
SELECT 
    sale_date,
    region,
    amount,
    SUM(amount) OVER (
        PARTITION BY region 
        ORDER BY sale_date
    ) as region_ytd
FROM sales;
 
-- Result: Each region has its own running total that resets
-- North: 100 → 250 → 400 → ...
-- South: 80 → 180 → 270 → ...
 
-- Pattern 2: Ranking within groups
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY department 
        ORDER BY salary DESC
    ) as dept_salary_rank
FROM employees;
 
-- Each department has ranks 1, 2, 3, ... independently
 
-- Pattern 3: Compare to previous value within group
SELECT 
    month,
    region,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY region 
        ORDER BY month
    ) as prev_month_revenue,
    revenue - LAG(revenue) OVER (
        PARTITION BY region 
        ORDER BY month
    ) as month_over_month_change
FROM monthly_revenue;
 
-- LAG looks at previous row WITHIN THE SAME REGION only

The "Top N per Group" pattern:

One of the most common and powerful uses of PARTITION BY + ORDER BY is selecting the top N rows from each group:

top_n_per_group.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Find top 3 highest-paid employees per department
WITH ranked AS (
    SELECT 
        name,
        department,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as rn
    FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rn <= 3;
 
-- Result shows only the top 3 earners from each department
 
-- Variation: Use RANK() if you want ties included
WITH ranked AS (
    SELECT 
        name,
        department,
        salary,
        RANK() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as rank
    FROM employees
)
SELECT name, department, salary, rank
FROM ranked
WHERE rank <= 3;
 
-- If 3 people tie for rank 1, all 3 are included (ranks 1,1,1 then 4...)

Frame Default Changes with ORDER BY

Performance Considerations

While window functions are generally efficient, PARTITION BY choices can significantly impact query performance. Understanding these considerations helps you write queries that perform well at scale.

Key Performance Factors

•Number of partitions — Many small partitions can be more memory-intensive than few large partitions, as partition metadata and boundaries must be tracked.
•Partition size distribution — Highly skewed partitions (one very large, many tiny) can cause imbalanced processing.
•Expression complexity — Complex PARTITION BY expressions (like CASE or function calls) are evaluated for every row.
•Multiple window functions — Queries with multiple window functions sharing the same PARTITION BY can often be optimized together.
•Index availability — If the PARTITION BY column is indexed, the database may be able to process partitions more efficiently.

performance_tips.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- TIP 1: Reuse the same PARTITION BY specification
-- BAD: Different specs force separate computation passes
SELECT 
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg,
    SUM(salary) OVER (PARTITION BY LOWER(department)) as dept_sum  -- Different!
FROM employees;
 
-- GOOD: Same spec allows single-pass optimization
SELECT 
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg,
    SUM(salary) OVER (PARTITION BY department) as dept_sum
FROM employees;
 
-- TIP 2: Use named window for consistency and clarity
SELECT 
    department,
    salary,
    AVG(salary) OVER dept_window as dept_avg,
    SUM(salary) OVER dept_window as dept_sum,
    COUNT(*) OVER dept_window as dept_count
FROM employees
WINDOW dept_window AS (PARTITION BY department);
 
-- TIP 3: Pre-compute complex partition keys
-- Instead of partitioning by complex expression multiple times:
WITH prepped AS (
    SELECT 
        *,
        CASE WHEN amount > 1000 THEN 'Large' ELSE 'Small' END as size_category
    FROM orders
)
SELECT 
    *,
    COUNT(*) OVER (PARTITION BY size_category) as category_count,
    AVG(amount) OVER (PARTITION BY size_category) as category_avg
FROM prepped;

Partition Strategy Trade-offs
Scenario	Impact	Mitigation
High cardinality partition key (e.g., user_id)	Many small partitions; high memory for partition tracking	Filter data first; consider sampling for analytics
Low cardinality partition key (e.g., status)	Few large partitions; single partitions may be massive	Often fine; watch for memory on very large tables
Skewed distribution (one huge partition)	Processing imbalance; one partition dominates runtime	Consider bucketing or filtering outliers
Complex expression in PARTITION BY	Expression evaluated per row	Pre-compute in CTE; use indexed computed columns

Optimizer Awareness

Summary: PARTITION BY

We've explored PARTITION BY comprehensively. Let's consolidate the key takeaways:

Key Takeaways

•PARTITION BY divides rows into independent windows — Each partition is processed separately, with window functions resetting at partition boundaries.
•Without PARTITION BY, all rows form one partition — Aggregates are computed across the entire result set, giving every row the same value.
•Multiple columns create composite partitions — Rows must match on ALL partition columns to share a partition.
•NULL values are grouped together — Unlike normal SQL where NULL ≠ NULL, PARTITION BY treats NULLs as equal for grouping.
•Expressions are valid partition keys — Use CASE, functions, and computed values to create custom partitioning logic.
•PARTITION BY + ORDER BY enables powerful patterns — Running totals per group, ranking within categories, top-N per group.
•Performance matters at scale — Reuse partition specifications, pre-compute complex expressions, and be mindful of partition cardinality.

What's next:

Page Complete