Loading content...
The PARTITION BY clause is the window function equivalent of GROUP BY, but with a crucial difference: instead of collapsing rows into summary groups, it divides them into independent analytical windows while preserving every individual row.
When you write OVER (PARTITION BY department), you're telling the database: "Run this window function separately for each department, as if each department's rows were in their own isolated table."
This partitioning capability unlocks sophisticated analytical patterns:
Mastering PARTITION BY is essential for leveraging window functions in real-world data analysis.
By the end of this page, you will understand how PARTITION BY divides data, how to partition by multiple columns, how NULL values are handled, performance implications of partitioning strategies, and advanced patterns using partitions for complex analytical queries.
At its core, PARTITION BY creates logical groupings (partitions) of rows within which the window function operates independently. Each partition is processed as if it were a separate dataset.
Syntax:
OVER (PARTITION BY expression1, expression2, ...)
Key characteristics:
12345678910111213141516171819202122232425262728293031323334353637
-- Sample dataset: Sales by region and product-- | sale_id | region | product | amount |-- |---------|--------|----------|--------|-- | 1 | North | Widget | 100 |-- | 2 | North | Gadget | 150 |-- | 3 | North | Widget | 200 |-- | 4 | South | Widget | 120 |-- | 5 | South | Gadget | 180 |-- | 6 | East | Widget | 90 | -- Without PARTITION BY: global calculationsSELECT sale_id, region, amount, SUM(amount) OVER () as grand_total, -- All rows: 840 AVG(amount) OVER () as grand_avg -- All rows: 140FROM sales; -- With PARTITION BY region: regional calculationsSELECT sale_id, region, amount, SUM(amount) OVER (PARTITION BY region) as region_total, AVG(amount) OVER (PARTITION BY region) as region_avgFROM sales; -- Result:-- | sale_id | region | amount | region_total | region_avg |-- |---------|--------|--------|--------------|------------|-- | 1 | North | 100 | 450 | 150 | -- North partition-- | 2 | North | 150 | 450 | 150 |-- | 3 | North | 200 | 450 | 150 |-- | 4 | South | 120 | 300 | 150 | -- South partition -- | 5 | South | 180 | 300 | 150 |-- | 6 | East | 90 | 90 | 90 | -- East partitionThink of each partition as a separate window pane through which rows can see only other rows in the same partition. North region rows see only North region data through their window. South region rows see only South data through theirs. They're completely isolated from each other's calculations.
Understanding the difference between partitioned and non-partitioned window functions is crucial for selecting the right approach for your analysis.
12345678910111213141516171819202122232425262728293031323334
-- Employee salary data-- | emp_id | name | dept | salary |-- |--------|--------|------|--------|-- | 1 | Alice | IT | 90000 |-- | 2 | Bob | IT | 80000 |-- | 3 | Carol | HR | 60000 |-- | 4 | David | HR | 65000 | -- Side-by-side comparisonSELECT name, dept, salary, -- Global context AVG(salary) OVER () as company_avg, RANK() OVER (ORDER BY salary DESC) as company_rank, -- Department context AVG(salary) OVER (PARTITION BY dept) as dept_avg, RANK() OVER (PARTITION BY dept ORDER BY salary DESC) as dept_rankFROM employees; -- Result:-- | name | dept | salary | company_avg | company_rank | dept_avg | dept_rank |-- |-------|------|--------|-------------|--------------|----------|-----------|-- | Alice | IT | 90000 | 73750 | 1 | 85000 | 1 |-- | Bob | IT | 80000 | 73750 | 2 | 85000 | 2 |-- | David | HR | 65000 | 73750 | 3 | 62500 | 1 |-- | Carol | HR | 60000 | 73750 | 4 | 62500 | 2 | -- Observations:-- company_avg is 73750 for ALL rows (global)-- dept_avg differs: IT=85000, HR=62500 (partition-specific)-- company_rank: 1-4 across all employees-- dept_rank: 1-2 within each department (resets per partition)A powerful pattern is computing both global and partition-level metrics in the same query, as shown above. This enables analysis like 'Alice earns more than the company average (90K vs 73.75K) and is #1 in her department, but only #1 company-wide because IT salaries are higher overall.'
You can partition by multiple columns to create more granular groupings. When multiple columns are specified, rows must match on ALL columns to be in the same partition.
Syntax:
OVER (PARTITION BY column1, column2, column3, ...)
Think of it as creating a composite partition key, similar to a composite primary key in a table.
1234567891011121314151617181920212223242526272829303132
-- Sales data with multiple dimensions-- | sale_id | year | quarter | region | amount |-- |---------|------|---------|--------|--------|-- | 1 | 2023 | Q1 | North | 1000 |-- | 2 | 2023 | Q1 | North | 1200 |-- | 3 | 2023 | Q1 | South | 800 |-- | 4 | 2023 | Q2 | North | 1500 |-- | 5 | 2024 | Q1 | North | 1100 |-- | 6 | 2024 | Q1 | North | 1300 | -- Partition by year AND quarter AND regionSELECT sale_id, year, quarter, region, amount, SUM(amount) OVER (PARTITION BY year, quarter, region) as period_region_total, AVG(amount) OVER (PARTITION BY year, quarter, region) as period_region_avgFROM sales; -- Result:-- | sale_id | year | quarter | region | amount | period_region_total | period_region_avg |-- |---------|------|---------|--------|--------|---------------------|-------------------|-- | 1 | 2023 | Q1 | North | 1000 | 2200 | 1100 | -- 2023-Q1-North-- | 2 | 2023 | Q1 | North | 1200 | 2200 | 1100 | -- 2023-Q1-North-- | 3 | 2023 | Q1 | South | 800 | 800 | 800 | -- 2023-Q1-South-- | 4 | 2023 | Q2 | North | 1500 | 1500 | 1500 | -- 2023-Q2-North-- | 5 | 2024 | Q1 | North | 1100 | 2400 | 1200 | -- 2024-Q1-North-- | 6 | 2024 | Q1 | North | 1300 | 2400 | 1200 | -- 2024-Q1-North -- Each unique (year, quarter, region) combination = one partitionHierarchical analysis using different partition granularities:
A powerful technique is computing metrics at different levels of granularity in the same query:
12345678910111213141516171819202122
-- Multiple granularity levels in one querySELECT year, quarter, region, amount, -- Finest granularity: year + quarter + region SUM(amount) OVER (PARTITION BY year, quarter, region) as qtr_region_total, -- Medium granularity: year + region SUM(amount) OVER (PARTITION BY year, region) as year_region_total, -- Coarse granularity: region only SUM(amount) OVER (PARTITION BY region) as all_time_region_total, -- No partition: grand total SUM(amount) OVER () as grand_totalFROM sales; -- This gives each row context at multiple levels:-- "This Q1 2023 North sale of $1000 is part of:-- - Q1 2023 North total: $2200-- - 2023 North total: $3700 (Q1 + Q2)-- - All-time North total: $6100-- - Grand total: $6900"This pattern is especially useful for percentage calculations: 'What percentage of quarterly regional sales does this transaction represent? What percentage of annual regional sales? What percentage of all sales ever?'
NULL values in PARTITION BY columns receive special treatment that differs from standard SQL NULL semantics.
The key rule: For PARTITION BY purposes, NULL values are considered equal to each other. All rows with NULL in the partition column(s) are grouped into the same partition.
This is a notable exception to the typical SQL rule where NULL = NULL evaluates to UNKNOWN rather than TRUE.
12345678910111213141516171819202122232425262728
-- Employees with some NULL department values-- | emp_id | name | department | salary |-- |--------|---------|------------|--------|-- | 1 | Alice | IT | 90000 |-- | 2 | Bob | IT | 85000 |-- | 3 | Carol | NULL | 70000 |-- | 4 | David | NULL | 65000 |-- | 5 | Eve | HR | 60000 | SELECT name, department, salary, COUNT(*) OVER (PARTITION BY department) as dept_count, SUM(salary) OVER (PARTITION BY department) as dept_totalFROM employees; -- Result:-- | name | department | salary | dept_count | dept_total |-- |-------|------------|--------|------------|------------|-- | Alice | IT | 90000 | 2 | 175000 |-- | Bob | IT | 85000 | 2 | 175000 |-- | Carol | NULL | 70000 | 2 | 135000 | -- NULL partition-- | David | NULL | 65000 | 2 | 135000 | -- NULL partition (together!)-- | Eve | HR | 60000 | 1 | 60000 | -- Carol and David are in the SAME partition despite NULL department-- This is because NULL = NULL is treated as TRUE for PARTITION BYBe aware of this behavior when your partition columns contain NULLs. If you don't want NULLs grouped together, you may need to use COALESCE or CASE to convert NULLs to distinct placeholder values, or filter them out before the window function executes.
123456789101112131415161718192021222324252627282930
-- Option 1: Exclude NULL rowsSELECT name, department, salary, AVG(salary) OVER (PARTITION BY department) as dept_avgFROM employeesWHERE department IS NOT NULL; -- Option 2: Replace NULL with a placeholder (each NULL becomes unique)SELECT name, department, salary, AVG(salary) OVER ( PARTITION BY COALESCE(department, 'Unknown-' || emp_id::text) ) as dept_avgFROM employees;-- Each NULL department gets a unique partition -- Option 3: Replace NULL with a single known valueSELECT name, COALESCE(department, 'Unassigned') as department, salary, AVG(salary) OVER ( PARTITION BY COALESCE(department, 'Unassigned') ) as dept_avgFROM employees;-- All NULL departments grouped as 'Unassigned' partitionMultiple columns with NULLs:
When partitioning by multiple columns, the same NULL-equals-NULL rule applies to each column independently. Rows match if all corresponding partition column values are equal (treating NULLs as equal).
12345678910111213141516171819202122232425
-- Multi-column with NULLs-- | id | region | category | value |-- |----|--------|----------|-------|-- | 1 | North | A | 100 |-- | 2 | North | NULL | 200 |-- | 3 | NULL | NULL | 300 |-- | 4 | NULL | NULL | 400 | SELECT id, region, category, value, COUNT(*) OVER (PARTITION BY region, category) as partition_countFROM data; -- Result:-- | id | region | category | value | partition_count |-- |----|--------|----------|-------|-----------------|-- | 1 | North | A | 100 | 1 | -- (North, A) partition-- | 2 | North | NULL | 200 | 1 | -- (North, NULL) partition-- | 3 | NULL | NULL | 300 | 2 | -- (NULL, NULL) partition-- | 4 | NULL | NULL | 400 | 2 | -- (NULL, NULL) partition -- Rows 3 and 4 share a partition because (NULL, NULL) = (NULL, NULL) for partitioningPARTITION BY is not limited to simple column references. You can use expressions—computed values, function calls, CASE statements—to define partitions based on derived criteria.
This enables powerful patterns like:
1234567891011121314151617181920212223242526272829303132333435363738
-- Partition by year extracted from a date columnSELECT order_date, amount, SUM(amount) OVER (PARTITION BY EXTRACT(YEAR FROM order_date)) as year_total, SUM(amount) OVER ( PARTITION BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date) ) as month_totalFROM orders; -- Partition by salary bands using CASESELECT name, salary, CASE WHEN salary < 50000 THEN 'Entry' WHEN salary < 80000 THEN 'Mid' WHEN salary < 120000 THEN 'Senior' ELSE 'Executive' END as salary_band, AVG(salary) OVER ( PARTITION BY CASE WHEN salary < 50000 THEN 'Entry' WHEN salary < 80000 THEN 'Mid' WHEN salary < 120000 THEN 'Senior' ELSE 'Executive' END ) as band_avg_salary, COUNT(*) OVER ( PARTITION BY CASE WHEN salary < 50000 THEN 'Entry' WHEN salary < 80000 THEN 'Mid' WHEN salary < 120000 THEN 'Senior' ELSE 'Executive' END ) as employees_in_bandFROM employees;Partition by computed flags:
123456789101112131415161718192021222324252627
-- Partition by whether value exceeds thresholdSELECT name, sales, sales > 10000 as is_high_performer, AVG(sales) OVER (PARTITION BY sales > 10000) as group_avg, COUNT(*) OVER (PARTITION BY sales > 10000) as group_countFROM salespeople; -- Partition by calculated quarterSELECT order_date, amount, 'Q' || CEIL(EXTRACT(MONTH FROM order_date) / 3.0)::int as quarter, SUM(amount) OVER ( PARTITION BY CEIL(EXTRACT(MONTH FROM order_date) / 3.0) ) as quarter_totalFROM ordersWHERE EXTRACT(YEAR FROM order_date) = 2024; -- Partition by first letter of name (alphabetic grouping)SELECT name, salary, LEFT(name, 1) as first_letter, COUNT(*) OVER (PARTITION BY LEFT(name, 1)) as names_in_letter_groupFROM employees;When using complex expressions in PARTITION BY, the database computes them for every row. For better performance with repeated expressions (like CASE statements), consider using a CTE or subquery to compute the derived value once, then partition by the result column.
123456789101112131415161718192021
-- More efficient: compute expression once in CTEWITH categorized AS ( SELECT *, CASE WHEN salary < 50000 THEN 'Entry' WHEN salary < 80000 THEN 'Mid' WHEN salary < 120000 THEN 'Senior' ELSE 'Executive' END as salary_band FROM employees)SELECT name, salary, salary_band, AVG(salary) OVER (PARTITION BY salary_band) as band_avg, RANK() OVER (PARTITION BY salary_band ORDER BY salary DESC) as rank_in_bandFROM categorized; -- Expression computed once per row, then reused for multiple window functionsWhen PARTITION BY and ORDER BY are used together, they work in sequence:
This combination enables some of the most powerful window function patterns.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Pattern 1: Running totals per group-- Sales over time, by regionSELECT sale_date, region, amount, SUM(amount) OVER ( PARTITION BY region ORDER BY sale_date ) as region_ytdFROM sales; -- Result: Each region has its own running total that resets-- North: 100 → 250 → 400 → ...-- South: 80 → 180 → 270 → ... -- Pattern 2: Ranking within groupsSELECT name, department, salary, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) as dept_salary_rankFROM employees; -- Each department has ranks 1, 2, 3, ... independently -- Pattern 3: Compare to previous value within groupSELECT month, region, revenue, LAG(revenue) OVER ( PARTITION BY region ORDER BY month ) as prev_month_revenue, revenue - LAG(revenue) OVER ( PARTITION BY region ORDER BY month ) as month_over_month_changeFROM monthly_revenue; -- LAG looks at previous row WITHIN THE SAME REGION onlyThe "Top N per Group" pattern:
One of the most common and powerful uses of PARTITION BY + ORDER BY is selecting the top N rows from each group:
1234567891011121314151617181920212223242526272829303132333435
-- Find top 3 highest-paid employees per departmentWITH ranked AS ( SELECT name, department, salary, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) as rn FROM employees)SELECT name, department, salaryFROM rankedWHERE rn <= 3; -- Result shows only the top 3 earners from each department -- Variation: Use RANK() if you want ties includedWITH ranked AS ( SELECT name, department, salary, RANK() OVER ( PARTITION BY department ORDER BY salary DESC ) as rank FROM employees)SELECT name, department, salary, rankFROM rankedWHERE rank <= 3; -- If 3 people tie for rank 1, all 3 are included (ranks 1,1,1 then 4...)Remember: When ORDER BY is present in OVER, the default frame changes from 'all rows in partition' to 'rows from start of partition up to current row'. This is why SUM() OVER (PARTITION BY x ORDER BY y) gives running totals, not partition totals.
While window functions are generally efficient, PARTITION BY choices can significantly impact query performance. Understanding these considerations helps you write queries that perform well at scale.
12345678910111213141516171819202122232425262728293031323334353637383940
-- TIP 1: Reuse the same PARTITION BY specification-- BAD: Different specs force separate computation passesSELECT department, salary, AVG(salary) OVER (PARTITION BY department) as dept_avg, SUM(salary) OVER (PARTITION BY LOWER(department)) as dept_sum -- Different!FROM employees; -- GOOD: Same spec allows single-pass optimizationSELECT department, salary, AVG(salary) OVER (PARTITION BY department) as dept_avg, SUM(salary) OVER (PARTITION BY department) as dept_sumFROM employees; -- TIP 2: Use named window for consistency and claritySELECT department, salary, AVG(salary) OVER dept_window as dept_avg, SUM(salary) OVER dept_window as dept_sum, COUNT(*) OVER dept_window as dept_countFROM employeesWINDOW dept_window AS (PARTITION BY department); -- TIP 3: Pre-compute complex partition keys-- Instead of partitioning by complex expression multiple times:WITH prepped AS ( SELECT *, CASE WHEN amount > 1000 THEN 'Large' ELSE 'Small' END as size_category FROM orders)SELECT *, COUNT(*) OVER (PARTITION BY size_category) as category_count, AVG(amount) OVER (PARTITION BY size_category) as category_avgFROM prepped;| Scenario | Impact | Mitigation |
|---|---|---|
| High cardinality partition key (e.g., user_id) | Many small partitions; high memory for partition tracking | Filter data first; consider sampling for analytics |
| Low cardinality partition key (e.g., status) | Few large partitions; single partitions may be massive | Often fine; watch for memory on very large tables |
| Skewed distribution (one huge partition) | Processing imbalance; one partition dominates runtime | Consider bucketing or filtering outliers |
| Complex expression in PARTITION BY | Expression evaluated per row | Pre-compute in CTE; use indexed computed columns |
Modern database optimizers are sophisticated about window function execution. They may automatically combine multiple window functions with the same PARTITION BY and ORDER BY into a single pass. Check your database's EXPLAIN output to see how window functions are actually executed.
We've explored PARTITION BY comprehensively. Let's consolidate the key takeaways:
What's next:
Now that we've mastered partitioning data into windows, we'll explore ORDER BY in windows—understanding how row ordering within partitions affects calculations, enables running computations, and makes ranking functions meaningful.
You now have a comprehensive understanding of PARTITION BY—the mechanism that divides data into independent analytical windows. You can partition by columns, expressions, and multiple keys while understanding NULL handling and performance implications. Next, we'll dive deep into ORDER BY in windows.