Window Functions Basics - Learning Module

Loading content...

0/252

ORDER BY in Windows

The Critical Role of Order in Window Functions

When you write OVER (ORDER BY sale_date), you're doing far more than sorting output—you're fundamentally changing how the window function calculates its result.

The ORDER BY clause within OVER serves three critical purposes:

Enables meaningful ranking — Functions like ROW_NUMBER, RANK, and DENSE_RANK require an ordering to assign positions
Defines row relationships — Offset functions like LAG and LEAD need to know what "previous" and "next" mean
Changes the default frame — Perhaps most subtly, ORDER BY changes which rows aggregate functions see

Understanding ORDER BY in window functions is essential because it behaves differently than the ORDER BY at the end of your query. The outer ORDER BY determines output sequence; ORDER BY in OVER determines calculation logic.

What You Will Learn

By the end of this page, you will understand how ORDER BY in OVER defines calculation order, why it's mandatory for certain functions, how it changes default frame behavior, how to handle ties and NULLs, and how to use multiple ORDER BY columns effectively.

ORDER BY in OVER: Fundamentals

The ORDER BY clause within OVER specifies the logical ordering of rows within each partition (or within the entire result set if no PARTITION BY is present).

Syntax:

OVER (
    [PARTITION BY ...]
    ORDER BY column1 [ASC|DESC] [NULLS FIRST|LAST],
             column2 [ASC|DESC] [NULLS FIRST|LAST],
             ...
)

Key points:

ORDER BY in OVER is independent of ORDER BY in the outer query
It defines the logical sequence for window function calculation
Multiple columns can be specified, with later columns breaking ties
Sort direction (ASC/DESC) and NULL handling can be specified per column

order_by_basics.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Key distinction: OVER ORDER BY vs outer ORDER BY
SELECT 
    name,
    hire_date,
    salary,
    -- OVER ORDER BY: determines how ROW_NUMBER is calculated
    ROW_NUMBER() OVER (ORDER BY salary DESC) as salary_rank
FROM employees
-- Outer ORDER BY: determines output row sequence
ORDER BY name ASC;
 
-- Result:
-- | name   | hire_date  | salary | salary_rank |
-- |--------|------------|--------|-------------|
-- | Alice  | 2022-03-15 | 90000  | 1           |  -- Output ordered by name
-- | Bob    | 2023-01-10 | 85000  | 2           |
-- | Carol  | 2021-06-01 | 70000  | 4           |
-- | David  | 2020-11-20 | 75000  | 3           |
 
-- salary_rank is based on salary DESC (highest = 1)
-- Output order is name ASC (alphabetical)
-- These are completely independent orderings

Don't Confuse the Two ORDER BYs

The ORDER BY inside OVER controls CALCULATION order. The ORDER BY at the end of the query controls OUTPUT order. They serve different purposes and are processed at different stages of query execution.

Functions That Require ORDER BY

While ORDER BY is technically optional for aggregate window functions, several window-only functions require it for meaningful results. Without ORDER BY, these functions have no defined row sequence to work with.

ORDER BY Requirements by Function
Category	Functions	ORDER BY Required?	Why?
Ranking	ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()	Logically required	Ranks need defined order to assign positions
Offset	LAG(), LEAD()	Logically required	'Previous' and 'next' need defined sequence
Value	FIRST_VALUE(), LAST_VALUE(), NTH_VALUE()	Logically required	'First', 'last', and 'Nth' need defined order
Distribution	PERCENT_RANK(), CUME_DIST()	Logically required	Percentile calculations need ordered values
Aggregate	SUM(), AVG(), COUNT(), MIN(), MAX()	Optional	Affects frame default; enables running totals

order_by_requirements.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Ranking functions NEED ORDER BY to be meaningful
-- Without it, the database picks an arbitrary order
 
SELECT 
    name,
    salary,
    -- BAD: ORDER BY omitted - arbitrary ranking
    ROW_NUMBER() OVER () as arbitrary_rank,
    -- GOOD: ORDER BY specified - meaningful ranking
    ROW_NUMBER() OVER (ORDER BY salary DESC) as salary_rank
FROM employees;
 
-- The arbitrary_rank values are unpredictable and may change between executions
 
-- LAG/LEAD need ORDER BY to define "previous" and "next"
SELECT 
    name,
    hire_date,
    salary,
    -- BAD: What does "previous" mean without order?
    LAG(salary) OVER () as arbitrary_prev,
    -- GOOD: "Previous by hire date"
    LAG(salary) OVER (ORDER BY hire_date) as prev_by_date,
    -- ALSO GOOD: "Previous by salary rank"
    LAG(salary) OVER (ORDER BY salary DESC) as prev_by_salary
FROM employees;
 
-- FIRST_VALUE/LAST_VALUE need ORDER BY
SELECT 
    month,
    revenue,
    -- "First month's revenue for the year"
    FIRST_VALUE(revenue) OVER (ORDER BY month) as first_month_rev,
    -- "Highest revenue month" (first after DESC sort)
    FIRST_VALUE(revenue) OVER (ORDER BY revenue DESC) as highest_rev
FROM monthly_revenue;

Syntactically Optional ≠ Semantically Optional

Most databases won't throw an error if you omit ORDER BY from ROW_NUMBER() or LAG(), but the results will be meaningless or non-deterministic. Always include ORDER BY for ranking, offset, and value functions.

How ORDER BY Changes Frame Defaults

One of the most critical—and often surprising—effects of ORDER BY in OVER is how it changes the default window frame for aggregate functions.

The rule:

Without ORDER BY: Default frame is all rows in partition
With ORDER BY: Default frame is partition start up to current row

This distinction explains why SUM(amount) OVER () gives the same total for every row, but SUM(amount) OVER (ORDER BY date) gives a running total.

Without ORDER BY

•Frame: All rows in partition
•Equivalent to: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
•SUM() returns same total for every row
•AVG() returns same average for every row
•Use case: Compare each row to group total

With ORDER BY

•Frame: Start to current row
•Equivalent to: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
•SUM() returns running total
•AVG() returns running average
•Use case: Cumulative calculations

frame_default_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Sample data:
-- | date       | amount |
-- |------------|--------|
-- | 2024-01-01 | 100    |
-- | 2024-01-02 | 200    |
-- | 2024-01-03 | 150    |
-- | 2024-01-04 | 300    |
 
-- WITHOUT ORDER BY: all rows in partition
SELECT 
    date,
    amount,
    SUM(amount) OVER () as total               -- Same for all rows
FROM transactions;
 
-- Result:
-- | date       | amount | total |
-- |------------|--------|-------|
-- | 2024-01-01 | 100    | 750   |  -- Total of ALL rows
-- | 2024-01-02 | 200    | 750   |
-- | 2024-01-03 | 150    | 750   |
-- | 2024-01-04 | 300    | 750   |
 
-- WITH ORDER BY: start to current row (running total)
SELECT 
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) as running_total
FROM transactions;
 
-- Result:
-- | date       | amount | running_total |
-- |------------|--------|---------------|
-- | 2024-01-01 | 100    | 100           |  -- Just row 1
-- | 2024-01-02 | 200    | 300           |  -- Rows 1+2
-- | 2024-01-03 | 150    | 450           |  -- Rows 1+2+3
-- | 2024-01-04 | 300    | 750           |  -- All rows

This Catches Many Developers Off Guard

Many developers are surprised when adding ORDER BY to an aggregate window function changes the results from 'same value for all rows' to 'running total'. This is by design and is one of the most useful features of window functions, but you must be aware of it.

explicit_frame_override.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- If you want ORDER BY but NOT a running total, specify frame explicitly
SELECT 
    date,
    amount,
    -- Running total (default frame with ORDER BY)
    SUM(amount) OVER (ORDER BY date) as running_total,
    -- Full partition total (explicit frame overrides default)
    SUM(amount) OVER (
        ORDER BY date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as total
FROM transactions;
 
-- Result:
-- | date       | amount | running_total | total |
-- |------------|--------|---------------|-------|
-- | 2024-01-01 | 100    | 100           | 750   |
-- | 2024-01-02 | 200    | 300           | 750   |
-- | 2024-01-03 | 150    | 450           | 750   |
-- | 2024-01-04 | 300    | 750           | 750   |
 
-- The explicit frame gives same total for all rows, 
-- even though ORDER BY is specified

Multiple ORDER BY Columns

When you specify multiple columns in ORDER BY, they work hierarchically:

First column is the primary sort key
Second column breaks ties in the first
Third column breaks ties in the first two
And so on...

This is essential for creating deterministic rankings when primary sort keys have duplicates.

multiple_order_columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Problem: Salary ties cause non-deterministic row numbers
-- | name   | salary |
-- |--------|--------|
-- | Alice  | 90000  |
-- | Bob    | 85000  |
-- | Carol  | 85000  |  -- Tied with Bob
-- | David  | 80000  |
 
-- Single column: non-deterministic for ties
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as rank
FROM employees;
 
-- Result: Either Bob=2, Carol=3 OR Carol=2, Bob=3
-- The database picks arbitrarily!
 
-- Multiple columns: deterministic tiebreaker
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC, name ASC) as rank
FROM employees;
 
-- Result (always the same):
-- | name  | salary | rank |
-- |-------|--------|------|
-- | Alice | 90000  | 1    |
-- | Bob   | 85000  | 2    |  -- Bob before Carol (B < C alphabetically)
-- | Carol | 85000  | 3    |
-- | David | 80000  | 4    |

Different directions per column:

Each column can have its own ASC/DESC direction:

mixed_directions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Rank by salary descending, but for ties, by tenure ascending
-- (highest paid first, and for same salary, newest employees first)
SELECT 
    name,
    salary,
    hire_date,
    ROW_NUMBER() OVER (
        ORDER BY salary DESC,      -- Highest salary first
                 hire_date DESC    -- Most recent hire first (for ties)
    ) as rank
FROM employees;
 
-- Another example: Department A-Z, then salary high-to-low within each
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (
        ORDER BY department ASC,
                 salary DESC
    ) as org_chart_order
FROM employees;

Best Practice: Always Break Ties

For deterministic results, always include enough ORDER BY columns to eliminate ties. If your primary sort key can have duplicates, add a secondary key. In the worst case, include a unique identifier (like primary key or row ID) as the final tiebreaker.

Handling NULLs in ORDER BY

NULL values in ORDER BY columns require special consideration. Different databases have different defaults for where NULLs appear in the sort order, and the SQL standard provides explicit control through NULLS FIRST and NULLS LAST.

Default NULL ordering by database:

Default NULL Ordering by Database
Database	ASC Default	DESC Default
PostgreSQL	NULLS LAST	NULLS FIRST
Oracle	NULLS LAST	NULLS FIRST
SQL Server	NULLS FIRST (smallest)	NULLS LAST (smallest)
MySQL	NULLS FIRST (smallest)	NULLS LAST (smallest)
SQLite	NULLS FIRST (smallest)	NULLS LAST (smallest)

null_ordering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Data with NULL values:
-- | name   | bonus  |
-- |--------|--------|
-- | Alice  | 5000   |
-- | Bob    | NULL   |
-- | Carol  | 3000   |
-- | David  | NULL   |
-- | Eve    | 7000   |
 
-- Default behavior (PostgreSQL: NULLS LAST for ASC)
SELECT 
    name,
    bonus,
    ROW_NUMBER() OVER (ORDER BY bonus ASC) as rank
FROM employees;
 
-- Result (PostgreSQL):
-- | name  | bonus | rank |
-- |-------|-------|------|
-- | Carol | 3000  | 1    |
-- | Alice | 5000  | 2    |
-- | Eve   | 7000  | 3    |
-- | Bob   | NULL  | 4    |  -- NULLs sorted last
-- | David | NULL  | 5    |
 
-- Explicit NULLS FIRST
SELECT 
    name,
    bonus,
    ROW_NUMBER() OVER (ORDER BY bonus ASC NULLS FIRST) as rank
FROM employees;
 
-- Result:
-- | name  | bonus | rank |
-- |-------|-------|------|
-- | Bob   | NULL  | 1    |  -- NULLs now first
-- | David | NULL  | 2    |
-- | Carol | 3000  | 3    |
-- | Alice | 5000  | 4    |
-- | Eve   | 7000  | 5    |

Explicit NULL handling is recommended for portable code:

explicit_null_handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Explicit control ensures consistent behavior across databases
SELECT 
    name,
    completion_date,
    RANK() OVER (
        ORDER BY completion_date ASC NULLS LAST  -- Incomplete items rank last
    ) as completion_order
FROM tasks;
 
-- For descending sorts, you often want NULLS LAST as well
SELECT 
    name,
    score,
    RANK() OVER (
        ORDER BY score DESC NULLS LAST  -- No score = bottom of rankings
    ) as leaderboard_rank
FROM players;
 
-- NULLS FIRST for DESC when NULLs represent "not yet evaluated"
SELECT 
    application_id,
    review_score,
    ROW_NUMBER() OVER (
        ORDER BY review_score DESC NULLS FIRST  -- Unreviewed first (priority)
    ) as review_queue
FROM applications;

SQL Server Alternative

SQL Server doesn't support NULLS FIRST/LAST syntax directly. Use CASE expressions as a workaround: ORDER BY CASE WHEN column IS NULL THEN 0 ELSE 1 END, column

Understanding Peers and Ties

When ORDER BY doesn't uniquely order rows—because multiple rows have the same value(s) in the ORDER BY column(s)—those rows are called peers. How window functions handle peers varies by function and frame type.

Key behaviors:

ROW_NUMBER() — Always assigns unique numbers; peer order is arbitrary
RANK() — Gives peers the same rank; next rank skips numbers
DENSE_RANK() — Gives peers the same rank; next rank is consecutive
Aggregates with RANGE frame — Include all peers in the frame together

peers_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Data with ties:
-- | name   | score |
-- |--------|-------|
-- | Alice  | 100   |
-- | Bob    | 95    |
-- | Carol  | 95    |  -- Tied with Bob
-- | David  | 90    |
-- | Eve    | 90    |  -- Tied with David
-- | Frank  | 85    |
 
-- Comparing ranking functions for ties
SELECT 
    name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,   -- Unique, arbitrary for peers
    RANK() OVER (ORDER BY score DESC) as rank,            -- Same for peers, gaps
    DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank -- Same for peers, no gaps
FROM students
ORDER BY score DESC;
 
-- Result:
-- | name  | score | row_num | rank | dense_rank |
-- |-------|-------|---------|------|------------|
-- | Alice | 100   | 1       | 1    | 1          |
-- | Bob   | 95    | 2       | 2    | 2          |  -- Bob and Carol: rank 2
-- | Carol | 95    | 3       | 2    | 2          |  -- (row_num differs)
-- | David | 90    | 4       | 4    | 3          |  -- RANK skips 3, DENSE_RANK doesn't
-- | Eve   | 90    | 5       | 4    | 3          |
-- | Frank | 85    | 6       | 6    | 4          |

RANGE vs ROWS frame with peers:

The default frame when ORDER BY is specified is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With RANGE, all peers are included together:

range_vs_rows_peers.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Data with ties in ORDER BY column:
-- | date       | amount |
-- |------------|--------|
-- | 2024-01-01 | 100    |
-- | 2024-01-02 | 200    |
-- | 2024-01-02 | 150    |  -- Same date (peers)
-- | 2024-01-03 | 300    |
 
-- Default frame (RANGE): includes all peers together
SELECT 
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) as range_sum  -- Default is RANGE
FROM transactions;
 
-- Result:
-- | date       | amount | range_sum |
-- |------------|--------|-----------|
-- | 2024-01-01 | 100    | 100       |
-- | 2024-01-02 | 200    | 450       |  -- Includes BOTH Jan 2 rows (200+150+100)
-- | 2024-01-02 | 150    | 450       |  -- Same! Both peers see all Jan 2 data
-- | 2024-01-03 | 300    | 750       |
 
-- Explicit ROWS frame: physical row order, not peers
SELECT 
    date,
    amount,
    SUM(amount) OVER (
        ORDER BY date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as rows_sum
FROM transactions;
 
-- Result:
-- | date       | amount | rows_sum |
-- |------------|--------|----------|
-- | 2024-01-01 | 100    | 100      |
-- | 2024-01-02 | 200    | 300      |  -- Different from next row!
-- | 2024-01-02 | 150    | 450      |  -- Depends on physical row position
-- | 2024-01-03 | 300    | 750      |

RANGE Frame Behavior with Peers

The default RANGE frame includes ALL peers up to and including the current row. This means rows with the same ORDER BY value get the same aggregate result. If you want strictly physical row-by-row running totals, use explicit ROWS frame.

ORDER BY with Expressions

Just like PARTITION BY, ORDER BY can use expressions instead of simple column references. This enables powerful patterns like ordering by computed values, conditional sorting, and derived metrics.

order_by_expressions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Order by computed age
SELECT 
    name,
    birth_date,
    ROW_NUMBER() OVER (
        ORDER BY EXTRACT(YEAR FROM CURRENT_DATE) - EXTRACT(YEAR FROM birth_date) DESC
    ) as age_rank
FROM employees;
 
-- Order by absolute value (distance from zero)
SELECT 
    account_id,
    balance,
    RANK() OVER (ORDER BY ABS(balance) DESC) as magnitude_rank
FROM accounts;
 
-- Order by derived category, then by value
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (
        ORDER BY 
            CASE 
                WHEN salary >= 100000 THEN 1
                WHEN salary >= 70000 THEN 2
                ELSE 3
            END,
            salary DESC
    ) as tiered_rank
FROM employees;
 
-- Order by calculated fiscal year (April start)
SELECT 
    transaction_date,
    amount,
    SUM(amount) OVER (
        ORDER BY 
            CASE 
                WHEN EXTRACT(MONTH FROM transaction_date) >= 4 
                THEN EXTRACT(YEAR FROM transaction_date)
                ELSE EXTRACT(YEAR FROM transaction_date) - 1
            END,
            transaction_date
    ) as fiscal_ytd
FROM transactions;

Ordering by aggregates requires subquery:

You cannot ORDER BY an aggregate in the same window function (circular reference). Use a subquery or CTE:

order_by_aggregate.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- WRONG: Circular reference
-- SELECT name, SUM(sales), 
--        RANK() OVER (ORDER BY SUM(sales) DESC)  -- Can't aggregate in ORDER BY
-- FROM salespeople;
 
-- CORRECT: Aggregate first, then rank
WITH aggregated AS (
    SELECT name, SUM(sales) as total_sales
    FROM salespeople
    GROUP BY name
)
SELECT 
    name,
    total_sales,
    RANK() OVER (ORDER BY total_sales DESC) as rank
FROM aggregated;
 
-- Or inline subquery:
SELECT 
    name,
    total_sales,
    RANK() OVER (ORDER BY total_sales DESC) as rank
FROM (
    SELECT name, SUM(sales) as total_sales
    FROM salespeople
    GROUP BY name
) aggregated;

Summary: ORDER BY in Windows

We've explored ORDER BY in window functions comprehensively. Let's consolidate the key takeaways:

Key Takeaways

•ORDER BY in OVER defines calculation order, not output order — It's independent of the outer query's ORDER BY.
•Ranking and offset functions require ORDER BY — Without it, ROW_NUMBER, RANK, LAG, LEAD produce arbitrary or meaningless results.
•ORDER BY changes the default frame — Without ORDER BY: all rows. With ORDER BY: start to current row (running calculation).
•Multiple columns create deterministic tiebreakers — Secondary columns break ties in primary columns; include unique IDs for fully deterministic ordering.
•NULLS FIRST/LAST controls NULL positioning — Database defaults vary; explicit specification ensures portable code.
•Peers (ties) behave differently by function — ROW_NUMBER gives unique values; RANK/DENSE_RANK give same values to peers; RANGE frame includes all peers.
•Expressions are valid ORDER BY keys — Use CASE, functions, and computed values for flexible ordering logic.

What's next:

Now that we understand how partitioning and ordering work together, we'll explore the final piece of the window function puzzle: the frame specification. Frames give you precise control over exactly which rows around the current row participate in the window function calculation.

Page Complete

You now have a comprehensive understanding of ORDER BY in window functions—how it defines calculation order, affects frame defaults, handles ties and NULLs, and works with expressions. Next, we'll master frame specifications for ultimate control over window function behavior.