Ranking Functions - Learning Module

Loading content...

0/241

RANK() Function

Ranking with Recognition for Ties

While ROW_NUMBER() answers "what is the sequential position of each row?", it glosses over a subtle but important detail: what happens when multiple rows deserve the same position?

Consider a race where two runners cross the finish line simultaneously. They both earned first place—but ROW_NUMBER() would arbitrarily assign one runner as #1 and the other as #2. This isn't fair, and it doesn't reflect reality.

RANK() solves this problem by recognizing ties: rows with equal ORDER BY values receive the same rank. But it goes further—after tied rows, it leaves gaps in the sequence to preserve the ordinal meaning of the numbers. If three runners tie for first, the next runner is ranked 4th, not 2nd.

What You Will Master

By the end of this page, you will understand RANK()'s complete semantics: how it handles ties by producing gaps, why gaps preserve ordinal meaning, how it differs from ROW_NUMBER() and DENSE_RANK(), and how to apply it correctly in scenarios where tie recognition matters.

Fundamental Definition and Syntax

RANK() is a window function that assigns a rank to each row within a partition. Rows with equal ORDER BY values receive the same rank, and the next rank value accounts for all preceding rows (including ties).

Formal Definition:

RANK() assigns to each row a number equal to 1 plus the count of rows that strictly precede it in the ordering. Rows with identical ORDER BY values receive identical ranks. After a group of N tied rows at rank R, the next distinct value receives rank R + N (not R + 1).

The Gap Formula:

Rank of current row = 1 + (number of rows with strictly smaller ORDER BY values)

This formula produces gaps after ties because tied rows don't "count" toward advancing the rank—but subsequent rows must account for all prior rows regardless of ties.

RANK() Syntax
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Basic Syntax
RANK() OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC], ...
)
 
-- Minimal form
RANK() OVER (ORDER BY column_name)
 
-- Full form with partitioning
RANK() OVER (
    PARTITION BY category_column
    ORDER BY score DESC
)
 
-- RANK() syntax is identical to ROW_NUMBER()
-- The difference is purely semantic

Key Semantic Properties

•Tie Recognition — Rows with identical ORDER BY values receive identical rank values
•Gap Production — After N tied rows at rank R, next rank is R + N (gaps appear)
•Count-Based — Rank = 1 + count of rows strictly preceding in order
•Partition Independence — Ranking restarts within each partition
•Determinism — With the same ORDER BY values, the same ranks are always produced (unlike ROW_NUMBER with ties)

Gap Mechanics: Understanding the Numbers

The defining characteristic of RANK() is its gap-producing behavior. Let's see this in action with a concrete example.

Example Scenario:

Consider exam scores where multiple students achieved the same score:

RANK() with Ties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Student exam scores with duplicates
CREATE TABLE exam_scores (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100),
    subject VARCHAR(50),
    score INT
);
 
INSERT INTO exam_scores VALUES
(1, 'Alice', 'Math', 95),
(2, 'Bob', 'Math', 88),
(3, 'Carol', 'Math', 95),    -- Ties with Alice
(4, 'David', 'Math', 82),
(5, 'Emma', 'Math', 88),     -- Ties with Bob
(6, 'Frank', 'Math', 88),    -- Three-way tie!
(7, 'Grace', 'Math', 75),
(8, 'Henry', 'Math', 82);    -- Ties with David
 
-- Compare ROW_NUMBER vs RANK
SELECT 
    student_name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
    RANK() OVER (ORDER BY score DESC) AS rank_with_gaps
FROM exam_scores
ORDER BY score DESC, student_name;

Comparison: ROW_NUMBER vs RANK
student_name	score	row_num	rank_with_gaps
Alice	95	1	1
Carol	95	2	1
Bob	88	3	3
Emma	88	4	3
Frank	88	5	3
David	82	6	6
Henry	82	7	6
Grace	75	8	8

Analysis of the Results:

Score	Students	ROW_NUMBER	RANK
95	Alice, Carol	1, 2	1, 1
88	Bob, Emma, Frank	3, 4, 5	3, 3, 3
82	David, Henry	6, 7	6, 6
75	Grace	8	8

Key Observations:

Score 95 (2 students): ROW_NUMBER arbitrarily assigns 1 and 2. RANK gives both 1 (they tied for first).
Score 88 (3 students): ROW_NUMBER assigns 3, 4, 5. RANK gives all three rank 3 (tied for third).
Score 82 (2 students): The gap appears! RANK jumps from 3 to 6, not 4. Why? Because 5 students have higher scores (2 at 95 + 3 at 88), so the 6th and 7th students get rank 6.
Score 75 (1 student): Grace gets rank 8, not 7, because 7 students scored higher.

The Gap Rule Illustrated:

After 2 students tied at rank 1, next rank = 1 + 2 = 3
After 3 students tied at rank 3, next rank = 3 + 3 = 6
After 2 students tied at rank 6, next rank = 6 + 2 = 8

Ordinal Meaning Preserved

RANK() preserves ordinal meaning: rank 6 means '6th place' in the traditional sense—5 distinct positions exist above you. Grace at rank 8 truly has 7 students ahead of her. This semantic accuracy matters for official rankings, competitions, and anywhere position numbers carry real-world meaning.

RANK vs ROW_NUMBER: Critical Differences

Understanding when to use RANK() versus ROW_NUMBER() is essential. They answer subtly different questions and serve different purposes.

RANK vs ROW_NUMBER Feature Comparison
Feature	ROW_NUMBER()	RANK()
Tie handling	Assigns distinct arbitrary numbers	Assigns identical numbers to ties
Gaps in sequence	Never (always 1, 2, 3, ...)	Always after ties (1, 1, 3, ...)
Maximum value	Always equals row count	May be less than row count
Uniqueness	Every row gets unique number	Tied rows share the same number
Determinism with ties	Non-deterministic (arbitrary)	Deterministic (same ties = same rank)
Best for	Pagination, deduplication, sequencing	Competitions, leaderboards, ordinal ranking

Decision Framework:

Use ROW_NUMBER() when you need unique identifiers — Pagination (each page needs distinct rows), deduplication (picking one row among duplicates), sequential numbering for reports
Use RANK() when ties should be recognized with ordinal accuracy — Competition results (tied for 1st place), academic ranking (same GPA = same class rank), performance evaluations where equal performers should rank equally

Choosing the Right Function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Scenario: Top 3 salespeople per region
 
-- Using ROW_NUMBER: Get exactly 3 rows per region
-- (If 2 tie for 3rd, arbitrarily picks one)
WITH rn_ranked AS (
    SELECT 
        region,
        salesperson,
        total_sales,
        ROW_NUMBER() OVER (
            PARTITION BY region 
            ORDER BY total_sales DESC
        ) AS position
    FROM sales_summary
)
SELECT * FROM rn_ranked WHERE position <= 3;
-- Guaranteed exactly 3 rows per region
 
-- Using RANK: Get all who placed in top 3 positions
-- (If 2 tie for 3rd, both are included)
WITH rank_based AS (
    SELECT 
        region,
        salesperson,
        total_sales,
        RANK() OVER (
            PARTITION BY region 
            ORDER BY total_sales DESC
        ) AS position
    FROM sales_summary
)
SELECT * FROM rank_based WHERE position <= 3;
-- May return more than 3 rows per region if ties exist

The Count Implication

When filtering with RANK() <= N, you may get more than N rows due to ties. If you need exactly N rows regardless of ties, use ROW_NUMBER(). If you need all rows that ranked in the top N positions (including ties), use RANK().

Partitioned Ranking with Gaps

Like ROW_NUMBER(), RANK() operates independently within each partition. The gap mechanics apply separately within each group.

Partitioned RANK() Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Sales rankings by product category with ties
CREATE TABLE product_sales (
    product_id INT,
    product_name VARCHAR(100),
    category VARCHAR(50),
    quarterly_revenue DECIMAL(12,2)
);
 
INSERT INTO product_sales VALUES
(1, 'Widget Pro', 'Electronics', 150000),
(2, 'Widget Basic', 'Electronics', 120000),
(3, 'Gadget X', 'Electronics', 150000),   -- Ties with Widget Pro
(4, 'Office Chair', 'Furniture', 80000),
(5, 'Standing Desk', 'Furniture', 95000),
(6, 'Monitor Stand', 'Furniture', 95000), -- Ties with Standing Desk
(7, 'Laptop Bag', 'Accessories', 45000),
(8, 'USB Hub', 'Accessories', 45000),     -- Ties with Laptop Bag
(9, 'Webcam', 'Accessories', 45000);      -- Three-way tie!
 
SELECT 
    category,
    product_name,
    quarterly_revenue,
    RANK() OVER (
        PARTITION BY category 
        ORDER BY quarterly_revenue DESC
    ) AS category_rank
FROM product_sales
ORDER BY category, category_rank;

Partitioned RANK Results
category	product_name	quarterly_revenue	category_rank
Accessories	Laptop Bag	45000.00	1
Accessories	USB Hub	45000.00	1
Accessories	Webcam	45000.00	1
Electronics	Widget Pro	150000.00	1
Electronics	Gadget X	150000.00	1
Electronics	Widget Basic	120000.00	3
Furniture	Standing Desk	95000.00	1
Furniture	Monitor Stand	95000.00	1
Furniture	Office Chair	80000.00	3

Observations:

Electronics: Two products tie for #1 at $150K revenue. Widget Basic gets rank 3 (gap over 2).
Furniture: Two products tie for #1 at $95K. Office Chair gets rank 3.
Accessories: Three products have identical revenue! All three share rank 1. There's no rank 2 or 3 in this partition—and no higher revenue products exist.

Critical Insight for Accessories:

The three-way tie at rank 1 in Accessories means there is no rank 2 or rank 3 within that partition. If you filter for category_rank <= 3, you'd get all 3 Accessories products—but they're all rank 1, not ranks 1, 2, and 3.

Gaps Can Be Large

In extreme cases, gaps can be significant. If 100 products tie for rank 1, the next product would be rank 101. This is mathematically correct—100 products rank higher—but can be surprising if you expect to see rank 2.

Mathematical Foundations of RANK()

Understanding the mathematical definition of RANK() helps clarify its behavior and enables you to reason about edge cases.

Formal Definition:

For a row r with ORDER BY value v within partition P:

RANK(r) = 1 + |{ r' ∈ P : ORDER_BY(r') > ORDER_BY(r) }|

In words: the rank equals 1 plus the count of rows in the partition whose ORDER BY value is strictly greater (for descending) or strictly less (for ascending) than the current row's value.

Alternative Formulation:

RANK(r) = 1 + (position of r in sorted list, using 0-based tie grouping)

Mathematical Properties:

Mathematical Properties of RANK()

•Minimum value: Always 1 (at least one row is highest)
•Maximum value: N (total rows in partition), only if all values are unique
•Sum invariant: Sum of all distinct ranks × their multiplicity = N × (N+1) / 2 (only when no ties)
•Gap formula: After k rows tied at rank r, next rank = r + k
•Uniqueness within value: All rows with identical ORDER BY values have identical ranks
•Ordering preservation: If ORDER_BY(a) > ORDER_BY(b), then RANK(a) < RANK(b) (for descending order)

Verifying Mathematical Properties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Verify: rank + row_count at that rank = next_rank
WITH ranked_data AS (
    SELECT 
        score,
        RANK() OVER (ORDER BY score DESC) AS rnk
    FROM exam_scores
),
grouped AS (
    SELECT 
        score,
        rnk,
        COUNT(*) OVER (PARTITION BY rnk) AS ties_at_rank,
        LEAD(rnk) OVER (ORDER BY rnk) AS next_rank
    FROM ranked_data
)
SELECT DISTINCT
    rnk,
    ties_at_rank,
    rnk + ties_at_rank AS expected_next,
    next_rank AS actual_next,
    CASE WHEN rnk + ties_at_rank = next_rank 
         THEN 'Formula verified' 
         ELSE 'Final group (no next)' 
    END AS verification
FROM grouped
ORDER BY rnk;

The Statistical Interpretation:

RANK() has a natural statistical interpretation: it represents the percentile position from above. A rank of 1 means no one is ahead; a rank of 10 means 9 are ahead. This maps directly to concepts like:

Class rank in education
Competitive standings in sports
Performance percentiles in business

Unlike percentile ranks (which are relative to N and range 0-100%), RANK() produces absolute counts that directly answer: How many others are ahead of me?

Computing Percentile Rank

To convert RANK() to a percentile: (N - RANK + 1) / N * 100. For rank 3 out of 100 rows: (100 - 3 + 1) / 100 * 100 = 98th percentile. This means 98% of rows are at or below this position.

Common Use Cases for RANK()

RANK() shines in specific scenarios where tie recognition with ordinal accuracy matters. Let's explore the most common applications.

Use Case 1: Competition Leaderboards

An competition where tied scores should share positions:

Competition Leaderboard
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Gaming leaderboard with proper tie handling
SELECT 
    player_name,
    region,
    total_points,
    RANK() OVER (
        PARTITION BY region 
        ORDER BY total_points DESC
    ) AS regional_rank,
    RANK() OVER (
        ORDER BY total_points DESC
    ) AS global_rank
FROM players
WHERE season = 'current'
ORDER BY global_rank, regional_rank;
 
-- Results show:
-- * Players with same points get same rank
-- * Gaps accurately reflect how many players are ahead
-- * Works correctly for both regional and global rankings

Use Case 2: Academic Class Ranking

Students with identical GPAs receive identical class ranks:

Academic Class Ranking
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Class rank based on cumulative GPA
WITH student_rankings AS (
    SELECT 
        student_id,
        student_name,
        graduation_year,
        cumulative_gpa,
        RANK() OVER (
            PARTITION BY graduation_year 
            ORDER BY cumulative_gpa DESC
        ) AS class_rank,
        COUNT(*) OVER (PARTITION BY graduation_year) AS class_size
    FROM students
    WHERE enrollment_status = 'active'
)
SELECT 
    student_name,
    graduation_year,
    ROUND(cumulative_gpa, 3) AS gpa,
    class_rank,
    class_size,
    ROUND(class_rank * 100.0 / class_size, 1) AS percentile_rank
FROM student_rankings
ORDER BY graduation_year, class_rank;

Use Case 3: Top-N Including Ties

When you want "top 3" to include everyone who placed in the top 3 positions:

Top-N Including Ties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Find all winners (1st, 2nd, 3rd place including ties)
WITH ranked_contestants AS (
    SELECT 
        contestant_name,
        category,
        score,
        RANK() OVER (
            PARTITION BY category 
            ORDER BY score DESC
        ) AS placement
    FROM contest_results
)
SELECT 
    category,
    placement,
    contestant_name,
    score,
    CASE placement
        WHEN 1 THEN 'Gold'
        WHEN 2 THEN 'Silver'
        WHEN 3 THEN 'Bronze'
    END AS medal
FROM ranked_contestants
WHERE placement <= 3
ORDER BY category, placement;
 
-- Note: If 3 contestants tie for 2nd, you get:
-- 1st place (1 winner)
-- 2nd place (3 winners, all get Silver)
-- No 3rd place (it would be rank 5)

Ties Consume Ranks

With RANK(), ties for 2nd place can eliminate 3rd place entirely! If filtering for placement <= 3, you might get only ranks 1 and 2 if ties push the next rank to 4+. This accurately reflects competition semantics but may surprise you if you expect exactly 3 rankings.

Use Case 4: Finding Rank Percentiles

Identifying where entries fall relative to the group:

Rank-Based Percentiles
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Identify top performers (top 10% by rank)
WITH ranked_sales AS (
    SELECT 
        salesperson_id,
        salesperson_name,
        total_sales,
        RANK() OVER (ORDER BY total_sales DESC) AS sales_rank,
        COUNT(*) OVER () AS total_salespeople
    FROM quarterly_sales
)
SELECT 
    salesperson_name,
    total_sales,
    sales_rank,
    ROUND(sales_rank * 100.0 / total_salespeople, 1) AS percentile_from_top,
    CASE 
        WHEN sales_rank <= CEIL(total_salespeople * 0.10) THEN 'Top 10%'
        WHEN sales_rank <= CEIL(total_salespeople * 0.25) THEN 'Top 25%'
        WHEN sales_rank <= CEIL(total_salespeople * 0.50) THEN 'Top 50%'
        ELSE 'Bottom 50%'
    END AS performance_tier
FROM ranked_sales
ORDER BY sales_rank;

RANK in Complex Analytical Queries

RANK() becomes particularly powerful when combined with other SQL features for sophisticated analysis.

Pattern 1: Rank Changes Over Time

Tracking how ranks evolve across periods:

Tracking Rank Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Track how product rankings change month-over-month
WITH monthly_ranks AS (
    SELECT 
        product_id,
        product_name,
        sale_month,
        monthly_revenue,
        RANK() OVER (
            PARTITION BY sale_month 
            ORDER BY monthly_revenue DESC
        ) AS monthly_rank
    FROM monthly_product_sales
),
rank_changes AS (
    SELECT 
        curr.product_name,
        curr.sale_month,
        curr.monthly_revenue,
        curr.monthly_rank AS current_rank,
        LAG(curr.monthly_rank) OVER (
            PARTITION BY curr.product_id 
            ORDER BY curr.sale_month
        ) AS previous_rank
    FROM monthly_ranks curr
)
SELECT 
    product_name,
    sale_month,
    monthly_revenue,
    current_rank,
    previous_rank,
    COALESCE(previous_rank - current_rank, 0) AS rank_improvement,
    CASE 
        WHEN previous_rank IS NULL THEN 'New Entry'
        WHEN current_rank < previous_rank THEN '↑ Improved'
        WHEN current_rank > previous_rank THEN '↓ Declined'
        ELSE '→ Unchanged'
    END AS trend
FROM rank_changes
ORDER BY sale_month, current_rank;

Pattern 2: Multi-Criteria Ranking

Ranking by primary and secondary criteria with RANK():

Multi-Criteria Ranking
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Rank employees by performance, with ties broken by tenure
SELECT 
    employee_name,
    department,
    performance_score,
    years_of_service,
    -- Primary rank: by performance only
    RANK() OVER (
        PARTITION BY department 
        ORDER BY performance_score DESC
    ) AS performance_rank,
    -- Secondary rank: performance first, then tenure
    RANK() OVER (
        PARTITION BY department 
        ORDER BY performance_score DESC, years_of_service DESC
    ) AS composite_rank,
    -- Demonstrate difference: same performance = same rank
    -- OR same performance + same tenure = same rank
    CASE 
        WHEN RANK() OVER (PARTITION BY department ORDER BY performance_score DESC) 
             = RANK() OVER (PARTITION BY department ORDER BY performance_score DESC, years_of_service DESC)
        THEN 'Primary tie-breaker: tenure'
        ELSE 'Ranks differ due to tie resolution'
    END AS note
FROM employees
ORDER BY department, composite_rank;

Pattern 3: Rank-Based Grouping

Grouping data based on rank ranges:

Rank-Based Grouping
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Group products into rank-based tiers
WITH ranked_products AS (
    SELECT 
        product_id,
        product_name,
        category,
        monthly_sales,
        RANK() OVER (
            PARTITION BY category 
            ORDER BY monthly_sales DESC
        ) AS category_rank,
        COUNT(*) OVER (PARTITION BY category) AS products_in_category
    FROM product_performance
)
SELECT 
    category,
    CASE 
        WHEN category_rank = 1 THEN 'Category Leader'
        WHEN category_rank <= 3 THEN 'Top Performers'
        WHEN category_rank <= CEIL(products_in_category * 0.25) THEN 'Upper Tier'
        WHEN category_rank <= CEIL(products_in_category * 0.75) THEN 'Mid Tier'
        ELSE 'Lower Tier'
    END AS product_tier,
    COUNT(*) AS product_count,
    SUM(monthly_sales) AS tier_revenue,
    AVG(monthly_sales) AS avg_sales
FROM ranked_products
GROUP BY category, 
    CASE 
        WHEN category_rank = 1 THEN 'Category Leader'
        WHEN category_rank <= 3 THEN 'Top Performers'
        WHEN category_rank <= CEIL(products_in_category * 0.25) THEN 'Upper Tier'
        WHEN category_rank <= CEIL(products_in_category * 0.75) THEN 'Mid Tier'
        ELSE 'Lower Tier'
    END
ORDER BY category, 
    CASE product_tier
        WHEN 'Category Leader' THEN 1
        WHEN 'Top Performers' THEN 2
        WHEN 'Upper Tier' THEN 3
        WHEN 'Mid Tier' THEN 4
        ELSE 5
    END;

Performance Considerations

RANK() shares the same execution characteristics as ROW_NUMBER() but with an additional comparison step for detecting ties.

Execution Model:

Partition the data (if PARTITION BY specified)
Sort within each partition (ORDER BY)
Stream through sorted data, assigning ranks
Compare each row's ORDER BY value to the previous row
If equal: assign same rank; if different: assign 1 + row position

Cost Factors:

Performance Impact Factors

•Sorting — The primary cost; ensure indexes support the ORDER BY clause
•Partition size — Large partitions require more memory for sorting
•Tie detection — Minimal overhead; simple comparison during streaming
•Multiple RANK() calls — If using same PARTITION BY and ORDER BY, optimizer may share the sort
•Filtering on RANK() — Requires CTE/subquery; may prevent some optimizations

Index Optimization for RANK()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- For query: RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Optimal index:
CREATE INDEX idx_dept_salary ON employees (department, salary DESC);
 
-- This allows:
-- 1. Seek to each department partition
-- 2. Read rows in salary order (no sort needed)
-- 3. Calculate RANK() in single pass
 
-- For multiple partitioned rankings, compound indexes help:
CREATE INDEX idx_product_ranking ON products (category, monthly_sales DESC);
CREATE INDEX idx_region_ranking ON products (region, monthly_sales DESC);
 
-- Be mindful of index size vs query frequency tradeoff

Shared Window Specifications

When using multiple ranking functions with identical PARTITION BY and ORDER BY, define a named window: WINDOW w AS (PARTITION BY dept ORDER BY salary DESC), then use RANK() OVER w, ROW_NUMBER() OVER w. Some databases optimize to sort only once.

Summary: Mastering RANK()

RANK() provides tie-aware ranking with ordinal accuracy through its gap-producing behavior. Let's consolidate the essential knowledge:

Key Takeaways

•Definition — RANK() assigns the same rank to tied rows and creates gaps after ties to preserve ordinal meaning
•Gap Formula — After N tied rows at rank R, the next distinct value receives rank R + N
•Ordinal Accuracy — Rank R means exactly R-1 rows precede you, regardless of ties among them
•vs ROW_NUMBER() — Use RANK() when ties should be recognized; use ROW_NUMBER() when unique numbers are needed
•COUNT Implication — Filtering on RANK() <= N may return more than N rows if ties exist
•Mathematical Basis — RANK(r) = 1 + count of rows strictly preceding r in order
•Performance — Same as ROW_NUMBER(); optimize with indexes matching PARTITION BY + ORDER BY

What's Next:

With RANK()'s gap-producing behavior understood, we'll explore DENSE_RANK() in the next page. DENSE_RANK() addresses a limitation of RANK(): its gaps can be inconvenient when you want consecutive rank numbers despite ties. DENSE_RANK() produces ranks 1, 1, 2 (no gap) instead of 1, 1, 3 (RANK's behavior). Understanding when each is appropriate completes your ranking function toolkit.

Page Complete

You now possess comprehensive knowledge of the RANK() function—from its mathematical foundation through complex analytical patterns. Combined with ROW_NUMBER(), you can handle most ranking scenarios. DENSE_RANK() adds the final nuance for gap-free ranking.

RANK() Function

Ranking with Recognition for Ties

While ROW_NUMBER() answers "what is the sequential position of each row?", it glosses over a subtle but important detail: what happens when multiple rows deserve the same position?

What You Will Master

Fundamental Definition and Syntax

Formal Definition:

RANK() assigns to each row a number equal to 1 plus the count of rows that strictly precede it in the ordering. Rows with identical ORDER BY values receive identical ranks. After a group of N tied rows at rank R, the next distinct value receives rank R + N (not R + 1).

The Gap Formula:

Rank of current row = 1 + (number of rows with strictly smaller ORDER BY values)

This formula produces gaps after ties because tied rows don't "count" toward advancing the rank—but subsequent rows must account for all prior rows regardless of ties.

RANK() Syntax
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Basic Syntax
RANK() OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC], ...
)
 
-- Minimal form
RANK() OVER (ORDER BY column_name)
 
-- Full form with partitioning
RANK() OVER (
    PARTITION BY category_column
    ORDER BY score DESC
)
 
-- RANK() syntax is identical to ROW_NUMBER()
-- The difference is purely semantic

Key Semantic Properties

•Tie Recognition — Rows with identical ORDER BY values receive identical rank values
•Gap Production — After N tied rows at rank R, next rank is R + N (gaps appear)
•Count-Based — Rank = 1 + count of rows strictly preceding in order
•Partition Independence — Ranking restarts within each partition
•Determinism — With the same ORDER BY values, the same ranks are always produced (unlike ROW_NUMBER with ties)

Gap Mechanics: Understanding the Numbers

The defining characteristic of RANK() is its gap-producing behavior. Let's see this in action with a concrete example.

Example Scenario:

Consider exam scores where multiple students achieved the same score:

RANK() with Ties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Student exam scores with duplicates
CREATE TABLE exam_scores (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100),
    subject VARCHAR(50),
    score INT
);
 
INSERT INTO exam_scores VALUES
(1, 'Alice', 'Math', 95),
(2, 'Bob', 'Math', 88),
(3, 'Carol', 'Math', 95),    -- Ties with Alice
(4, 'David', 'Math', 82),
(5, 'Emma', 'Math', 88),     -- Ties with Bob
(6, 'Frank', 'Math', 88),    -- Three-way tie!
(7, 'Grace', 'Math', 75),
(8, 'Henry', 'Math', 82);    -- Ties with David
 
-- Compare ROW_NUMBER vs RANK
SELECT 
    student_name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
    RANK() OVER (ORDER BY score DESC) AS rank_with_gaps
FROM exam_scores
ORDER BY score DESC, student_name;

Comparison: ROW_NUMBER vs RANK
student_name	score	row_num	rank_with_gaps
Alice	95	1	1
Carol	95	2	1
Bob	88	3	3
Emma	88	4	3
Frank	88	5	3
David	82	6	6
Henry	82	7	6
Grace	75	8	8

Analysis of the Results:

Score	Students	ROW_NUMBER	RANK
95	Alice, Carol	1, 2	1, 1
88	Bob, Emma, Frank	3, 4, 5	3, 3, 3
82	David, Henry	6, 7	6, 6
75	Grace	8	8

Key Observations:

Score 95 (2 students): ROW_NUMBER arbitrarily assigns 1 and 2. RANK gives both 1 (they tied for first).
Score 88 (3 students): ROW_NUMBER assigns 3, 4, 5. RANK gives all three rank 3 (tied for third).
Score 82 (2 students): The gap appears! RANK jumps from 3 to 6, not 4. Why? Because 5 students have higher scores (2 at 95 + 3 at 88), so the 6th and 7th students get rank 6.
Score 75 (1 student): Grace gets rank 8, not 7, because 7 students scored higher.

The Gap Rule Illustrated:

After 2 students tied at rank 1, next rank = 1 + 2 = 3
After 3 students tied at rank 3, next rank = 3 + 3 = 6
After 2 students tied at rank 6, next rank = 6 + 2 = 8

Ordinal Meaning Preserved

RANK vs ROW_NUMBER: Critical Differences

Understanding when to use RANK() versus ROW_NUMBER() is essential. They answer subtly different questions and serve different purposes.

RANK vs ROW_NUMBER Feature Comparison
Feature	ROW_NUMBER()	RANK()
Tie handling	Assigns distinct arbitrary numbers	Assigns identical numbers to ties
Gaps in sequence	Never (always 1, 2, 3, ...)	Always after ties (1, 1, 3, ...)
Maximum value	Always equals row count	May be less than row count
Uniqueness	Every row gets unique number	Tied rows share the same number
Determinism with ties	Non-deterministic (arbitrary)	Deterministic (same ties = same rank)
Best for	Pagination, deduplication, sequencing	Competitions, leaderboards, ordinal ranking

Decision Framework:

Use ROW_NUMBER() when you need unique identifiers — Pagination (each page needs distinct rows), deduplication (picking one row among duplicates), sequential numbering for reports
Use RANK() when ties should be recognized with ordinal accuracy — Competition results (tied for 1st place), academic ranking (same GPA = same class rank), performance evaluations where equal performers should rank equally

Choosing the Right Function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Scenario: Top 3 salespeople per region
 
-- Using ROW_NUMBER: Get exactly 3 rows per region
-- (If 2 tie for 3rd, arbitrarily picks one)
WITH rn_ranked AS (
    SELECT 
        region,
        salesperson,
        total_sales,
        ROW_NUMBER() OVER (
            PARTITION BY region 
            ORDER BY total_sales DESC
        ) AS position
    FROM sales_summary
)
SELECT * FROM rn_ranked WHERE position <= 3;
-- Guaranteed exactly 3 rows per region
 
-- Using RANK: Get all who placed in top 3 positions
-- (If 2 tie for 3rd, both are included)
WITH rank_based AS (
    SELECT 
        region,
        salesperson,
        total_sales,
        RANK() OVER (
            PARTITION BY region 
            ORDER BY total_sales DESC
        ) AS position
    FROM sales_summary
)
SELECT * FROM rank_based WHERE position <= 3;
-- May return more than 3 rows per region if ties exist

The Count Implication

Partitioned Ranking with Gaps

Like ROW_NUMBER(), RANK() operates independently within each partition. The gap mechanics apply separately within each group.

Partitioned RANK() Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Sales rankings by product category with ties
CREATE TABLE product_sales (
    product_id INT,
    product_name VARCHAR(100),
    category VARCHAR(50),
    quarterly_revenue DECIMAL(12,2)
);
 
INSERT INTO product_sales VALUES
(1, 'Widget Pro', 'Electronics', 150000),
(2, 'Widget Basic', 'Electronics', 120000),
(3, 'Gadget X', 'Electronics', 150000),   -- Ties with Widget Pro
(4, 'Office Chair', 'Furniture', 80000),
(5, 'Standing Desk', 'Furniture', 95000),
(6, 'Monitor Stand', 'Furniture', 95000), -- Ties with Standing Desk
(7, 'Laptop Bag', 'Accessories', 45000),
(8, 'USB Hub', 'Accessories', 45000),     -- Ties with Laptop Bag
(9, 'Webcam', 'Accessories', 45000);      -- Three-way tie!
 
SELECT 
    category,
    product_name,
    quarterly_revenue,
    RANK() OVER (
        PARTITION BY category 
        ORDER BY quarterly_revenue DESC
    ) AS category_rank
FROM product_sales
ORDER BY category, category_rank;

Partitioned RANK Results
category	product_name	quarterly_revenue	category_rank
Accessories	Laptop Bag	45000.00	1
Accessories	USB Hub	45000.00	1
Accessories	Webcam	45000.00	1
Electronics	Widget Pro	150000.00	1
Electronics	Gadget X	150000.00	1
Electronics	Widget Basic	120000.00	3
Furniture	Standing Desk	95000.00	1
Furniture	Monitor Stand	95000.00	1
Furniture	Office Chair	80000.00	3

Observations:

Electronics: Two products tie for #1 at $150K revenue. Widget Basic gets rank 3 (gap over 2).
Furniture: Two products tie for #1 at $95K. Office Chair gets rank 3.
Accessories: Three products have identical revenue! All three share rank 1. There's no rank 2 or 3 in this partition—and no higher revenue products exist.

Critical Insight for Accessories:

Gaps Can Be Large

Mathematical Foundations of RANK()

Understanding the mathematical definition of RANK() helps clarify its behavior and enables you to reason about edge cases.

Formal Definition:

For a row r with ORDER BY value v within partition P:

RANK(r) = 1 + |{ r' ∈ P : ORDER_BY(r') > ORDER_BY(r) }|

In words: the rank equals 1 plus the count of rows in the partition whose ORDER BY value is strictly greater (for descending) or strictly less (for ascending) than the current row's value.

Alternative Formulation:

RANK(r) = 1 + (position of r in sorted list, using 0-based tie grouping)

Mathematical Properties:

Mathematical Properties of RANK()

•Minimum value: Always 1 (at least one row is highest)
•Maximum value: N (total rows in partition), only if all values are unique
•Sum invariant: Sum of all distinct ranks × their multiplicity = N × (N+1) / 2 (only when no ties)
•Gap formula: After k rows tied at rank r, next rank = r + k
•Uniqueness within value: All rows with identical ORDER BY values have identical ranks
•Ordering preservation: If ORDER_BY(a) > ORDER_BY(b), then RANK(a) < RANK(b) (for descending order)

Verifying Mathematical Properties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Verify: rank + row_count at that rank = next_rank
WITH ranked_data AS (
    SELECT 
        score,
        RANK() OVER (ORDER BY score DESC) AS rnk
    FROM exam_scores
),
grouped AS (
    SELECT 
        score,
        rnk,
        COUNT(*) OVER (PARTITION BY rnk) AS ties_at_rank,
        LEAD(rnk) OVER (ORDER BY rnk) AS next_rank
    FROM ranked_data
)
SELECT DISTINCT
    rnk,
    ties_at_rank,
    rnk + ties_at_rank AS expected_next,
    next_rank AS actual_next,
    CASE WHEN rnk + ties_at_rank = next_rank 
         THEN 'Formula verified' 
         ELSE 'Final group (no next)' 
    END AS verification
FROM grouped
ORDER BY rnk;

The Statistical Interpretation:

Class rank in education
Competitive standings in sports
Performance percentiles in business

Unlike percentile ranks (which are relative to N and range 0-100%), RANK() produces absolute counts that directly answer: How many others are ahead of me?

Computing Percentile Rank

To convert RANK() to a percentile: (N - RANK + 1) / N * 100. For rank 3 out of 100 rows: (100 - 3 + 1) / 100 * 100 = 98th percentile. This means 98% of rows are at or below this position.

Common Use Cases for RANK()

RANK() shines in specific scenarios where tie recognition with ordinal accuracy matters. Let's explore the most common applications.

Use Case 1: Competition Leaderboards

An competition where tied scores should share positions:

Competition Leaderboard
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Gaming leaderboard with proper tie handling
SELECT 
    player_name,
    region,
    total_points,
    RANK() OVER (
        PARTITION BY region 
        ORDER BY total_points DESC
    ) AS regional_rank,
    RANK() OVER (
        ORDER BY total_points DESC
    ) AS global_rank
FROM players
WHERE season = 'current'
ORDER BY global_rank, regional_rank;
 
-- Results show:
-- * Players with same points get same rank
-- * Gaps accurately reflect how many players are ahead
-- * Works correctly for both regional and global rankings

Use Case 2: Academic Class Ranking

Students with identical GPAs receive identical class ranks:

Academic Class Ranking
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Class rank based on cumulative GPA
WITH student_rankings AS (
    SELECT 
        student_id,
        student_name,
        graduation_year,
        cumulative_gpa,
        RANK() OVER (
            PARTITION BY graduation_year 
            ORDER BY cumulative_gpa DESC
        ) AS class_rank,
        COUNT(*) OVER (PARTITION BY graduation_year) AS class_size
    FROM students
    WHERE enrollment_status = 'active'
)
SELECT 
    student_name,
    graduation_year,
    ROUND(cumulative_gpa, 3) AS gpa,
    class_rank,
    class_size,
    ROUND(class_rank * 100.0 / class_size, 1) AS percentile_rank
FROM student_rankings
ORDER BY graduation_year, class_rank;

Use Case 3: Top-N Including Ties

When you want "top 3" to include everyone who placed in the top 3 positions:

Top-N Including Ties
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Find all winners (1st, 2nd, 3rd place including ties)
WITH ranked_contestants AS (
    SELECT 
        contestant_name,
        category,
        score,
        RANK() OVER (
            PARTITION BY category 
            ORDER BY score DESC
        ) AS placement
    FROM contest_results
)
SELECT 
    category,
    placement,
    contestant_name,
    score,
    CASE placement
        WHEN 1 THEN 'Gold'
        WHEN 2 THEN 'Silver'
        WHEN 3 THEN 'Bronze'
    END AS medal
FROM ranked_contestants
WHERE placement <= 3
ORDER BY category, placement;
 
-- Note: If 3 contestants tie for 2nd, you get:
-- 1st place (1 winner)
-- 2nd place (3 winners, all get Silver)
-- No 3rd place (it would be rank 5)

Ties Consume Ranks

Use Case 4: Finding Rank Percentiles

Identifying where entries fall relative to the group:

Rank-Based Percentiles
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Identify top performers (top 10% by rank)
WITH ranked_sales AS (
    SELECT 
        salesperson_id,
        salesperson_name,
        total_sales,
        RANK() OVER (ORDER BY total_sales DESC) AS sales_rank,
        COUNT(*) OVER () AS total_salespeople
    FROM quarterly_sales
)
SELECT 
    salesperson_name,
    total_sales,
    sales_rank,
    ROUND(sales_rank * 100.0 / total_salespeople, 1) AS percentile_from_top,
    CASE 
        WHEN sales_rank <= CEIL(total_salespeople * 0.10) THEN 'Top 10%'
        WHEN sales_rank <= CEIL(total_salespeople * 0.25) THEN 'Top 25%'
        WHEN sales_rank <= CEIL(total_salespeople * 0.50) THEN 'Top 50%'
        ELSE 'Bottom 50%'
    END AS performance_tier
FROM ranked_sales
ORDER BY sales_rank;

RANK in Complex Analytical Queries

RANK() becomes particularly powerful when combined with other SQL features for sophisticated analysis.

Pattern 1: Rank Changes Over Time

Tracking how ranks evolve across periods:

Tracking Rank Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Track how product rankings change month-over-month
WITH monthly_ranks AS (
    SELECT 
        product_id,
        product_name,
        sale_month,
        monthly_revenue,
        RANK() OVER (
            PARTITION BY sale_month 
            ORDER BY monthly_revenue DESC
        ) AS monthly_rank
    FROM monthly_product_sales
),
rank_changes AS (
    SELECT 
        curr.product_name,
        curr.sale_month,
        curr.monthly_revenue,
        curr.monthly_rank AS current_rank,
        LAG(curr.monthly_rank) OVER (
            PARTITION BY curr.product_id 
            ORDER BY curr.sale_month
        ) AS previous_rank
    FROM monthly_ranks curr
)
SELECT 
    product_name,
    sale_month,
    monthly_revenue,
    current_rank,
    previous_rank,
    COALESCE(previous_rank - current_rank, 0) AS rank_improvement,
    CASE 
        WHEN previous_rank IS NULL THEN 'New Entry'
        WHEN current_rank < previous_rank THEN '↑ Improved'
        WHEN current_rank > previous_rank THEN '↓ Declined'
        ELSE '→ Unchanged'
    END AS trend
FROM rank_changes
ORDER BY sale_month, current_rank;

Pattern 2: Multi-Criteria Ranking

Ranking by primary and secondary criteria with RANK():

Multi-Criteria Ranking
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Rank employees by performance, with ties broken by tenure
SELECT 
    employee_name,
    department,
    performance_score,
    years_of_service,
    -- Primary rank: by performance only
    RANK() OVER (
        PARTITION BY department 
        ORDER BY performance_score DESC
    ) AS performance_rank,
    -- Secondary rank: performance first, then tenure
    RANK() OVER (
        PARTITION BY department 
        ORDER BY performance_score DESC, years_of_service DESC
    ) AS composite_rank,
    -- Demonstrate difference: same performance = same rank
    -- OR same performance + same tenure = same rank
    CASE 
        WHEN RANK() OVER (PARTITION BY department ORDER BY performance_score DESC) 
             = RANK() OVER (PARTITION BY department ORDER BY performance_score DESC, years_of_service DESC)
        THEN 'Primary tie-breaker: tenure'
        ELSE 'Ranks differ due to tie resolution'
    END AS note
FROM employees
ORDER BY department, composite_rank;

Pattern 3: Rank-Based Grouping

Grouping data based on rank ranges:

Rank-Based Grouping
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Group products into rank-based tiers
WITH ranked_products AS (
    SELECT 
        product_id,
        product_name,
        category,
        monthly_sales,
        RANK() OVER (
            PARTITION BY category 
            ORDER BY monthly_sales DESC
        ) AS category_rank,
        COUNT(*) OVER (PARTITION BY category) AS products_in_category
    FROM product_performance
)
SELECT 
    category,
    CASE 
        WHEN category_rank = 1 THEN 'Category Leader'
        WHEN category_rank <= 3 THEN 'Top Performers'
        WHEN category_rank <= CEIL(products_in_category * 0.25) THEN 'Upper Tier'
        WHEN category_rank <= CEIL(products_in_category * 0.75) THEN 'Mid Tier'
        ELSE 'Lower Tier'
    END AS product_tier,
    COUNT(*) AS product_count,
    SUM(monthly_sales) AS tier_revenue,
    AVG(monthly_sales) AS avg_sales
FROM ranked_products
GROUP BY category, 
    CASE 
        WHEN category_rank = 1 THEN 'Category Leader'
        WHEN category_rank <= 3 THEN 'Top Performers'
        WHEN category_rank <= CEIL(products_in_category * 0.25) THEN 'Upper Tier'
        WHEN category_rank <= CEIL(products_in_category * 0.75) THEN 'Mid Tier'
        ELSE 'Lower Tier'
    END
ORDER BY category, 
    CASE product_tier
        WHEN 'Category Leader' THEN 1
        WHEN 'Top Performers' THEN 2
        WHEN 'Upper Tier' THEN 3
        WHEN 'Mid Tier' THEN 4
        ELSE 5
    END;

Performance Considerations

RANK() shares the same execution characteristics as ROW_NUMBER() but with an additional comparison step for detecting ties.

Execution Model:

Partition the data (if PARTITION BY specified)
Sort within each partition (ORDER BY)
Stream through sorted data, assigning ranks
Compare each row's ORDER BY value to the previous row
If equal: assign same rank; if different: assign 1 + row position

Cost Factors:

Performance Impact Factors

•Sorting — The primary cost; ensure indexes support the ORDER BY clause
•Partition size — Large partitions require more memory for sorting
•Tie detection — Minimal overhead; simple comparison during streaming
•Multiple RANK() calls — If using same PARTITION BY and ORDER BY, optimizer may share the sort
•Filtering on RANK() — Requires CTE/subquery; may prevent some optimizations

Index Optimization for RANK()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- For query: RANK() OVER (PARTITION BY dept ORDER BY salary DESC)
-- Optimal index:
CREATE INDEX idx_dept_salary ON employees (department, salary DESC);
 
-- This allows:
-- 1. Seek to each department partition
-- 2. Read rows in salary order (no sort needed)
-- 3. Calculate RANK() in single pass
 
-- For multiple partitioned rankings, compound indexes help:
CREATE INDEX idx_product_ranking ON products (category, monthly_sales DESC);
CREATE INDEX idx_region_ranking ON products (region, monthly_sales DESC);
 
-- Be mindful of index size vs query frequency tradeoff

Shared Window Specifications

Summary: Mastering RANK()

RANK() provides tie-aware ranking with ordinal accuracy through its gap-producing behavior. Let's consolidate the essential knowledge:

Key Takeaways

•Definition — RANK() assigns the same rank to tied rows and creates gaps after ties to preserve ordinal meaning
•Gap Formula — After N tied rows at rank R, the next distinct value receives rank R + N
•Ordinal Accuracy — Rank R means exactly R-1 rows precede you, regardless of ties among them
•vs ROW_NUMBER() — Use RANK() when ties should be recognized; use ROW_NUMBER() when unique numbers are needed
•COUNT Implication — Filtering on RANK() <= N may return more than N rows if ties exist
•Mathematical Basis — RANK(r) = 1 + count of rows strictly preceding r in order
•Performance — Same as ROW_NUMBER(); optimize with indexes matching PARTITION BY + ORDER BY

What's Next:

Page Complete