Loading content...
Among SQL window functions, ROW_NUMBER() stands as the foundational ranking function—the simplest in concept, yet among the most powerful and universally applicable. It answers a deceptively simple question: What is the sequential position of each row within a defined ordering?
Unlike aggregate functions that collapse multiple rows into a single result, ROW_NUMBER() preserves every row while enriching each with positional metadata. This capability unlocks entire categories of analytical operations that would otherwise require complex procedural logic or multiple query passes.
Whether you're implementing pagination for a web application, selecting the top N items per category, eliminating duplicates based on business rules, or constructing running sequences for audit trails, ROW_NUMBER() provides the essential building block.
By the end of this page, you will understand ROW_NUMBER()'s complete mechanics: how it assigns sequential integers, how PARTITION BY and ORDER BY shape its behavior, how the database engine executes it internally, and how to apply it to solve real-world ranking and selection problems with precision.
ROW_NUMBER() is a window function that assigns a unique, sequential integer to each row within a partition of a result set, starting from 1. The assignment follows the order specified in the ORDER BY clause of the OVER() clause.
Formal Definition:
ROW_NUMBER() assigns to each row in the partition a unique number from 1 to N, where N is the total number of rows in that partition. The specific number assigned to each row depends on the ordering specified; ties (rows with identical ORDER BY values) receive arbitrary but deterministic assignments.
The function signature is elegantly simple:
1234567891011121314
-- Basic SyntaxROW_NUMBER() OVER ( [PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC | DESC], ...) -- Minimal form (entire result set as one partition)ROW_NUMBER() OVER (ORDER BY column_name) -- Full form with partitioningROW_NUMBER() OVER ( PARTITION BY category_column ORDER BY sorting_column DESC)While syntactically optional in some database systems, omitting ORDER BY makes ROW_NUMBER() assign numbers in an undefined sequence—typically the physical storage order or query execution order. This produces unpredictable, non-reproducible results. Always specify ORDER BY explicitly for deterministic behavior.
Understanding ROW_NUMBER() at a deep level requires examining its mechanics from multiple angles: the logical assignment process, the handling of partitions, and the treatment of ties.
The Assignment Algorithm:
Conceptually, ROW_NUMBER() operates through the following process:
1234567891011121314151617181920212223242526
-- Sample employees tableCREATE TABLE employees ( employee_id INT PRIMARY KEY, name VARCHAR(100), department VARCHAR(50), salary DECIMAL(10, 2), hire_date DATE); INSERT INTO employees VALUES(1, 'Alice Chen', 'Engineering', 95000, '2019-03-15'),(2, 'Bob Martinez', 'Engineering', 87000, '2020-07-22'),(3, 'Carol White', 'Sales', 72000, '2018-11-01'),(4, 'David Kim', 'Engineering', 92000, '2021-01-10'),(5, 'Emma Brown', 'Sales', 68000, '2019-06-18'),(6, 'Frank Jones', 'Marketing', 75000, '2020-02-28'),(7, 'Grace Lee', 'Sales', 78000, '2017-09-05'),(8, 'Henry Wilson', 'Marketing', 71000, '2022-04-12'); -- Assign row numbers by salary (highest first)SELECT name, department, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rankFROM employees;| name | department | salary | salary_rank |
|---|---|---|---|
| Alice Chen | Engineering | 95000.00 | 1 |
| David Kim | Engineering | 92000.00 | 2 |
| Bob Martinez | Engineering | 87000.00 | 3 |
| Grace Lee | Sales | 78000.00 | 4 |
| Frank Jones | Marketing | 75000.00 | 5 |
| Carol White | Sales | 72000.00 | 6 |
| Henry Wilson | Marketing | 71000.00 | 7 |
| Emma Brown | Sales | 68000.00 | 8 |
Key Observations:
This global ranking answers: Where does each employee stand in the company-wide salary ordering?
PARTITION BY fundamentally changes how ROW_NUMBER() operates. Instead of treating the entire result set as a single sequence, it creates independent sequences within each partition.
Mental Model:
Think of PARTITION BY as creating invisible walls that separate your data into isolated groups. ROW_NUMBER() operates entirely within each group, unaware of rows in other groups. The numbering resets to 1 for each partition.
This is analogous to running separate ROW_NUMBER() queries for each department, then combining the results—but accomplished in a single, efficient pass through the data.
1234567891011
-- Row numbers within each department, by salarySELECT name, department, salary, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) AS dept_salary_rankFROM employeesORDER BY department, dept_salary_rank;| name | department | salary | dept_salary_rank |
|---|---|---|---|
| Alice Chen | Engineering | 95000.00 | 1 |
| David Kim | Engineering | 92000.00 | 2 |
| Bob Martinez | Engineering | 87000.00 | 3 |
| Frank Jones | Marketing | 75000.00 | 1 |
| Henry Wilson | Marketing | 71000.00 | 2 |
| Grace Lee | Sales | 78000.00 | 1 |
| Carol White | Sales | 72000.00 | 2 |
| Emma Brown | Sales | 68000.00 | 3 |
Critical Differences from Global Ranking:
Each department becomes its own independent numbering universe. Alice is rank 1 in Engineering; Frank is rank 1 in Marketing; Grace is rank 1 in Sales. The function answers: Where does each employee rank within their own department?
PARTITION BY can accept multiple columns: PARTITION BY department, hire_year. This creates finer-grained partitions—one sequence per unique combination of department and hire year. Use this for multi-dimensional analysis like 'rank within department within quarter.'
The ORDER BY clause within the OVER() specification determines how ROW_NUMBER() sequences rows. Understanding its nuances—particularly how ties are handled—is crucial for correct results.
The Tie Problem:
What happens when two rows have identical ORDER BY values? ROW_NUMBER() must still assign distinct integers—it cannot give two rows the same number. The solution: it assigns numbers arbitrarily among tied rows.
"Arbitrary but Deterministic":
While the assignment among ties is arbitrary (not based on any specified criteria), it is typically deterministic within a single query execution—the same query against unchanged data usually produces the same result. However, this behavior is implementation-dependent and should never be relied upon.
123456789101112131415161718
-- Add employees with identical salariesINSERT INTO employees VALUES(9, 'Ivy Clark', 'Engineering', 92000, '2020-05-20'),(10, 'Jack Taylor', 'Sales', 72000, '2021-08-15'); -- Ties in salary - which row gets which number?SELECT name, department, salary, hire_date, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) AS rankFROM employeesWHERE department = 'Engineering'ORDER BY rank;| name | department | salary | hire_date | rank |
|---|---|---|---|---|
| Alice Chen | Engineering | 95000.00 | 2019-03-15 | 1 |
| David Kim | Engineering | 92000.00 | 2021-01-10 | 2 |
| Ivy Clark | Engineering | 92000.00 | 2020-05-20 | 3 |
| Bob Martinez | Engineering | 87000.00 | 2020-07-22 | 4 |
The Problem: David Kim and Ivy Clark both earn $92,000. Who should be rank 2 versus rank 3? With only ORDER BY salary DESC, the database chooses arbitrarily. This might vary across:
The Solution: Secondary Sort Keys
To ensure deterministic, reproducible results, add secondary sort columns that break ties:
12345678910111213
-- Deterministic ranking with tie-breakerSELECT name, department, salary, hire_date, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC, hire_date ASC -- Earlier hire wins ties ) AS rankFROM employeesWHERE department = 'Engineering'ORDER BY rank;| name | department | salary | hire_date | rank |
|---|---|---|---|---|
| Alice Chen | Engineering | 95000.00 | 2019-03-15 | 1 |
| Ivy Clark | Engineering | 92000.00 | 2020-05-20 | 2 |
| David Kim | Engineering | 92000.00 | 2021-01-10 | 3 |
| Bob Martinez | Engineering | 87000.00 | 2020-07-22 | 4 |
In production systems, always include enough ORDER BY columns to guarantee uniqueness. A common pattern: ORDER BY main_criteria, secondary_criteria, primary_key. The primary key, being unique, guarantees no remaining ties.
Understanding how the database engine executes ROW_NUMBER() helps you write efficient queries and anticipate performance characteristics.
Execution Phases:
Window functions, including ROW_NUMBER(), execute in a specific phase of query processing—after FROM, WHERE, GROUP BY, and HAVING, but before ORDER BY and LIMIT. This timing has important implications:
12345678910111213
-- Conceptual execution order1. FROM -- Identify source tables2. WHERE -- Filter rows3. GROUP BY -- Group rows (if aggregating)4. HAVING -- Filter groups5. SELECT -- Evaluate expressions, including window functions └── Window Functions execute here6. DISTINCT -- Remove duplicates7. ORDER BY -- Sort final output8. LIMIT/OFFSET -- Restrict result count -- Window functions see the pre-final result set-- They cannot see rows removed by DISTINCT, ORDER BY sorting, or LIMITPhysical Execution Strategy:
For ROW_NUMBER() specifically, the database typically:
Performance Implications:
For frequently-run ROW_NUMBER() queries, consider creating an index on (partition_columns, order_columns). This allows the database to read rows in the exact order needed, eliminating the sort step entirely.
123456
-- For the query: ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC)-- Create this index:CREATE INDEX idx_dept_salary ON employees (department, salary DESC); -- The database can now scan this index in order-- No additional sorting required for the window functionROW_NUMBER() appears in numerous practical scenarios. Let's examine the most common patterns with detailed implementations.
Pattern 1: Top-N Per Group
One of the most frequent applications—selecting the top N items within each category. This cannot be efficiently achieved with GROUP BY alone.
12345678910111213141516171819202122
-- Find the 2 highest-paid employees in each departmentWITH ranked_employees AS ( SELECT employee_id, name, department, salary, ROW_NUMBER() OVER ( PARTITION BY department ORDER BY salary DESC ) AS rank_in_dept FROM employees)SELECT employee_id, name, department, salary, rank_in_deptFROM ranked_employeesWHERE rank_in_dept <= 2ORDER BY department, rank_in_dept;Why this pattern is powerful:
Pattern 2: Pagination
For APIs and web applications that display results page by page:
1234567891011121314151617181920212223
-- Pagination: Get page 3 with 10 items per page-- (items 21-30)WITH numbered AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY created_at DESC, id) AS row_num FROM products WHERE category = 'Electronics')SELECT *FROM numberedWHERE row_num BETWEEN 21 AND 30; -- Alternative using subquery (same logic)SELECT *FROM ( SELECT *, ROW_NUMBER() OVER (ORDER BY created_at DESC, id) AS row_num FROM products WHERE category = 'Electronics') AS numberedWHERE row_num BETWEEN 21 AND 30;While LIMIT/OFFSET is simpler, ROW_NUMBER() provides flexibility (filter on row_num in complex ways) and consistency (stable pagination even with concurrent inserts). For keyset pagination (seeking by last-seen ID), ROW_NUMBER() combined with proper indexing outperforms OFFSET for large datasets.
Pattern 3: Deduplication
Removing duplicate rows based on business logic—keeping the most recent, the first, or the one with the highest priority:
1234567891011121314151617181920
-- Remove duplicate orders, keeping only the most recent per customerWITH ranked_orders AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY customer_id, product_id ORDER BY order_date DESC, order_id DESC ) AS dup_rank FROM orders)-- Keep only the first (most recent) occurrenceDELETE FROM ordersWHERE order_id IN ( SELECT order_id FROM ranked_orders WHERE dup_rank > 1); -- Or, for a SELECT that shows only deduplicated data:SELECT *FROM ranked_ordersWHERE dup_rank = 1;Pattern 4: Sequential Numbering for Reports
Generating line numbers, invoice item numbers, or sequence identifiers:
12345678910111213
-- Generate invoice line numbersSELECT invoice_id, ROW_NUMBER() OVER ( PARTITION BY invoice_id ORDER BY line_item_id ) AS line_number, product_name, quantity, unit_price, quantity * unit_price AS line_totalFROM invoice_itemsORDER BY invoice_id, line_number;Beyond basic patterns, ROW_NUMBER() enables sophisticated analytical queries through creative combinations with other SQL features.
Technique 1: Detecting Consecutive Sequences
Identifying runs of consecutive values (dates without gaps, sequential IDs, etc.):
12345678910111213141516171819202122232425262728
-- Find consecutive login days for each user-- Uses the "islands and gaps" techniqueWITH daily_logins AS ( SELECT DISTINCT user_id, DATE(login_time) AS login_date FROM user_sessions),numbered AS ( SELECT user_id, login_date, login_date - INTERVAL ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY login_date ) DAY AS group_id FROM daily_logins)SELECT user_id, MIN(login_date) AS streak_start, MAX(login_date) AS streak_end, COUNT(*) AS streak_lengthFROM numberedGROUP BY user_id, group_idHAVING COUNT(*) >= 3 -- Only streaks of 3+ daysORDER BY user_id, streak_start; -- The "group_id" technique: subtracting row_number from the date-- creates a constant value for consecutive datesThis classic technique leverages the fact that for consecutive values, subtracting row_number produces a constant. Non-consecutive values create different constants, naturally grouping consecutive runs together. It's invaluable for temporal analysis, sequence detection, and session identification.
Technique 2: Conditional Top-N
Combining filtering logic with ranking for complex selection criteria:
123456789101112131415161718192021222324
-- Top 3 recent orders per customer, but only orders over $100-- and only for active customersWITH filtered_ranked AS ( SELECT o.*, c.customer_status, ROW_NUMBER() OVER ( PARTITION BY o.customer_id ORDER BY o.order_date DESC, o.order_id DESC ) AS recency_rank FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id WHERE o.total_amount > 100 AND c.customer_status = 'active')SELECT customer_id, order_id, order_date, total_amount, recency_rankFROM filtered_rankedWHERE recency_rank <= 3ORDER BY customer_id, recency_rank;Technique 3: Multiple Independent Rankings
Computing several different rankings simultaneously for multi-faceted analysis:
12345678910111213141516171819202122
-- Rank products by multiple criteria simultaneouslySELECT product_id, product_name, price, units_sold, average_rating, -- Global rankings ROW_NUMBER() OVER (ORDER BY price ASC) AS price_rank_asc, ROW_NUMBER() OVER (ORDER BY units_sold DESC) AS sales_rank, ROW_NUMBER() OVER (ORDER BY average_rating DESC) AS rating_rank, -- Department-scoped rankings ROW_NUMBER() OVER ( PARTITION BY department_id ORDER BY units_sold DESC ) AS dept_sales_rank, ROW_NUMBER() OVER ( PARTITION BY department_id ORDER BY average_rating DESC ) AS dept_rating_rankFROM productsORDER BY sales_rank;Even experienced developers encounter these pitfalls when working with ROW_NUMBER(). Understanding them helps you write correct queries from the start.
WHERE ROW_NUMBER() OVER ... = 1 is a syntax error.1234567891011121314151617181920212223242526
-- ❌ WRONG: Cannot use window function in WHERESELECT *FROM employeesWHERE ROW_NUMBER() OVER (ORDER BY salary DESC) <= 5;-- Error: Window functions are not allowed in WHERE -- ✅ CORRECT: Use CTE or subqueryWITH ranked AS ( SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn FROM employees)SELECT * FROM ranked WHERE rn <= 5; -- ❌ WRONG: Assuming result is ordered by row_numberSELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rankFROM employees;-- Results may appear in any order! -- ✅ CORRECT: Add explicit ORDER BY for output orderSELECT name, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rankFROM employeesORDER BY rank;Because window functions execute after WHERE/HAVING but during SELECT, you cannot filter on window function results in the same query level. Always wrap window function queries in a CTE or subquery when filtering on the calculated rank.
ROW_NUMBER() is the foundational ranking function—deceptively simple in syntax, extraordinarily powerful in application. Let's consolidate the essential knowledge:
What's Next:
With ROW_NUMBER() thoroughly understood, we'll explore RANK() in the next page. RANK() addresses the limitation of ROW_NUMBER() with ties—instead of assigning arbitrary unique numbers, it gives tied rows the same rank and adjusts subsequent numbers accordingly. Understanding when to choose ROW_NUMBER() versus RANK() is essential for correct ranking semantics.
You now possess comprehensive knowledge of the ROW_NUMBER() function—from basic mechanics through advanced techniques. This foundational understanding prepares you for RANK(), DENSE_RANK(), and NTILE(), which build upon these same concepts with different ranking semantics.