Loading learning content...
Consider a common business question: "For each sale, show the sale amount alongside the department's total and the sale's percentage of that total."
With traditional SQL, you face an uncomfortable choice. You can use GROUP BY to compute department totals, but then you lose individual sale rows. Or you can keep individual rows, but then you need subqueries or self-joins to bring in aggregate values. Either way, the query becomes convoluted, inefficient, or both.
This limitation persists across countless analytical scenarios:
Window functions solve this fundamental problem. Introduced in SQL:2003 and now supported by all major databases, window functions perform calculations across sets of table rows that are somehow related to the current row—all while preserving the individual row context that GROUP BY destroys.
By the end of this page, you will understand what window functions are, why they're necessary, how they fundamentally differ from aggregate functions, and the conceptual model of 'windows' that gives them their name. This foundation is essential before diving into specific syntax and clauses.
To truly appreciate window functions, we must first understand the limitation they address. Let's examine a concrete scenario that illustrates the gap in traditional SQL.
Scenario: Employee Salary Analysis
Suppose we have an employees table:
| emp_id | name | department | salary |
|---|---|---|---|
| 1 | Alice | Engineering | 95000 |
| 2 | Bob | Engineering | 85000 |
| 3 | Carol | Engineering | 90000 |
| 4 | David | Sales | 70000 |
| 5 | Eve | Sales | 75000 |
| 6 | Frank | HR | 60000 |
The business asks: "For each employee, show their salary, their department's average salary, and how their salary compares to that average."
This requires two things simultaneously:
Traditional SQL forces a choice between these requirements.
1234567891011121314
-- GROUP BY gives us department averages...SELECT department, AVG(salary) as avg_salaryFROM employeesGROUP BY department; -- Result:-- department | avg_salary-- ------------|------------ Engineering | 90000-- Sales | 72500-- HR | 60000 -- But we've LOST individual employee rows!-- We can no longer see Alice, Bob, Carol, etc.GROUP BY collapses rows into groups. We get one row per department, not one row per employee. The individual employee context is destroyed—exactly what we don't want.
A window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions with GROUP BY, window functions do not collapse rows—they compute a value for each row based on a "window" of rows related to it.
The formal definition:
A window function computes a value for each row in the query result based on a subset of rows (the window) defined by the
OVERclause. The window may include all rows, rows in the same partition, or a frame of rows relative to the current row.
The key insight:
Think of a window function as giving each row "visibility" into other rows around it. Instead of each row being isolated, it can see and perform calculations using values from related rows—without merging with them.
The term 'window' comes from the concept of a sliding window or frame through which each row views a subset of data. Imagine looking through a window that shows you surrounding rows—that's the conceptual model. The window can show your entire partition, or just nearby rows, or even a custom frame.
The anatomy of a window function call:
12345678910111213141516171819202122232425262728
-- General syntax:function_name(arguments) OVER (window_specification) -- The window_specification can include:-- 1. PARTITION BY - defines how to group rows into partitions-- 2. ORDER BY - defines row ordering within partitions -- 3. Frame clause - defines which rows around current row to include -- Full syntax:function_name(arguments) OVER ( PARTITION BY column1, column2, ... ORDER BY column3, column4, ... frame_clause) -- Examples of increasing complexity: -- Example 1: All rows in entire result setAVG(salary) OVER () -- Example 2: All rows in same departmentAVG(salary) OVER (PARTITION BY department) -- Example 3: Running total within departmentSUM(sales) OVER (PARTITION BY department ORDER BY sale_date) -- Example 4: Moving average of last 3 rowsAVG(price) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)Understanding the distinction between window functions and aggregate functions is fundamental. They share some functions (like SUM, AVG, COUNT) but operate on fundamentally different paradigms.
The key difference:
| Aspect | Aggregate Function (GROUP BY) | Window Function (OVER) |
|---|---|---|
| Row Output | One row per group | One row per input row |
| Row Identity | Lost (rows collapsed) | Preserved |
| Calculation Scope | Entire group | Window per row |
| Use Case | Summary reports | Analytical queries |
Visual illustration:
Consider three sales rows for the 'North' region:
12345678910111213141516171819202122232425262728293031323334353637
-- Sample data:-- | sale_id | region | amount |-- |---------|--------|--------|-- | 1 | North | 100 |-- | 2 | North | 200 |-- | 3 | North | 150 | -- AGGREGATE with GROUP BY:SELECT region, SUM(amount) as totalFROM salesGROUP BY region; -- Result: -- | region | total |-- |--------|-------|-- | North | 450 |-- -- 3 rows → 1 row (collapsed)-- Individual sales are GONE -- WINDOW FUNCTION:SELECT sale_id, region, amount, SUM(amount) OVER (PARTITION BY region) as region_totalFROM sales; -- Result:-- | sale_id | region | amount | region_total |-- |---------|--------|--------|--------------|-- | 1 | North | 100 | 450 |-- | 2 | North | 200 | 450 |-- | 3 | North | 150 | 450 |---- 3 rows → 3 rows (preserved)-- Each sale visible WITH its group totalNotice that SUM() is the same function in both cases. The difference is not in the function itself—it's in how it's invoked. With GROUP BY, SUM() is an aggregate function that collapses rows. With OVER, the same SUM() becomes a window function that preserves rows while adding aggregate context to each.
Window functions fall into several categories based on their purpose. Understanding these categories helps you match the right function to your analytical need.
| Category | Functions | Purpose | ORDER BY Required? |
|---|---|---|---|
| Aggregate | SUM, AVG, COUNT, MIN, MAX | Apply traditional aggregates over window | Optional (affects frame) |
| Ranking | ROW_NUMBER, RANK, DENSE_RANK, NTILE | Assign position/rank to rows | Required |
| Value/Offset | LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE | Access values from other rows | Required |
| Distribution | PERCENT_RANK, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DISC | Calculate statistical distributions | Required |
Category 1: Aggregate Window Functions
These are the familiar aggregate functions (SUM, AVG, COUNT, MIN, MAX) used with OVER instead of GROUP BY. They compute aggregates while preserving individual rows.
123456789101112131415161718192021222324
-- Running totalSELECT order_date, amount, SUM(amount) OVER (ORDER BY order_date) as running_totalFROM orders; -- Percentage of category totalSELECT product_name, category, sales, sales * 100.0 / SUM(sales) OVER (PARTITION BY category) as pct_of_categoryFROM products; -- Moving averageSELECT date, price, AVG(price) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) as seven_day_avgFROM stock_prices;Category 2: Ranking Functions
These assign a numeric rank or position to each row based on ORDER BY criteria. Essential for "top N" queries and competitive rankings.
1234567891011121314151617181920212223242526272829
-- Row number (unique, sequential)SELECT name, score, ROW_NUMBER() OVER (ORDER BY score DESC) as positionFROM contestants; -- Rank with gaps (ties get same rank, next rank skipped)SELECT name, score, RANK() OVER (ORDER BY score DESC) as rankFROM contestants;-- If two people tie for 1st, next is 3rd -- Dense rank (no gaps)SELECT name, score, DENSE_RANK() OVER (ORDER BY score DESC) as dense_rankFROM contestants;-- If two people tie for 1st, next is 2nd -- NTILE (divide into buckets)SELECT student_name, score, NTILE(4) OVER (ORDER BY score DESC) as quartileFROM exam_results;Category 3: Value/Offset Functions
These access values from rows other than the current row—either by relative position (LAG, LEAD) or absolute position (FIRST_VALUE, LAST_VALUE, NTH_VALUE).
12345678910111213141516171819202122232425
-- LAG: access previous row's valueSELECT date, revenue, LAG(revenue, 1, 0) OVER (ORDER BY date) as prev_revenue, revenue - LAG(revenue, 1, 0) OVER (ORDER BY date) as revenue_changeFROM monthly_revenue; -- LEAD: access next row's value SELECT flight_id, departure_time, LEAD(departure_time) OVER (ORDER BY departure_time) as next_departureFROM flights; -- FIRST_VALUE / LAST_VALUESELECT product, category, price, FIRST_VALUE(product) OVER ( PARTITION BY category ORDER BY price DESC ) as most_expensive_in_categoryFROM products;ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE, and NTH_VALUE are window-only functions. They ONLY work with OVER—they cannot be used as aggregate functions with GROUP BY. This is because their semantics depend on row-level positioning that GROUP BY destroys.
Understanding how window functions execute helps you reason about their behavior and optimize their use. Here's the conceptual processing model:
Step 1: The query runs normally up through GROUP BY and HAVING
Window functions operate on the result set after WHERE, GROUP BY, and HAVING filters have been applied. They see the post-aggregation rows (if any aggregation occurred).
Step 2: For each row in the result, the database:
Step 3: ORDER BY (if present in outer query) sorts final output
Window function calculations are complete before the final ORDER BY clause executes.
Because window functions execute during SELECT (step 5), you cannot use window function results in WHERE, GROUP BY, or HAVING clauses. If you need to filter on a window function result, you must wrap the query in a subquery or CTE and filter in the outer query.
123456789101112131415161718192021222324
-- WRONG: Cannot use window function in WHERESELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rankFROM employeesWHERE RANK() OVER (ORDER BY salary DESC) <= 5; -- ERROR! -- CORRECT: Use subquery or CTESELECT * FROM ( SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM employees) rankedWHERE rank <= 5; -- Or with CTE (cleaner):WITH ranked_employees AS ( SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM employees)SELECT * FROM ranked_employees WHERE rank <= 5;The "Sliding Window" Mental Model
For each row being processed, imagine a window—a frame—that shows a subset of rows in the partition. The window function looks through this window and computes its value.
Window functions shine in analytical scenarios where you need both detail and context. Here are the canonical use cases:
Decision Framework: Aggregate vs Window Function
| If You Need... | Use | Example |
|---|---|---|
| One row per group (summary only) | GROUP BY + Aggregate | Total sales per region |
| All rows plus group context | Window Function | Each sale with region total |
| To filter groups | GROUP BY + HAVING | Regions with sales > 1M |
| To filter on computed position | Window + Subquery | Top 3 per region |
| Previous/next row values | Window (LAG/LEAD) | Change from last month |
| Ranking with ties handled | Window (RANK/DENSE_RANK) | Competition standings |
Window functions can be computationally expensive for large datasets, especially with complex frame specifications. However, they're typically far more efficient than equivalent self-join or correlated subquery approaches. Modern query optimizers handle window functions well, often computing multiple window functions in a single pass through the data.
Window functions are part of the SQL:2003 standard and are now supported by all major relational databases. However, there are some differences in specific function availability and syntax:
| Feature | PostgreSQL | MySQL 8.0+ | SQL Server | Oracle | SQLite 3.25+ |
|---|---|---|---|---|---|
| Core window functions | ✅ Full | ✅ Full | ✅ Full | ✅ Full | ✅ Full |
| ROW_NUMBER, RANK, DENSE_RANK | ✅ | ✅ | ✅ | ✅ | ✅ |
| LAG, LEAD | ✅ | ✅ | ✅ | ✅ | ✅ |
| FIRST_VALUE, LAST_VALUE | ✅ | ✅ | ✅ | ✅ | ✅ |
| NTH_VALUE | ✅ | ✅ | ✅ | ✅ | ✅ |
| NTILE | ✅ | ✅ | ✅ | ✅ | ✅ |
| Frame clause (ROWS/RANGE) | ✅ | ✅ | ✅ | ✅ | ✅ |
| GROUPS frame | ✅ | ❌ | ❌ | ✅ | ✅ |
| EXCLUDE clause | ✅ | ❌ | ❌ | ❌ | ✅ |
MySQL only added window function support in version 8.0 (released 2018). If you're working with MySQL 5.7 or earlier, window functions are not available—you'll need to use the workarounds (self-joins, correlated subqueries) discussed earlier.
Syntax variations to be aware of:
While the core syntax is standardized, databases have minor variations:
WINDOW clauseWe've established the foundational concepts of window functions. Let's consolidate the key takeaways:
What's next:
Now that we understand what window functions are and why they exist, we'll dive into the specific clauses that define window behavior. The next page covers the OVER clause in depth—the core syntax element that transforms an ordinary function into a window function.
You now understand the fundamental concept of window functions—one of SQL's most powerful analytical features. You can articulate why they exist, how they differ from aggregate functions, and when to use them. Next, we'll master the OVER clause that brings window functions to life.