Loading learning content...
Imagine you're analyzing sales data and need to answer: How does today's revenue compare to yesterday's? Or perhaps: What's the percentage change in stock price from the previous trading day? Or even: What will be the next delivery date after the current order?
Traditionally, answering these questions required complex self-joins—joining a table to itself with carefully crafted offset conditions. This approach is not only syntactically cumbersome but also computationally expensive, often resulting in quadratic complexity as the database must match each row against potentially all other rows.
LAG and LEAD are SQL window functions that elegantly solve this problem. They allow you to access data from previous rows (LAG) or subsequent rows (LEAD) relative to the current row, all within a single query pass. This capability transforms how we approach time-series analysis, change detection, gap analysis, and countless other analytical patterns.
Understanding LAG and LEAD is fundamental to mastering SQL analytics because they represent a paradigm shift: instead of thinking in terms of set operations and joins, you learn to think in terms of ordered sequences and row-to-row relationships—a mental model that unlocks powerful analytical capabilities.
By the end of this page, you will understand the complete syntax and semantics of LAG and LEAD functions, including offset parameters, default values, and NULL handling. You'll master partitioning and ordering for complex multi-group analysis, explore real-world use cases from trend detection to gap analysis, and learn performance optimization strategies for production systems.
Before diving into syntax, let's establish a solid conceptual foundation. LAG and LEAD are offset functions within the broader category of window functions (also called analytic functions or OLAP functions).
The Window Function Paradigm:
Unlike aggregate functions that collapse multiple rows into a single result, window functions compute a value for each row based on a window or frame of related rows. The key insight is that the original row is preserved—the computation adds information without reducing the row count.
The Ordered Sequence Model:
LAG and LEAD specifically operate on an ordered sequence of rows. Within this sequence:
Key Terminology:
Partition: A logical grouping of rows over which the window function operates. LAG and LEAD compute values independently within each partition. If no partition is specified, the entire result set is one partition.
Order: The sequence in which rows are arranged within each partition. This ordering is critical for LAG and LEAD—without order, 'previous' and 'next' have no meaning.
Offset: How many rows backward (LAG) or forward (LEAD) to look. An offset of 1 means the immediately preceding/following row; 2 means two rows back/forward.
Default Value: What to return when the offset goes beyond the partition boundary (e.g., LAG on the first row has no previous row).
Think of LAG and LEAD as a pointer that slides through your ordered data. At each row position, LAG peeks backwards and LEAD peeks forwards. The offset parameter controls how far the peek extends. This pointer-based mental model helps when debugging complex window expressions.
Let's examine the complete syntax for both LAG and LEAD functions, including all optional parameters.
12345678910111213
-- Full LAG syntaxLAG(expression [, offset [, default_value]])OVER ( [PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC | DESC] [NULLS FIRST | NULLS LAST], ...) -- Full LEAD syntax LEAD(expression [, offset [, default_value]])OVER ( [PARTITION BY partition_expression, ...] ORDER BY sort_expression [ASC | DESC] [NULLS FIRST | NULLS LAST], ...)| Parameter | Required | Default | Description |
|---|---|---|---|
| expression | Yes | — | The column or expression whose value to retrieve from the offset row |
| offset | No | 1 | Number of rows backward (LAG) or forward (LEAD) from current row |
| default_value | No | NULL | Value returned when offset goes beyond partition boundaries |
| PARTITION BY | No | Entire result set | Divides rows into groups; LAG/LEAD reset at partition boundaries |
| ORDER BY | Yes* | — | Defines row sequence within partition (*required for meaningful results) |
| NULLS FIRST/LAST | No | Implementation-dependent | Where NULLs appear in the ordering |
Important Syntax Notes:
1. ORDER BY is Semantically Required:
While some databases may not syntactically require ORDER BY, omitting it produces nondeterministic results. Without ordering, 'previous' and 'next' are undefined—the database may return any arbitrary row.
2. Offset Must Be Non-Negative:
The offset parameter must be 0 or a positive integer. Zero returns the current row's value (rarely useful). Negative offsets are not supported.
3. Default Value Type Compatibility:
The default value must be implicitly convertible to the expression's data type. Mismatched types cause runtime errors.
123456789101112131415161718192021222324252627282930313233343536373839
-- Example table: daily_sales-- | date | product | revenue |-- |------------|---------|---------|-- | 2024-01-01 | A | 1000 |-- | 2024-01-02 | A | 1200 |-- | 2024-01-03 | A | 950 |-- | 2024-01-04 | A | 1100 | -- Basic LAG: Get previous day's revenueSELECT date, product, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_day_revenueFROM daily_sales; -- Result:-- | date | product | revenue | prev_day_revenue |-- |------------|---------|---------|------------------|-- | 2024-01-01 | A | 1000 | NULL |-- | 2024-01-02 | A | 1200 | 1000 |-- | 2024-01-03 | A | 950 | 1200 |-- | 2024-01-04 | A | 1100 | 950 | -- Basic LEAD: Get next day's revenueSELECT date, product, revenue, LEAD(revenue) OVER (ORDER BY date) AS next_day_revenueFROM daily_sales; -- Result:-- | date | product | revenue | next_day_revenue |-- |------------|---------|---------|------------------|-- | 2024-01-01 | A | 1000 | 1200 |-- | 2024-01-02 | A | 1200 | 950 |-- | 2024-01-03 | A | 950 | 1100 |-- | 2024-01-04 | A | 1100 | NULL |Notice that LAG returns NULL for the first row (no previous row exists) and LEAD returns NULL for the last row (no next row exists). This boundary behavior is critical to understand—it often requires special handling with COALESCE or default values.
The offset parameter extends LAG and LEAD beyond immediate neighbors, while the default value parameter handles boundary conditions gracefully. Let's explore these mechanics in depth.
123456789101112131415161718192021222324252627282930
-- Multi-offset example: Compare current value to multiple previous periodsSELECT date, revenue, LAG(revenue, 1) OVER (ORDER BY date) AS prev_1_day, LAG(revenue, 7) OVER (ORDER BY date) AS prev_7_days, LAG(revenue, 30) OVER (ORDER BY date) AS prev_30_daysFROM daily_sales; -- This enables week-over-week and month-over-month comparisons-- in a single query without self-joins -- Using default values to handle boundariesSELECT date, revenue, LAG(revenue, 1, 0) OVER (ORDER BY date) AS prev_revenue, revenue - LAG(revenue, 1, revenue) OVER (ORDER BY date) AS change_from_prevFROM daily_sales; -- With default = 0: First row shows 0 for prev_revenue-- With default = revenue: First row shows 0 change (compares to itself) -- Expression as default valueSELECT date, revenue, LAG(revenue, 1, revenue * 0.9) OVER (ORDER BY date) AS prev_or_estimateFROM daily_sales;-- First row uses 90% of current revenue as an estimateDynamic Offset Considerations:
The offset parameter must typically be a constant or scalar value—it cannot be a column reference that varies per row. If you need variable offsets, you'll need alternative approaches:
Default Value Strategies:
| Scenario | Default Value | Rationale |
|---|---|---|
| Percentage change | NULL | NULL propagates through division, avoiding division by zero |
| Running difference | 0 | First row shows full value as the initial change |
| Same as current | expression | Boundary rows compare to themselves, showing zero change |
| Conservative estimate | MIN(expression) | Use known minimum when previous is unavailable |
| Flag for handling | -99999 | Sentinel value that application code recognizes and handles specially |
1234567891011121314151617181920212223242526272829303132
-- Pattern 1: Using COALESCE for boundary handlingSELECT date, revenue, COALESCE( LAG(revenue) OVER (ORDER BY date), revenue -- Use current value if no previous exists ) AS prev_or_currentFROM daily_sales; -- Pattern 2: Using CASE for complex boundary logicSELECT date, revenue, CASE WHEN LAG(revenue) OVER (ORDER BY date) IS NULL THEN 'First Record' WHEN revenue > LAG(revenue) OVER (ORDER BY date) THEN 'Increased' WHEN revenue < LAG(revenue) OVER (ORDER BY date) THEN 'Decreased' ELSE 'Unchanged' END AS trendFROM daily_sales; -- Pattern 3: Separate flag for boundary rowsSELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_revenue, CASE WHEN ROW_NUMBER() OVER (ORDER BY date) = 1 THEN TRUE ELSE FALSE END AS is_first_rowFROM daily_sales;A NULL from LAG/LEAD means 'no row at that offset exists.' A default value substitutes for this NULL. Be careful: if your data legitimately contains NULL values, you can't distinguish 'no previous row' from 'previous row has NULL value' unless you use a sentinel default that can't occur naturally in your data.
PARTITION BY is the key to multi-group analysis. It divides the dataset into independent groups, with LAG and LEAD operating separately within each partition. This enables powerful comparative analytics across categories, time periods, or any grouping dimension.
The Partition Boundary Effect:
When rows are partitioned, LAG and LEAD cannot 'see' across partition boundaries. The first row of each partition has no LAG predecessor; the last row has no LEAD successor—even if rows from other partitions physically exist before/after.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Example: Daily sales by product, compare to previous day within same productSELECT date, product, revenue, LAG(revenue) OVER (PARTITION BY product ORDER BY date) AS prev_day_revenue, revenue - LAG(revenue, 1, revenue) OVER (PARTITION BY product ORDER BY date) AS daily_changeFROM daily_salesORDER BY product, date; -- Result:-- | date | product | revenue | prev_day_revenue | daily_change |-- |------------|---------|---------|------------------|--------------|-- | 2024-01-01 | A | 1000 | NULL | 0 |-- | 2024-01-02 | A | 1200 | 1000 | 200 |-- | 2024-01-03 | A | 950 | 1200 | -250 |-- | 2024-01-01 | B | 500 | NULL | 0 | <-- Partition reset-- | 2024-01-02 | B | 600 | 500 | 100 |-- | 2024-01-03 | B | 550 | 600 | -50 | -- Multi-column partition: Compare within product AND regionSELECT date, product, region, revenue, LAG(revenue) OVER ( PARTITION BY product, region ORDER BY date ) AS prev_day_region_revenueFROM regional_salesORDER BY product, region, date; -- Nested analysis: Year-over-year comparison by monthSELECT year, month, product, revenue, LAG(revenue) OVER ( PARTITION BY month, product ORDER BY year ) AS prev_year_same_monthFROM monthly_salesORDER BY product, year, month;Strategic Partitioning Patterns:
High-cardinality partitions (many small groups) are generally more efficient than low-cardinality partitions (few large groups). With billions of rows partitioned by customer_id (millions of customers), each partition is small and fast to process. Be cautious with partitions that create very uneven group sizes.
LAG and LEAD unlock a rich set of analytical patterns that would otherwise require complex self-joins or procedural code. Let's explore the most important real-world applications.
Calculating Changes Over Time:
The most common use case is detecting and quantifying changes between consecutive records.
123456789101112131415161718192021222324252627282930
-- Absolute and percentage change in stock pricesSELECT trade_date, symbol, close_price, LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date) AS prev_close, close_price - LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date) AS price_change, ROUND( 100.0 * (close_price - LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date)) / NULLIF(LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date), 0), 2 ) AS pct_changeFROM stock_pricesWHERE symbol = 'AAPL'ORDER BY trade_date; -- Detecting status changes in order workflowSELECT order_id, status, status_timestamp, LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) AS prev_status, CASE WHEN LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) IS NULL THEN 'Created' WHEN status <> LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) THEN 'Changed' ELSE 'Unchanged' END AS change_type, status_timestamp - LAG(status_timestamp) OVER (PARTITION BY order_id ORDER BY status_timestamp) AS time_in_prev_statusFROM order_status_historyORDER BY order_id, status_timestamp;These patterns often combine. Session analysis might include gap detection (to define session boundaries), change detection (to track state changes within sessions), and trend analysis (to identify behavioral patterns). LAG and LEAD are foundational building blocks for complex analytical logic.
LAG and LEAD are computationally efficient compared to self-join alternatives, but understanding their performance characteristics helps you use them optimally in production systems.
Execution Model:
Internally, LAG and LEAD operate using a single-pass algorithm with buffering. The database:
This is fundamentally more efficient than a self-join, which might require nested loops or hash matching:
Self-Join: O(n²) in worst case, O(n log n) with proper indexing LAG/LEAD: O(n log n) for sorting + O(n) for scan = O(n log n) overall
| Aspect | LAG/LEAD | Self-Join Equivalent |
|---|---|---|
| Complexity | O(n log n) | O(n log n) to O(n²) |
| Table Scans | 1 scan + sort | 2+ scans + join |
| Index Usage | Benefits from order-matching index | Requires join-key index |
| Memory | Buffer for offset rows | Full join hash/sort |
| Parallelization | Partition-level parallel | More limited parallel options |
1234567891011121314151617181920212223242526
-- Inefficient: Self-join approach to get previous valueSELECT t1.date, t1.revenue, t2.revenue AS prev_revenueFROM daily_sales t1LEFT JOIN daily_sales t2 ON t2.date = t1.date - INTERVAL '1 day';-- Requires joining every row; O(n²) without index -- Efficient: LAG approachSELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_revenueFROM daily_sales;-- Single pass with sort; O(n log n) -- Even worse: Correlated subquery approachSELECT date, revenue, (SELECT revenue FROM daily_sales t2 WHERE t2.date = (SELECT MAX(date) FROM daily_sales t3 WHERE t3.date < t1.date)) AS prev_revenueFROM daily_sales t1;-- Multiple subqueries per row; extremely slowPARTITION BY customer_id ORDER BY order_date, create an index on (customer_id, order_date).123456789101112131415161718192021222324252627282930313233
-- Optimization: Index recommendationCREATE INDEX idx_sales_date ON daily_sales(date);-- Or for partitioned queries:CREATE INDEX idx_sales_product_date ON daily_sales(product, date); -- Optimization: Consolidate multiple window calculations-- Instead of computing the same window multiple times:SELECT date, revenue, LAG(revenue, 1) OVER w AS prev_1, LAG(revenue, 7) OVER w AS prev_7, LAG(revenue, 30) OVER w AS prev_30, LEAD(revenue, 1) OVER w AS next_1FROM daily_salesWINDOW w AS (ORDER BY date); -- Define window once, reuse -- Optimization: Filter early-- Less efficient: windowing then filteringSELECT * FROM ( SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev FROM daily_sales) tWHERE date >= '2024-01-01'; -- More efficient: filtering then windowingWITH filtered AS ( SELECT date, revenue FROM daily_sales WHERE date >= '2024-01-01')SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prevFROM filtered;If your data isn't already sorted (no matching index), the database must perform a full sort—O(n log n). For very large tables, this sort can dominate execution time and require significant temporary storage. Always check EXPLAIN plans for 'Sort' operations and consider indexes.
Beyond basic usage, LAG and LEAD enable sophisticated analytical patterns. Let's explore advanced techniques and handle edge cases that arise in production environments.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
-- Pattern: Find first/last occurrence with LAG-- Identify the first purchase date for each customer sessionWITH purchase_events AS ( SELECT customer_id, event_date, event_type, LAG(event_type) OVER (PARTITION BY customer_id ORDER BY event_date) AS prev_event FROM customer_events)SELECT *FROM purchase_eventsWHERE event_type = 'purchase' AND (prev_event IS NULL OR prev_event != 'purchase'); -- Pattern: Interleaved LAG and LEAD for bidirectional analysisSELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_revenue, LEAD(revenue) OVER (ORDER BY date) AS next_revenue, CASE WHEN revenue > LAG(revenue) OVER (ORDER BY date) AND revenue > LEAD(revenue) OVER (ORDER BY date) THEN 'Local Maximum' WHEN revenue < LAG(revenue) OVER (ORDER BY date) AND revenue < LEAD(revenue) OVER (ORDER BY date) THEN 'Local Minimum' ELSE 'Neither' END AS peak_classificationFROM daily_sales; -- Pattern: Accessing non-adjacent offset correlation-- Compare current value to value 3 periods ago AND 3 periods aheadSELECT date, revenue, LAG(revenue, 3) OVER (ORDER BY date) AS revenue_3_ago, LEAD(revenue, 3) OVER (ORDER BY date) AS revenue_3_ahead, CASE WHEN LAG(revenue, 3) OVER (ORDER BY date) IS NOT NULL AND LEAD(revenue, 3) OVER (ORDER BY date) IS NOT NULL THEN (LAG(revenue, 3) OVER (ORDER BY date) + revenue + LEAD(revenue, 3) OVER (ORDER BY date)) / 3.0 ELSE NULL END AS symmetric_avgFROM daily_sales;Handling NULL Values in Source Data:
NULL values in your data require careful handling. LAG will return a NULL if the previous row's value is NULL—this is different from LAG returning NULL because there's no previous row.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- Problem: Distinguishing 'no previous row' from 'previous row is NULL'-- Solution: Use additional context -- Method 1: Combine with ROW_NUMBERSELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev_revenue, ROW_NUMBER() OVER (ORDER BY date) AS row_num, CASE WHEN ROW_NUMBER() OVER (ORDER BY date) = 1 THEN 'First Row (No Previous)' WHEN LAG(revenue) OVER (ORDER BY date) IS NULL THEN 'Previous Value is NULL' ELSE 'Has Previous Value' END AS lag_statusFROM daily_sales; -- Method 2: Use COALESCE with sentinel valueSELECT date, revenue, LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) AS prev_with_sentinel, CASE WHEN LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) IS NULL THEN 'First Row' WHEN LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) = -1 THEN 'Previous was NULL' ELSE 'Normal Previous' END AS lag_interpretationFROM daily_sales; -- Pattern: IGNORE NULLS functionality (available in some databases)-- Oracle/PostgreSQL 14+:SELECT date, revenue, LAG(revenue IGNORE NULLS) OVER (ORDER BY date) AS last_non_null_revenueFROM daily_sales; -- Alternative for databases without IGNORE NULLS:SELECT date, revenue, (SELECT revenue FROM daily_sales s2 WHERE s2.date < s1.date AND s2.revenue IS NOT NULL ORDER BY s2.date DESC LIMIT 1) AS last_non_null_revenueFROM daily_sales s1;Some databases offer IGNORE NULLS with LAG/LEAD, which skips NULL values when looking for the previous/next value. Oracle has long supported this; PostgreSQL added it in version 14. Check your database documentation for availability—it simplifies many NULL-handling patterns significantly.
12345678910111213141516171819202122232425262728
-- Edge Case: Empty partitions-- When a partition has only one row, LAG and LEAD both return NULLSELECT customer_id, order_date, revenue, LAG(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev, LEAD(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS nextFROM orders;-- Single-order customers have NULL for both prev and next -- Edge Case: Ties in ORDER BY-- When order is not unique, behavior is nondeterministic-- Always add a tiebreaker columnSELECT date, transaction_id, -- tiebreaker amount, LAG(amount) OVER (ORDER BY date, transaction_id) AS prev_amount -- deterministicFROM transactions; -- Edge Case: Offset larger than partition sizeSELECT date, revenue, LAG(revenue, 100) OVER (ORDER BY date) AS prev_100, -- likely NULL for most rows LAG(revenue, 100, -1) OVER (ORDER BY date) AS prev_100_with_default -- returns -1 insteadFROM daily_sales; -- assuming less than 100 rowsLAG and LEAD are foundational analytic functions that transform how you approach row-to-row analysis. Let's consolidate the key concepts from this comprehensive exploration.
What's Next:
Now that you've mastered LAG and LEAD, we'll explore FIRST_VALUE and LAST_VALUE—functions that access the first or last value within a window frame. While LAG and LEAD use fixed offsets, FIRST_VALUE and LAST_VALUE work with window frame boundaries, enabling different but complementary analytical patterns.
You now possess deep understanding of LAG and LEAD functions—from basic syntax through advanced patterns and performance optimization. These functions eliminate the need for complex self-joins and open up powerful analytical capabilities. Practice applying these patterns to your own data to solidify your mastery.