Database Management SystemsAnalytic Functions

Analytic Functions

LevelAdvanced

Duration120 mins

TopicAnalytic Functions

1 / 5

LAG and LEAD

Accessing Neighboring Rows Without Self-Joins

Imagine you're analyzing sales data and need to answer: How does today's revenue compare to yesterday's? Or perhaps: What's the percentage change in stock price from the previous trading day? Or even: What will be the next delivery date after the current order?

Traditionally, answering these questions required complex self-joins—joining a table to itself with carefully crafted offset conditions. This approach is not only syntactically cumbersome but also computationally expensive, often resulting in quadratic complexity as the database must match each row against potentially all other rows.

LAG and LEAD are SQL window functions that elegantly solve this problem. They allow you to access data from previous rows (LAG) or subsequent rows (LEAD) relative to the current row, all within a single query pass. This capability transforms how we approach time-series analysis, change detection, gap analysis, and countless other analytical patterns.

Understanding LAG and LEAD is fundamental to mastering SQL analytics because they represent a paradigm shift: instead of thinking in terms of set operations and joins, you learn to think in terms of ordered sequences and row-to-row relationships—a mental model that unlocks powerful analytical capabilities.

What You Will Learn

By the end of this page, you will understand the complete syntax and semantics of LAG and LEAD functions, including offset parameters, default values, and NULL handling. You'll master partitioning and ordering for complex multi-group analysis, explore real-world use cases from trend detection to gap analysis, and learn performance optimization strategies for production systems.

Understanding the Conceptual Model

Before diving into syntax, let's establish a solid conceptual foundation. LAG and LEAD are offset functions within the broader category of window functions (also called analytic functions or OLAP functions).

The Window Function Paradigm:

Unlike aggregate functions that collapse multiple rows into a single result, window functions compute a value for each row based on a window or frame of related rows. The key insight is that the original row is preserved—the computation adds information without reducing the row count.

The Ordered Sequence Model:

LAG and LEAD specifically operate on an ordered sequence of rows. Within this sequence:

LAG looks backward to access values from rows that come before the current row
LEAD looks forward to access values from rows that come after the current row

Converting Mermaid diagram...

Key Terminology:

Partition: A logical grouping of rows over which the window function operates. LAG and LEAD compute values independently within each partition. If no partition is specified, the entire result set is one partition.

Order: The sequence in which rows are arranged within each partition. This ordering is critical for LAG and LEAD—without order, 'previous' and 'next' have no meaning.

Offset: How many rows backward (LAG) or forward (LEAD) to look. An offset of 1 means the immediately preceding/following row; 2 means two rows back/forward.

Default Value: What to return when the offset goes beyond the partition boundary (e.g., LAG on the first row has no previous row).

Mental Model: Sliding Window Pointer

Think of LAG and LEAD as a pointer that slides through your ordered data. At each row position, LAG peeks backwards and LEAD peeks forwards. The offset parameter controls how far the peek extends. This pointer-based mental model helps when debugging complex window expressions.

Complete Syntax Reference

Let's examine the complete syntax for both LAG and LEAD functions, including all optional parameters.

lag_lead_syntax.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Full LAG syntax
LAG(expression [, offset [, default_value]])
OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC] [NULLS FIRST | NULLS LAST], ...
)
 
-- Full LEAD syntax  
LEAD(expression [, offset [, default_value]])
OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC] [NULLS FIRST | NULLS LAST], ...
)

LAG and LEAD Parameters Explained
Parameter	Required	Default	Description
expression	Yes	—	The column or expression whose value to retrieve from the offset row
offset	No	1	Number of rows backward (LAG) or forward (LEAD) from current row
default_value	No	NULL	Value returned when offset goes beyond partition boundaries
PARTITION BY	No	Entire result set	Divides rows into groups; LAG/LEAD reset at partition boundaries
ORDER BY	Yes*	—	Defines row sequence within partition (*required for meaningful results)
NULLS FIRST/LAST	No	Implementation-dependent	Where NULLs appear in the ordering

Important Syntax Notes:

1. ORDER BY is Semantically Required:

While some databases may not syntactically require ORDER BY, omitting it produces nondeterministic results. Without ordering, 'previous' and 'next' are undefined—the database may return any arbitrary row.

2. Offset Must Be Non-Negative:

The offset parameter must be 0 or a positive integer. Zero returns the current row's value (rarely useful). Negative offsets are not supported.

3. Default Value Type Compatibility:

The default value must be implicitly convertible to the expression's data type. Mismatched types cause runtime errors.

basic_lag_lead_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Example table: daily_sales
-- | date       | product | revenue |
-- |------------|---------|---------|
-- | 2024-01-01 | A       | 1000    |
-- | 2024-01-02 | A       | 1200    |
-- | 2024-01-03 | A       | 950     |
-- | 2024-01-04 | A       | 1100    |
 
-- Basic LAG: Get previous day's revenue
SELECT 
    date,
    product,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_day_revenue
FROM daily_sales;
 
-- Result:
-- | date       | product | revenue | prev_day_revenue |
-- |------------|---------|---------|------------------|
-- | 2024-01-01 | A       | 1000    | NULL             |
-- | 2024-01-02 | A       | 1200    | 1000             |
-- | 2024-01-03 | A       | 950     | 1200             |
-- | 2024-01-04 | A       | 1100    | 950              |
 
-- Basic LEAD: Get next day's revenue
SELECT 
    date,
    product,
    revenue,
    LEAD(revenue) OVER (ORDER BY date) AS next_day_revenue
FROM daily_sales;
 
-- Result:
-- | date       | product | revenue | next_day_revenue |
-- |------------|---------|---------|------------------|
-- | 2024-01-01 | A       | 1000    | 1200             |
-- | 2024-01-02 | A       | 1200    | 950              |
-- | 2024-01-03 | A       | 950     | 1100             |
-- | 2024-01-04 | A       | 1100    | NULL             |

NULL in First/Last Rows

Notice that LAG returns NULL for the first row (no previous row exists) and LEAD returns NULL for the last row (no next row exists). This boundary behavior is critical to understand—it often requires special handling with COALESCE or default values.

Offset and Default Value Mechanics

The offset parameter extends LAG and LEAD beyond immediate neighbors, while the default value parameter handles boundary conditions gracefully. Let's explore these mechanics in depth.

offset_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Multi-offset example: Compare current value to multiple previous periods
SELECT 
    date,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY date) AS prev_1_day,
    LAG(revenue, 7) OVER (ORDER BY date) AS prev_7_days,
    LAG(revenue, 30) OVER (ORDER BY date) AS prev_30_days
FROM daily_sales;
 
-- This enables week-over-week and month-over-month comparisons
-- in a single query without self-joins
 
-- Using default values to handle boundaries
SELECT 
    date,
    revenue,
    LAG(revenue, 1, 0) OVER (ORDER BY date) AS prev_revenue,
    revenue - LAG(revenue, 1, revenue) OVER (ORDER BY date) AS change_from_prev
FROM daily_sales;
 
-- With default = 0: First row shows 0 for prev_revenue
-- With default = revenue: First row shows 0 change (compares to itself)
 
-- Expression as default value
SELECT 
    date,
    revenue,
    LAG(revenue, 1, revenue * 0.9) OVER (ORDER BY date) AS prev_or_estimate
FROM daily_sales;
-- First row uses 90% of current revenue as an estimate

Dynamic Offset Considerations:

The offset parameter must typically be a constant or scalar value—it cannot be a column reference that varies per row. If you need variable offsets, you'll need alternative approaches:

Self-joins with calculated offsets
ROWS/RANGE frame specifications (covered in Module 4)
Stored procedure logic

Default Value Strategies:

Common Default Value Patterns
Scenario	Default Value	Rationale
Percentage change	`NULL`	NULL propagates through division, avoiding division by zero
Running difference	`0`	First row shows full value as the initial change
Same as current	`expression`	Boundary rows compare to themselves, showing zero change
Conservative estimate	`MIN(expression)`	Use known minimum when previous is unavailable
Flag for handling	`-99999`	Sentinel value that application code recognizes and handles specially

null_handling_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Pattern 1: Using COALESCE for boundary handling
SELECT 
    date,
    revenue,
    COALESCE(
        LAG(revenue) OVER (ORDER BY date),
        revenue  -- Use current value if no previous exists
    ) AS prev_or_current
FROM daily_sales;
 
-- Pattern 2: Using CASE for complex boundary logic
SELECT 
    date,
    revenue,
    CASE 
        WHEN LAG(revenue) OVER (ORDER BY date) IS NULL THEN 'First Record'
        WHEN revenue > LAG(revenue) OVER (ORDER BY date) THEN 'Increased'
        WHEN revenue < LAG(revenue) OVER (ORDER BY date) THEN 'Decreased'
        ELSE 'Unchanged'
    END AS trend
FROM daily_sales;
 
-- Pattern 3: Separate flag for boundary rows
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
    CASE 
        WHEN ROW_NUMBER() OVER (ORDER BY date) = 1 THEN TRUE 
        ELSE FALSE 
    END AS is_first_row
FROM daily_sales;

NULL vs Default: Different Semantics

A NULL from LAG/LEAD means 'no row at that offset exists.' A default value substitutes for this NULL. Be careful: if your data legitimately contains NULL values, you can't distinguish 'no previous row' from 'previous row has NULL value' unless you use a sentinel default that can't occur naturally in your data.

Partitioning for Grouped Analysis

PARTITION BY is the key to multi-group analysis. It divides the dataset into independent groups, with LAG and LEAD operating separately within each partition. This enables powerful comparative analytics across categories, time periods, or any grouping dimension.

The Partition Boundary Effect:

When rows are partitioned, LAG and LEAD cannot 'see' across partition boundaries. The first row of each partition has no LAG predecessor; the last row has no LEAD successor—even if rows from other partitions physically exist before/after.

Converting Mermaid diagram...

partition_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Example: Daily sales by product, compare to previous day within same product
SELECT 
    date,
    product,
    revenue,
    LAG(revenue) OVER (PARTITION BY product ORDER BY date) AS prev_day_revenue,
    revenue - LAG(revenue, 1, revenue) OVER (PARTITION BY product ORDER BY date) AS daily_change
FROM daily_sales
ORDER BY product, date;
 
-- Result:
-- | date       | product | revenue | prev_day_revenue | daily_change |
-- |------------|---------|---------|------------------|--------------|
-- | 2024-01-01 | A       | 1000    | NULL             | 0            |
-- | 2024-01-02 | A       | 1200    | 1000             | 200          |
-- | 2024-01-03 | A       | 950     | 1200             | -250         |
-- | 2024-01-01 | B       | 500     | NULL             | 0            | <-- Partition reset
-- | 2024-01-02 | B       | 600     | 500              | 100          |
-- | 2024-01-03 | B       | 550     | 600              | -50          |
 
-- Multi-column partition: Compare within product AND region
SELECT 
    date,
    product,
    region,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY product, region 
        ORDER BY date
    ) AS prev_day_region_revenue
FROM regional_sales
ORDER BY product, region, date;
 
-- Nested analysis: Year-over-year comparison by month
SELECT 
    year,
    month,
    product,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY month, product 
        ORDER BY year
    ) AS prev_year_same_month
FROM monthly_sales
ORDER BY product, year, month;

Strategic Partitioning Patterns:

Common Partitioning Strategies

•Time-series by entity: PARTITION BY customer_id ORDER BY transaction_date — Compare customer's current purchase to their previous purchase
•Geographic rollups: PARTITION BY state ORDER BY date — Track trends within each state independently
•Hierarchical analysis: PARTITION BY department, team ORDER BY hire_date — Compare employee tenure within organizational units
•Periodic comparison: PARTITION BY day_of_week ORDER BY date — Compare each Monday to the previous Monday
•Category comparison: PARTITION BY product_category ORDER BY revenue DESC — Rank products within their category and compare to next-ranked

Partition Cardinality Matters

High-cardinality partitions (many small groups) are generally more efficient than low-cardinality partitions (few large groups). With billions of rows partitioned by customer_id (millions of customers), each partition is small and fast to process. Be cautious with partitions that create very uneven group sizes.

Real-World Use Cases

LAG and LEAD unlock a rich set of analytical patterns that would otherwise require complex self-joins or procedural code. Let's explore the most important real-world applications.

Calculating Changes Over Time:

The most common use case is detecting and quantifying changes between consecutive records.

change_detection.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Absolute and percentage change in stock prices
SELECT 
    trade_date,
    symbol,
    close_price,
    LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date) AS prev_close,
    close_price - LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date) AS price_change,
    ROUND(
        100.0 * (close_price - LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date)) 
        / NULLIF(LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date), 0),
        2
    ) AS pct_change
FROM stock_prices
WHERE symbol = 'AAPL'
ORDER BY trade_date;
 
-- Detecting status changes in order workflow
SELECT 
    order_id,
    status,
    status_timestamp,
    LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) AS prev_status,
    CASE 
        WHEN LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) IS NULL THEN 'Created'
        WHEN status <> LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) THEN 'Changed'
        ELSE 'Unchanged'
    END AS change_type,
    status_timestamp - LAG(status_timestamp) OVER (PARTITION BY order_id ORDER BY status_timestamp) AS time_in_prev_status
FROM order_status_history
ORDER BY order_id, status_timestamp;

Combining Use Cases

These patterns often combine. Session analysis might include gap detection (to define session boundaries), change detection (to track state changes within sessions), and trend analysis (to identify behavioral patterns). LAG and LEAD are foundational building blocks for complex analytical logic.

Performance Considerations

LAG and LEAD are computationally efficient compared to self-join alternatives, but understanding their performance characteristics helps you use them optimally in production systems.

Execution Model:

Internally, LAG and LEAD operate using a single-pass algorithm with buffering. The database:

Sorts (or uses existing index order for) the partition data
Maintains a buffer of recent rows (for LAG) or reads ahead (for LEAD)
Processes each row once, looking up values from the buffer

This is fundamentally more efficient than a self-join, which might require nested loops or hash matching:

Self-Join: O(n²) in worst case, O(n log n) with proper indexing LAG/LEAD: O(n log n) for sorting + O(n) for scan = O(n log n) overall

Performance: LAG/LEAD vs Self-Join
Aspect	LAG/LEAD	Self-Join Equivalent
Complexity	O(n log n)	O(n log n) to O(n²)
Table Scans	1 scan + sort	2+ scans + join
Index Usage	Benefits from order-matching index	Requires join-key index
Memory	Buffer for offset rows	Full join hash/sort
Parallelization	Partition-level parallel	More limited parallel options

performance_comparison.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Inefficient: Self-join approach to get previous value
SELECT 
    t1.date,
    t1.revenue,
    t2.revenue AS prev_revenue
FROM daily_sales t1
LEFT JOIN daily_sales t2 
    ON t2.date = t1.date - INTERVAL '1 day';
-- Requires joining every row; O(n²) without index
 
-- Efficient: LAG approach
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue
FROM daily_sales;
-- Single pass with sort; O(n log n)
 
-- Even worse: Correlated subquery approach
SELECT 
    date,
    revenue,
    (SELECT revenue FROM daily_sales t2 
     WHERE t2.date = (SELECT MAX(date) FROM daily_sales t3 WHERE t3.date < t1.date)) AS prev_revenue
FROM daily_sales t1;
-- Multiple subqueries per row; extremely slow

Performance Optimization Strategies

•Create indexes matching window ORDER BY: An index on (partition_columns, order_column) eliminates sorting overhead. For PARTITION BY customer_id ORDER BY order_date, create an index on (customer_id, order_date).
•Minimize window function calls: Multiple LAG calls with the same window can often be consolidated. Use a CTE to compute once and reference multiple times.
•Filter before windowing: Apply WHERE clauses before computing window functions to reduce the window function's input size.
•Be cautious with large offsets: While LAG(value, 365) is logically valid, it requires buffering 365 rows per partition. Consider alternative approaches for very large offsets.
•Partition for parallelism: Partitioned window functions can process each partition in parallel. Ensure partitioning aligns with your parallel execution capabilities.

optimization_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Optimization: Index recommendation
CREATE INDEX idx_sales_date ON daily_sales(date);
-- Or for partitioned queries:
CREATE INDEX idx_sales_product_date ON daily_sales(product, date);
 
-- Optimization: Consolidate multiple window calculations
-- Instead of computing the same window multiple times:
SELECT 
    date,
    revenue,
    LAG(revenue, 1) OVER w AS prev_1,
    LAG(revenue, 7) OVER w AS prev_7,
    LAG(revenue, 30) OVER w AS prev_30,
    LEAD(revenue, 1) OVER w AS next_1
FROM daily_sales
WINDOW w AS (ORDER BY date);  -- Define window once, reuse
 
-- Optimization: Filter early
-- Less efficient: windowing then filtering
SELECT * FROM (
    SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev
    FROM daily_sales
) t
WHERE date >= '2024-01-01';
 
-- More efficient: filtering then windowing
WITH filtered AS (
    SELECT date, revenue
    FROM daily_sales
    WHERE date >= '2024-01-01'
)
SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev
FROM filtered;

Sorting Cost

If your data isn't already sorted (no matching index), the database must perform a full sort—O(n log n). For very large tables, this sort can dominate execution time and require significant temporary storage. Always check EXPLAIN plans for 'Sort' operations and consider indexes.

Advanced Patterns and Edge Cases

Beyond basic usage, LAG and LEAD enable sophisticated analytical patterns. Let's explore advanced techniques and handle edge cases that arise in production environments.

advanced_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- Pattern: Find first/last occurrence with LAG
-- Identify the first purchase date for each customer session
WITH purchase_events AS (
    SELECT 
        customer_id,
        event_date,
        event_type,
        LAG(event_type) OVER (PARTITION BY customer_id ORDER BY event_date) AS prev_event
    FROM customer_events
)
SELECT *
FROM purchase_events
WHERE event_type = 'purchase' AND (prev_event IS NULL OR prev_event != 'purchase');
 
-- Pattern: Interleaved LAG and LEAD for bidirectional analysis
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
    LEAD(revenue) OVER (ORDER BY date) AS next_revenue,
    CASE 
        WHEN revenue > LAG(revenue) OVER (ORDER BY date) 
         AND revenue > LEAD(revenue) OVER (ORDER BY date) 
        THEN 'Local Maximum'
        WHEN revenue < LAG(revenue) OVER (ORDER BY date) 
         AND revenue < LEAD(revenue) OVER (ORDER BY date) 
        THEN 'Local Minimum'
        ELSE 'Neither'
    END AS peak_classification
FROM daily_sales;
 
-- Pattern: Accessing non-adjacent offset correlation
-- Compare current value to value 3 periods ago AND 3 periods ahead
SELECT 
    date,
    revenue,
    LAG(revenue, 3) OVER (ORDER BY date) AS revenue_3_ago,
    LEAD(revenue, 3) OVER (ORDER BY date) AS revenue_3_ahead,
    CASE 
        WHEN LAG(revenue, 3) OVER (ORDER BY date) IS NOT NULL 
         AND LEAD(revenue, 3) OVER (ORDER BY date) IS NOT NULL
        THEN (LAG(revenue, 3) OVER (ORDER BY date) + 
              revenue + 
              LEAD(revenue, 3) OVER (ORDER BY date)) / 3.0
        ELSE NULL 
    END AS symmetric_avg
FROM daily_sales;

Handling NULL Values in Source Data:

NULL values in your data require careful handling. LAG will return a NULL if the previous row's value is NULL—this is different from LAG returning NULL because there's no previous row.

null_handling_advanced.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Problem: Distinguishing 'no previous row' from 'previous row is NULL'
-- Solution: Use additional context
 
-- Method 1: Combine with ROW_NUMBER
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
    ROW_NUMBER() OVER (ORDER BY date) AS row_num,
    CASE 
        WHEN ROW_NUMBER() OVER (ORDER BY date) = 1 THEN 'First Row (No Previous)'
        WHEN LAG(revenue) OVER (ORDER BY date) IS NULL THEN 'Previous Value is NULL'
        ELSE 'Has Previous Value'
    END AS lag_status
FROM daily_sales;
 
-- Method 2: Use COALESCE with sentinel value
SELECT 
    date,
    revenue,
    LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) AS prev_with_sentinel,
    CASE 
        WHEN LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) IS NULL THEN 'First Row'
        WHEN LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) = -1 THEN 'Previous was NULL'
        ELSE 'Normal Previous'
    END AS lag_interpretation
FROM daily_sales;
 
-- Pattern: IGNORE NULLS functionality (available in some databases)
-- Oracle/PostgreSQL 14+:
SELECT 
    date,
    revenue,
    LAG(revenue IGNORE NULLS) OVER (ORDER BY date) AS last_non_null_revenue
FROM daily_sales;
 
-- Alternative for databases without IGNORE NULLS:
SELECT 
    date,
    revenue,
    (SELECT revenue FROM daily_sales s2 
     WHERE s2.date < s1.date AND s2.revenue IS NOT NULL
     ORDER BY s2.date DESC LIMIT 1) AS last_non_null_revenue
FROM daily_sales s1;

Database-Specific Features

Some databases offer IGNORE NULLS with LAG/LEAD, which skips NULL values when looking for the previous/next value. Oracle has long supported this; PostgreSQL added it in version 14. Check your database documentation for availability—it simplifies many NULL-handling patterns significantly.

edge_cases.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Edge Case: Empty partitions
-- When a partition has only one row, LAG and LEAD both return NULL
SELECT 
    customer_id,
    order_date,
    revenue,
    LAG(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev,
    LEAD(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS next
FROM orders;
-- Single-order customers have NULL for both prev and next
 
-- Edge Case: Ties in ORDER BY
-- When order is not unique, behavior is nondeterministic
-- Always add a tiebreaker column
SELECT 
    date,
    transaction_id,  -- tiebreaker
    amount,
    LAG(amount) OVER (ORDER BY date, transaction_id) AS prev_amount  -- deterministic
FROM transactions;
 
-- Edge Case: Offset larger than partition size
SELECT 
    date,
    revenue,
    LAG(revenue, 100) OVER (ORDER BY date) AS prev_100,  -- likely NULL for most rows
    LAG(revenue, 100, -1) OVER (ORDER BY date) AS prev_100_with_default  -- returns -1 instead
FROM daily_sales;  -- assuming less than 100 rows

Summary: LAG and LEAD

LAG and LEAD are foundational analytic functions that transform how you approach row-to-row analysis. Let's consolidate the key concepts from this comprehensive exploration.

Key Takeaways

•LAG looks backward, LEAD looks forward: LAG accesses values from previous rows; LEAD accesses values from subsequent rows in an ordered sequence.
•Offset controls distance: Default offset is 1 (immediate predecessor/successor); larger offsets look further into the sequence.
•Default values handle boundaries: When offset exceeds partition bounds, default value (NULL if unspecified) is returned.
•PARTITION BY creates independent groups: LAG/LEAD operate separately within each partition; boundaries reset at partition changes.
•ORDER BY is essential: Without ordering, 'previous' and 'next' are undefined; always specify ORDER BY for deterministic results.
•Far more efficient than self-joins: Single-pass algorithm with buffering vs. potentially quadratic join operations.
•Enables powerful analytics: Change detection, trend analysis, gap finding, session analysis, and countless other patterns.
•Watch for NULL complexity: Distinguish between NULL from offset exceeding bounds vs. NULL in the source data.

What's Next:

Now that you've mastered LAG and LEAD, we'll explore FIRST_VALUE and LAST_VALUE—functions that access the first or last value within a window frame. While LAG and LEAD use fixed offsets, FIRST_VALUE and LAST_VALUE work with window frame boundaries, enabling different but complementary analytical patterns.

Page Complete

You now possess deep understanding of LAG and LEAD functions—from basic syntax through advanced patterns and performance optimization. These functions eliminate the need for complex self-joins and open up powerful analytical capabilities. Practice applying these patterns to your own data to solidify your mastery.

1 / 5

Loading learning content...

Database Management SystemsAnalytic Functions

Analytic Functions

LevelAdvanced

Duration120 mins

TopicAnalytic Functions

1 / 5

LAG and LEAD

Accessing Neighboring Rows Without Self-Joins

What You Will Learn

Understanding the Conceptual Model

The Window Function Paradigm:

The Ordered Sequence Model:

LAG and LEAD specifically operate on an ordered sequence of rows. Within this sequence:

LAG looks backward to access values from rows that come before the current row
LEAD looks forward to access values from rows that come after the current row

Converting Mermaid diagram...

Key Terminology:

Order: The sequence in which rows are arranged within each partition. This ordering is critical for LAG and LEAD—without order, 'previous' and 'next' have no meaning.

Offset: How many rows backward (LAG) or forward (LEAD) to look. An offset of 1 means the immediately preceding/following row; 2 means two rows back/forward.

Default Value: What to return when the offset goes beyond the partition boundary (e.g., LAG on the first row has no previous row).

Mental Model: Sliding Window Pointer

Complete Syntax Reference

Let's examine the complete syntax for both LAG and LEAD functions, including all optional parameters.

lag_lead_syntax.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
-- Full LAG syntax
LAG(expression [, offset [, default_value]])
OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC] [NULLS FIRST | NULLS LAST], ...
)
 
-- Full LEAD syntax  
LEAD(expression [, offset [, default_value]])
OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression [ASC | DESC] [NULLS FIRST | NULLS LAST], ...
)

LAG and LEAD Parameters Explained
Parameter	Required	Default	Description
expression	Yes	—	The column or expression whose value to retrieve from the offset row
offset	No	1	Number of rows backward (LAG) or forward (LEAD) from current row
default_value	No	NULL	Value returned when offset goes beyond partition boundaries
PARTITION BY	No	Entire result set	Divides rows into groups; LAG/LEAD reset at partition boundaries
ORDER BY	Yes*	—	Defines row sequence within partition (*required for meaningful results)
NULLS FIRST/LAST	No	Implementation-dependent	Where NULLs appear in the ordering

Important Syntax Notes:

1. ORDER BY is Semantically Required:

2. Offset Must Be Non-Negative:

The offset parameter must be 0 or a positive integer. Zero returns the current row's value (rarely useful). Negative offsets are not supported.

3. Default Value Type Compatibility:

The default value must be implicitly convertible to the expression's data type. Mismatched types cause runtime errors.

basic_lag_lead_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Example table: daily_sales
-- | date       | product | revenue |
-- |------------|---------|---------|
-- | 2024-01-01 | A       | 1000    |
-- | 2024-01-02 | A       | 1200    |
-- | 2024-01-03 | A       | 950     |
-- | 2024-01-04 | A       | 1100    |
 
-- Basic LAG: Get previous day's revenue
SELECT 
    date,
    product,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_day_revenue
FROM daily_sales;
 
-- Result:
-- | date       | product | revenue | prev_day_revenue |
-- |------------|---------|---------|------------------|
-- | 2024-01-01 | A       | 1000    | NULL             |
-- | 2024-01-02 | A       | 1200    | 1000             |
-- | 2024-01-03 | A       | 950     | 1200             |
-- | 2024-01-04 | A       | 1100    | 950              |
 
-- Basic LEAD: Get next day's revenue
SELECT 
    date,
    product,
    revenue,
    LEAD(revenue) OVER (ORDER BY date) AS next_day_revenue
FROM daily_sales;
 
-- Result:
-- | date       | product | revenue | next_day_revenue |
-- |------------|---------|---------|------------------|
-- | 2024-01-01 | A       | 1000    | 1200             |
-- | 2024-01-02 | A       | 1200    | 950              |
-- | 2024-01-03 | A       | 950     | 1100             |
-- | 2024-01-04 | A       | 1100    | NULL             |

NULL in First/Last Rows

Offset and Default Value Mechanics

The offset parameter extends LAG and LEAD beyond immediate neighbors, while the default value parameter handles boundary conditions gracefully. Let's explore these mechanics in depth.

offset_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Multi-offset example: Compare current value to multiple previous periods
SELECT 
    date,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY date) AS prev_1_day,
    LAG(revenue, 7) OVER (ORDER BY date) AS prev_7_days,
    LAG(revenue, 30) OVER (ORDER BY date) AS prev_30_days
FROM daily_sales;
 
-- This enables week-over-week and month-over-month comparisons
-- in a single query without self-joins
 
-- Using default values to handle boundaries
SELECT 
    date,
    revenue,
    LAG(revenue, 1, 0) OVER (ORDER BY date) AS prev_revenue,
    revenue - LAG(revenue, 1, revenue) OVER (ORDER BY date) AS change_from_prev
FROM daily_sales;
 
-- With default = 0: First row shows 0 for prev_revenue
-- With default = revenue: First row shows 0 change (compares to itself)
 
-- Expression as default value
SELECT 
    date,
    revenue,
    LAG(revenue, 1, revenue * 0.9) OVER (ORDER BY date) AS prev_or_estimate
FROM daily_sales;
-- First row uses 90% of current revenue as an estimate

Dynamic Offset Considerations:

The offset parameter must typically be a constant or scalar value—it cannot be a column reference that varies per row. If you need variable offsets, you'll need alternative approaches:

Self-joins with calculated offsets
ROWS/RANGE frame specifications (covered in Module 4)
Stored procedure logic

Default Value Strategies:

Common Default Value Patterns
Scenario	Default Value	Rationale
Percentage change	`NULL`	NULL propagates through division, avoiding division by zero
Running difference	`0`	First row shows full value as the initial change
Same as current	`expression`	Boundary rows compare to themselves, showing zero change
Conservative estimate	`MIN(expression)`	Use known minimum when previous is unavailable
Flag for handling	`-99999`	Sentinel value that application code recognizes and handles specially

null_handling_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Pattern 1: Using COALESCE for boundary handling
SELECT 
    date,
    revenue,
    COALESCE(
        LAG(revenue) OVER (ORDER BY date),
        revenue  -- Use current value if no previous exists
    ) AS prev_or_current
FROM daily_sales;
 
-- Pattern 2: Using CASE for complex boundary logic
SELECT 
    date,
    revenue,
    CASE 
        WHEN LAG(revenue) OVER (ORDER BY date) IS NULL THEN 'First Record'
        WHEN revenue > LAG(revenue) OVER (ORDER BY date) THEN 'Increased'
        WHEN revenue < LAG(revenue) OVER (ORDER BY date) THEN 'Decreased'
        ELSE 'Unchanged'
    END AS trend
FROM daily_sales;
 
-- Pattern 3: Separate flag for boundary rows
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
    CASE 
        WHEN ROW_NUMBER() OVER (ORDER BY date) = 1 THEN TRUE 
        ELSE FALSE 
    END AS is_first_row
FROM daily_sales;

NULL vs Default: Different Semantics

Partitioning for Grouped Analysis

The Partition Boundary Effect:

Converting Mermaid diagram...

partition_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- Example: Daily sales by product, compare to previous day within same product
SELECT 
    date,
    product,
    revenue,
    LAG(revenue) OVER (PARTITION BY product ORDER BY date) AS prev_day_revenue,
    revenue - LAG(revenue, 1, revenue) OVER (PARTITION BY product ORDER BY date) AS daily_change
FROM daily_sales
ORDER BY product, date;
 
-- Result:
-- | date       | product | revenue | prev_day_revenue | daily_change |
-- |------------|---------|---------|------------------|--------------|
-- | 2024-01-01 | A       | 1000    | NULL             | 0            |
-- | 2024-01-02 | A       | 1200    | 1000             | 200          |
-- | 2024-01-03 | A       | 950     | 1200             | -250         |
-- | 2024-01-01 | B       | 500     | NULL             | 0            | <-- Partition reset
-- | 2024-01-02 | B       | 600     | 500              | 100          |
-- | 2024-01-03 | B       | 550     | 600              | -50          |
 
-- Multi-column partition: Compare within product AND region
SELECT 
    date,
    product,
    region,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY product, region 
        ORDER BY date
    ) AS prev_day_region_revenue
FROM regional_sales
ORDER BY product, region, date;
 
-- Nested analysis: Year-over-year comparison by month
SELECT 
    year,
    month,
    product,
    revenue,
    LAG(revenue) OVER (
        PARTITION BY month, product 
        ORDER BY year
    ) AS prev_year_same_month
FROM monthly_sales
ORDER BY product, year, month;

Strategic Partitioning Patterns:

Common Partitioning Strategies

•Time-series by entity: PARTITION BY customer_id ORDER BY transaction_date — Compare customer's current purchase to their previous purchase
•Geographic rollups: PARTITION BY state ORDER BY date — Track trends within each state independently
•Hierarchical analysis: PARTITION BY department, team ORDER BY hire_date — Compare employee tenure within organizational units
•Periodic comparison: PARTITION BY day_of_week ORDER BY date — Compare each Monday to the previous Monday
•Category comparison: PARTITION BY product_category ORDER BY revenue DESC — Rank products within their category and compare to next-ranked

Partition Cardinality Matters

Real-World Use Cases

LAG and LEAD unlock a rich set of analytical patterns that would otherwise require complex self-joins or procedural code. Let's explore the most important real-world applications.

Calculating Changes Over Time:

The most common use case is detecting and quantifying changes between consecutive records.

change_detection.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Absolute and percentage change in stock prices
SELECT 
    trade_date,
    symbol,
    close_price,
    LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date) AS prev_close,
    close_price - LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date) AS price_change,
    ROUND(
        100.0 * (close_price - LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date)) 
        / NULLIF(LAG(close_price) OVER (PARTITION BY symbol ORDER BY trade_date), 0),
        2
    ) AS pct_change
FROM stock_prices
WHERE symbol = 'AAPL'
ORDER BY trade_date;
 
-- Detecting status changes in order workflow
SELECT 
    order_id,
    status,
    status_timestamp,
    LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) AS prev_status,
    CASE 
        WHEN LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) IS NULL THEN 'Created'
        WHEN status <> LAG(status) OVER (PARTITION BY order_id ORDER BY status_timestamp) THEN 'Changed'
        ELSE 'Unchanged'
    END AS change_type,
    status_timestamp - LAG(status_timestamp) OVER (PARTITION BY order_id ORDER BY status_timestamp) AS time_in_prev_status
FROM order_status_history
ORDER BY order_id, status_timestamp;

Combining Use Cases

Performance Considerations

LAG and LEAD are computationally efficient compared to self-join alternatives, but understanding their performance characteristics helps you use them optimally in production systems.

Execution Model:

Internally, LAG and LEAD operate using a single-pass algorithm with buffering. The database:

Sorts (or uses existing index order for) the partition data
Maintains a buffer of recent rows (for LAG) or reads ahead (for LEAD)
Processes each row once, looking up values from the buffer

This is fundamentally more efficient than a self-join, which might require nested loops or hash matching:

Self-Join: O(n²) in worst case, O(n log n) with proper indexing LAG/LEAD: O(n log n) for sorting + O(n) for scan = O(n log n) overall

Performance: LAG/LEAD vs Self-Join
Aspect	LAG/LEAD	Self-Join Equivalent
Complexity	O(n log n)	O(n log n) to O(n²)
Table Scans	1 scan + sort	2+ scans + join
Index Usage	Benefits from order-matching index	Requires join-key index
Memory	Buffer for offset rows	Full join hash/sort
Parallelization	Partition-level parallel	More limited parallel options

performance_comparison.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Inefficient: Self-join approach to get previous value
SELECT 
    t1.date,
    t1.revenue,
    t2.revenue AS prev_revenue
FROM daily_sales t1
LEFT JOIN daily_sales t2 
    ON t2.date = t1.date - INTERVAL '1 day';
-- Requires joining every row; O(n²) without index
 
-- Efficient: LAG approach
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue
FROM daily_sales;
-- Single pass with sort; O(n log n)
 
-- Even worse: Correlated subquery approach
SELECT 
    date,
    revenue,
    (SELECT revenue FROM daily_sales t2 
     WHERE t2.date = (SELECT MAX(date) FROM daily_sales t3 WHERE t3.date < t1.date)) AS prev_revenue
FROM daily_sales t1;
-- Multiple subqueries per row; extremely slow

Performance Optimization Strategies

•Create indexes matching window ORDER BY: An index on (partition_columns, order_column) eliminates sorting overhead. For PARTITION BY customer_id ORDER BY order_date, create an index on (customer_id, order_date).
•Minimize window function calls: Multiple LAG calls with the same window can often be consolidated. Use a CTE to compute once and reference multiple times.
•Filter before windowing: Apply WHERE clauses before computing window functions to reduce the window function's input size.
•Be cautious with large offsets: While LAG(value, 365) is logically valid, it requires buffering 365 rows per partition. Consider alternative approaches for very large offsets.
•Partition for parallelism: Partitioned window functions can process each partition in parallel. Ensure partitioning aligns with your parallel execution capabilities.

optimization_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Optimization: Index recommendation
CREATE INDEX idx_sales_date ON daily_sales(date);
-- Or for partitioned queries:
CREATE INDEX idx_sales_product_date ON daily_sales(product, date);
 
-- Optimization: Consolidate multiple window calculations
-- Instead of computing the same window multiple times:
SELECT 
    date,
    revenue,
    LAG(revenue, 1) OVER w AS prev_1,
    LAG(revenue, 7) OVER w AS prev_7,
    LAG(revenue, 30) OVER w AS prev_30,
    LEAD(revenue, 1) OVER w AS next_1
FROM daily_sales
WINDOW w AS (ORDER BY date);  -- Define window once, reuse
 
-- Optimization: Filter early
-- Less efficient: windowing then filtering
SELECT * FROM (
    SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev
    FROM daily_sales
) t
WHERE date >= '2024-01-01';
 
-- More efficient: filtering then windowing
WITH filtered AS (
    SELECT date, revenue
    FROM daily_sales
    WHERE date >= '2024-01-01'
)
SELECT date, revenue, LAG(revenue) OVER (ORDER BY date) AS prev
FROM filtered;

Sorting Cost

Advanced Patterns and Edge Cases

Beyond basic usage, LAG and LEAD enable sophisticated analytical patterns. Let's explore advanced techniques and handle edge cases that arise in production environments.

advanced_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
-- Pattern: Find first/last occurrence with LAG
-- Identify the first purchase date for each customer session
WITH purchase_events AS (
    SELECT 
        customer_id,
        event_date,
        event_type,
        LAG(event_type) OVER (PARTITION BY customer_id ORDER BY event_date) AS prev_event
    FROM customer_events
)
SELECT *
FROM purchase_events
WHERE event_type = 'purchase' AND (prev_event IS NULL OR prev_event != 'purchase');
 
-- Pattern: Interleaved LAG and LEAD for bidirectional analysis
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
    LEAD(revenue) OVER (ORDER BY date) AS next_revenue,
    CASE 
        WHEN revenue > LAG(revenue) OVER (ORDER BY date) 
         AND revenue > LEAD(revenue) OVER (ORDER BY date) 
        THEN 'Local Maximum'
        WHEN revenue < LAG(revenue) OVER (ORDER BY date) 
         AND revenue < LEAD(revenue) OVER (ORDER BY date) 
        THEN 'Local Minimum'
        ELSE 'Neither'
    END AS peak_classification
FROM daily_sales;
 
-- Pattern: Accessing non-adjacent offset correlation
-- Compare current value to value 3 periods ago AND 3 periods ahead
SELECT 
    date,
    revenue,
    LAG(revenue, 3) OVER (ORDER BY date) AS revenue_3_ago,
    LEAD(revenue, 3) OVER (ORDER BY date) AS revenue_3_ahead,
    CASE 
        WHEN LAG(revenue, 3) OVER (ORDER BY date) IS NOT NULL 
         AND LEAD(revenue, 3) OVER (ORDER BY date) IS NOT NULL
        THEN (LAG(revenue, 3) OVER (ORDER BY date) + 
              revenue + 
              LEAD(revenue, 3) OVER (ORDER BY date)) / 3.0
        ELSE NULL 
    END AS symmetric_avg
FROM daily_sales;

Handling NULL Values in Source Data:

NULL values in your data require careful handling. LAG will return a NULL if the previous row's value is NULL—this is different from LAG returning NULL because there's no previous row.

null_handling_advanced.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Problem: Distinguishing 'no previous row' from 'previous row is NULL'
-- Solution: Use additional context
 
-- Method 1: Combine with ROW_NUMBER
SELECT 
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
    ROW_NUMBER() OVER (ORDER BY date) AS row_num,
    CASE 
        WHEN ROW_NUMBER() OVER (ORDER BY date) = 1 THEN 'First Row (No Previous)'
        WHEN LAG(revenue) OVER (ORDER BY date) IS NULL THEN 'Previous Value is NULL'
        ELSE 'Has Previous Value'
    END AS lag_status
FROM daily_sales;
 
-- Method 2: Use COALESCE with sentinel value
SELECT 
    date,
    revenue,
    LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) AS prev_with_sentinel,
    CASE 
        WHEN LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) IS NULL THEN 'First Row'
        WHEN LAG(COALESCE(revenue, -1)) OVER (ORDER BY date) = -1 THEN 'Previous was NULL'
        ELSE 'Normal Previous'
    END AS lag_interpretation
FROM daily_sales;
 
-- Pattern: IGNORE NULLS functionality (available in some databases)
-- Oracle/PostgreSQL 14+:
SELECT 
    date,
    revenue,
    LAG(revenue IGNORE NULLS) OVER (ORDER BY date) AS last_non_null_revenue
FROM daily_sales;
 
-- Alternative for databases without IGNORE NULLS:
SELECT 
    date,
    revenue,
    (SELECT revenue FROM daily_sales s2 
     WHERE s2.date < s1.date AND s2.revenue IS NOT NULL
     ORDER BY s2.date DESC LIMIT 1) AS last_non_null_revenue
FROM daily_sales s1;

Database-Specific Features

edge_cases.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Edge Case: Empty partitions
-- When a partition has only one row, LAG and LEAD both return NULL
SELECT 
    customer_id,
    order_date,
    revenue,
    LAG(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev,
    LEAD(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS next
FROM orders;
-- Single-order customers have NULL for both prev and next
 
-- Edge Case: Ties in ORDER BY
-- When order is not unique, behavior is nondeterministic
-- Always add a tiebreaker column
SELECT 
    date,
    transaction_id,  -- tiebreaker
    amount,
    LAG(amount) OVER (ORDER BY date, transaction_id) AS prev_amount  -- deterministic
FROM transactions;
 
-- Edge Case: Offset larger than partition size
SELECT 
    date,
    revenue,
    LAG(revenue, 100) OVER (ORDER BY date) AS prev_100,  -- likely NULL for most rows
    LAG(revenue, 100, -1) OVER (ORDER BY date) AS prev_100_with_default  -- returns -1 instead
FROM daily_sales;  -- assuming less than 100 rows

Summary: LAG and LEAD

LAG and LEAD are foundational analytic functions that transform how you approach row-to-row analysis. Let's consolidate the key concepts from this comprehensive exploration.

Key Takeaways

•LAG looks backward, LEAD looks forward: LAG accesses values from previous rows; LEAD accesses values from subsequent rows in an ordered sequence.
•Offset controls distance: Default offset is 1 (immediate predecessor/successor); larger offsets look further into the sequence.
•Default values handle boundaries: When offset exceeds partition bounds, default value (NULL if unspecified) is returned.
•PARTITION BY creates independent groups: LAG/LEAD operate separately within each partition; boundaries reset at partition changes.
•ORDER BY is essential: Without ordering, 'previous' and 'next' are undefined; always specify ORDER BY for deterministic results.
•Far more efficient than self-joins: Single-pass algorithm with buffering vs. potentially quadratic join operations.
•Enables powerful analytics: Change detection, trend analysis, gap finding, session analysis, and countless other patterns.
•Watch for NULL complexity: Distinguish between NULL from offset exceeding bounds vs. NULL in the source data.

What's Next:

Page Complete

1 / 5