Database Management SystemsAnalytic Functions

Analytic Functions

LevelAdvanced

Duration120 mins

TopicAnalytic Functions

4 / 5

Moving Averages

Smoothing Through Windowed Computation

Raw time-series data is often noisy. Daily sales fluctuate due to weather, day-of-week effects, and random variation. Stock prices jump on news. Sensor readings contain measurement error. To see the underlying trend beneath this noise, we need smoothing.

Moving averages are the workhorse of time-series smoothing. Instead of using all historical data (like a running average), a moving average considers only a recent window of values—the last 7 days, the last 20 data points, the last hour. As new data arrives, old data drops out; the window "moves" forward.

This sliding window approach has profound advantages:

Responsive: Reflects recent conditions, not distant history
Smooth: Averages out short-term noise
Comparable: Equal weight across the entire window
Bounded memory: Only needs to track the window size

SQL window functions make moving averages elegant to express. By specifying frame boundaries like ROWS BETWEEN 6 PRECEDING AND CURRENT ROW, you create a 7-row sliding window that the database efficiently maintains as it scans your data.

What You Will Learn

By the end of this page, you will understand the theory behind simple, weighted, and exponential moving averages, master SQL frame specifications for sliding window calculations, handle edge cases at data boundaries, apply moving averages to trend detection and anomaly identification, and optimize window computations for production performance.

Moving Average Fundamentals

A moving average (MA), also called a rolling average or sliding window average, is the average of the most recent n values in an ordered sequence. As you progress through the sequence, the window slides forward, dropping the oldest value and incorporating the newest.

Mathematical Definition:

For a sequence of values v₁, v₂, ..., vₘ and window size n, the n-period moving average at position i (where i ≥ n) is:

MA(i, n) = (vᵢ₋ₙ₊₁ + vᵢ₋ₙ₊₂ + ... + vᵢ) / n

Example Calculation:

3-Day Moving Average Example
Day	Sales	Window Values	3-Day MA
1	100	—	(insufficient data)
2	150	—	(insufficient data)
3	80	[100, 150, 80]	(100+150+80)/3 = 110
4	200	[150, 80, 200]	(150+80+200)/3 = 143.3
5	120	[80, 200, 120]	(80+200+120)/3 = 133.3
6	180	[200, 120, 180]	(200+120+180)/3 = 166.7

Converting Mermaid diagram...

Key Properties of Moving Averages:

1. Smoothing Effect: Random fluctuations average out; larger windows produce smoother results but respond more slowly to changes.

2. Lag: The moving average lags behind the actual data because it incorporates historical values. A 7-day MA effectively represents conditions from 3-4 days ago.

3. Edge Behavior: At the start of data, there aren't enough values to fill the window. Different strategies handle this (partial windows, NULL, or waiting for sufficient data).

4. Window Size Trade-off: Larger windows smooth more noise but react slowly to real changes. Smaller windows are more responsive but noisier.

Choosing Window Size

Window size should reflect the time scale of the patterns you want to retain vs. remove. For weekly patterns, use 7-day windows. For monthly trends, use 20-30 day windows. For quarterly analysis, use 60-90 day windows. There's no 'correct' size—it depends on your analytical goals.

SQL Syntax for Moving Averages

Moving averages in SQL use aggregate functions with carefully specified frame boundaries. The frame defines which rows constitute the sliding window.

moving_average_syntax.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Basic moving average syntax
AVG(expression) OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression
    ROWS BETWEEN n PRECEDING AND CURRENT ROW
)
 
-- 7-day moving average of daily sales
SELECT 
    sale_date,
    daily_revenue,
    ROUND(AVG(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2) AS moving_avg_7day
FROM daily_sales
ORDER BY sale_date;
 
-- Result:
-- | sale_date  | daily_revenue | moving_avg_7day |
-- |------------|---------------|-----------------|
-- | 2024-01-01 | 1000          | 1000.00         | (only 1 value)
-- | 2024-01-02 | 1200          | 1100.00         | (2 values)
-- | 2024-01-03 | 950           | 1050.00         | (3 values)
-- | 2024-01-04 | 1100          | 1062.50         | (4 values)
-- | 2024-01-05 | 1300          | 1110.00         | (5 values)
-- | 2024-01-06 | 900           | 1075.00         | (6 values)
-- | 2024-01-07 | 1150          | 1085.71         | (7 values - full window)
-- | 2024-01-08 | 1250          | 1121.43         | (7 values, Jan 1 dropped)

Understanding Frame Boundaries:

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means:

Start: 6 rows before the current row
End: The current row itself
Total: Up to 7 rows in the window

Why 6 PRECEDING for 7-day? Because the window includes the current row plus the 6 preceding rows = 7 total rows.

Frame Specifications for Common Moving Averages
Desired Window	Frame Specification	Total Rows
3-period MA	`ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`	3
7-day MA	`ROWS BETWEEN 6 PRECEDING AND CURRENT ROW`	7
20-day MA	`ROWS BETWEEN 19 PRECEDING AND CURRENT ROW`	20
52-week MA	`ROWS BETWEEN 51 PRECEDING AND CURRENT ROW`	52
Centered 5-period	`ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING`	5

multiple_moving_averages.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Multiple moving averages for trend comparison
SELECT 
    trade_date,
    close_price,
    ROUND(AVG(close_price) OVER (
        ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    ), 2) AS ma_5day,
    ROUND(AVG(close_price) OVER (
        ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
    ), 2) AS ma_20day,
    ROUND(AVG(close_price) OVER (
        ORDER BY trade_date ROWS BETWEEN 49 PRECEDING AND CURRENT ROW
    ), 2) AS ma_50day
FROM stock_prices
WHERE symbol = 'AAPL'
ORDER BY trade_date;
 
-- Using WINDOW clause for clean syntax
SELECT 
    trade_date,
    close_price,
    ROUND(AVG(close_price) OVER w5, 2) AS ma_5day,
    ROUND(AVG(close_price) OVER w20, 2) AS ma_20day,
    ROUND(AVG(close_price) OVER w50, 2) AS ma_50day
FROM stock_prices
WHERE symbol = 'AAPL'
WINDOW 
    w5 AS (ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW),
    w20 AS (ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW),
    w50 AS (ORDER BY trade_date ROWS BETWEEN 49 PRECEDING AND CURRENT ROW)
ORDER BY trade_date;

Partial Windows at Edges

Notice how the first 6 days show averages of fewer than 7 values. SQL computes the average of whatever rows exist in the frame. If you need strictly 7-day averages (NULL for insufficient data), add a CASE expression checking row count.

Trailing, Leading, and Centered Windows

Moving averages can be computed using different window alignments relative to the current row. Each alignment has distinct properties and use cases.

Trailing (Backward-Looking) Window:

Includes the current row and preceding rows. Most common for real-time analysis because it only uses data available at each point in time.

ROWS BETWEEN n PRECEDING AND CURRENT ROW

Leading (Forward-Looking) Window:

Includes the current row and following rows. Useful for planning scenarios or when analyzing historical data where future values are known.

ROWS BETWEEN CURRENT ROW AND n FOLLOWING

Centered Window:

Includes rows before and after the current row. Minimizes lag by centering the window, but cannot be used for real-time prediction (requires future data).

ROWS BETWEEN n PRECEDING AND n FOLLOWING

window_alignments.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Comparison of window alignments
SELECT 
    date,
    value,
    -- Trailing: uses past data only (real-time applicable)
    ROUND(AVG(value) OVER (
        ORDER BY date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS trailing_ma_3,
    
    -- Leading: uses future data only (planning perspective)
    ROUND(AVG(value) OVER (
        ORDER BY date 
        ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
    ), 2) AS leading_ma_3,
    
    -- Centered: uses both past and future (minimal lag, not real-time)
    ROUND(AVG(value) OVER (
        ORDER BY date 
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ), 2) AS centered_ma_3
FROM daily_values
ORDER BY date;
 
-- Example results showing alignment differences:
-- | date       | value | trailing_ma_3 | leading_ma_3 | centered_ma_3 |
-- |------------|-------|---------------|--------------|---------------|
-- | 2024-01-01 | 100   | 100.00        | 110.00       | 125.00        |
-- | 2024-01-02 | 150   | 125.00        | 143.33       | 110.00        |
-- | 2024-01-03 | 80    | 110.00        | 126.67       | 143.33        |
-- | 2024-01-04 | 200   | 143.33        | 166.67       | 133.33        |
-- | 2024-01-05 | 100   | 126.67        | 160.00       | 160.00        |

Window Alignment Comparison
Alignment	Lag	Real-time Use	Best For
Trailing	High (centered on past)	Yes	Dashboards, trading signals, real-time monitoring
Leading	High (centered on future)	No	Planning, what-if analysis, retrospective smoothing
Centered	Minimal	No	Historical analysis, trend identification, anomaly detection

Avoid Data Leakage

In predictive modeling and backtesting, using leading or centered windows creates 'data leakage'—your model sees future information that wouldn't be available in real-time. Always use trailing windows for models that will make real-time predictions.

ROWS vs RANGE for Time-Based Windows

When computing moving averages over time, you have two approaches: counting physical rows or measuring time ranges. The choice matters significantly when data has gaps.

rows_vs_range_time.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Data with missing days (market closed on weekends, holidays)
-- | trade_date | price |
-- |------------|-------|
-- | 2024-01-02 | 100   | (Tuesday - market open)
-- | 2024-01-03 | 105   | (Wednesday)
-- | 2024-01-04 | 102   | (Thursday)
-- | 2024-01-05 | 108   | (Friday)
-- | 2024-01-08 | 110   | (Monday - weekend gap!)
-- | 2024-01-09 | 107   | (Tuesday)
 
-- ROWS: 3-row moving average (ignores time gaps)
SELECT 
    trade_date,
    price,
    ROUND(AVG(price) OVER (
        ORDER BY trade_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS ma_3_rows
FROM stock_prices;
-- Jan 8 includes: Jan 4, Jan 5, Jan 8 (ignores weekend non-existence)
 
-- RANGE: 3-DAY moving average (respects actual time)
SELECT 
    trade_date,
    price,
    ROUND(AVG(price) OVER (
        ORDER BY trade_date 
        RANGE BETWEEN INTERVAL '3 days' PRECEDING AND CURRENT ROW
    ), 2) AS ma_3_days
FROM stock_prices;
-- Jan 8 includes: Jan 5, Jan 8 (Jan 6, 7 don't exist; Jan 4 is > 3 days ago)

ROWS vs RANGE for Time-Series Moving Averages
Aspect	ROWS	RANGE
Boundary meaning	Physical row count	Actual time/value distance
With data gaps	Includes n rows regardless of time spans	Respects actual time distances
Window size	Always exactly n rows (if available)	Variable based on data density
Performance	Often faster (simpler computation)	May be slower (value comparisons)
Use when	Regular, evenly-spaced data	Irregular data with gaps

range_interval_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Various RANGE specifications for time-based windows
 
-- Last 7 calendar days (not 7 data points)
AVG(value) OVER (ORDER BY event_time RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW)
 
-- Last 1 hour for high-frequency data
AVG(value) OVER (ORDER BY event_time RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW)
 
-- Last 30 minutes
AVG(sensor_reading) OVER (ORDER BY reading_time RANGE BETWEEN INTERVAL '30 minutes' PRECEDING AND CURRENT ROW)
 
-- Note: RANGE with intervals requires the ORDER BY column to be a date/time type
-- and support for interval arithmetic (PostgreSQL, Oracle, SQL Server 2022+)
 
-- For databases without RANGE interval support, generate a complete date series:
WITH date_series AS (
    SELECT generate_series(
        '2024-01-01'::date, 
        '2024-01-31'::date, 
        '1 day'
    ) AS date
),
data_with_gaps AS (
    SELECT 
        ds.date,
        d.value
    FROM date_series ds
    LEFT JOIN daily_data d ON ds.date = d.date
)
SELECT 
    date,
    value,
    AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7_day
FROM data_with_gaps;

Handling Missing Data

For clean moving averages with missing dates, consider generating a complete date series and LEFT JOIN your data. This ensures ROWS-based windows represent consistent time periods. Alternatively, use RANGE with intervals if your database supports it.

Weighted and Exponential Moving Averages

Simple Moving Average (SMA) treats all values in the window equally. But recent values are often more relevant than older ones. Weighted Moving Average (WMA) and Exponential Moving Average (EMA) address this by weighting recent values more heavily.

Weighted Moving Average (WMA):

Assigns linearly increasing weights to more recent values. For a 3-period WMA: oldest gets weight 1, middle gets weight 2, newest gets weight 3.

WMA = (1×v₁ + 2×v₂ + 3×v₃) / (1+2+3)

weighted_moving_average.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Weighted Moving Average (3-period) using manual weight assignment
-- This is complex in SQL because standard aggregates don't support custom weights
 
-- Method 1: Using LAG to access individual values for manual weighting
SELECT 
    date,
    value,
    (
        1 * LAG(value, 2) OVER (ORDER BY date) +
        2 * LAG(value, 1) OVER (ORDER BY date) +
        3 * value
    ) / 6.0 AS wma_3
FROM daily_values;
 
-- Method 2: Positional weighting with ROW_NUMBER in a subquery
WITH numbered AS (
    SELECT 
        date,
        value,
        ROW_NUMBER() OVER (ORDER BY date) AS global_rn
    FROM daily_values
),
windowed AS (
    SELECT 
        n1.date,
        n1.value,
        n2.value AS window_value,
        n1.global_rn - n2.global_rn + 1 AS weight
    FROM numbered n1
    JOIN numbered n2 
        ON n2.global_rn BETWEEN n1.global_rn - 2 AND n1.global_rn
)
SELECT 
    date,
    MAX(value) AS value,
    SUM(window_value * weight) / SUM(weight) AS wma_3
FROM windowed
GROUP BY date
ORDER BY date;

Exponential Moving Average (EMA):

The EMA applies exponentially decreasing weights. More recent values have exponentially more influence. The formula is recursive:

EMA_today = α × Value_today + (1-α) × EMA_yesterday

Where α (alpha) is the smoothing factor, typically 2/(n+1) for an n-period EMA.

exponential_moving_average.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Exponential Moving Average using recursive CTE
-- α = 2/(n+1) for n-period EMA; for 10-period: α = 2/11 ≈ 0.1818
 
WITH RECURSIVE ema_calc AS (
    -- Base case: first row, EMA = value itself
    SELECT 
        date,
        value,
        value AS ema_10,
        1 AS rn
    FROM (
        SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS rn 
        FROM daily_values
    ) t
    WHERE rn = 1
    
    UNION ALL
    
    -- Recursive case: EMA = α * value + (1-α) * previous_EMA
    SELECT 
        curr.date,
        curr.value,
        (2.0/11.0) * curr.value + (1 - 2.0/11.0) * prev.ema_10,
        curr.rn
    FROM (
        SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS rn 
        FROM daily_values
    ) curr
    JOIN ema_calc prev ON curr.rn = prev.rn + 1
)
SELECT date, value, ROUND(ema_10, 2) AS ema_10
FROM ema_calc
ORDER BY date;
 
-- Note: Recursive CTEs can be slow for large datasets
-- Consider application-layer computation or database-specific extensions

Comparing Moving Average Types
Type	Weighting	Responsiveness	Smoothness	SQL Complexity
SMA	Equal	Slow (lag)	Smooth	Simple (AVG OVER)
WMA	Linear	Moderate	Moderate	Moderate (manual)
EMA	Exponential	Fast	Less smooth	Complex (recursive CTE)

When to Use Each Type

Use SMA for general smoothing and when simplicity matters. Use WMA when recent data should matter more but you want interpretable weights. Use EMA for technical analysis (trading) and when you need quick response to trends. Many production systems compute EMA at the application layer due to SQL complexity.

Practical Applications

Moving averages are fundamental tools across many domains. Let's explore the most impactful applications with production-ready SQL.

Identifying Trends with Moving Average Crossovers:

Comparing short-term and long-term moving averages reveals trend changes. When a short MA crosses above a long MA, it signals an uptrend (bullish crossover); crossing below signals a downtrend (bearish crossover).

trend_detection.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Detect trend crossovers using 5-day and 20-day moving averages
WITH ma_data AS (
    SELECT 
        trade_date,
        close_price,
        AVG(close_price) OVER (ORDER BY trade_date ROWS 4 PRECEDING) AS ma_5,
        AVG(close_price) OVER (ORDER BY trade_date ROWS 19 PRECEDING) AS ma_20,
        LAG(AVG(close_price) OVER (ORDER BY trade_date ROWS 4 PRECEDING)) 
            OVER (ORDER BY trade_date) AS prev_ma_5,
        LAG(AVG(close_price) OVER (ORDER BY trade_date ROWS 19 PRECEDING)) 
            OVER (ORDER BY trade_date) AS prev_ma_20
    FROM stock_prices
    WHERE symbol = 'AAPL'
)
SELECT 
    trade_date,
    close_price,
    ROUND(ma_5, 2) AS ma_5,
    ROUND(ma_20, 2) AS ma_20,
    CASE 
        WHEN ma_5 > ma_20 AND prev_ma_5 <= prev_ma_20 THEN 'BULLISH CROSSOVER'
        WHEN ma_5 < ma_20 AND prev_ma_5 >= prev_ma_20 THEN 'BEARISH CROSSOVER'
        WHEN ma_5 > ma_20 THEN 'Uptrend'
        ELSE 'Downtrend'
    END AS trend_signal
FROM ma_data
WHERE ma_20 IS NOT NULL  -- Ensure sufficient data
ORDER BY trade_date;

Performance Optimization

Moving averages with bounded windows are inherently efficient, but there are still important optimization considerations.

Algorithmic Complexity:

Bounded windows (ROWS n PRECEDING): O(n) where the database maintains a sliding window and incrementally updates the aggregate. Very efficient for fixed-size windows.

Unbounded windows (UNBOUNDED PRECEDING): O(n) total, but each row's computation may be O(1) or O(window_size) depending on the aggregate and database implementation.

Sort overhead: O(n log n) if no matching index. With an appropriate index, sorting is eliminated.

optimization_strategies.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Index recommendation for moving average queries
-- For: AVG(value) OVER (PARTITION BY customer ORDER BY date ...)
CREATE INDEX idx_customer_date ON transactions(customer_id, transaction_date);
 
-- Filter before windowing (crucial for performance)
-- GOOD: Filter eliminates rows before expensive window computation
SELECT 
    date,
    AVG(value) OVER (ORDER BY date ROWS 29 PRECEDING) AS ma_30
FROM daily_values
WHERE date >= '2024-01-01'  -- Filters FIRST
ORDER BY date;
 
-- LESS GOOD: Filtering after windowing
SELECT * FROM (
    SELECT 
        date,
        AVG(value) OVER (ORDER BY date ROWS 29 PRECEDING) AS ma_30
    FROM daily_values
) t
WHERE date >= '2024-01-01';  -- Computes MA for ALL rows, then filters
 
-- Consolidate window definitions
SELECT 
    date,
    value,
    AVG(value) OVER w AS ma_30,
    STDDEV(value) OVER w AS std_30,
    MIN(value) OVER w AS min_30,
    MAX(value) OVER w AS max_30,
    COUNT(*) OVER w AS count_30
FROM daily_values
WINDOW w AS (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW);
-- All five aggregates computed in single window pass

Performance Best Practices

•Index the ORDER BY column: Eliminates expensive sort operations
•Filter early: Apply WHERE clauses before window functions
•Use ROWS over RANGE when possible: ROWS frames have simpler, more predictable performance
•Consolidate windows: Multiple aggregates over the same window are computed together
•Limit partition sizes: Large partitions require more memory; consider time-bounding queries
•Materialize for repeated access: If the same moving averages are queried often, store in materialized views

Window Size vs Performance Trade-offs
Window Size	Memory Impact	Computation Cost	Smoothing Quality
Small (3-7)	Minimal	Fast	Less smooth, more responsive
Medium (20-50)	Moderate	Fast	Good balance
Large (100-365)	Higher	Still O(n) but more per-row work	Very smooth, slow response

Moving Average Storage Pattern

For dashboards that always show the same moving averages, consider storing pre-computed values. Update them incrementally as new data arrives. This trades computation for storage and dramatically improves query response times.

Summary: Moving Averages

Moving averages are the foundation of time-series analysis, enabling smoothing, trend detection, and anomaly identification. Let's consolidate the essential concepts.

Key Takeaways

•Moving averages smooth noise: By averaging over a sliding window, random fluctuations are reduced while underlying trends become visible.
•Frame specification controls the window: ROWS BETWEEN n PRECEDING AND CURRENT ROW creates an (n+1)-row sliding window.
•Trailing windows for real-time: Use only past data for dashboards and predictions to avoid data leakage.
•Centered windows minimize lag: For historical analysis where future data is available, centering reduces the inherent lag.
•ROWS vs RANGE matters with gaps: ROWS counts physical rows; RANGE respects actual time distances.
•Window size is a trade-off: Larger windows smooth more but respond slowly; smaller windows are responsive but noisier.
•SMA, WMA, EMA offer different weighting: Simple MA weights equally; Weighted and Exponential MAs prioritize recent values.
•Applications span many domains: Trend detection, anomaly identification, noise reduction, and forecast baselines all use moving averages.

What's Next:

Building on moving averages, we'll explore cumulative functions—a broader category of analytical computations that accumulate, distribute, or normalize values across window frames. This includes cumulative distribution functions, percentile calculations, and more advanced analytical patterns.

Page Complete

You now have deep understanding of moving averages—from simple trailing averages to complex weighted and exponential variants. These techniques form the backbone of time-series analysis across finance, operations, and countless other domains. Experiment with different window sizes on your own data to develop intuition for the smoothing trade-offs.

4 / 5

Loading learning content...

Database Management SystemsAnalytic Functions

Analytic Functions

LevelAdvanced

Duration120 mins

TopicAnalytic Functions

4 / 5

Moving Averages

Smoothing Through Windowed Computation

This sliding window approach has profound advantages:

Responsive: Reflects recent conditions, not distant history
Smooth: Averages out short-term noise
Comparable: Equal weight across the entire window
Bounded memory: Only needs to track the window size

What You Will Learn

Moving Average Fundamentals

Mathematical Definition:

For a sequence of values v₁, v₂, ..., vₘ and window size n, the n-period moving average at position i (where i ≥ n) is:

MA(i, n) = (vᵢ₋ₙ₊₁ + vᵢ₋ₙ₊₂ + ... + vᵢ) / n

Example Calculation:

3-Day Moving Average Example
Day	Sales	Window Values	3-Day MA
1	100	—	(insufficient data)
2	150	—	(insufficient data)
3	80	[100, 150, 80]	(100+150+80)/3 = 110
4	200	[150, 80, 200]	(150+80+200)/3 = 143.3
5	120	[80, 200, 120]	(80+200+120)/3 = 133.3
6	180	[200, 120, 180]	(200+120+180)/3 = 166.7

Converting Mermaid diagram...

Key Properties of Moving Averages:

1. Smoothing Effect: Random fluctuations average out; larger windows produce smoother results but respond more slowly to changes.

2. Lag: The moving average lags behind the actual data because it incorporates historical values. A 7-day MA effectively represents conditions from 3-4 days ago.

3. Edge Behavior: At the start of data, there aren't enough values to fill the window. Different strategies handle this (partial windows, NULL, or waiting for sufficient data).

4. Window Size Trade-off: Larger windows smooth more noise but react slowly to real changes. Smaller windows are more responsive but noisier.

Choosing Window Size

SQL Syntax for Moving Averages

Moving averages in SQL use aggregate functions with carefully specified frame boundaries. The frame defines which rows constitute the sliding window.

moving_average_syntax.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Basic moving average syntax
AVG(expression) OVER (
    [PARTITION BY partition_expression, ...]
    ORDER BY sort_expression
    ROWS BETWEEN n PRECEDING AND CURRENT ROW
)
 
-- 7-day moving average of daily sales
SELECT 
    sale_date,
    daily_revenue,
    ROUND(AVG(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2) AS moving_avg_7day
FROM daily_sales
ORDER BY sale_date;
 
-- Result:
-- | sale_date  | daily_revenue | moving_avg_7day |
-- |------------|---------------|-----------------|
-- | 2024-01-01 | 1000          | 1000.00         | (only 1 value)
-- | 2024-01-02 | 1200          | 1100.00         | (2 values)
-- | 2024-01-03 | 950           | 1050.00         | (3 values)
-- | 2024-01-04 | 1100          | 1062.50         | (4 values)
-- | 2024-01-05 | 1300          | 1110.00         | (5 values)
-- | 2024-01-06 | 900           | 1075.00         | (6 values)
-- | 2024-01-07 | 1150          | 1085.71         | (7 values - full window)
-- | 2024-01-08 | 1250          | 1121.43         | (7 values, Jan 1 dropped)

Understanding Frame Boundaries:

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means:

Start: 6 rows before the current row
End: The current row itself
Total: Up to 7 rows in the window

Why 6 PRECEDING for 7-day? Because the window includes the current row plus the 6 preceding rows = 7 total rows.

Frame Specifications for Common Moving Averages
Desired Window	Frame Specification	Total Rows
3-period MA	`ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`	3
7-day MA	`ROWS BETWEEN 6 PRECEDING AND CURRENT ROW`	7
20-day MA	`ROWS BETWEEN 19 PRECEDING AND CURRENT ROW`	20
52-week MA	`ROWS BETWEEN 51 PRECEDING AND CURRENT ROW`	52
Centered 5-period	`ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING`	5

multiple_moving_averages.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Multiple moving averages for trend comparison
SELECT 
    trade_date,
    close_price,
    ROUND(AVG(close_price) OVER (
        ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    ), 2) AS ma_5day,
    ROUND(AVG(close_price) OVER (
        ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
    ), 2) AS ma_20day,
    ROUND(AVG(close_price) OVER (
        ORDER BY trade_date ROWS BETWEEN 49 PRECEDING AND CURRENT ROW
    ), 2) AS ma_50day
FROM stock_prices
WHERE symbol = 'AAPL'
ORDER BY trade_date;
 
-- Using WINDOW clause for clean syntax
SELECT 
    trade_date,
    close_price,
    ROUND(AVG(close_price) OVER w5, 2) AS ma_5day,
    ROUND(AVG(close_price) OVER w20, 2) AS ma_20day,
    ROUND(AVG(close_price) OVER w50, 2) AS ma_50day
FROM stock_prices
WHERE symbol = 'AAPL'
WINDOW 
    w5 AS (ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW),
    w20 AS (ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW),
    w50 AS (ORDER BY trade_date ROWS BETWEEN 49 PRECEDING AND CURRENT ROW)
ORDER BY trade_date;

Partial Windows at Edges

Trailing, Leading, and Centered Windows

Moving averages can be computed using different window alignments relative to the current row. Each alignment has distinct properties and use cases.

Trailing (Backward-Looking) Window:

Includes the current row and preceding rows. Most common for real-time analysis because it only uses data available at each point in time.

ROWS BETWEEN n PRECEDING AND CURRENT ROW

Leading (Forward-Looking) Window:

Includes the current row and following rows. Useful for planning scenarios or when analyzing historical data where future values are known.

ROWS BETWEEN CURRENT ROW AND n FOLLOWING

Centered Window:

Includes rows before and after the current row. Minimizes lag by centering the window, but cannot be used for real-time prediction (requires future data).

ROWS BETWEEN n PRECEDING AND n FOLLOWING

window_alignments.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Comparison of window alignments
SELECT 
    date,
    value,
    -- Trailing: uses past data only (real-time applicable)
    ROUND(AVG(value) OVER (
        ORDER BY date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS trailing_ma_3,
    
    -- Leading: uses future data only (planning perspective)
    ROUND(AVG(value) OVER (
        ORDER BY date 
        ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
    ), 2) AS leading_ma_3,
    
    -- Centered: uses both past and future (minimal lag, not real-time)
    ROUND(AVG(value) OVER (
        ORDER BY date 
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ), 2) AS centered_ma_3
FROM daily_values
ORDER BY date;
 
-- Example results showing alignment differences:
-- | date       | value | trailing_ma_3 | leading_ma_3 | centered_ma_3 |
-- |------------|-------|---------------|--------------|---------------|
-- | 2024-01-01 | 100   | 100.00        | 110.00       | 125.00        |
-- | 2024-01-02 | 150   | 125.00        | 143.33       | 110.00        |
-- | 2024-01-03 | 80    | 110.00        | 126.67       | 143.33        |
-- | 2024-01-04 | 200   | 143.33        | 166.67       | 133.33        |
-- | 2024-01-05 | 100   | 126.67        | 160.00       | 160.00        |

Window Alignment Comparison
Alignment	Lag	Real-time Use	Best For
Trailing	High (centered on past)	Yes	Dashboards, trading signals, real-time monitoring
Leading	High (centered on future)	No	Planning, what-if analysis, retrospective smoothing
Centered	Minimal	No	Historical analysis, trend identification, anomaly detection

Avoid Data Leakage

ROWS vs RANGE for Time-Based Windows

When computing moving averages over time, you have two approaches: counting physical rows or measuring time ranges. The choice matters significantly when data has gaps.

rows_vs_range_time.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Data with missing days (market closed on weekends, holidays)
-- | trade_date | price |
-- |------------|-------|
-- | 2024-01-02 | 100   | (Tuesday - market open)
-- | 2024-01-03 | 105   | (Wednesday)
-- | 2024-01-04 | 102   | (Thursday)
-- | 2024-01-05 | 108   | (Friday)
-- | 2024-01-08 | 110   | (Monday - weekend gap!)
-- | 2024-01-09 | 107   | (Tuesday)
 
-- ROWS: 3-row moving average (ignores time gaps)
SELECT 
    trade_date,
    price,
    ROUND(AVG(price) OVER (
        ORDER BY trade_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS ma_3_rows
FROM stock_prices;
-- Jan 8 includes: Jan 4, Jan 5, Jan 8 (ignores weekend non-existence)
 
-- RANGE: 3-DAY moving average (respects actual time)
SELECT 
    trade_date,
    price,
    ROUND(AVG(price) OVER (
        ORDER BY trade_date 
        RANGE BETWEEN INTERVAL '3 days' PRECEDING AND CURRENT ROW
    ), 2) AS ma_3_days
FROM stock_prices;
-- Jan 8 includes: Jan 5, Jan 8 (Jan 6, 7 don't exist; Jan 4 is > 3 days ago)

ROWS vs RANGE for Time-Series Moving Averages
Aspect	ROWS	RANGE
Boundary meaning	Physical row count	Actual time/value distance
With data gaps	Includes n rows regardless of time spans	Respects actual time distances
Window size	Always exactly n rows (if available)	Variable based on data density
Performance	Often faster (simpler computation)	May be slower (value comparisons)
Use when	Regular, evenly-spaced data	Irregular data with gaps

range_interval_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Various RANGE specifications for time-based windows
 
-- Last 7 calendar days (not 7 data points)
AVG(value) OVER (ORDER BY event_time RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW)
 
-- Last 1 hour for high-frequency data
AVG(value) OVER (ORDER BY event_time RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW)
 
-- Last 30 minutes
AVG(sensor_reading) OVER (ORDER BY reading_time RANGE BETWEEN INTERVAL '30 minutes' PRECEDING AND CURRENT ROW)
 
-- Note: RANGE with intervals requires the ORDER BY column to be a date/time type
-- and support for interval arithmetic (PostgreSQL, Oracle, SQL Server 2022+)
 
-- For databases without RANGE interval support, generate a complete date series:
WITH date_series AS (
    SELECT generate_series(
        '2024-01-01'::date, 
        '2024-01-31'::date, 
        '1 day'
    ) AS date
),
data_with_gaps AS (
    SELECT 
        ds.date,
        d.value
    FROM date_series ds
    LEFT JOIN daily_data d ON ds.date = d.date
)
SELECT 
    date,
    value,
    AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7_day
FROM data_with_gaps;

Handling Missing Data

Weighted and Exponential Moving Averages

Weighted Moving Average (WMA):

Assigns linearly increasing weights to more recent values. For a 3-period WMA: oldest gets weight 1, middle gets weight 2, newest gets weight 3.

WMA = (1×v₁ + 2×v₂ + 3×v₃) / (1+2+3)

weighted_moving_average.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Weighted Moving Average (3-period) using manual weight assignment
-- This is complex in SQL because standard aggregates don't support custom weights
 
-- Method 1: Using LAG to access individual values for manual weighting
SELECT 
    date,
    value,
    (
        1 * LAG(value, 2) OVER (ORDER BY date) +
        2 * LAG(value, 1) OVER (ORDER BY date) +
        3 * value
    ) / 6.0 AS wma_3
FROM daily_values;
 
-- Method 2: Positional weighting with ROW_NUMBER in a subquery
WITH numbered AS (
    SELECT 
        date,
        value,
        ROW_NUMBER() OVER (ORDER BY date) AS global_rn
    FROM daily_values
),
windowed AS (
    SELECT 
        n1.date,
        n1.value,
        n2.value AS window_value,
        n1.global_rn - n2.global_rn + 1 AS weight
    FROM numbered n1
    JOIN numbered n2 
        ON n2.global_rn BETWEEN n1.global_rn - 2 AND n1.global_rn
)
SELECT 
    date,
    MAX(value) AS value,
    SUM(window_value * weight) / SUM(weight) AS wma_3
FROM windowed
GROUP BY date
ORDER BY date;

Exponential Moving Average (EMA):

The EMA applies exponentially decreasing weights. More recent values have exponentially more influence. The formula is recursive:

EMA_today = α × Value_today + (1-α) × EMA_yesterday

Where α (alpha) is the smoothing factor, typically 2/(n+1) for an n-period EMA.

exponential_moving_average.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Exponential Moving Average using recursive CTE
-- α = 2/(n+1) for n-period EMA; for 10-period: α = 2/11 ≈ 0.1818
 
WITH RECURSIVE ema_calc AS (
    -- Base case: first row, EMA = value itself
    SELECT 
        date,
        value,
        value AS ema_10,
        1 AS rn
    FROM (
        SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS rn 
        FROM daily_values
    ) t
    WHERE rn = 1
    
    UNION ALL
    
    -- Recursive case: EMA = α * value + (1-α) * previous_EMA
    SELECT 
        curr.date,
        curr.value,
        (2.0/11.0) * curr.value + (1 - 2.0/11.0) * prev.ema_10,
        curr.rn
    FROM (
        SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS rn 
        FROM daily_values
    ) curr
    JOIN ema_calc prev ON curr.rn = prev.rn + 1
)
SELECT date, value, ROUND(ema_10, 2) AS ema_10
FROM ema_calc
ORDER BY date;
 
-- Note: Recursive CTEs can be slow for large datasets
-- Consider application-layer computation or database-specific extensions

Comparing Moving Average Types
Type	Weighting	Responsiveness	Smoothness	SQL Complexity
SMA	Equal	Slow (lag)	Smooth	Simple (AVG OVER)
WMA	Linear	Moderate	Moderate	Moderate (manual)
EMA	Exponential	Fast	Less smooth	Complex (recursive CTE)

When to Use Each Type

Practical Applications

Moving averages are fundamental tools across many domains. Let's explore the most impactful applications with production-ready SQL.

Identifying Trends with Moving Average Crossovers:

trend_detection.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Detect trend crossovers using 5-day and 20-day moving averages
WITH ma_data AS (
    SELECT 
        trade_date,
        close_price,
        AVG(close_price) OVER (ORDER BY trade_date ROWS 4 PRECEDING) AS ma_5,
        AVG(close_price) OVER (ORDER BY trade_date ROWS 19 PRECEDING) AS ma_20,
        LAG(AVG(close_price) OVER (ORDER BY trade_date ROWS 4 PRECEDING)) 
            OVER (ORDER BY trade_date) AS prev_ma_5,
        LAG(AVG(close_price) OVER (ORDER BY trade_date ROWS 19 PRECEDING)) 
            OVER (ORDER BY trade_date) AS prev_ma_20
    FROM stock_prices
    WHERE symbol = 'AAPL'
)
SELECT 
    trade_date,
    close_price,
    ROUND(ma_5, 2) AS ma_5,
    ROUND(ma_20, 2) AS ma_20,
    CASE 
        WHEN ma_5 > ma_20 AND prev_ma_5 <= prev_ma_20 THEN 'BULLISH CROSSOVER'
        WHEN ma_5 < ma_20 AND prev_ma_5 >= prev_ma_20 THEN 'BEARISH CROSSOVER'
        WHEN ma_5 > ma_20 THEN 'Uptrend'
        ELSE 'Downtrend'
    END AS trend_signal
FROM ma_data
WHERE ma_20 IS NOT NULL  -- Ensure sufficient data
ORDER BY trade_date;

Performance Optimization

Moving averages with bounded windows are inherently efficient, but there are still important optimization considerations.

Algorithmic Complexity:

Bounded windows (ROWS n PRECEDING): O(n) where the database maintains a sliding window and incrementally updates the aggregate. Very efficient for fixed-size windows.

Unbounded windows (UNBOUNDED PRECEDING): O(n) total, but each row's computation may be O(1) or O(window_size) depending on the aggregate and database implementation.

Sort overhead: O(n log n) if no matching index. With an appropriate index, sorting is eliminated.

optimization_strategies.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Index recommendation for moving average queries
-- For: AVG(value) OVER (PARTITION BY customer ORDER BY date ...)
CREATE INDEX idx_customer_date ON transactions(customer_id, transaction_date);
 
-- Filter before windowing (crucial for performance)
-- GOOD: Filter eliminates rows before expensive window computation
SELECT 
    date,
    AVG(value) OVER (ORDER BY date ROWS 29 PRECEDING) AS ma_30
FROM daily_values
WHERE date >= '2024-01-01'  -- Filters FIRST
ORDER BY date;
 
-- LESS GOOD: Filtering after windowing
SELECT * FROM (
    SELECT 
        date,
        AVG(value) OVER (ORDER BY date ROWS 29 PRECEDING) AS ma_30
    FROM daily_values
) t
WHERE date >= '2024-01-01';  -- Computes MA for ALL rows, then filters
 
-- Consolidate window definitions
SELECT 
    date,
    value,
    AVG(value) OVER w AS ma_30,
    STDDEV(value) OVER w AS std_30,
    MIN(value) OVER w AS min_30,
    MAX(value) OVER w AS max_30,
    COUNT(*) OVER w AS count_30
FROM daily_values
WINDOW w AS (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW);
-- All five aggregates computed in single window pass

Performance Best Practices

•Index the ORDER BY column: Eliminates expensive sort operations
•Filter early: Apply WHERE clauses before window functions
•Use ROWS over RANGE when possible: ROWS frames have simpler, more predictable performance
•Consolidate windows: Multiple aggregates over the same window are computed together
•Limit partition sizes: Large partitions require more memory; consider time-bounding queries
•Materialize for repeated access: If the same moving averages are queried often, store in materialized views

Window Size vs Performance Trade-offs
Window Size	Memory Impact	Computation Cost	Smoothing Quality
Small (3-7)	Minimal	Fast	Less smooth, more responsive
Medium (20-50)	Moderate	Fast	Good balance
Large (100-365)	Higher	Still O(n) but more per-row work	Very smooth, slow response

Moving Average Storage Pattern

Summary: Moving Averages

Moving averages are the foundation of time-series analysis, enabling smoothing, trend detection, and anomaly identification. Let's consolidate the essential concepts.

Key Takeaways

•Moving averages smooth noise: By averaging over a sliding window, random fluctuations are reduced while underlying trends become visible.
•Frame specification controls the window: ROWS BETWEEN n PRECEDING AND CURRENT ROW creates an (n+1)-row sliding window.
•Trailing windows for real-time: Use only past data for dashboards and predictions to avoid data leakage.
•Centered windows minimize lag: For historical analysis where future data is available, centering reduces the inherent lag.
•ROWS vs RANGE matters with gaps: ROWS counts physical rows; RANGE respects actual time distances.
•Window size is a trade-off: Larger windows smooth more but respond slowly; smaller windows are responsive but noisier.
•SMA, WMA, EMA offer different weighting: Simple MA weights equally; Weighted and Exponential MAs prioritize recent values.
•Applications span many domains: Trend detection, anomaly identification, noise reduction, and forecast baselines all use moving averages.

What's Next:

Page Complete

4 / 5