Loading learning content...
Raw time-series data is often noisy. Daily sales fluctuate due to weather, day-of-week effects, and random variation. Stock prices jump on news. Sensor readings contain measurement error. To see the underlying trend beneath this noise, we need smoothing.
Moving averages are the workhorse of time-series smoothing. Instead of using all historical data (like a running average), a moving average considers only a recent window of values—the last 7 days, the last 20 data points, the last hour. As new data arrives, old data drops out; the window "moves" forward.
This sliding window approach has profound advantages:
SQL window functions make moving averages elegant to express. By specifying frame boundaries like ROWS BETWEEN 6 PRECEDING AND CURRENT ROW, you create a 7-row sliding window that the database efficiently maintains as it scans your data.
By the end of this page, you will understand the theory behind simple, weighted, and exponential moving averages, master SQL frame specifications for sliding window calculations, handle edge cases at data boundaries, apply moving averages to trend detection and anomaly identification, and optimize window computations for production performance.
A moving average (MA), also called a rolling average or sliding window average, is the average of the most recent n values in an ordered sequence. As you progress through the sequence, the window slides forward, dropping the oldest value and incorporating the newest.
Mathematical Definition:
For a sequence of values v₁, v₂, ..., vₘ and window size n, the n-period moving average at position i (where i ≥ n) is:
MA(i, n) = (vᵢ₋ₙ₊₁ + vᵢ₋ₙ₊₂ + ... + vᵢ) / n
Example Calculation:
| Day | Sales | Window Values | 3-Day MA |
|---|---|---|---|
| 1 | 100 | — | (insufficient data) |
| 2 | 150 | — | (insufficient data) |
| 3 | 80 | [100, 150, 80] | (100+150+80)/3 = 110 |
| 4 | 200 | [150, 80, 200] | (150+80+200)/3 = 143.3 |
| 5 | 120 | [80, 200, 120] | (80+200+120)/3 = 133.3 |
| 6 | 180 | [200, 120, 180] | (200+120+180)/3 = 166.7 |
Key Properties of Moving Averages:
1. Smoothing Effect: Random fluctuations average out; larger windows produce smoother results but respond more slowly to changes.
2. Lag: The moving average lags behind the actual data because it incorporates historical values. A 7-day MA effectively represents conditions from 3-4 days ago.
3. Edge Behavior: At the start of data, there aren't enough values to fill the window. Different strategies handle this (partial windows, NULL, or waiting for sufficient data).
4. Window Size Trade-off: Larger windows smooth more noise but react slowly to real changes. Smaller windows are more responsive but noisier.
Window size should reflect the time scale of the patterns you want to retain vs. remove. For weekly patterns, use 7-day windows. For monthly trends, use 20-30 day windows. For quarterly analysis, use 60-90 day windows. There's no 'correct' size—it depends on your analytical goals.
Moving averages in SQL use aggregate functions with carefully specified frame boundaries. The frame defines which rows constitute the sliding window.
1234567891011121314151617181920212223242526272829
-- Basic moving average syntaxAVG(expression) OVER ( [PARTITION BY partition_expression, ...] ORDER BY sort_expression ROWS BETWEEN n PRECEDING AND CURRENT ROW) -- 7-day moving average of daily salesSELECT sale_date, daily_revenue, ROUND(AVG(daily_revenue) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 2) AS moving_avg_7dayFROM daily_salesORDER BY sale_date; -- Result:-- | sale_date | daily_revenue | moving_avg_7day |-- |------------|---------------|-----------------|-- | 2024-01-01 | 1000 | 1000.00 | (only 1 value)-- | 2024-01-02 | 1200 | 1100.00 | (2 values)-- | 2024-01-03 | 950 | 1050.00 | (3 values)-- | 2024-01-04 | 1100 | 1062.50 | (4 values)-- | 2024-01-05 | 1300 | 1110.00 | (5 values)-- | 2024-01-06 | 900 | 1075.00 | (6 values)-- | 2024-01-07 | 1150 | 1085.71 | (7 values - full window)-- | 2024-01-08 | 1250 | 1121.43 | (7 values, Jan 1 dropped)Understanding Frame Boundaries:
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means:
Why 6 PRECEDING for 7-day? Because the window includes the current row plus the 6 preceding rows = 7 total rows.
| Desired Window | Frame Specification | Total Rows |
|---|---|---|
| 3-period MA | ROWS BETWEEN 2 PRECEDING AND CURRENT ROW | 3 |
| 7-day MA | ROWS BETWEEN 6 PRECEDING AND CURRENT ROW | 7 |
| 20-day MA | ROWS BETWEEN 19 PRECEDING AND CURRENT ROW | 20 |
| 52-week MA | ROWS BETWEEN 51 PRECEDING AND CURRENT ROW | 52 |
| Centered 5-period | ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING | 5 |
12345678910111213141516171819202122232425262728293031
-- Multiple moving averages for trend comparisonSELECT trade_date, close_price, ROUND(AVG(close_price) OVER ( ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW ), 2) AS ma_5day, ROUND(AVG(close_price) OVER ( ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW ), 2) AS ma_20day, ROUND(AVG(close_price) OVER ( ORDER BY trade_date ROWS BETWEEN 49 PRECEDING AND CURRENT ROW ), 2) AS ma_50dayFROM stock_pricesWHERE symbol = 'AAPL'ORDER BY trade_date; -- Using WINDOW clause for clean syntaxSELECT trade_date, close_price, ROUND(AVG(close_price) OVER w5, 2) AS ma_5day, ROUND(AVG(close_price) OVER w20, 2) AS ma_20day, ROUND(AVG(close_price) OVER w50, 2) AS ma_50dayFROM stock_pricesWHERE symbol = 'AAPL'WINDOW w5 AS (ORDER BY trade_date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW), w20 AS (ORDER BY trade_date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW), w50 AS (ORDER BY trade_date ROWS BETWEEN 49 PRECEDING AND CURRENT ROW)ORDER BY trade_date;Notice how the first 6 days show averages of fewer than 7 values. SQL computes the average of whatever rows exist in the frame. If you need strictly 7-day averages (NULL for insufficient data), add a CASE expression checking row count.
Moving averages can be computed using different window alignments relative to the current row. Each alignment has distinct properties and use cases.
Trailing (Backward-Looking) Window:
Includes the current row and preceding rows. Most common for real-time analysis because it only uses data available at each point in time.
ROWS BETWEEN n PRECEDING AND CURRENT ROW
Leading (Forward-Looking) Window:
Includes the current row and following rows. Useful for planning scenarios or when analyzing historical data where future values are known.
ROWS BETWEEN CURRENT ROW AND n FOLLOWING
Centered Window:
Includes rows before and after the current row. Minimizes lag by centering the window, but cannot be used for real-time prediction (requires future data).
ROWS BETWEEN n PRECEDING AND n FOLLOWING
1234567891011121314151617181920212223242526272829303132
-- Comparison of window alignmentsSELECT date, value, -- Trailing: uses past data only (real-time applicable) ROUND(AVG(value) OVER ( ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ), 2) AS trailing_ma_3, -- Leading: uses future data only (planning perspective) ROUND(AVG(value) OVER ( ORDER BY date ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING ), 2) AS leading_ma_3, -- Centered: uses both past and future (minimal lag, not real-time) ROUND(AVG(value) OVER ( ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ), 2) AS centered_ma_3FROM daily_valuesORDER BY date; -- Example results showing alignment differences:-- | date | value | trailing_ma_3 | leading_ma_3 | centered_ma_3 |-- |------------|-------|---------------|--------------|---------------|-- | 2024-01-01 | 100 | 100.00 | 110.00 | 125.00 |-- | 2024-01-02 | 150 | 125.00 | 143.33 | 110.00 |-- | 2024-01-03 | 80 | 110.00 | 126.67 | 143.33 |-- | 2024-01-04 | 200 | 143.33 | 166.67 | 133.33 |-- | 2024-01-05 | 100 | 126.67 | 160.00 | 160.00 || Alignment | Lag | Real-time Use | Best For |
|---|---|---|---|
| Trailing | High (centered on past) | Yes | Dashboards, trading signals, real-time monitoring |
| Leading | High (centered on future) | No | Planning, what-if analysis, retrospective smoothing |
| Centered | Minimal | No | Historical analysis, trend identification, anomaly detection |
In predictive modeling and backtesting, using leading or centered windows creates 'data leakage'—your model sees future information that wouldn't be available in real-time. Always use trailing windows for models that will make real-time predictions.
When computing moving averages over time, you have two approaches: counting physical rows or measuring time ranges. The choice matters significantly when data has gaps.
12345678910111213141516171819202122232425262728293031
-- Data with missing days (market closed on weekends, holidays)-- | trade_date | price |-- |------------|-------|-- | 2024-01-02 | 100 | (Tuesday - market open)-- | 2024-01-03 | 105 | (Wednesday)-- | 2024-01-04 | 102 | (Thursday)-- | 2024-01-05 | 108 | (Friday)-- | 2024-01-08 | 110 | (Monday - weekend gap!)-- | 2024-01-09 | 107 | (Tuesday) -- ROWS: 3-row moving average (ignores time gaps)SELECT trade_date, price, ROUND(AVG(price) OVER ( ORDER BY trade_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ), 2) AS ma_3_rowsFROM stock_prices;-- Jan 8 includes: Jan 4, Jan 5, Jan 8 (ignores weekend non-existence) -- RANGE: 3-DAY moving average (respects actual time)SELECT trade_date, price, ROUND(AVG(price) OVER ( ORDER BY trade_date RANGE BETWEEN INTERVAL '3 days' PRECEDING AND CURRENT ROW ), 2) AS ma_3_daysFROM stock_prices;-- Jan 8 includes: Jan 5, Jan 8 (Jan 6, 7 don't exist; Jan 4 is > 3 days ago)| Aspect | ROWS | RANGE |
|---|---|---|
| Boundary meaning | Physical row count | Actual time/value distance |
| With data gaps | Includes n rows regardless of time spans | Respects actual time distances |
| Window size | Always exactly n rows (if available) | Variable based on data density |
| Performance | Often faster (simpler computation) | May be slower (value comparisons) |
| Use when | Regular, evenly-spaced data | Irregular data with gaps |
12345678910111213141516171819202122232425262728293031323334
-- Various RANGE specifications for time-based windows -- Last 7 calendar days (not 7 data points)AVG(value) OVER (ORDER BY event_time RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW) -- Last 1 hour for high-frequency dataAVG(value) OVER (ORDER BY event_time RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW) -- Last 30 minutesAVG(sensor_reading) OVER (ORDER BY reading_time RANGE BETWEEN INTERVAL '30 minutes' PRECEDING AND CURRENT ROW) -- Note: RANGE with intervals requires the ORDER BY column to be a date/time type-- and support for interval arithmetic (PostgreSQL, Oracle, SQL Server 2022+) -- For databases without RANGE interval support, generate a complete date series:WITH date_series AS ( SELECT generate_series( '2024-01-01'::date, '2024-01-31'::date, '1 day' ) AS date),data_with_gaps AS ( SELECT ds.date, d.value FROM date_series ds LEFT JOIN daily_data d ON ds.date = d.date)SELECT date, value, AVG(value) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7_dayFROM data_with_gaps;For clean moving averages with missing dates, consider generating a complete date series and LEFT JOIN your data. This ensures ROWS-based windows represent consistent time periods. Alternatively, use RANGE with intervals if your database supports it.
Simple Moving Average (SMA) treats all values in the window equally. But recent values are often more relevant than older ones. Weighted Moving Average (WMA) and Exponential Moving Average (EMA) address this by weighting recent values more heavily.
Weighted Moving Average (WMA):
Assigns linearly increasing weights to more recent values. For a 3-period WMA: oldest gets weight 1, middle gets weight 2, newest gets weight 3.
WMA = (1×v₁ + 2×v₂ + 3×v₃) / (1+2+3)
123456789101112131415161718192021222324252627282930313233343536373839
-- Weighted Moving Average (3-period) using manual weight assignment-- This is complex in SQL because standard aggregates don't support custom weights -- Method 1: Using LAG to access individual values for manual weightingSELECT date, value, ( 1 * LAG(value, 2) OVER (ORDER BY date) + 2 * LAG(value, 1) OVER (ORDER BY date) + 3 * value ) / 6.0 AS wma_3FROM daily_values; -- Method 2: Positional weighting with ROW_NUMBER in a subqueryWITH numbered AS ( SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS global_rn FROM daily_values),windowed AS ( SELECT n1.date, n1.value, n2.value AS window_value, n1.global_rn - n2.global_rn + 1 AS weight FROM numbered n1 JOIN numbered n2 ON n2.global_rn BETWEEN n1.global_rn - 2 AND n1.global_rn)SELECT date, MAX(value) AS value, SUM(window_value * weight) / SUM(weight) AS wma_3FROM windowedGROUP BY dateORDER BY date;Exponential Moving Average (EMA):
The EMA applies exponentially decreasing weights. More recent values have exponentially more influence. The formula is recursive:
EMA_today = α × Value_today + (1-α) × EMA_yesterday
Where α (alpha) is the smoothing factor, typically 2/(n+1) for an n-period EMA.
123456789101112131415161718192021222324252627282930313233343536
-- Exponential Moving Average using recursive CTE-- α = 2/(n+1) for n-period EMA; for 10-period: α = 2/11 ≈ 0.1818 WITH RECURSIVE ema_calc AS ( -- Base case: first row, EMA = value itself SELECT date, value, value AS ema_10, 1 AS rn FROM ( SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS rn FROM daily_values ) t WHERE rn = 1 UNION ALL -- Recursive case: EMA = α * value + (1-α) * previous_EMA SELECT curr.date, curr.value, (2.0/11.0) * curr.value + (1 - 2.0/11.0) * prev.ema_10, curr.rn FROM ( SELECT date, value, ROW_NUMBER() OVER (ORDER BY date) AS rn FROM daily_values ) curr JOIN ema_calc prev ON curr.rn = prev.rn + 1)SELECT date, value, ROUND(ema_10, 2) AS ema_10FROM ema_calcORDER BY date; -- Note: Recursive CTEs can be slow for large datasets-- Consider application-layer computation or database-specific extensions| Type | Weighting | Responsiveness | Smoothness | SQL Complexity |
|---|---|---|---|---|
| SMA | Equal | Slow (lag) | Smooth | Simple (AVG OVER) |
| WMA | Linear | Moderate | Moderate | Moderate (manual) |
| EMA | Exponential | Fast | Less smooth | Complex (recursive CTE) |
Use SMA for general smoothing and when simplicity matters. Use WMA when recent data should matter more but you want interpretable weights. Use EMA for technical analysis (trading) and when you need quick response to trends. Many production systems compute EMA at the application layer due to SQL complexity.
Moving averages are fundamental tools across many domains. Let's explore the most impactful applications with production-ready SQL.
Identifying Trends with Moving Average Crossovers:
Comparing short-term and long-term moving averages reveals trend changes. When a short MA crosses above a long MA, it signals an uptrend (bullish crossover); crossing below signals a downtrend (bearish crossover).
12345678910111213141516171819202122232425262728
-- Detect trend crossovers using 5-day and 20-day moving averagesWITH ma_data AS ( SELECT trade_date, close_price, AVG(close_price) OVER (ORDER BY trade_date ROWS 4 PRECEDING) AS ma_5, AVG(close_price) OVER (ORDER BY trade_date ROWS 19 PRECEDING) AS ma_20, LAG(AVG(close_price) OVER (ORDER BY trade_date ROWS 4 PRECEDING)) OVER (ORDER BY trade_date) AS prev_ma_5, LAG(AVG(close_price) OVER (ORDER BY trade_date ROWS 19 PRECEDING)) OVER (ORDER BY trade_date) AS prev_ma_20 FROM stock_prices WHERE symbol = 'AAPL')SELECT trade_date, close_price, ROUND(ma_5, 2) AS ma_5, ROUND(ma_20, 2) AS ma_20, CASE WHEN ma_5 > ma_20 AND prev_ma_5 <= prev_ma_20 THEN 'BULLISH CROSSOVER' WHEN ma_5 < ma_20 AND prev_ma_5 >= prev_ma_20 THEN 'BEARISH CROSSOVER' WHEN ma_5 > ma_20 THEN 'Uptrend' ELSE 'Downtrend' END AS trend_signalFROM ma_dataWHERE ma_20 IS NOT NULL -- Ensure sufficient dataORDER BY trade_date;Moving averages with bounded windows are inherently efficient, but there are still important optimization considerations.
Algorithmic Complexity:
Bounded windows (ROWS n PRECEDING): O(n) where the database maintains a sliding window and incrementally updates the aggregate. Very efficient for fixed-size windows.
Unbounded windows (UNBOUNDED PRECEDING): O(n) total, but each row's computation may be O(1) or O(window_size) depending on the aggregate and database implementation.
Sort overhead: O(n log n) if no matching index. With an appropriate index, sorting is eliminated.
12345678910111213141516171819202122232425262728293031323334
-- Index recommendation for moving average queries-- For: AVG(value) OVER (PARTITION BY customer ORDER BY date ...)CREATE INDEX idx_customer_date ON transactions(customer_id, transaction_date); -- Filter before windowing (crucial for performance)-- GOOD: Filter eliminates rows before expensive window computationSELECT date, AVG(value) OVER (ORDER BY date ROWS 29 PRECEDING) AS ma_30FROM daily_valuesWHERE date >= '2024-01-01' -- Filters FIRSTORDER BY date; -- LESS GOOD: Filtering after windowingSELECT * FROM ( SELECT date, AVG(value) OVER (ORDER BY date ROWS 29 PRECEDING) AS ma_30 FROM daily_values) tWHERE date >= '2024-01-01'; -- Computes MA for ALL rows, then filters -- Consolidate window definitionsSELECT date, value, AVG(value) OVER w AS ma_30, STDDEV(value) OVER w AS std_30, MIN(value) OVER w AS min_30, MAX(value) OVER w AS max_30, COUNT(*) OVER w AS count_30FROM daily_valuesWINDOW w AS (ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW);-- All five aggregates computed in single window pass| Window Size | Memory Impact | Computation Cost | Smoothing Quality |
|---|---|---|---|
| Small (3-7) | Minimal | Fast | Less smooth, more responsive |
| Medium (20-50) | Moderate | Fast | Good balance |
| Large (100-365) | Higher | Still O(n) but more per-row work | Very smooth, slow response |
For dashboards that always show the same moving averages, consider storing pre-computed values. Update them incrementally as new data arrives. This trades computation for storage and dramatically improves query response times.
Moving averages are the foundation of time-series analysis, enabling smoothing, trend detection, and anomaly identification. Let's consolidate the essential concepts.
ROWS BETWEEN n PRECEDING AND CURRENT ROW creates an (n+1)-row sliding window.What's Next:
Building on moving averages, we'll explore cumulative functions—a broader category of analytical computations that accumulate, distribute, or normalize values across window frames. This includes cumulative distribution functions, percentile calculations, and more advanced analytical patterns.
You now have deep understanding of moving averages—from simple trailing averages to complex weighted and exponential variants. These techniques form the backbone of time-series analysis across finance, operations, and countless other domains. Experiment with different window sizes on your own data to develop intuition for the smoothing trade-offs.