Loading content...
While forward chaining (expanding window) grows the training set with each fold, sliding window cross-validation maintains a fixed training set size that moves forward through time. This approach trades training data volume for recency—prioritizing recent observations over distant history.
Sliding window validation is particularly powerful for non-stationary time series where older data may be misleading: financial markets that undergo regime changes, user behavior that evolves with platform updates, or sensor data from systems that degrade over time.
This page provides a comprehensive treatment of sliding window cross-validation—when to prefer it over expanding windows, how to configure window sizes, and how to diagnose whether your data favors recency over volume.
By the end of this page, you will master sliding window mechanics, understand the recency vs. volume tradeoff, learn optimal window sizing strategies, and know how to compare sliding vs. expanding windows empirically.
The Sliding Window Algorithm:
Unlike expanding window where the training start remains fixed at the beginning of the series, sliding window moves both the start and end of the training window forward:
Key Difference from Expanding Window:
| Aspect | Expanding Window | Sliding Window |
|---|---|---|
| Training start | Fixed at beginning | Moves forward |
| Training size | Grows each fold | Fixed at W |
| Early data | Always included | Eventually dropped |
| Computational cost | Increases with folds | Constant per fold |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
import numpy as npfrom typing import Generator, Tuple, List def sliding_window_splits( n_samples: int, window_size: int, test_size: int, step_size: int = None, gap: int = 0) -> Generator[Tuple[np.ndarray, np.ndarray], None, None]: """ Generate sliding window cross-validation splits. Parameters: ----------- n_samples : int Total number of observations window_size : int Fixed size of each training window test_size : int Size of each validation set step_size : int, optional How far to slide between folds. Defaults to test_size (non-overlapping validation sets) gap : int Embargo period between train and test Yields: ------- train_indices, test_indices : tuple of np.ndarray """ if step_size is None: step_size = test_size # Validate parameters if window_size + gap + test_size > n_samples: raise ValueError( f"Window ({window_size}) + gap ({gap}) + test ({test_size}) " f"exceeds data size ({n_samples})" ) indices = np.arange(n_samples) train_start = 0 while train_start + window_size + gap + test_size <= n_samples: train_end = train_start + window_size test_start = train_end + gap test_end = test_start + test_size yield ( indices[train_start:train_end], indices[test_start:test_end] ) train_start += step_size def visualize_sliding_vs_expanding(n_samples: int = 100): """Compare sliding and expanding window visually.""" window_size = 20 test_size = 10 print("SLIDING WINDOW (fixed training size)") print("=" * 50) for fold, (train, test) in enumerate( sliding_window_splits(n_samples, window_size, test_size), 1 ): if fold > 5: print("...") break bar = ['.'] * 50 # Compressed view scale = n_samples / 50 for i in train: bar[int(i / scale)] = '#' for i in test: bar[int(i / scale)] = '*' print(f"Fold {fold}: {''.join(bar[:40])}") print("\nEXPANDING WINDOW (growing training size)") print("=" * 50) # Show first 5 folds of expanding for comparison from page_0 import forward_chain_splits # Assuming previous implementation for fold, (train, test) in enumerate( forward_chain_splits(n_samples, 5, window_size, test_size), 1 ): bar = ['.'] * 50 scale = n_samples / 50 for i in train: bar[int(i / scale)] = '#' for i in test: bar[int(i / scale)] = '*' print(f"Fold {fold}: {''.join(bar[:40])}") # Demonstration output:# SLIDING WINDOW (fixed training size)# ==================================================# Fold 1: ##########**........# Fold 2: ..########**........# Fold 3: ....########**......# Fold 4: ......########**....# Fold 5: ........########**..## EXPANDING WINDOW (growing training size)# ==================================================# Fold 1: ##########**........# Fold 2: ##############**....# Fold 3: ##################**# Fold 4: ######################**# Fold 5: ##########################**The choice between sliding and expanding windows is fundamentally a bet about data relevance over time. Sliding window assumes older data is less valuable—or even harmful—for predicting current outcomes.
Sliding Window is Preferred When:
Expanding Window is Preferred When:
Don't guess—test both approaches. Plot CV performance against training window size. If performance plateaus or degrades beyond a certain window size, sliding window is indicated. If performance consistently improves with more data, use expanding window.
Window size (W) is the critical hyperparameter for sliding window CV. Too small, and the model lacks sufficient data to learn patterns. Too large, and you lose the recency benefits that motivated sliding window in the first place.
Factors Influencing Optimal Window Size:
1. Seasonality Period The window must include at least 1-2 complete seasonal cycles to capture repeating patterns:
2. Regime/Concept Drift Speed Faster drift → smaller windows:
3. Model Complexity More parameters need more data:
4. Feature Engineering Requirements Technical indicators and lagged features consume data:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138
import numpy as npfrom sklearn.base import BaseEstimator, clonefrom sklearn.metrics import mean_squared_errorfrom typing import List, Dict, Tupleimport matplotlib.pyplot as plt def find_optimal_window_size( X: np.ndarray, y: np.ndarray, model: BaseEstimator, window_sizes: List[int], test_size: int, min_folds: int = 3) -> Dict: """ Find optimal sliding window size by testing multiple sizes. Parameters: ----------- X : np.ndarray Feature matrix (time-ordered) y : np.ndarray Target variable model : BaseEstimator Model to evaluate window_sizes : List[int] Window sizes to test test_size : int Validation set size for each fold min_folds : int Minimum number of folds required for valid comparison Returns: -------- Dict with optimal window size and performance by window size """ n_samples = len(X) results = {} for W in window_sizes: # Check if this window size allows enough folds max_folds = (n_samples - W) // test_size if max_folds < min_folds: print(f"Window {W}: Skipped (only {max_folds} folds possible)") continue fold_scores = [] # Run sliding window CV train_start = 0 while train_start + W + test_size <= n_samples: train_end = train_start + W test_start = train_end test_end = test_start + test_size X_train = X[train_start:train_end] y_train = y[train_start:train_end] X_test = X[test_start:test_end] y_test = y[test_start:test_end] fold_model = clone(model) fold_model.fit(X_train, y_train) predictions = fold_model.predict(X_test) rmse = np.sqrt(mean_squared_error(y_test, predictions)) fold_scores.append(rmse) train_start += test_size results[W] = { 'n_folds': len(fold_scores), 'mean_rmse': np.mean(fold_scores), 'std_rmse': np.std(fold_scores), 'fold_scores': fold_scores } # Find optimal window (lowest mean RMSE) optimal_W = min(results.keys(), key=lambda w: results[w]['mean_rmse']) return { 'optimal_window_size': optimal_W, 'optimal_rmse': results[optimal_W]['mean_rmse'], 'all_results': results } def plot_window_size_analysis(results: Dict): """Visualize performance vs window size.""" window_sizes = sorted(results['all_results'].keys()) means = [results['all_results'][w]['mean_rmse'] for w in window_sizes] stds = [results['all_results'][w]['std_rmse'] for w in window_sizes] plt.figure(figsize=(10, 6)) plt.errorbar(window_sizes, means, yerr=stds, marker='o', capsize=5) plt.axvline( results['optimal_window_size'], color='red', linestyle='--', label=f"Optimal: {results['optimal_window_size']}" ) plt.xlabel('Window Size') plt.ylabel('RMSE') plt.title('Sliding Window CV: Performance vs Window Size') plt.legend() plt.grid(True, alpha=0.3) plt.show() # Example usagefrom sklearn.linear_model import Ridge # Generate sample non-stationary datanp.random.seed(42)n = 500t = np.arange(n) # Regime change at t=250X = np.random.randn(n, 3)y = np.where( t < 250, X @ np.array([1, 0.5, 0.2]) + 0.1 * np.random.randn(n), X @ np.array([0.2, 0.5, 1.5]) + 0.1 * np.random.randn(n) # Changed coefficients) # Find optimal windowresults = find_optimal_window_size( X, y, model=Ridge(alpha=1.0), window_sizes=[50, 100, 150, 200, 300, 400], test_size=25, min_folds=5) print(f"Optimal window size: {results['optimal_window_size']}")print(f"RMSE at optimal: {results['optimal_rmse']:.4f}") # With regime change, smaller windows should perform better# because they don't mix data from different regimesOptimizing window size on the same data you'll later evaluate creates a subtle form of leakage. Ideally, use a holdout test set that wasn't used in window selection, or use nested CV where the inner loop selects window size.
The step size (S) determines how far the window slides between consecutive folds. This parameter controls the overlap between successive training sets and affects both the number of folds and the diversity of validation scenarios.
Step Size Options:
1. Non-overlapping (S = test_size) Each test point appears in exactly one validation fold:
2. Sliding by 1 (S = 1) Maximum overlap between consecutive training sets:
3. Partial overlap (S = test_size / 2) Middle ground:
| Step Size | Number of Folds | Fold Independence | Use Case |
|---|---|---|---|
| S = test_size | Low (~5-10 typical) | High (no overlap) | Standard validation |
| S = 1 | Very high (~n) | Very low (adjacent folds nearly identical) | Temporal analysis, concept drift detection |
| S = test_size/k | Moderate | Moderate | Variance reduction, ensemble training |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
import numpy as npfrom typing import Listfrom scipy import stats def analyze_step_size_impact( n_samples: int, window_size: int, test_size: int, step_sizes: List[int]) -> None: """Analyze impact of different step sizes.""" print("Step Size Impact Analysis") print("=" * 60) print(f"Data size: {n_samples}, Window: {window_size}, Test: {test_size}") print("=" * 60) for step in step_sizes: n_folds = (n_samples - window_size - test_size) // step + 1 # Calculate training set overlap between consecutive folds train_overlap = max(0, window_size - step) / window_size # Calculate test set overlap test_overlap = max(0, test_size - step) / test_size if step < test_size else 0 print(f"\nStep size = {step}:") print(f" Number of folds: {n_folds}") print(f" Training overlap: {train_overlap:.1%}") print(f" Test overlap: {test_overlap:.1%}") print(f" Effective sample size: {n_folds * test_size * (1 - test_overlap):.0f}") def compute_fold_correlation( fold_scores: List[float], step_size: int, window_size: int) -> float: """ Estimate correlation between adjacent fold scores. High correlation indicates folded scores are not independent and simple averaging may underestimate variance. """ if len(fold_scores) < 3: return np.nan scores = np.array(fold_scores) # Lag-1 autocorrelation of fold scores autocorr = np.corrcoef(scores[:-1], scores[1:])[0, 1] # Theoretical overlap-based correlation training_overlap = max(0, (window_size - step_size) / window_size) return { 'empirical_autocorr': autocorr, 'training_overlap': training_overlap, 'expected_corr_lower': training_overlap * 0.5, # Rough lower bound 'effective_n_folds': len(fold_scores) * (1 - abs(autocorr)) } def adjusted_standard_error( fold_scores: List[float], autocorrelation: float) -> float: """ Compute standard error adjusted for fold correlation. When folds are correlated, naive SE understates uncertainty. This applies a Newey-West style correction. """ n = len(fold_scores) naive_se = np.std(fold_scores) / np.sqrt(n) # Correction factor for lag-1 autocorrelation # More sophisticated methods use HAC estimators correction = np.sqrt(1 + 2 * autocorrelation) if autocorrelation > 0 else 1 return naive_se * correction # Exampleanalyze_step_size_impact( n_samples=500, window_size=100, test_size=20, step_sizes=[20, 10, 5, 1]) # Example output:# Step size = 20:# Number of folds: 19# Training overlap: 80.0%# Test overlap: 0.0%## Step size = 1:# Number of folds: 380# Training overlap: 99.0%# Test overlap: 95.0%Anchored walk-forward is a hybrid between pure sliding window and expanding window. Instead of a fixed start that expands or a sliding start with fixed width, anchored walk-forward uses a rolling anchor point that periodically resets.
The Algorithm:
Use Cases:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102
import numpy as npfrom typing import Generator, Tuple def anchored_walk_forward( n_samples: int, anchor_period: int, min_train_size: int, max_train_size: int, test_size: int, gap: int = 0) -> Generator[Tuple[np.ndarray, np.ndarray], None, None]: """ Anchored walk-forward with periodic anchor resets. Parameters: ----------- n_samples : int Total observations anchor_period : int How often to reset the anchor point (in observations) min_train_size : int Minimum training size after anchor reset max_train_size : int Maximum training size before anchor resets test_size : int Validation set size gap : int Embargo/gap between train and test Yields: ------- train_indices, test_indices for each fold """ indices = np.arange(n_samples) anchor = 0 while anchor + min_train_size + gap + test_size <= n_samples: # Expansion phase: grow from anchor train_end = anchor + min_train_size while train_end + gap + test_size <= n_samples: # Check if we've hit max training size current_train_size = train_end - anchor if current_train_size > max_train_size: break test_start = train_end + gap test_end = test_start + test_size yield ( indices[anchor:train_end], indices[test_start:test_end] ) train_end += test_size # Reset anchor forward anchor += anchor_period def visualize_anchored_walk_forward(): """Visualize anchored walk-forward pattern.""" n = 120 print("Anchored Walk-Forward Visualization") print("=" * 60) print("Each '|' marks an anchor reset") print() folds = list(anchored_walk_forward( n_samples=n, anchor_period=40, min_train_size=20, max_train_size=40, test_size=10 )) current_anchor = 0 for fold_idx, (train, test) in enumerate(folds): # Detect anchor reset new_anchor = train[0] reset_marker = " |RESET|" if new_anchor != current_anchor else "" current_anchor = new_anchor visual = ['.'] * (n // 2) for i in train: visual[i // 2] = '#' for i in test: visual[i // 2] = '*' print(f"Fold {fold_idx+1}: {''.join(visual[:50])}{reset_marker}") print(f" Train[{train[0]}:{train[-1]+1}] Test[{test[0]}:{test[-1]+1}]") visualize_anchored_walk_forward() # Output shows training windows that grow, then reset:# Fold 1: Train[0:20] Test[20:30] ##########**...# Fold 2: Train[0:30] Test[30:40] ###############**...# Fold 3: Train[0:40] Test[40:50] ####################**... |RESET|# Fold 4: Train[40:60] Test[60:70] ...##########**...# ...Anchored walk-forward is less common than pure sliding or expanding windows but valuable in production scenarios where you periodically retrain from scratch (e.g., monthly complete retraining) rather than incremental updates. It also helps when you suspect catastrophic forgetting would occur with too much old data.
Rather than guessing which approach suits your data, run a head-to-head comparison. This empirical approach reveals whether your time series benefits from recency (sliding) or volume (expanding).
Comparison Protocol:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151
import numpy as npfrom sklearn.base import clone, BaseEstimatorfrom sklearn.metrics import mean_squared_errorfrom scipy import statsfrom typing import Dict, List def compare_window_strategies( X: np.ndarray, y: np.ndarray, model: BaseEstimator, window_size: int, # Used as initial/fixed size test_size: int, n_folds: int = 5) -> Dict: """ Compare sliding vs expanding window strategies. Both strategies validate on the same time points for fair comparison. """ n_samples = len(X) sliding_scores = [] expanding_scores = [] sliding_preds_all = [] expanding_preds_all = [] actuals_all = [] # Generate matched fold positions fold_positions = [] for i in range(n_folds): train_end_expanding = window_size + i * test_size test_start = train_end_expanding test_end = test_start + test_size if test_end > n_samples: break fold_positions.append({ 'expanding_train': (0, train_end_expanding), 'sliding_train': (train_end_expanding - window_size, train_end_expanding), 'test': (test_start, test_end) }) for fold in fold_positions: test_start, test_end = fold['test'] X_test = X[test_start:test_end] y_test = y[test_start:test_end] actuals_all.extend(y_test) # Expanding window exp_start, exp_end = fold['expanding_train'] X_train_exp = X[exp_start:exp_end] y_train_exp = y[exp_start:exp_end] model_exp = clone(model) model_exp.fit(X_train_exp, y_train_exp) preds_exp = model_exp.predict(X_test) expanding_scores.append(mean_squared_error(y_test, preds_exp)) expanding_preds_all.extend(preds_exp) # Sliding window slide_start, slide_end = fold['sliding_train'] X_train_slide = X[slide_start:slide_end] y_train_slide = y[slide_start:slide_end] model_slide = clone(model) model_slide.fit(X_train_slide, y_train_slide) preds_slide = model_slide.predict(X_test) sliding_scores.append(mean_squared_error(y_test, preds_slide)) sliding_preds_all.extend(preds_slide) # Statistical comparison sliding_mean = np.mean(sliding_scores) expanding_mean = np.mean(expanding_scores) # Paired t-test (folds are matched) t_stat, p_value = stats.ttest_rel(sliding_scores, expanding_scores) # Effect size (Cohen's d) diff = np.array(sliding_scores) - np.array(expanding_scores) cohens_d = np.mean(diff) / np.std(diff) if np.std(diff) > 0 else 0 return { 'sliding': { 'mean_mse': sliding_mean, 'std_mse': np.std(sliding_scores), 'rmse': np.sqrt(sliding_mean), 'fold_scores': sliding_scores }, 'expanding': { 'mean_mse': expanding_mean, 'std_mse': np.std(expanding_scores), 'rmse': np.sqrt(expanding_mean), 'fold_scores': expanding_scores }, 'comparison': { 'difference_mse': sliding_mean - expanding_mean, 'winner': 'sliding' if sliding_mean < expanding_mean else 'expanding', 'relative_improvement': abs(sliding_mean - expanding_mean) / max(sliding_mean, expanding_mean), 'p_value': p_value, 'is_significant': p_value < 0.05, 'cohens_d': cohens_d, 'interpretation': interpret_comparison(cohens_d, p_value) } } def interpret_comparison(cohens_d: float, p_value: float) -> str: """Provide interpretation of statistical comparison.""" if p_value >= 0.05: return "No significant difference - either approach acceptable" effect = abs(cohens_d) if effect < 0.2: magnitude = "negligible" elif effect < 0.5: magnitude = "small" elif effect < 0.8: magnitude = "medium" else: magnitude = "large" better = "sliding" if cohens_d < 0 else "expanding" return f"{better} window is significantly better (p={p_value:.4f}, {magnitude} effect)" # Example with regime change (should favor sliding)np.random.seed(42)n = 500t = np.arange(n)X = np.random.randn(n, 3)y = np.where( t < 250, X @ [1, 0.5, 0.2], X @ [0.2, 0.5, 1.5] # Regime change) + 0.2 * np.random.randn(n) from sklearn.linear_model import Ridgeresults = compare_window_strategies( X, y, Ridge(alpha=1.0), window_size=100, test_size=30, n_folds=8) print("Window Strategy Comparison")print("=" * 50)print(f"Sliding Window RMSE: {results['sliding']['rmse']:.4f}")print(f"Expanding Window RMSE: {results['expanding']['rmse']:.4f}")print(f"\nStatistical Test: {results['comparison']['interpretation']}")Sliding window cross-validation provides the recency-focused alternative to expanding window approaches, trading training data volume for temporal relevance.
The next page explores expanding window cross-validation in greater depth, including strategies for handling the growing training set's computational and statistical implications. We'll also cover when to combine sliding and expanding approaches in a multi-resolution validation strategy.