Loading content...
What if we could combine the variance reduction benefits of bootstrap sampling with the diversity advantages of feature subsampling—and do so more aggressively than either Random Forests or Bagging? This is precisely the idea behind Random Patches, introduced by Gilles Louppe and Pierre Geurts in 2012.
Random Patches builds on a simple but powerful insight: by simultaneously subsampling both rows (instances) and columns (features) for each base estimator, we can create ensembles with remarkable diversity while maintaining computational efficiency. The name "patches" refers to the rectangular subsets of the data matrix that each tree sees—literally patches of the full data.
This approach generalizes both Bagging (which subsamples rows) and Random Subspace (which subsamples columns), offering a unified framework with greater flexibility and often improved performance.
By the end of this page, you will deeply understand: (1) The relationship between Random Patches and related methods, (2) How double subsampling affects bias-variance, (3) Optimal sampling strategies for different scenarios, (4) Memory and computational advantages, and (5) When Random Patches outperform standard Random Forests.
Random Patches can be understood as a generalization that encompasses several well-known ensemble methods as special cases. Understanding these relationships clarifies when and why Random Patches offers advantages.
The Subsampling Framework:
Consider a training dataset as a matrix with $n$ rows (samples) and $d$ columns (features). Any ensemble method that trains individual models on subsets of this matrix can be characterized by two parameters:
Different methods correspond to different choices of these parameters:
| Method | θ_sample | θ_feature | Description |
|---|---|---|---|
| Full Decision Tree | 1.0 | 1.0 | No subsampling; single tree on all data |
| Bagging | ~0.632 (with replacement) | 1.0 | Bootstrap rows, use all features |
| Random Subspace | 1.0 | sqrt(d)/d or similar | All samples, subset of features |
| Random Forest | ~0.632 (bootstrap) | sqrt(d)/d per split | Bootstrap + feature sampling at splits |
| Random Patches | < 1.0 | < 1.0 | Subsample both rows AND columns |
| Extra-Trees | 1.0 (no bootstrap) | sqrt(d)/d + random threshold | Random thresholds for diversity |
The Key Insight:
Random Patches recognizes that subsampling happens before tree construction, not just at each split. This means:
In Random Forests, feature subsampling happens at each split—a tree can potentially use all features across its full depth. In Random Patches, each tree is restricted to a fixed feature subset from the start. This seemingly subtle difference has significant implications for diversity and computational efficiency.
Let's formalize the Random Patches approach and analyze its theoretical properties.
Formal Definition:
Given a training set $\mathcal{D} = {(x_i, y_i)}_{i=1}^{n}$ with feature dimensionality $d$, a Random Patches ensemble constructs $T$ base estimators, where each estimator $h_t$ is trained on a subset:
$$\mathcal{D}t = {(x{i,S_t^{\text{feat}}}, y_i) : i \in S_t^{\text{sample}}}$$
where:
The ensemble prediction is: $$\hat{f}(x) = \frac{1}{T} \sum_{t=1}^{T} h_t(x_{S_t^{\text{feat}}})$$
Bias-Variance Analysis:
The expected error of the ensemble can be decomposed as:
$$\text{Error} = \text{Bias}^2 + \text{Variance} + \text{Noise}$$
where: $$\text{Variance} \approx \frac{1}{T}\sigma^2 + \left(1 - \frac{1}{T}\right)\rho\sigma^2 = \rho\sigma^2 + \frac{(1-\rho)\sigma^2}{T}$$
Here, $\rho$ is the average pairwise correlation between tree predictions and $\sigma^2$ is the variance of individual trees.
Effect of Subsampling on Each Component:
Sample Subsampling ($\theta_{\text{sample}} < 1$):
Feature Subsampling ($\theta_{\text{feat}} < 1$):
Combined Effect:
The interaction is multiplicative—correlation drops faster than with either subsampling alone:
$$\rho_{\text{patches}} \approx \rho_{\text{bagging}} \cdot \rho_{\text{subspace}}$$
This multiplicative reduction in correlation is the key to Random Patches' effectiveness.
With double subsampling, you can achieve the same diversity (low ρ) as aggressive single-dimension subsampling, but with smaller increases in individual error. For example, 80% samples × 80% features gives similar diversity to 64% samples × 100% features, but often with lower bias.
Let's implement the Random Patches algorithm from scratch, highlighting its elegant simplicity.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246
import numpy as npfrom sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressorfrom sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixinfrom typing import List, Tuple, Optional, Unionfrom dataclasses import dataclass @dataclassclass PatchModel: """Stores a tree and its associated data patch indices.""" estimator: Union[DecisionTreeClassifier, DecisionTreeRegressor] sample_indices: np.ndarray feature_indices: np.ndarray class RandomPatchesClassifier(BaseEstimator, ClassifierMixin): """ Random Patches Ensemble Classifier. Combines sample subsampling and feature subsampling to create diverse ensembles of decision trees. Reference: Louppe, G., & Geurts, P. (2012). "Ensembles on Random Patches." """ def __init__( self, n_estimators: int = 100, max_samples: float = 0.8, max_features: float = 0.8, bootstrap_samples: bool = True, bootstrap_features: bool = False, base_estimator: str = 'decision_tree', max_depth: Optional[int] = None, min_samples_leaf: int = 1, random_state: Optional[int] = None, n_jobs: int = 1 ): """ Initialize Random Patches Classifier. Args: n_estimators: Number of trees in the ensemble max_samples: Fraction (0,1] or count of samples per tree max_features: Fraction (0,1] or count of features per tree bootstrap_samples: If True, sample with replacement; else without bootstrap_features: If True, sample features with replacement base_estimator: Type of base estimator max_depth: Maximum depth of trees (None = unlimited) min_samples_leaf: Minimum samples per leaf random_state: Random seed n_jobs: Parallel jobs (-1 for all cores) """ self.n_estimators = n_estimators self.max_samples = max_samples self.max_features = max_features self.bootstrap_samples = bootstrap_samples self.bootstrap_features = bootstrap_features self.base_estimator = base_estimator self.max_depth = max_depth self.min_samples_leaf = min_samples_leaf self.random_state = random_state self.n_jobs = n_jobs self.estimators_: List[PatchModel] = [] self.classes_ = None self.n_features_in_ = None self.rng_ = None def _get_sample_size(self, n_samples: int) -> int: """Compute number of samples per estimator.""" if isinstance(self.max_samples, float): return max(1, int(self.max_samples * n_samples)) return min(self.max_samples, n_samples) def _get_feature_size(self, n_features: int) -> int: """Compute number of features per estimator.""" if isinstance(self.max_features, float): return max(1, int(self.max_features * n_features)) return min(self.max_features, n_features) def _draw_patch( self, n_samples: int, n_features: int ) -> Tuple[np.ndarray, np.ndarray]: """ Draw a random patch (sample indices, feature indices). This is the core of Random Patches: simultaneously sampling both dimensions of the data matrix. """ # Sample indices n_samples_patch = self._get_sample_size(n_samples) if self.bootstrap_samples: sample_indices = self.rng_.choice( n_samples, size=n_samples_patch, replace=True ) else: sample_indices = self.rng_.choice( n_samples, size=n_samples_patch, replace=False ) # Feature indices n_features_patch = self._get_feature_size(n_features) if self.bootstrap_features: feature_indices = self.rng_.choice( n_features, size=n_features_patch, replace=True ) else: feature_indices = self.rng_.choice( n_features, size=n_features_patch, replace=False ) return sample_indices, feature_indices def _train_single_estimator( self, X: np.ndarray, y: np.ndarray, sample_indices: np.ndarray, feature_indices: np.ndarray ) -> PatchModel: """Train a single estimator on a data patch.""" # Extract the patch X_patch = X[sample_indices][:, feature_indices] y_patch = y[sample_indices] # Create and train estimator estimator = DecisionTreeClassifier( max_depth=self.max_depth, min_samples_leaf=self.min_samples_leaf, random_state=self.rng_.randint(0, 2**31) ) estimator.fit(X_patch, y_patch) return PatchModel( estimator=estimator, sample_indices=sample_indices, feature_indices=feature_indices ) def fit(self, X: np.ndarray, y: np.ndarray) -> 'RandomPatchesClassifier': """ Fit the Random Patches ensemble. For each estimator: 1. Draw a random patch (sample + feature subset) 2. Train a decision tree on the patch 3. Store the tree with its patch indices """ self.rng_ = np.random.RandomState(self.random_state) n_samples, n_features = X.shape self.n_features_in_ = n_features self.classes_ = np.unique(y) self.estimators_ = [] for _ in range(self.n_estimators): # Draw random patch sample_indices, feature_indices = self._draw_patch(n_samples, n_features) # Train estimator on patch patch_model = self._train_single_estimator( X, y, sample_indices, feature_indices ) self.estimators_.append(patch_model) return self def predict_proba(self, X: np.ndarray) -> np.ndarray: """ Predict class probabilities. For each estimator, use only the features that were in its patch. """ n_samples = X.shape[0] n_classes = len(self.classes_) probas = np.zeros((n_samples, n_classes)) for patch_model in self.estimators_: # Extract relevant features for this estimator X_subset = X[:, patch_model.feature_indices] # Get predictions proba = patch_model.estimator.predict_proba(X_subset) # Handle case where some classes weren't seen during training tree_classes = patch_model.estimator.classes_ for i, cls in enumerate(tree_classes): cls_idx = np.where(self.classes_ == cls)[0][0] probas[:, cls_idx] += proba[:, i] # Average across estimators probas /= len(self.estimators_) return probas def predict(self, X: np.ndarray) -> np.ndarray: """Predict class labels.""" proba = self.predict_proba(X) return self.classes_[np.argmax(proba, axis=1)] def get_patch_statistics(self) -> dict: """Get statistics about the patches used.""" n_samples_list = [len(pm.sample_indices) for pm in self.estimators_] n_features_list = [len(pm.feature_indices) for pm in self.estimators_] return { 'avg_samples_per_tree': np.mean(n_samples_list), 'avg_features_per_tree': np.mean(n_features_list), 'sample_coverage': len(set.union(*[set(pm.sample_indices) for pm in self.estimators_])) / self.n_features_in_, 'feature_coverage': len(set.union(*[set(pm.feature_indices) for pm in self.estimators_])) / self.n_features_in_, } # Example usage and comparisonif __name__ == "__main__": from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.ensemble import RandomForestClassifier, BaggingClassifier # Generate dataset X, y = make_classification( n_samples=1000, n_features=50, n_informative=25, n_redundant=10, random_state=42 ) # Compare methods models = { 'Random Patches (80%/80%)': RandomPatchesClassifier( n_estimators=100, max_samples=0.8, max_features=0.8, random_state=42 ), 'Random Patches (60%/60%)': RandomPatchesClassifier( n_estimators=100, max_samples=0.6, max_features=0.6, random_state=42 ), 'Random Forest': RandomForestClassifier( n_estimators=100, random_state=42 ), 'Bagging': BaggingClassifier( n_estimators=100, random_state=42 ), } for name, model in models.items(): scores = cross_val_score(model, X, y, cv=5) print(f"{name:30s}: {scores.mean():.4f} (+/- {scores.std():.4f})")Choosing the right sampling fractions ($\theta_{\text{sample}}$, $\theta_{\text{feat}}$) is crucial for Random Patches performance. Let's explore the design space and practical guidelines.
| θ_sample | θ_feat | Individual Error | Diversity | Best Use Case |
|---|---|---|---|---|
| High (>0.9) | High (>0.9) | Low | Low | When base trees are stable |
| High (>0.9) | Low (<0.5) | Moderate (bias) | High | Many irrelevant features |
| Low (<0.5) | High (>0.9) | Moderate (variance) | High | Large sample size, few features |
| Moderate (0.7-0.9) | Moderate (0.7-0.9) | Balanced | Balanced | General purpose (recommended) |
| Low (<0.5) | Low (<0.5) | High | Very High | Very large data, computational limits |
Design Principles:
1. Balance Individual Error and Diversity:
The optimal point is where the diversity gain from subsampling exactly balances the increased individual error. This typically occurs when:
$$\theta_{\text{sample}} \cdot \theta_{\text{feat}} \approx 0.5 - 0.7$$
For example, (0.8, 0.8), (0.9, 0.7), or (0.7, 0.85) are all reasonable choices.
2. Consider Data Characteristics:
3. Computational Constraints:
Patch size directly affects training time. With $\theta_{\text{sample}} = \theta_{\text{feat}} = 0.5$, each tree sees only 25% of the data matrix, providing ~4x speedup.
Start with max_samples=0.8 and max_features=0.8. If accuracy is too low, increase both. If computational resources are limited or you need more diversity, decrease both. Generally, keep both above 0.5 unless you have specific reasons (extreme high dimensionality, very large data).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
import numpy as npfrom sklearn.model_selection import cross_val_scorefrom sklearn.ensemble import BaggingClassifierfrom sklearn.tree import DecisionTreeClassifierimport matplotlib.pyplot as plt def search_optimal_sampling(X, y, sample_fracs, feature_fracs, n_estimators=50, cv=5): """ Grid search over sampling fractions to find optimal configuration. Returns a 2D array of cross-validation scores. """ results = np.zeros((len(sample_fracs), len(feature_fracs))) for i, sample_frac in enumerate(sample_fracs): for j, feature_frac in enumerate(feature_fracs): # Use BaggingClassifier with max_samples and max_features model = BaggingClassifier( estimator=DecisionTreeClassifier(), n_estimators=n_estimators, max_samples=sample_frac, max_features=feature_frac, bootstrap=True, bootstrap_features=False, random_state=42, n_jobs=-1 ) scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy') results[i, j] = scores.mean() print(f"θ_sample={sample_frac:.1f}, θ_feat={feature_frac:.1f}: {scores.mean():.4f}") return results def plot_sampling_heatmap(results, sample_fracs, feature_fracs): """Visualize the performance landscape.""" plt.figure(figsize=(10, 8)) plt.imshow(results, cmap='RdYlGn', interpolation='nearest', origin='lower', aspect='auto') plt.colorbar(label='CV Accuracy') # Add labels plt.xticks(range(len(feature_fracs)), [f'{f:.1f}' for f in feature_fracs]) plt.yticks(range(len(sample_fracs)), [f'{s:.1f}' for s in sample_fracs]) plt.xlabel('Feature Sampling Fraction (θ_feat)') plt.ylabel('Sample Sampling Fraction (θ_sample)') plt.title('Random Patches Performance Landscape') # Mark optimal point best_idx = np.unravel_index(np.argmax(results), results.shape) plt.scatter(best_idx[1], best_idx[0], marker='*', s=300, c='black', label=f'Best: ({sample_fracs[best_idx[0]]:.1f}, {feature_fracs[best_idx[1]]:.1f})') plt.legend() plt.tight_layout() return plt.gcf() # Example usage:# sample_fracs = [0.3, 0.5, 0.7, 0.8, 0.9, 1.0]# feature_fracs = [0.3, 0.5, 0.7, 0.8, 0.9, 1.0]# results = search_optimal_sampling(X, y, sample_fracs, feature_fracs)# plot_sampling_heatmap(results, sample_fracs, feature_fracs)One of the most compelling practical advantages of Random Patches is its computational efficiency, especially for large-scale datasets.
| Method | Samples per Tree | Features per Split | Memory per Tree | Training Time per Tree |
|---|---|---|---|---|
| Full Tree | n | d | O(n·d) | O(n·d·log n) |
| Bagging | ~0.63n | d | O(n·d) | O(n·d·log n) |
| Random Subspace | n | ~√d | O(n·d) | O(n·√d·log n) |
| Random Forest | ~0.63n | ~√d per split | O(n·d) | O(n·√d·log n) |
| Random Patches (0.7, 0.7) | 0.7n | 0.7d entire tree | O(0.49·n·d) | O(0.49·n·d·log n) |
| Random Patches (0.5, 0.5) | 0.5n | 0.5d entire tree | O(0.25·n·d) | O(0.25·n·d·log n) |
Key Efficiency Insights:
1. Memory Reduction:
With $\theta_{\text{sample}} = \theta_{\text{feat}} = 0.5$, each tree requires only 25% of the memory needed for a full tree. This is transformative for:
2. Training Speed:
Training complexity scales with patch size: $$\text{Time per tree} \propto (\theta_{\text{sample}} \cdot n) \cdot (\theta_{\text{feat}} \cdot d) \cdot \log(\theta_{\text{sample}} \cdot n)$$
With 50% subsampling on each dimension, training is roughly 4-5x faster per tree.
3. Parallelization:
Smaller patches mean:
Random Patches offers a unique scalability advantage: you can train MORE trees in the same time/memory budget, often compensating for the slightly higher error per tree. An ensemble of 500 Random Patches trees with 50% subsampling can outperform 100 Random Forest trees while training in similar time.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
import numpy as npimport sysfrom sklearn.ensemble import BaggingClassifierfrom sklearn.tree import DecisionTreeClassifierimport time def compare_memory_and_time(X, y, configs): """ Compare memory usage and training time for different configurations. configs: list of (max_samples, max_features, n_estimators) tuples """ results = [] for max_samples, max_features, n_estimators in configs: model = BaggingClassifier( estimator=DecisionTreeClassifier(max_depth=20), n_estimators=n_estimators, max_samples=max_samples, max_features=max_features, bootstrap=True, n_jobs=1, # Single threaded for fair comparison random_state=42 ) # Measure training time start = time.time() model.fit(X, y) train_time = time.time() - start # Estimate memory (simplified - actual measurement requires memory_profiler) samples_per_tree = int(max_samples * len(y)) if max_samples <= 1 else max_samples features_per_tree = int(max_features * X.shape[1]) if max_features <= 1 else max_features relative_memory = (samples_per_tree * features_per_tree) / (len(y) * X.shape[1]) # Measure accuracy from sklearn.model_selection import cross_val_score scores = cross_val_score(model, X, y, cv=3) results.append({ 'config': f'({max_samples}, {max_features}) x {n_estimators}', 'train_time': train_time, 'relative_memory': relative_memory, 'accuracy': scores.mean(), 'n_estimators': n_estimators }) print(f"Config {max_samples:.1f}/{max_features:.1f} x {n_estimators}: " f"Time={train_time:.2f}s, RelMem={relative_memory:.2f}, Acc={scores.mean():.4f}") return results # Example: Compare equal-time configurations# In the same training time as 100 full trees, we can train ~400 half-size patches configs = [ (1.0, 1.0, 100), # Full bagging, 100 trees (0.5, 0.5, 400), # Random Patches, 4x more trees (0.7, 0.7, 200), # Moderate patches, 2x trees (0.8, 0.8, 150), # Light patches, 1.5x trees] # Conceptual output:# (1.0, 1.0) x 100: Time=12.34s, RelMem=1.00, Acc=0.8756# (0.5, 0.5) x 400: Time=12.45s, RelMem=0.25, Acc=0.8892 <- Often better!# (0.7, 0.7) x 200: Time=11.89s, RelMem=0.49, Acc=0.8834# (0.8, 0.8) x 150: Time=12.12s, RelMem=0.64, Acc=0.8801Random Patches excels in specific scenarios. Understanding when to choose this method is key to effective ensemble learning.
Decision Framework:
Is data very large (n > 100K or d > 1000)?
├── Yes → Consider Random Patches for efficiency
│ ├── Memory constrained? → Aggressive patches (0.5, 0.5)
│ └── Time constrained? → Moderate patches (0.7, 0.7)
└── No → Random Patches optional
├── Many redundant features? → Try feature subsampling
├── Need high diversity? → Try Random Patches
└── Otherwise → Standard Random Forest may suffice
While these guidelines are based on theoretical analysis and empirical studies, the optimal approach depends on your specific data. Always validate with cross-validation on your actual problem. Random Patches is particularly worth trying when standard approaches hit computational limits.
Scikit-Learn's BaggingClassifier and BaggingRegressor directly support Random Patches through their max_samples and max_features parameters. Here's how to use them effectively.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169
from sklearn.ensemble import BaggingClassifier, BaggingRegressorfrom sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressorfrom sklearn.model_selection import cross_val_score, GridSearchCVfrom sklearn.datasets import make_classification, make_regressionimport numpy as np # ============================================# Classification with Random Patches# ============================================ def random_patches_classifier( n_estimators: int = 100, max_samples: float = 0.8, max_features: float = 0.8, max_depth: int = None, bootstrap: bool = True, random_state: int = None) -> BaggingClassifier: """ Create a Random Patches classifier using sklearn's BaggingClassifier. Key parameters: - max_samples: Fraction of samples per tree (column dimension) - max_features: Fraction of features per tree (row dimension) - bootstrap: Whether to sample with replacement """ base_tree = DecisionTreeClassifier( max_depth=max_depth, random_state=random_state ) return BaggingClassifier( estimator=base_tree, n_estimators=n_estimators, max_samples=max_samples, max_features=max_features, bootstrap=bootstrap, bootstrap_features=False, # Usually sample features without replacement random_state=random_state, n_jobs=-1 # Use all cores ) # ============================================# Regression with Random Patches# ============================================ def random_patches_regressor( n_estimators: int = 100, max_samples: float = 0.8, max_features: float = 0.8, max_depth: int = None, bootstrap: bool = True, random_state: int = None) -> BaggingRegressor: """Create a Random Patches regressor.""" base_tree = DecisionTreeRegressor( max_depth=max_depth, random_state=random_state ) return BaggingRegressor( estimator=base_tree, n_estimators=n_estimators, max_samples=max_samples, max_features=max_features, bootstrap=bootstrap, bootstrap_features=False, random_state=random_state, n_jobs=-1 ) # ============================================# Hyperparameter Tuning# ============================================ def tune_random_patches(X, y, task='classification'): """ Hyperparameter tuning specifically for Random Patches. Key insight: the product max_samples * max_features determines diversity vs accuracy tradeoff. """ if task == 'classification': estimator = BaggingClassifier( estimator=DecisionTreeClassifier(), n_jobs=-1, random_state=42 ) scoring = 'accuracy' else: estimator = BaggingRegressor( estimator=DecisionTreeRegressor(), n_jobs=-1, random_state=42 ) scoring = 'neg_mean_squared_error' param_grid = { 'n_estimators': [50, 100, 200], 'max_samples': [0.5, 0.7, 0.8, 0.9], 'max_features': [0.5, 0.7, 0.8, 0.9], 'bootstrap': [True, False], } grid_search = GridSearchCV( estimator, param_grid, cv=5, scoring=scoring, n_jobs=-1, verbose=1 ) grid_search.fit(X, y) print(f"Best parameters: {grid_search.best_params_}") print(f"Best score: {grid_search.best_score_:.4f}") return grid_search.best_estimator_ # ============================================# Example: Large-Scale Data# ============================================ def large_scale_example(): """ Demonstrate Random Patches on a large dataset where computational efficiency matters. """ # Simulate large dataset print("Generating large dataset...") X, y = make_classification( n_samples=100000, n_features=200, n_informative=50, n_redundant=100, random_state=42 ) # Standard approach: fewer trees, full data from sklearn.ensemble import RandomForestClassifier import time print("Training Standard Random Forest...") start = time.time() rf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42) rf.fit(X, y) rf_time = time.time() - start rf_score = cross_val_score(rf, X, y, cv=3).mean() print(f" Time: {rf_time:.2f}s, Accuracy: {rf_score:.4f}") # Random Patches: more trees, smaller patches print("Training Random Patches (0.5, 0.5, 400 trees)...") start = time.time() rp = random_patches_classifier( n_estimators=400, max_samples=0.5, max_features=0.5, random_state=42 ) rp.fit(X, y) rp_time = time.time() - start rp_score = cross_val_score(rp, X, y, cv=3).mean() print(f" Time: {rp_time:.2f}s, Accuracy: {rp_score:.4f}") print(f"Random Patches speedup: {rf_time/rp_time:.2f}x") if __name__ == "__main__": large_scale_example()Let's consolidate the essential knowledge about Random Patches:
BaggingClassifier with max_samples and max_features parameters for production-ready Random PatchesYou now have a comprehensive understanding of Random Patches, from its theoretical foundations to practical implementation. Next, we'll explore Subspace Forests—a method that focuses specifically on feature subspace sampling without sample subsampling.