Loading learning content...
In high-dimensional data, a fascinating phenomenon emerges: different subsets of features can provide entirely different perspectives on the same classification problem. This observation is the foundation of the Random Subspace Method, introduced by Tin Kam Ho in 1998—predating Random Forests by three years.
The core insight is elegant: rather than having all ensemble members examine the same features (potentially discovering the same patterns and making correlated errors), we train each member on a randomly selected subspace of the original feature space. The result is an ensemble where each member has literally a different "view" of the data.
Subspace Forests—ensembles of decision trees trained using the Random Subspace Method—offer unique advantages for high-dimensional problems, including robustness to irrelevant features, reduced overfitting, and the ability to capture complementary patterns from different feature combinations.
By the end of this page, you will deeply understand: (1) The theoretical foundations of the Random Subspace Method, (2) Why feature subspacing creates effective diversity, (3) The relationship to Random Forests and other methods, (4) Optimal subspace size selection, and (5) When Subspace Forests outperform alternatives.
The Random Subspace Method (RSM) creates ensemble diversity through a fundamentally different mechanism than bootstrap sampling. Instead of varying which samples each model sees, RSM varies which features each model can access.
Formal Definition:
Given a dataset with $n$ samples and $d$ features, the Random Subspace Method:
Key Difference from Random Forests:
In Random Forests, feature selection happens at each split—a tree can potentially use all features across its full depth. In RSM:
| Method | When Features Selected | Features per Split | Total Features per Tree |
|---|---|---|---|
| Full Tree | Never (all used) | All d | All d |
| Random Forest | At each split | sqrt(d) candidates | Potentially all d |
| Random Subspace | Once per tree | All k (from subset) | Exactly k < d |
| Extra-Trees | At each split | sqrt(d) candidates + random threshold | Potentially all d |
Think of each subspace tree as an expert that knows only certain "dimensions" of the problem. Tree A might excel at patterns visible in features {1,3,7}, while Tree B captures patterns in features {2,4,5}. Together, they cover the full feature space through their collective expertise.
The effectiveness of the Random Subspace Method rests on several theoretical pillars. Let's examine each in detail.
1. Dimension Reduction and the Curse of Dimensionality:
In high-dimensional spaces, decision trees face the curse of dimensionality:
By training each tree on a $k$-dimensional subspace where $k \ll d$, individual trees operate in a more manageable space:
$$\text{Effective density per tree} \propto n^{1/k} \gg n^{1/d}$$
This higher effective density allows trees to make more reliable splits.
2. Error Decorrelation:
The ensemble error bound depends on the correlation $\rho$ between tree predictions:
$$\text{Ensemble Variance} = \rho \cdot \sigma^2 + \frac{(1-\rho)\sigma^2}{T}$$
For RSM, trees trained on non-overlapping features have predictions that are approximately independent for regions of the feature space where the excluded features provide discriminative power.
3. Feature Importance Distribution:
Let $I_j$ denote the importance of feature $j$. In RSM:
This creates an implicit soft feature selection: important features influence predictions often, while noise features rarely get to dominate any tree.
4. Bias-Variance Trade-off:
RSM affects bias and variance differently:
The net effect is often positive, especially when:
RSM works best when multiple subsets of features each contain enough information to approximate the target function. In domains with high feature redundancy (genomics, image features, text), this assumption typically holds. In domains with few, non-redundant features, RSM may hurt performance.
Let's implement a Subspace Forest from scratch, highlighting the key algorithmic decisions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246
import numpy as npfrom sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressorfrom sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixinfrom typing import List, Tuple, Optional, Unionfrom dataclasses import dataclass @dataclassclass SubspaceTree: """A tree trained on a feature subspace.""" tree: Union[DecisionTreeClassifier, DecisionTreeRegressor] feature_indices: np.ndarray # Which features this tree uses class SubspaceForestClassifier(BaseEstimator, ClassifierMixin): """ Subspace Forest (Random Subspace Method with Decision Trees). Each tree is trained on a random subset of features, with all training samples. This differs from Random Forest where feature selection happens at each split. Reference: Ho, T.K. (1998). "The Random Subspace Method for Constructing Decision Forests." """ def __init__( self, n_estimators: int = 100, max_features: Union[int, float, str] = 0.5, max_depth: Optional[int] = None, min_samples_split: int = 2, min_samples_leaf: int = 1, random_state: Optional[int] = None, n_jobs: int = 1 ): """ Initialize Subspace Forest. Args: n_estimators: Number of trees max_features: Number of features per subspace - int: exact number - float (0,1): fraction of total features - 'sqrt': sqrt(n_features) - 'log2': log2(n_features) max_depth: Maximum tree depth min_samples_split: Minimum samples to split a node min_samples_leaf: Minimum samples per leaf random_state: Random seed n_jobs: Number of parallel jobs """ self.n_estimators = n_estimators self.max_features = max_features self.max_depth = max_depth self.min_samples_split = min_samples_split self.min_samples_leaf = min_samples_leaf self.random_state = random_state self.n_jobs = n_jobs self.trees_: List[SubspaceTree] = [] self.classes_ = None self.n_features_in_ = None def _get_subspace_size(self, n_features: int) -> int: """Compute the subspace dimensionality.""" if isinstance(self.max_features, int): return min(self.max_features, n_features) elif isinstance(self.max_features, float): return max(1, int(self.max_features * n_features)) elif self.max_features == 'sqrt': return max(1, int(np.sqrt(n_features))) elif self.max_features == 'log2': return max(1, int(np.log2(n_features))) else: return n_features def _sample_subspace( self, n_features: int, rng: np.random.RandomState ) -> np.ndarray: """ Sample a random feature subspace. Features are sampled WITHOUT replacement to ensure a proper k-dimensional subspace. """ k = self._get_subspace_size(n_features) return rng.choice(n_features, size=k, replace=False) def fit(self, X: np.ndarray, y: np.ndarray) -> 'SubspaceForestClassifier': """ Fit the Subspace Forest. Unlike Random Forest, each tree is trained on ALL samples but only a SUBSET of features. """ rng = np.random.RandomState(self.random_state) n_samples, n_features = X.shape self.n_features_in_ = n_features self.classes_ = np.unique(y) self.trees_ = [] for i in range(self.n_estimators): # Sample feature subspace feature_indices = self._sample_subspace(n_features, rng) # Extract subspace data (ALL samples, SUBSET of features) X_subspace = X[:, feature_indices] # Train tree on subspace tree = DecisionTreeClassifier( max_depth=self.max_depth, min_samples_split=self.min_samples_split, min_samples_leaf=self.min_samples_leaf, random_state=rng.randint(0, 2**31) ) tree.fit(X_subspace, y) self.trees_.append(SubspaceTree( tree=tree, feature_indices=feature_indices )) return self def predict_proba(self, X: np.ndarray) -> np.ndarray: """ Predict class probabilities by averaging tree predictions. Each tree predicts using only its subspace features. """ n_samples = X.shape[0] n_classes = len(self.classes_) probas = np.zeros((n_samples, n_classes)) for subspace_tree in self.trees_: # Extract this tree's features X_subspace = X[:, subspace_tree.feature_indices] # Get tree's predictions tree_proba = subspace_tree.tree.predict_proba(X_subspace) # Handle potential class mismatch tree_classes = subspace_tree.tree.classes_ for i, cls in enumerate(tree_classes): cls_idx = np.where(self.classes_ == cls)[0] if len(cls_idx) > 0: probas[:, cls_idx[0]] += tree_proba[:, i] # Average probas /= len(self.trees_) return probas def predict(self, X: np.ndarray) -> np.ndarray: """Predict class labels.""" proba = self.predict_proba(X) return self.classes_[np.argmax(proba, axis=1)] def get_feature_coverage(self) -> dict: """ Analyze feature coverage across the ensemble. Returns statistics about how features are distributed across trees. """ n_features = self.n_features_in_ feature_counts = np.zeros(n_features) for subspace_tree in self.trees_: feature_counts[subspace_tree.feature_indices] += 1 return { 'feature_counts': feature_counts, 'coverage_ratio': (feature_counts > 0).sum() / n_features, 'avg_trees_per_feature': feature_counts.mean(), 'min_coverage': feature_counts.min(), 'max_coverage': feature_counts.max(), } def get_subspace_feature_importance(self) -> np.ndarray: """ Compute feature importance aggregated across subspaces. Note: Unlike RF, features not in a subspace get 0 importance from that tree. We aggregate by averaging only over trees that include each feature. """ n_features = self.n_features_in_ importance_sum = np.zeros(n_features) feature_counts = np.zeros(n_features) for subspace_tree in self.trees_: tree_importance = subspace_tree.tree.feature_importances_ for i, feat_idx in enumerate(subspace_tree.feature_indices): importance_sum[feat_idx] += tree_importance[i] feature_counts[feat_idx] += 1 # Average importance where feature was included with np.errstate(divide='ignore', invalid='ignore'): importance = np.where( feature_counts > 0, importance_sum / feature_counts, 0 ) return importance # Demonstrationif __name__ == "__main__": from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.ensemble import RandomForestClassifier # High-dimensional dataset X, y = make_classification( n_samples=1000, n_features=100, n_informative=20, n_redundant=30, n_clusters_per_class=2, random_state=42 ) # Compare methods sf = SubspaceForestClassifier( n_estimators=100, max_features=0.5, random_state=42 ) rf = RandomForestClassifier( n_estimators=100, random_state=42 ) sf_scores = cross_val_score(sf, X, y, cv=5) rf_scores = cross_val_score(rf, X, y, cv=5) print(f"Subspace Forest: {sf_scores.mean():.4f} (+/- {sf_scores.std():.4f})") print(f"Random Forest: {rf_scores.mean():.4f} (+/- {rf_scores.std():.4f})") # Feature coverage analysis sf.fit(X, y) coverage = sf.get_feature_coverage() print(f"Feature Coverage: {coverage['coverage_ratio']:.2%}") print(f"Avg trees per feature: {coverage['avg_trees_per_feature']:.1f}")The subspace dimensionality $k$ is the critical hyperparameter in the Random Subspace Method. Choosing $k$ involves a fundamental trade-off.
The Trade-off:
Theoretical Guidance:
For problems where a subset of size $r$ features is sufficient for accurate classification:
$$k \geq r \cdot \left(1 + \log\frac{d}{r}\right)$$
ensures high probability that each subspace contains at least some informative features.
Practical Guidelines:
| Scenario | Recommended k | Rationale |
|---|---|---|
| High redundancy (genomics, images) | 0.3d - 0.5d | Each subspace likely captures patterns |
| Moderate redundancy | 0.5d - 0.7d | Balance diversity and information |
| Low redundancy | 0.7d - 0.9d | Need most features for accuracy |
| Unknown structure | 0.5d (default) | Robust starting point |
| Very high d (>1000) | sqrt(d) or log(d) | Computational efficiency |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
import numpy as npfrom sklearn.model_selection import cross_val_scorefrom sklearn.ensemble import BaggingClassifierfrom sklearn.tree import DecisionTreeClassifierimport matplotlib.pyplot as plt def analyze_subspace_size(X, y, subspace_fractions, n_estimators=100, cv=5): """ Analyze performance across different subspace sizes. Returns data for plotting the bias-variance trade-off as subspace size varies. """ results = [] n_features = X.shape[1] for frac in subspace_fractions: k = max(1, int(frac * n_features)) # Use BaggingClassifier with bootstrap=False to simulate RSM model = BaggingClassifier( estimator=DecisionTreeClassifier(), n_estimators=n_estimators, max_samples=1.0, # All samples max_features=k, # k features per tree bootstrap=False, # No sample bootstrapping bootstrap_features=False, random_state=42, n_jobs=-1 ) scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy') results.append({ 'fraction': frac, 'k': k, 'mean_accuracy': scores.mean(), 'std_accuracy': scores.std(), }) print(f"k={k:3d} ({frac:.0%}): {scores.mean():.4f} (+/- {scores.std():.4f})") return results def plot_subspace_analysis(results): """Visualize the subspace size impact.""" fractions = [r['fraction'] for r in results] means = [r['mean_accuracy'] for r in results] stds = [r['std_accuracy'] for r in results] fig, ax = plt.subplots(figsize=(10, 6)) ax.errorbar(fractions, means, yerr=stds, marker='o', capsize=5, linewidth=2, markersize=8) ax.set_xlabel('Subspace Fraction (k/d)', fontsize=12) ax.set_ylabel('Cross-Validation Accuracy', fontsize=12) ax.set_title('Subspace Forest: Effect of Subspace Size', fontsize=14) ax.grid(True, alpha=0.3) # Annotate optimal best_idx = np.argmax(means) ax.annotate(f'Optimal: {fractions[best_idx]:.0%}', xy=(fractions[best_idx], means[best_idx]), xytext=(fractions[best_idx] + 0.1, means[best_idx] + 0.02), arrowprops=dict(arrowstyle='->', color='red'), fontsize=11, color='red') plt.tight_layout() return fig def estimate_optimal_subspace_size(X, y, cv=5, n_estimators=50): """ Estimate optimal subspace size using coarse-to-fine search. More efficient than full grid search for production use. """ n_features = X.shape[1] # Coarse search coarse_fractions = [0.1, 0.3, 0.5, 0.7, 0.9] coarse_results = analyze_subspace_size( X, y, coarse_fractions, n_estimators=n_estimators, cv=cv ) # Find best region best_idx = np.argmax([r['mean_accuracy'] for r in coarse_results]) best_frac = coarse_fractions[best_idx] # Fine search around best low = max(0.05, best_frac - 0.15) high = min(0.95, best_frac + 0.15) fine_fractions = np.linspace(low, high, 5) fine_results = analyze_subspace_size( X, y, fine_fractions, n_estimators=n_estimators, cv=cv ) final_best = max(fine_results, key=lambda x: x['mean_accuracy']) print(f"Optimal subspace size: {final_best['k']} features " f"({final_best['fraction']:.1%} of {n_features})") return final_best # Example typical output:# k= 10 (10%): 0.8234 (+/- 0.0312)# k= 30 (30%): 0.8567 (+/- 0.0267)# k= 50 (50%): 0.8712 (+/- 0.0234) <- Often optimal# k= 70 (70%): 0.8689 (+/- 0.0245)# k= 90 (90%): 0.8543 (+/- 0.0289) <- Declining (less diversity)For many problems, starting with k = 0.5d (50% of features) provides a good balance. From there, tune based on performance: increase k if accuracy is too low (trees missing important features), decrease k if you observe high correlation between tree predictions (need more diversity).
Both Subspace Forests and Random Forests use feature subsampling, but in fundamentally different ways. Understanding these differences is crucial for method selection.
| Aspect | Subspace Forest | Random Forest |
|---|---|---|
| Feature selection timing | Once per tree (global) | At each split (local) |
| Total features per tree | Exactly k (fixed) | Potentially all d |
| Sample handling | All samples (no bootstrap) | Bootstrap (~63.2% unique) |
| Individual tree expressiveness | Limited to k features | Full expressiveness |
| Diversity mechanism | Feature subspace only | Bootstrap + feature sampling |
| OOB estimation | Not available (uses all samples) | Available |
| Interpretation | Clear subspace structure | Feature importance more complex |
| High-dimensional suitability | Excellent | Good |
Empirical Performance Patterns:
Text/Document Classification: Subspace Forests often excel due to extremely high dimensionality and feature redundancy
Gene Expression Data: Strong performance from both; Subspace Forests sometimes preferred for interpretability
Standard Tabular Data: Random Forests typically win due to full feature access per tree
Image Features (pre-CNN era): Subspace Forests competitive due to high-dimensional, redundant features
Financial Data: Random Forests typically preferred; features often not redundant enough for RSM
You can combine Subspace Forests with bootstrap sampling to get both diversity mechanisms. This hybrid approach—sometimes called Random Patches when including sample subsampling—can outperform either pure approach. Experiment with your specific data to find the optimal combination.
Subspace Forests particularly shine in high-dimensional settings. Let's examine why and how to apply them effectively.
The High-Dimensional Advantage:
When $d \gg n$ (more features than samples), traditional methods face challenges:
Subspace Forests address all three:
$$\text{Effective dimensionality per tree} = k \ll d$$
This means each tree operates in a more tractable space where:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
import numpy as npfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import cross_val_score, train_test_splitfrom sklearn.ensemble import RandomForestClassifier, BaggingClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.feature_selection import SelectKBest, f_classiffrom sklearn.pipeline import Pipelineimport time def compare_methods_high_dimensional(): """ Compare methods on a high-dimensional dataset (d >> n). Simulates scenarios common in genomics, text classification, etc. """ # High-dimensional dataset: 500 samples, 5000 features X, y = make_classification( n_samples=500, n_features=5000, n_informative=50, n_redundant=200, n_clusters_per_class=3, random_state=42 ) print(f"Dataset: {X.shape[0]} samples × {X.shape[1]} features") print(f"n << d: {X.shape[0]} << {X.shape[1]}") methods = { 'Subspace Forest (10%)': BaggingClassifier( estimator=DecisionTreeClassifier(), n_estimators=100, max_samples=1.0, max_features=0.1, # 10% of features = 500 features per tree bootstrap=False, bootstrap_features=False, random_state=42, n_jobs=-1 ), 'Subspace Forest (5%)': BaggingClassifier( estimator=DecisionTreeClassifier(), n_estimators=100, max_samples=1.0, max_features=0.05, # 5% = 250 features per tree bootstrap=False, bootstrap_features=False, random_state=42, n_jobs=-1 ), 'Random Forest': RandomForestClassifier( n_estimators=100, random_state=42, n_jobs=-1 ), 'Feature Selection + RF': Pipeline([ ('select', SelectKBest(f_classif, k=500)), ('rf', RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)) ]), } results = {} for name, model in methods.items(): start = time.time() scores = cross_val_score(model, X, y, cv=5, scoring='accuracy') elapsed = time.time() - start results[name] = { 'accuracy': scores.mean(), 'std': scores.std(), 'time': elapsed } print(f"{name:25s}: {scores.mean():.4f} (+/- {scores.std():.4f}) [{elapsed:.1f}s]") return results def analyze_subspace_for_genomics(): """ Simulate genomics-like data where genes are grouped into pathways. Subspace Forests can capture pathway-level patterns when different subspaces correspond to different biological pathways. """ np.random.seed(42) n_samples = 200 n_pathways = 10 genes_per_pathway = 100 n_features = n_pathways * genes_per_pathway # Create pathway-structured data X = np.random.randn(n_samples, n_features) # Make some pathways predictive y = np.zeros(n_samples, dtype=int) # Classes based on pathway 0 and pathway 3 expression pathway_0_mean = X[:, :genes_per_pathway].mean(axis=1) pathway_3_mean = X[:, 3*genes_per_pathway:4*genes_per_pathway].mean(axis=1) y = ((pathway_0_mean > 0) & (pathway_3_mean > 0)).astype(int) print(f"Genomics-like data: {n_samples} samples, {n_features} genes, " f"{n_pathways} pathways") # Compare methods subspace_model = BaggingClassifier( estimator=DecisionTreeClassifier(max_depth=5), n_estimators=100, max_samples=1.0, max_features=genes_per_pathway * 2, # ~2 pathways worth bootstrap=False, random_state=42, n_jobs=-1 ) rf_model = RandomForestClassifier( n_estimators=100, max_depth=5, random_state=42, n_jobs=-1 ) subspace_scores = cross_val_score(subspace_model, X, y, cv=5) rf_scores = cross_val_score(rf_model, X, y, cv=5) print(f"Subspace Forest: {subspace_scores.mean():.4f}") print(f"Random Forest: {rf_scores.mean():.4f}") if __name__ == "__main__": compare_methods_high_dimensional() print() analyze_subspace_for_genomics()Subspace Forests have shown strong results in: gene expression classification (where d can exceed 20,000), document classification (vocabulary size often > 50,000), hyperspectral image analysis (hundreds of spectral bands), and EEG signal classification (many channels × time points).
Several extensions enhance the basic Random Subspace Method for specific applications.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177
import numpy as npfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.feature_selection import mutual_info_classif class WeightedSubspaceForest: """ Subspace Forest with weighted feature sampling. Features are sampled proportional to their univariate importance, biasing subspaces toward more informative features while still maintaining diversity through randomization. """ def __init__( self, n_estimators: int = 100, subspace_fraction: float = 0.5, weighting_strength: float = 1.0, random_state: int = None ): """ Args: n_estimators: Number of trees subspace_fraction: Fraction of features per subspace weighting_strength: How strongly to weight by importance - 0: uniform random (standard RSM) - 1: proportional to importance - >1: more aggressive toward important features """ self.n_estimators = n_estimators self.subspace_fraction = subspace_fraction self.weighting_strength = weighting_strength self.random_state = random_state self.trees_ = [] self.feature_weights_ = None def _compute_feature_weights(self, X, y): """Compute feature weights based on mutual information.""" mi_scores = mutual_info_classif(X, y, random_state=self.random_state) # Apply weighting strength weights = np.power(mi_scores + 1e-10, self.weighting_strength) weights /= weights.sum() return weights def _sample_weighted_subspace(self, n_features, rng): """Sample features according to computed weights.""" k = max(1, int(self.subspace_fraction * n_features)) # Sample without replacement, weighted by feature importance indices = rng.choice( n_features, size=k, replace=False, p=self.feature_weights_ ) return indices def fit(self, X, y): """Fit the weighted subspace forest.""" rng = np.random.RandomState(self.random_state) n_features = X.shape[1] # Compute feature weights self.feature_weights_ = self._compute_feature_weights(X, y) self.trees_ = [] self.classes_ = np.unique(y) for _ in range(self.n_estimators): # Weighted subspace sampling feature_indices = self._sample_weighted_subspace(n_features, rng) # Train tree X_subspace = X[:, feature_indices] tree = DecisionTreeClassifier(random_state=rng.randint(2**31)) tree.fit(X_subspace, y) self.trees_.append({ 'tree': tree, 'features': feature_indices }) return self def predict(self, X): """Predict using weighted voting.""" n_samples = X.shape[0] vote_counts = np.zeros((n_samples, len(self.classes_))) for tree_info in self.trees_: X_sub = X[:, tree_info['features']] preds = tree_info['tree'].predict_proba(X_sub) vote_counts += preds return self.classes_[np.argmax(vote_counts, axis=1)] class StructuredSubspaceForest: """ Subspace Forest for data with known feature structure. Example: image data where features are pixels, and spatial coherence means nearby pixels should be sampled together. """ def __init__( self, n_estimators: int = 100, block_size: int = 10, n_blocks: int = 5, random_state: int = None ): """ Args: n_estimators: Number of trees block_size: Size of each contiguous feature block n_blocks: Number of blocks per subspace """ self.n_estimators = n_estimators self.block_size = block_size self.n_blocks = n_blocks self.random_state = random_state self.trees_ = [] def _sample_structured_subspace(self, n_features, rng): """Sample contiguous blocks of features.""" max_block_start = n_features - self.block_size if max_block_start <= 0: return np.arange(n_features) # Randomly select block starting positions block_starts = rng.choice( max_block_start, size=min(self.n_blocks, max_block_start), replace=False ) # Collect all indices in selected blocks indices = [] for start in block_starts: indices.extend(range(start, start + self.block_size)) return np.array(sorted(set(indices))) def fit(self, X, y): """Fit with structured subspace sampling.""" rng = np.random.RandomState(self.random_state) n_features = X.shape[1] self.trees_ = [] self.classes_ = np.unique(y) for _ in range(self.n_estimators): features = self._sample_structured_subspace(n_features, rng) tree = DecisionTreeClassifier(random_state=rng.randint(2**31)) tree.fit(X[:, features], y) self.trees_.append({'tree': tree, 'features': features}) return self def predict(self, X): """Predict with structured subspaces.""" votes = np.zeros((X.shape[0], len(self.classes_))) for info in self.trees_: proba = info['tree'].predict_proba(X[:, info['features']]) votes += proba return self.classes_[np.argmax(votes, axis=1)]Let's consolidate the essential knowledge about Subspace Forests and the Random Subspace Method:
You now have a comprehensive understanding of Subspace Forests and the Random Subspace Method. Next, we'll explore Oblique Random Forests—a variant that breaks the axis-aligned constraint by finding optimal linear combinations of features at each split.