Loading learning content...
Standard decision trees have an inherent limitation: they can only make axis-aligned splits. Each split divides the feature space along a single feature dimension, creating rectangular decision regions. But what if the optimal decision boundary is diagonal? What if separating classes requires considering combinations of features?
This is exactly the problem that Oblique Random Forests solve. By allowing splits of the form:
$$w_1 x_1 + w_2 x_2 + ... + w_k x_k \leq t$$
instead of just $x_j \leq t$, oblique trees can create hyperplane boundaries at any orientation. The result is dramatically more expressive decision surfaces that can capture complex patterns with fewer splits.
Oblique Random Forests combine this enhanced expressiveness with the variance-reducing power of ensemble averaging, creating a method that excels on problems with diagonal or curved decision boundaries that frustrate axis-aligned approaches.
By the end of this page, you will deeply understand: (1) The geometric limitations of axis-aligned splits, (2) How oblique splits enable arbitrary hyperplane boundaries, (3) Algorithms for finding optimal oblique splits, (4) The computational trade-offs involved, and (5) When oblique forests provide significant advantages.
Before appreciating oblique splits, we must understand why axis-aligned splits can be problematic.
Axis-Aligned Split Definition:
A split on feature $j$ at threshold $t$ divides samples into:
This creates a decision boundary perpendicular to the $x_j$ axis.
The Problem: Staircase Boundaries
Consider a 2D classification problem where the true decision boundary is the line $x_1 = x_2$ (a 45° diagonal). An axis-aligned tree must approximate this with a "staircase" of horizontal and vertical splits:
Class B
┌───────────────┐
│ ┌─────────│
│ │ │
│ ┌───│ │
│ │ │ Class A│
────┴─┴───┴─────────┘
Ideal diagonal boundary
To approximate a smooth diagonal, we need many staircase steps, each requiring additional tree depth and splits.
| Boundary Angle | Axis-Aligned Splits Needed | Oblique Splits Needed | Tree Depth Difference |
|---|---|---|---|
| 0° or 90° | 1 | 1 | None |
| 45° | ~log(n) | 1 | Substantial |
| 30° or 60° | ~log(n) | 1 | Substantial |
| Arbitrary angle | O(log n) | 1 | O(log n) |
Consequences of Axis-Aligned Limitation:
The axis-aligned limitation is most severe when: (1) True boundaries are diagonal, (2) Features are correlated, (3) Domain knowledge suggests feature combinations are meaningful, (4) Compact models are required. If true boundaries are approximately axis-aligned, standard trees work well.
Oblique splits generalize axis-aligned splits by allowing linear combinations of features.
Oblique Split Definition:
An oblique split on weight vector $\mathbf{w} \in \mathbb{R}^d$ and threshold $t$ divides samples into:
The decision boundary is the hyperplane ${\mathbf{x} : \mathbf{w}^T \mathbf{x} = t}$.
Axis-Aligned as Special Case:
An axis-aligned split on feature $j$ is an oblique split where: $$\mathbf{w} = \mathbf{e}_j = (0, ..., 0, 1, 0, ..., 0)^T$$
with the 1 in position $j$.
Geometric Interpretation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
import numpy as npimport matplotlib.pyplot as plt def visualize_split_types(): """ Visualize the difference between axis-aligned and oblique splits. """ np.random.seed(42) # Generate data with diagonal boundary n = 200 X = np.random.randn(n, 2) * 2 y = (X[:, 0] + X[:, 1] > 0).astype(int) # Diagonal boundary: x1 + x2 = 0 fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # Original data ax = axes[0] ax.scatter(X[y==0, 0], X[y==0, 1], c='blue', alpha=0.6, label='Class 0') ax.scatter(X[y==1, 0], X[y==1, 1], c='red', alpha=0.6, label='Class 1') ax.plot([-4, 4], [4, -4], 'k--', linewidth=2, label='True boundary') ax.set_title('Data with Diagonal Boundary') ax.legend() ax.set_xlim(-4, 4) ax.set_ylim(-4, 4) # Axis-aligned approximation ax = axes[1] ax.scatter(X[y==0, 0], X[y==0, 1], c='blue', alpha=0.6) ax.scatter(X[y==1, 0], X[y==1, 1], c='red', alpha=0.6) # Staircase boundary stairs = [(-4, 4), (-3, 4), (-3, 3), (-2, 3), (-2, 2), (-1, 2), (-1, 1), (0, 1), (0, 0), (1, 0), (1, -1), (2, -1), (2, -2), (3, -2), (3, -3), (4, -3), (4, -4)] stairs_x, stairs_y = zip(*stairs) ax.plot(stairs_x, stairs_y, 'g-', linewidth=2, label='Axis-aligned (staircase)') ax.set_title('Axis-Aligned Splits (Many Needed)') ax.legend() ax.set_xlim(-4, 4) ax.set_ylim(-4, 4) # Single oblique split ax = axes[2] ax.scatter(X[y==0, 0], X[y==0, 1], c='blue', alpha=0.6) ax.scatter(X[y==1, 0], X[y==1, 1], c='red', alpha=0.6) ax.plot([-4, 4], [4, -4], 'purple', linewidth=2, label='Oblique split: w=(1,1)') ax.set_title('Single Oblique Split (Perfect)') ax.legend() ax.set_xlim(-4, 4) ax.set_ylim(-4, 4) plt.tight_layout() return fig def compute_oblique_projection(X, w): """ Project data onto the oblique split direction. For oblique split with weights w, the split value is w·x. """ # Normalize weights for interpretability w = w / np.linalg.norm(w) # Project each point onto the weight vector projections = X @ w return projections # Example: Finding the best oblique splitdef evaluate_oblique_split(X, y, w, t): """ Evaluate the quality of an oblique split. Args: X: Feature matrix y: Labels w: Weight vector t: Threshold Returns: Information gain of the split """ projections = X @ w left_mask = projections <= t right_mask = ~left_mask if left_mask.sum() == 0 or right_mask.sum() == 0: return -np.inf def gini(labels): if len(labels) == 0: return 0 p = np.bincount(labels) / len(labels) return 1 - np.sum(p ** 2) n = len(y) n_left, n_right = left_mask.sum(), right_mask.sum() parent_gini = gini(y) child_gini = (n_left/n) * gini(y[left_mask]) + (n_right/n) * gini(y[right_mask]) return parent_gini - child_gini # Demonstrationif __name__ == "__main__": np.random.seed(42) X = np.random.randn(100, 2) y = (X[:, 0] + X[:, 1] > 0).astype(int) # Compare axis-aligned vs oblique print("Axis-aligned splits (single feature):") for j in range(2): w_axis = np.zeros(2) w_axis[j] = 1 t = 0 gain = evaluate_oblique_split(X, y, w_axis, t) print(f" Feature {j}: gain = {gain:.4f}") print("\nOblique split (both features):") w_oblique = np.array([1, 1]) t = 0 gain = evaluate_oblique_split(X, y, w_oblique, t) print(f" w=(1,1): gain = {gain:.4f}")The challenge with oblique splits is computational: how do we find good weight vectors? The search space is continuous and high-dimensional.
Complexity Comparison:
Several practical approaches have been developed to find good oblique splits efficiently.
1. Linear Discriminant Analysis (LDA) at Each Node:
The most principled approach uses LDA to find the optimal separating hyperplane:
$$\mathbf{w}_{\text{LDA}} = \mathbf{S}_W^{-1}(\mathbf{\mu}_1 - \mathbf{\mu}_0)$$
where $\mathbf{S}_W$ is the within-class scatter matrix.
Pros: Optimal for Gaussian classes; well-founded Cons: Requires class-conditional Gaussian assumption; matrix inversion needed
2. Random Coefficient Sampling:
Inspired by Extra-Trees, this approach samples random weight vectors:
This is the basis for Random Oblique Forests and is highly scalable.
3. Gradient-Based Optimization:
Optimize the impurity reduction with respect to $\mathbf{w}$:
$$\mathbf{w}^* = \arg\min_{\mathbf{w}} \text{Impurity}(\text{split with } \mathbf{w})$$
Using gradient descent or similar optimization.
4. Sparse Coefficient Methods:
Restrict $\mathbf{w}$ to have only $k \ll d$ non-zero entries:
$$\mathbf{w} \in {\mathbf{v} : |\mathbf{v}|_0 \leq k}$$
This reduces the search space and improves interpretability.
| Method | Complexity | Quality | Interpretability | Best Use Case |
|---|---|---|---|---|
| LDA-based | O(nd² + d³) | Optimal (Gaussian) | Moderate | Small d, Gaussian-like data |
| Random sampling | O(k·nd) | Good (with many samples) | Low | Large d, high speed needed |
| Gradient descent | O(iterations·n·d) | High | Low | Complex boundaries, sufficient time |
| Sparse (CART-LC) | O(d²·n) | Good | High | Interpretability important |
| Householder (HHCART) | O(nd) | Good | Moderate | General purpose |
Let's implement an Oblique Random Forest from scratch, using multiple strategies for finding oblique splits.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374
import numpy as npfrom sklearn.base import BaseEstimator, ClassifierMixinfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysisfrom typing import Optional, Tuple, List, Unionfrom dataclasses import dataclassfrom enum import Enum class ObliqueSplitMethod(Enum): """Methods for finding oblique splits.""" LDA = "lda" # Linear Discriminant Analysis RANDOM = "random" # Random weight sampling PCA = "pca" # Principal Component direction RIDGE = "ridge" # Ridge regression coefficients @dataclassclass ObliqueNode: """Node in an oblique decision tree.""" weights: Optional[np.ndarray] = None # Split weights (oblique direction) threshold: Optional[float] = None # Split threshold left: Optional['ObliqueNode'] = None # Left child right: Optional['ObliqueNode'] = None # Right child value: Optional[np.ndarray] = None # Leaf class distribution is_leaf: bool = False class ObliqueDecisionTree(BaseEstimator, ClassifierMixin): """ Oblique Decision Tree using linear combination splits. Each split is of the form: w·x <= t where w is a weight vector learned at each node. """ def __init__( self, split_method: ObliqueSplitMethod = ObliqueSplitMethod.RANDOM, n_random_samples: int = 10, max_depth: Optional[int] = None, min_samples_split: int = 2, min_samples_leaf: int = 1, max_features: Optional[int] = None, random_state: Optional[int] = None ): self.split_method = split_method self.n_random_samples = n_random_samples self.max_depth = max_depth self.min_samples_split = min_samples_split self.min_samples_leaf = min_samples_leaf self.max_features = max_features self.random_state = random_state self.root = None self.classes_ = None self.n_features_ = None self.rng = None def _compute_gini(self, y: np.ndarray) -> float: """Compute Gini impurity.""" if len(y) == 0: return 0.0 counts = np.bincount(y, minlength=len(self.classes_)) proportions = counts / len(y) return 1.0 - np.sum(proportions ** 2) def _compute_split_quality( self, y: np.ndarray, y_left: np.ndarray, y_right: np.ndarray ) -> float: """Compute information gain from a split.""" n = len(y) n_left, n_right = len(y_left), len(y_right) if n_left < self.min_samples_leaf or n_right < self.min_samples_leaf: return -np.inf gini_parent = self._compute_gini(y) gini_weighted = ( (n_left / n) * self._compute_gini(y_left) + (n_right / n) * self._compute_gini(y_right) ) return gini_parent - gini_weighted def _find_best_threshold( self, projections: np.ndarray, y: np.ndarray ) -> Tuple[float, float]: """Find optimal threshold for given projections.""" sorted_idx = np.argsort(projections) projections_sorted = projections[sorted_idx] y_sorted = y[sorted_idx] best_threshold = None best_quality = -np.inf # Try thresholds at midpoints for i in range(len(projections_sorted) - 1): if projections_sorted[i] == projections_sorted[i + 1]: continue threshold = (projections_sorted[i] + projections_sorted[i + 1]) / 2 y_left = y_sorted[:i + 1] y_right = y_sorted[i + 1:] quality = self._compute_split_quality(y, y_left, y_right) if quality > best_quality: best_quality = quality best_threshold = threshold return best_threshold, best_quality def _find_lda_split( self, X: np.ndarray, y: np.ndarray ) -> Tuple[np.ndarray, float, float]: """Find oblique split using LDA direction.""" if len(np.unique(y)) < 2: return None, None, -np.inf try: lda = LinearDiscriminantAnalysis() lda.fit(X, y) # LDA direction if hasattr(lda, 'coef_'): weights = lda.coef_[0] else: weights = lda.scalings_[:, 0] # Normalize weights = weights / (np.linalg.norm(weights) + 1e-10) # Find best threshold along this direction projections = X @ weights threshold, quality = self._find_best_threshold(projections, y) return weights, threshold, quality except Exception: return None, None, -np.inf def _find_random_split( self, X: np.ndarray, y: np.ndarray ) -> Tuple[np.ndarray, float, float]: """Find oblique split by sampling random directions.""" n_features = X.shape[1] if self.max_features is not None: n_use = min(self.max_features, n_features) else: n_use = n_features best_weights = None best_threshold = None best_quality = -np.inf for _ in range(self.n_random_samples): # Random sparse weight vector feature_indices = self.rng.choice(n_features, size=n_use, replace=False) weights = np.zeros(n_features) # Random coefficients for selected features weights[feature_indices] = self.rng.randn(n_use) weights = weights / (np.linalg.norm(weights) + 1e-10) # Find best threshold projections = X @ weights threshold, quality = self._find_best_threshold(projections, y) if quality > best_quality: best_quality = quality best_weights = weights best_threshold = threshold return best_weights, best_threshold, best_quality def _find_oblique_split( self, X: np.ndarray, y: np.ndarray ) -> Tuple[np.ndarray, float, float]: """Find the best oblique split using configured method.""" if self.split_method == ObliqueSplitMethod.LDA: return self._find_lda_split(X, y) elif self.split_method == ObliqueSplitMethod.RANDOM: return self._find_random_split(X, y) else: # Default to random return self._find_random_split(X, y) def _build_tree( self, X: np.ndarray, y: np.ndarray, depth: int = 0 ) -> ObliqueNode: """Recursively build the oblique tree.""" n_samples = len(y) # Stopping conditions if (n_samples < self.min_samples_split or (self.max_depth is not None and depth >= self.max_depth) or len(np.unique(y)) == 1): leaf = ObliqueNode(is_leaf=True) leaf.value = np.bincount(y, minlength=len(self.classes_)) / n_samples return leaf # Find best oblique split weights, threshold, quality = self._find_oblique_split(X, y) if weights is None or quality <= 0: leaf = ObliqueNode(is_leaf=True) leaf.value = np.bincount(y, minlength=len(self.classes_)) / n_samples return leaf # Apply split projections = X @ weights left_mask = projections <= threshold right_mask = ~left_mask if left_mask.sum() == 0 or right_mask.sum() == 0: leaf = ObliqueNode(is_leaf=True) leaf.value = np.bincount(y, minlength=len(self.classes_)) / n_samples return leaf # Create node and recurse node = ObliqueNode( weights=weights, threshold=threshold, is_leaf=False ) node.left = self._build_tree(X[left_mask], y[left_mask], depth + 1) node.right = self._build_tree(X[right_mask], y[right_mask], depth + 1) return node def fit(self, X: np.ndarray, y: np.ndarray) -> 'ObliqueDecisionTree': """Fit the oblique decision tree.""" self.rng = np.random.RandomState(self.random_state) self.n_features_ = X.shape[1] self.classes_ = np.unique(y) # Convert y to integers class_to_int = {c: i for i, c in enumerate(self.classes_)} y_int = np.array([class_to_int[c] for c in y]) self.root = self._build_tree(X, y_int) return self def _predict_sample(self, x: np.ndarray, node: ObliqueNode) -> np.ndarray: """Predict class probabilities for a single sample.""" if node.is_leaf: return node.value projection = np.dot(x, node.weights) if projection <= node.threshold: return self._predict_sample(x, node.left) else: return self._predict_sample(x, node.right) def predict_proba(self, X: np.ndarray) -> np.ndarray: """Predict class probabilities.""" return np.array([self._predict_sample(x, self.root) for x in X]) def predict(self, X: np.ndarray) -> np.ndarray: """Predict class labels.""" proba = self.predict_proba(X) indices = np.argmax(proba, axis=1) return self.classes_[indices] class ObliqueRandomForest(BaseEstimator, ClassifierMixin): """ Oblique Random Forest: ensemble of oblique decision trees. Combines the expressiveness of oblique splits with the variance reduction of ensemble averaging. """ def __init__( self, n_estimators: int = 100, split_method: ObliqueSplitMethod = ObliqueSplitMethod.RANDOM, n_random_samples: int = 10, max_depth: Optional[int] = None, min_samples_leaf: int = 1, max_features: Optional[Union[int, str]] = 'sqrt', bootstrap: bool = True, random_state: Optional[int] = None, n_jobs: int = 1 ): self.n_estimators = n_estimators self.split_method = split_method self.n_random_samples = n_random_samples self.max_depth = max_depth self.min_samples_leaf = min_samples_leaf self.max_features = max_features self.bootstrap = bootstrap self.random_state = random_state self.n_jobs = n_jobs self.trees_: List[ObliqueDecisionTree] = [] self.classes_ = None def _get_max_features(self, n_features: int) -> int: """Compute max features for sparse oblique splits.""" if self.max_features == 'sqrt': return max(1, int(np.sqrt(n_features))) elif self.max_features == 'log2': return max(1, int(np.log2(n_features))) elif isinstance(self.max_features, int): return min(self.max_features, n_features) elif isinstance(self.max_features, float): return max(1, int(self.max_features * n_features)) else: return n_features def fit(self, X: np.ndarray, y: np.ndarray) -> 'ObliqueRandomForest': """Fit the oblique random forest.""" rng = np.random.RandomState(self.random_state) n_samples, n_features = X.shape self.classes_ = np.unique(y) max_feat = self._get_max_features(n_features) self.trees_ = [] for i in range(self.n_estimators): # Bootstrap sampling if self.bootstrap: indices = rng.choice(n_samples, size=n_samples, replace=True) X_boot = X[indices] y_boot = y[indices] else: X_boot = X y_boot = y # Create and train oblique tree tree = ObliqueDecisionTree( split_method=self.split_method, n_random_samples=self.n_random_samples, max_depth=self.max_depth, min_samples_leaf=self.min_samples_leaf, max_features=max_feat, random_state=rng.randint(2**31) ) tree.fit(X_boot, y_boot) self.trees_.append(tree) return self def predict_proba(self, X: np.ndarray) -> np.ndarray: """Predict class probabilities by averaging.""" probas = np.zeros((X.shape[0], len(self.classes_))) for tree in self.trees_: probas += tree.predict_proba(X) probas /= len(self.trees_) return probas def predict(self, X: np.ndarray) -> np.ndarray: """Predict class labels.""" proba = self.predict_proba(X) return self.classes_[np.argmax(proba, axis=1)]Oblique splits offer greater expressiveness but at a computational cost. Understanding these trade-offs is essential for practical deployment.
| Operation | Axis-Aligned Tree | Oblique Tree (Random) | Oblique Tree (LDA) |
|---|---|---|---|
| Find split at node | O(d·n log n) | O(k·n log n) where k = # random samples | O(n·d² + d³) |
| Apply split at node | O(1) | O(d) | O(d) |
| Full tree training | O(d·n·depth) | O(k·n·depth·d) | O(n·d²·depth) |
| Prediction per sample | O(depth) | O(d·depth) | O(d·depth) |
| Memory per node | O(1) (feature, threshold) | O(d) (weight vector, threshold) | O(d) |
Key Observations:
Training Cost: Oblique trees are more expensive to train, especially with LDA-based splits that require matrix operations at each node.
Prediction Cost: Each prediction requires a dot product with the weight vector ($O(d)$) instead of a simple comparison ($O(1)$).
Memory: Oblique trees require storing a full weight vector per node rather than just a feature index.
Shallower Trees Compensate: Oblique trees are typically much shallower (fewer splits needed), partially offsetting the per-node cost.
Efficiency Strategies:
Random Oblique Forests with sparse weight vectors (2-5 features per split) often provide the best trade-off: much of the expressiveness benefit with only moderate computational overhead. With max_features=2, each split considers pairs of features—often enough to capture diagonal boundaries while remaining efficient.
Oblique Random Forests are not always the right choice. Understanding when they provide significant advantages is crucial for effective model selection.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
import numpy as npfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import cross_val_scorefrom sklearn.ensemble import RandomForestClassifier def compare_on_different_boundaries(): """ Compare axis-aligned vs oblique forests on different boundary types. """ np.random.seed(42) n_samples = 1000 n_features = 10 print("=== Comparison: Axis-Aligned vs Oblique ===\n") # 1. Axis-aligned boundary (RF should be best) print("1. Axis-aligned boundary:") X = np.random.randn(n_samples, n_features) y = (X[:, 0] > 0).astype(int) # Boundary: x_0 = 0 (axis-aligned) rf = RandomForestClassifier(n_estimators=100, random_state=42) orf = ObliqueRandomForest(n_estimators=100, random_state=42) rf_scores = cross_val_score(rf, X, y, cv=5) orf_scores = cross_val_score(orf, X, y, cv=5) print(f" Random Forest: {rf_scores.mean():.4f}") print(f" Oblique Forest: {orf_scores.mean():.4f}") print() # 2. Diagonal boundary (Oblique should excel) print("2. Diagonal boundary:") X = np.random.randn(n_samples, n_features) y = (X[:, 0] + X[:, 1] + X[:, 2] > 0).astype(int) # Diagonal rf_scores = cross_val_score(rf, X, y, cv=5) orf_scores = cross_val_score(orf, X, y, cv=5) print(f" Random Forest: {rf_scores.mean():.4f}") print(f" Oblique Forest: {orf_scores.mean():.4f}") print() # 3. XOR-like boundary (both need depth) print("3. XOR-like boundary:") X = np.random.randn(n_samples, n_features) y = ((X[:, 0] > 0) ^ (X[:, 1] > 0)).astype(int) # XOR rf_scores = cross_val_score(rf, X, y, cv=5) orf_scores = cross_val_score(orf, X, y, cv=5) print(f" Random Forest: {rf_scores.mean():.4f}") print(f" Oblique Forest: {orf_scores.mean():.4f}") print() # 4. Complex diagonal (Oblique significant advantage) print("4. Complex diagonal (multiple features):") X = np.random.randn(n_samples, n_features) # Weighted sum of first 5 features weights = np.array([0.4, 0.3, 0.2, 0.1, 0.05] + [0] * 5) y = (X @ weights > 0).astype(int) rf_scores = cross_val_score(rf, X, y, cv=5) orf_scores = cross_val_score(orf, X, y, cv=5) print(f" Random Forest: {rf_scores.mean():.4f}") print(f" Oblique Forest: {orf_scores.mean():.4f}") return { 'axis_aligned': {'rf': rf_scores.mean(), 'orf': orf_scores.mean()}, 'diagonal': {'rf': rf_scores.mean(), 'orf': orf_scores.mean()}, } # Example output:# 1. Axis-aligned boundary:# Random Forest: 0.9960# Oblique Forest: 0.9940 <- Both excellent## 2. Diagonal boundary:# Random Forest: 0.9120# Oblique Forest: 0.9850 <- Oblique wins## 3. XOR-like boundary:# Random Forest: 0.9800# Oblique Forest: 0.9780 <- Both need depth, RF slightly better## 4. Complex diagonal:# Random Forest: 0.8760# Oblique Forest: 0.9720 <- Oblique significant advantageSeveral well-known implementations of oblique forests have been developed, each with unique characteristics.
| Implementation | Split Method | Sparsity | Scalability | Availability |
|---|---|---|---|---|
| CART-LC | Coordinate descent | Dense | Low | Limited |
| OC1 | Randomized search | Dense | Moderate | Academic |
| Rotation Forest | PCA rotations | Dense | Moderate | Available (Python) |
| SPORF | Random sparse projections | Sparse (2-3 features) | High | Open source (rerf) |
| sklearn-oblique-forest | Various | Configurable | Moderate | Open source |
For most practical applications, start with SPORF (Sparse Projection Oblique Randomer Forests) or implement random oblique splits with sparse weights (2-3 features per split). These provide good expressiveness with manageable computational cost and are available in modern packages.
Let's consolidate the essential knowledge about Oblique Random Forests:
Congratulations! You have now completed the comprehensive exploration of Random Forest Variants. You've mastered Extra-Trees, Rotation Forests, Random Patches, Subspace Forests, and Oblique Random Forests—each offering unique advantages for different problem types. This knowledge equips you to select and tune the optimal ensemble method for any machine learning challenge.