Random Forests - Learning Module

Loading content...

0/278

Hyperparameters

Mastering the Control Panel

One of Random Forests' greatest strengths is their robustness—they often work well "out of the box" with default settings. However, understanding the full set of hyperparameters enables you to squeeze out additional performance, handle edge cases, and make informed tradeoffs between accuracy, speed, and memory.

Think of hyperparameters as the control panel of a sophisticated machine. Default settings work for most situations, but an expert operator knows when to adjust each dial and what effects to expect.

This page provides a comprehensive reference to all Random Forest hyperparameters, organized by function, with practical guidance on when and how to tune each one.

What You Will Learn

By the end of this page, you will understand every major Random Forest hyperparameter, know the recommended default values and when to deviate from them, be able to design efficient hyperparameter search strategies, and understand the interactions between hyperparameters.

Hyperparameter Categories

Random Forest hyperparameters can be organized into four functional categories:

1. Ensemble-Level Parameters

Control the number and diversity of trees
Examples: n_estimators, max_features, bootstrap, max_samples

2. Tree-Level Parameters

Control the structure and complexity of individual trees
Examples: max_depth, min_samples_split, min_samples_leaf

3. Split Quality Parameters

Control how splits are evaluated
Examples: criterion, min_impurity_decrease

4. Computational Parameters

Control resources and reproducibility
Examples: n_jobs, random_state, warm_start

Complete Random Forest Hyperparameter Reference (scikit-learn)
Parameter	Default	Type	Impact
n_estimators	100	Ensemble	Number of trees in forest
max_features	'sqrt' (clf) / 1.0 (reg)	Ensemble	Features to consider at each split
bootstrap	True	Ensemble	Whether to use bootstrap sampling
max_samples	None (=n)	Ensemble	Samples to draw for each tree
max_depth	None	Tree	Maximum tree depth
min_samples_split	2	Tree	Minimum samples to split a node
min_samples_leaf	1	Tree	Minimum samples in a leaf
max_leaf_nodes	None	Tree	Maximum number of leaves
criterion	'gini'/'squared_error'	Split	Split quality measure
min_impurity_decrease	0.0	Split	Minimum impurity decrease for split
n_jobs	None	Compute	Parallel workers (-1 = all cores)
random_state	None	Compute	Random seed for reproducibility
oob_score	False	Compute	Whether to compute OOB score

Ensemble-Level Parameters

n_estimators: Number of Trees

Aspect	Description
What it does	Sets the number of trees in the forest
Default	100 (scikit-learn)
Recommended	100-1000; more is rarely harmful (just slower)
Effect of increasing	More stable predictions, slower training/inference
Diminishing returns	After ~100-500 trees, gains are marginal

Key Insight: Unlike boosting methods, Random Forests don't overfit with more trees. Adding trees always reduces variance (up to the correlation floor). The tradeoff is purely computational.

n_estimators_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
 
def analyze_n_estimators(X, y, max_trees=500, step=25):
    """
    Analyze how accuracy changes with number of trees.
    Typically shows diminishing returns after ~100-200 trees.
    """
    tree_counts = list(range(step, max_trees + 1, step))
    scores = []
    oob_scores = []
    
    for n_trees in tree_counts:
        rf = RandomForestClassifier(
            n_estimators=n_trees,
            oob_score=True,
            random_state=42
        )
        rf.fit(X, y)
        
        cv_score = cross_val_score(rf, X, y, cv=5).mean()
        scores.append(cv_score)
        oob_scores.append(rf.oob_score_)
    
    # Find point of diminishing returns (95% of max improvement)
    max_score = max(scores)
    first_score = scores[0]
    target = first_score + 0.95 * (max_score - first_score)
    sufficient_trees = next(
        (t for t, s in zip(tree_counts, scores) if s >= target),
        tree_counts[-1]
    )
    
    print(f"Analysis of n_estimators")
    print(f"{'='*50}")
    print(f"Best CV score: {max_score:.4f} at {tree_counts[np.argmax(scores)]} trees")
    print(f"95% of max gain: {target:.4f} at ~{sufficient_trees} trees")
    print(f"First (25 trees): {scores[0]:.4f}")
    print(f"100 trees: {scores[3]:.4f}")
    print(f"500 trees: {scores[-1]:.4f}")
    
    return tree_counts, scores, oob_scores
 
 
X, y = make_classification(
    n_samples=2000, n_features=50,
    n_informative=20, random_state=42
)
 
tree_counts, scores, oob_scores = analyze_n_estimators(X, y)

max_features: Feature Randomization

We covered this in detail in the previous pages. Quick reference:

Task	Recommended Default	Why
Classification	`'sqrt'` or `int(sqrt(n_features))`	Empirically optimal tradeoff
Regression	`None` or `1.0` (all features) or `n_features/3`	Regression benefits from more features
High-dimensional	`'log2'` or lower	Avoid overfitting to noise

bootstrap: Enable/Disable Bagging

Value	Effect
`True` (default)	Each tree trained on bootstrap sample (~63.2% unique)
`False`	Each tree trained on full dataset; only feature randomization

Keep bootstrap=True in almost all cases. Setting it to False loses OOB estimation and sample diversity.

max_samples: Subsample Size

Value	Effect
`None` (default)	Bootstrap samples of size n
Float (0, 1]	Fraction of n samples per tree
Integer	Exactly that many samples per tree

Useful for very large datasets: max_samples=0.5 halves training time with minimal accuracy loss.

Quick Wins

For most problems: n_estimators=200, max_features='sqrt' for classification or 1.0 for regression, bootstrap=True. This simple configuration is highly competitive. Only tune further if cross-validation shows room for improvement.

Tree-Level Parameters

Tree-level parameters control the complexity and depth of individual trees. Unlike single decision trees (which require careful regularization to prevent overfitting), Random Forest trees typically benefit from being grown deep.

max_depth: Maximum Tree Depth

Value	Effect	When to Use
`None` (default)	Trees grow until leaves are pure or min_samples	Most cases; let trees overfit
Integer (e.g., 10)	Hard limit on depth	Memory-constrained; interpretability
Small (3-6)	Very shallow trees	Unusual; kills accuracy

min_samples_split: Minimum Samples to Split

A node must have at least this many samples to be considered for splitting.

Value	Effect
2 (default)	Allow splits until only 2 samples remain
Higher (5-20)	Earlier stopping; less overfitting per tree
Float	Interpreted as fraction of total samples

min_samples_leaf: Minimum Samples per Leaf

Every leaf must contain at least this many samples.

Value	Effect
1 (default)	Leaves can be single samples
Higher (5-20)	Smoother predictions; regularization
Float	Fraction of total samples

max_leaf_nodes: Maximum Number of Leaves

Value	Effect
`None` (default)	Unlimited leaves
Integer	Grow tree in best-first manner up to limit

This is a softer constraint than max_depth. With max_leaf_nodes=100, the tree will grow greedily, adding the best split anywhere in the tree until reaching 100 leaves.

tree_depth_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
import numpy as np
 
def analyze_tree_depth(X, y):
    """
    Compare performance across different max_depth settings.
    Random Forests usually benefit from deep trees.
    """
    depths = [3, 5, 10, 15, 20, 30, None]  # None = unlimited
    
    print("Effect of max_depth on Random Forest")
    print("=" * 60)
    
    for depth in depths:
        rf = RandomForestClassifier(
            n_estimators=100,
            max_depth=depth,
            random_state=42
        )
        
        score = cross_val_score(rf, X, y, cv=5).mean()
        
        # Check actual max depth achieved
        rf.fit(X, y)
        actual_depths = [tree.tree_.max_depth for tree in rf.estimators_]
        avg_depth = np.mean(actual_depths)
        
        depth_str = str(depth) if depth else "None"
        print(f"max_depth={depth_str:4} | CV={score:.4f} | "
              f"Actual avg depth: {avg_depth:.1f}")
 
 
def analyze_min_samples(X, y):
    """
    Compare min_samples_split and min_samples_leaf.
    """
    configs = [
        {'min_samples_split': 2, 'min_samples_leaf': 1},   # Default
        {'min_samples_split': 5, 'min_samples_leaf': 1},
        {'min_samples_split': 10, 'min_samples_leaf': 5},
        {'min_samples_split': 20, 'min_samples_leaf': 10},
        {'min_samples_split': 50, 'min_samples_leaf': 20},
    ]
    
    print("
Effect of min_samples on Random Forest")
    print("=" * 60)
    
    for config in configs:
        rf = RandomForestClassifier(n_estimators=100, random_state=42, **config)
        score = cross_val_score(rf, X, y, cv=5).mean()
        
        rf.fit(X, y)
        avg_leaves = np.mean([tree.tree_.n_leaves for tree in rf.estimators_])
        
        print(f"split={config['min_samples_split']:2}, "
              f"leaf={config['min_samples_leaf']:2} | "
              f"CV={score:.4f} | Avg leaves: {avg_leaves:.0f}")
 
 
X, y = make_classification(
    n_samples=2000, n_features=50,
    n_informative=20, random_state=42
)
 
analyze_tree_depth(X, y)
analyze_min_samples(X, y)

Tree Depth in Random Forests vs. Single Trees

Unlike single decision trees, Random Forest trees SHOULD overfit to their bootstrap samples. The ensemble averaging removes the overfitting. Default tree parameters (unlimited depth, min_samples_split=2, min_samples_leaf=1) are usually correct. Only adjust if you have specific memory constraints or want smoother individual predictions.

Split Quality Parameters

criterion: Split Quality Measure

Determines how the "goodness" of a split is measured.

Split Criteria for Classification and Regression
Task	Criterion	Formula	Notes
Classification	'gini' (default)	1 - Σᵢpᵢ²	Fast; usually works well
Classification	'entropy'	-Σᵢpᵢlog₂(pᵢ)	Information gain; slightly slower
Classification	'log_loss'	Cross-entropy	For probability predictions
Regression	'squared_error' (default)	MSE	Standard choice
Regression	'absolute_error'	MAE	Robust to outliers
Regression	'friedman_mse'	Improved MSE	Often better splits
Regression	'poisson'	Poisson deviance	For count data

Gini vs Entropy for Classification:

Both measure impurity, but have subtle differences:

Criterion	Gini Impurity	Entropy
Computational cost	Lower	Higher (log computation)
Favor	Larger partitions	More balanced splits
Typical difference	< 1% accuracy
Recommendation	Use Gini (default)	Switch to entropy only if Gini underperforms

In practice, the choice between Gini and entropy rarely matters significantly for Random Forests—the randomization and averaging wash out these subtle differences.

min_impurity_decrease: Minimum Improvement Required

A split is only made if it decreases impurity by at least this amount:

$$\text{impurity decrease} = \frac{N_t}{N} \cdot \text{impurity}t - \frac{N{t_L}}{N}\cdot\text{impurity}{t_L} - \frac{N{t_R}}{N}\cdot\text{impurity}_{t_R} \geq \text{min_impurity_decrease}$$

Value	Effect
0.0 (default)	Accept any split that reduces impurity
Small (1e-7)	Avoid splits that barely improve
Larger (1e-4)	Significant regularization; shallower trees

This is rarely needed for Random Forests since ensemble averaging handles overfitting. Use only if you specifically need shallower trees (memory or interpretability constraints).

Split Parameters Summary

For most users: stick with defaults (gini for classification, squared_error for regression, min_impurity_decrease=0). These parameters rarely need tuning for Random Forests. Focus your tuning budget on n_estimators, max_features, and tree depth parameters.

Computational Parameters

n_jobs: Parallel Training

Random Forest trees are embarrassingly parallel—they can be trained independently.

Value	Behavior
`None` (default)	Single-threaded
`-1`	Use all available CPU cores
Positive integer	Use exactly that many cores

Speedup Guidelines:

With n_jobs=-1, training runs ~N× faster (N = core count), minus overhead
Prediction also parallelizes (across trees), but overhead is higher for small inputs
Memory usage increases with more jobs (each job needs its own copy of data)

parallelization_benchmark.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import time
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import multiprocessing
 
def benchmark_parallel(X, y, n_estimators=200):
    """
    Benchmark training time with different n_jobs settings.
    """
    n_cores = multiprocessing.cpu_count()
    print(f"System has {n_cores} CPU cores")
    print(f"Training Random Forest with {n_estimators} trees")
    print("=" * 50)
    
    n_jobs_options = [1, 2, 4, n_cores // 2, n_cores, -1]
    n_jobs_options = [j for j in n_jobs_options if j <= n_cores]
    
    base_time = None
    for n_jobs in n_jobs_options:
        # Warm-up run (not measured)
        rf = RandomForestClassifier(n_estimators=10, n_jobs=n_jobs, random_state=42)
        rf.fit(X[:100], y[:100])
        
        # Timed run
        rf = RandomForestClassifier(
            n_estimators=n_estimators, 
            n_jobs=n_jobs, 
            random_state=42
        )
        start = time.time()
        rf.fit(X, y)
        elapsed = time.time() - start
        
        if n_jobs == 1:
            base_time = elapsed
            speedup = 1.0
        else:
            speedup = base_time / elapsed
        
        jobs_str = "all" if n_jobs == -1 else str(n_jobs)
        print(f"n_jobs={jobs_str:3} | Time: {elapsed:.2f}s | "
              f"Speedup: {speedup:.2f}x | Efficiency: {speedup/n_jobs*100:.0f}%")
 
 
# Create dataset
X, y = make_classification(
    n_samples=10000, n_features=50,
    n_informative=25, random_state=42
)
 
benchmark_parallel(X, y)

random_state: Reproducibility

Value	Behavior
`None` (default)	Random initialization; results vary between runs
Integer	Fixed seed; reproducible results
RandomState	Use existing random state object

Always set random_state during development for reproducibility. Consider removing for production if you want variability (e.g., ensembling multiple Random Forests).

warm_start: Incremental Training

Value	Behavior
`False` (default)	Each fit() trains from scratch
`True`	fit() adds new trees to existing forest

Useful for:

Growing the forest incrementally
Monitoring convergence
Saving computation when adding trees

warm_start_example.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
 
def incremental_training(X, y, total_trees=500, step=50):
    """
    Demonstrate warm_start for incremental tree addition.
    Monitor OOB score as trees are added.
    """
    rf = RandomForestClassifier(
        n_estimators=step,
        warm_start=True,  # Key: enable warm start
        oob_score=True,
        random_state=42
    )
    
    print("Incremental Random Forest Training")
    print("=" * 50)
    
    n_trees_history = []
    oob_history = []
    
    for n_trees in range(step, total_trees + 1, step):
        rf.n_estimators = n_trees  # Increase target
        rf.fit(X, y)  # Adds new trees (doesn't retrain existing)
        
        n_trees_history.append(n_trees)
        oob_history.append(rf.oob_score_)
        
        # Check improvement
        if len(oob_history) > 1:
            improvement = oob_history[-1] - oob_history[-2]
            print(f"{n_trees:3} trees | OOB: {rf.oob_score_:.4f} | "
                  f"Δ: {improvement:+.4f}")
        else:
            print(f"{n_trees:3} trees | OOB: {rf.oob_score_:.4f}")
        
        # Early stopping if improvement is negligible
        if len(oob_history) >= 3:
            recent_improvement = oob_history[-1] - oob_history[-3]
            if abs(recent_improvement) < 0.001:
                print(f"
Converged at {n_trees} trees (improvement < 0.001)")
                break
    
    return rf, n_trees_history, oob_history
 
 
# Example
from sklearn.datasets import make_classification
X, y = make_classification(
    n_samples=2000, n_features=50,
    n_informative=20, random_state=42
)
 
rf, trees, oob = incremental_training(X, y)

oob_score: Out-of-Bag Evaluation

Value	Behavior
`False` (default)	OOB score not computed
`True`	Compute and store OOB score during fit

When oob_score=True, the fitted model has:

oob_score_: Accuracy (or R² for regression) on OOB samples
oob_decision_function_: Raw predictions for OOB samples

OOB score is "free" cross-validation—it uses samples each tree didn't train on. It's slightly pessimistic compared to CV (uses fewer samples for evaluation) but very fast.

Production Configuration

For production: n_jobs=-1 for speed, random_state=42 (or another fixed seed) for reproducibility during testing, oob_score=True for free validation. Remove or set random_state=None in final deployment if you want model variability.

Hyperparameter Interactions

Hyperparameters don't act in isolation—they interact in important ways:

1. n_estimators × max_features Interaction

max_features	Effect on n_estimators
Small (e.g., 'log2')	Need MORE trees (individual trees are weak)
Large (all features)	Fewer trees sufficient (bagging reaches correlation floor faster)

Guideline: If you use aggressive feature subsampling (small max_features), increase n_estimators to compensate.

2. max_depth × min_samples_leaf Interaction

Both control tree complexity, but differently:

max_depth	min_samples_leaf	Resulting Trees
Shallow (5-10)	Small (1-2)	Shallow but fine-grained
Deep (None)	Large (20+)	Deep but smooth leaves
Shallow	Large	Very constrained (rarely useful)

3. bootstrap × max_samples Interaction

bootstrap	max_samples	What Happens
True	None	Standard bootstrap (n samples with replacement)
True	0.5	Each tree sees 50% of data (with replacement)
False	None	Each tree sees 100% of data (only feature randomization)
False	0.5	Random 50% of data without replacement

Note: max_samples < 1.0 with bootstrap=True is called "subsampling" and can increase diversity and speed.

4. n_jobs × Memory Interaction

Parallelization increases memory usage:

Each worker needs a copy of the dataset
With n_jobs=-1, memory usage ≈ n_jobs × single-thread usage
For very large datasets, reduce n_jobs to avoid memory issues

Common Mistake: Over-Regularization

Setting both max_depth AND min_samples_leaf AND max_leaf_nodes is usually over-constraining. Trees become too simple, losing the benefit of deep RF trees. Pick one regularization mechanism at most. For Random Forests, usually none is needed!

Hyperparameter Tuning Strategy

Given Random Forests' robustness to hyperparameters, tuning should be efficient and targeted.

Step 1: Start with Strong Defaults

tuning_strategy.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV, cross_val_score
import numpy as np
 
def create_default_rf(task='classification'):
    """
    Create a Random Forest with robust defaults.
    This should be your starting point.
    """
    if task == 'classification':
        return RandomForestClassifier(
            n_estimators=200,       # Enough for most problems
            max_features='sqrt',    # Recommended for classification
            max_depth=None,         # Let trees grow deep
            min_samples_split=2,    # Allow fine splits
            min_samples_leaf=1,     # Allow small leaves
            bootstrap=True,         # Enable bagging
            oob_score=True,         # Free validation
            n_jobs=-1,              # Use all cores
            random_state=42         # Reproducibility
        )
    else:
        return RandomForestRegressor(
            n_estimators=200,
            max_features=1.0,       # Often better for regression
            max_depth=None,
            min_samples_split=2,
            min_samples_leaf=1,
            bootstrap=True,
            oob_score=True,
            n_jobs=-1,
            random_state=42
        )
 
 
# Step 2: Baseline evaluation
def establish_baseline(X, y, task='classification'):
    """
    Establish baseline performance with defaults.
    """
    rf = create_default_rf(task)
    
    # Quick CV evaluation
    scores = cross_val_score(rf, X, y, cv=5)
    print(f"Baseline {task} performance:")
    print(f"  CV Score: {scores.mean():.4f} (+/- {scores.std()*2:.4f})")
    
    rf.fit(X, y)
    print(f"  OOB Score: {rf.oob_score_:.4f}")
    
    return scores.mean()
 
 
# Step 3: Targeted search (if baseline needs improvement)
def targeted_tuning(X, y, task='classification'):
    """
    Efficient hyperparameter search focusing on parameters that matter.
    """
    if task == 'classification':
        base_estimator = RandomForestClassifier(random_state=42, n_jobs=-1)
    else:
        base_estimator = RandomForestRegressor(random_state=42, n_jobs=-1)
    
    # Focus on parameters that actually matter
    param_distributions = {
        'n_estimators': [100, 200, 300, 500],
        'max_features': ['sqrt', 'log2', 0.3, 0.5],
        'min_samples_leaf': [1, 2, 5, 10],
        'max_depth': [None, 20, 30, 50],
    }
    
    search = RandomizedSearchCV(
        base_estimator,
        param_distributions,
        n_iter=30,              # 30 random combinations
        cv=5,
        scoring='accuracy' if task == 'classification' else 'r2',
        random_state=42,
        n_jobs=1                # RF is already parallel
    )
    
    search.fit(X, y)
    
    print(f"
Best parameters found:")
    for param, value in search.best_params_.items():
        print(f"  {param}: {value}")
    
    print(f"
Best CV score: {search.best_score_:.4f}")
    
    # Compare to baseline
    baseline = establish_baseline(X, y, task)
    improvement = search.best_score_ - baseline
    print(f"Improvement over defaults: {improvement:+.4f} ({improvement/baseline*100:+.1f}%)")
    
    return search.best_estimator_
 
 
# Example usage
from sklearn.datasets import make_classification
 
X, y = make_classification(
    n_samples=2000, n_features=50,
    n_informative=20, random_state=42
)
 
print("Step 1: Establish Baseline")
print("=" * 50)
baseline = establish_baseline(X, y)
 
print("
Step 2: Targeted Tuning (if needed)")
print("=" * 50)
best_model = targeted_tuning(X, y)

Tuning Priority Order

•n_estimators: Ensure enough trees (200+ is safe)
•max_features: Try 'sqrt', 'log2', 0.3, 0.5 — often impactful
•min_samples_leaf: Try 1, 2, 5, 10 — can help with noisy data
•max_depth: Usually leave as None; try 20, 30 if memory-constrained
•Other parameters: Rarely need tuning

When to Stop Tuning

If defaults achieve 98%+ of your target metric, stop tuning. Random Forests are remarkably robust, and extensive tuning often yields <1% improvement. Spend that time on feature engineering instead—it almost always has higher ROI than hyperparameter tuning.

Summary: Hyperparameters

We've covered the complete hyperparameter landscape for Random Forests, providing you with the knowledge to configure and tune these models effectively.

Key Takeaways

•Four categories: Ensemble-level, tree-level, split quality, and computational parameters
•Robust defaults: n_estimators=200, max_features='sqrt', max_depth=None work well for most problems
•n_estimators is safe: More trees never hurt (except speed); 200-500 is usually sufficient
•max_features matters most: This is typically the highest-impact tunable parameter
•Tree regularization rarely needed: Unlike single trees, RF trees should grow deep
•Tune efficiently: Start with defaults, establish baseline, then targeted search on high-impact parameters

Quick Reference: Recommended Default Configuration
Parameter	Classification	Regression
n_estimators	200-500	200-500
max_features	'sqrt'	1.0 or n_features/3
max_depth	None	None
min_samples_split	2	2
min_samples_leaf	1	1
bootstrap	True	True
n_jobs	-1	-1

What's Next:

With hyperparameters mastered, the final page of this module covers parallelization—strategies for training and predicting at scale, distributed Random Forests, and production deployment considerations.

Page Complete

You now have a comprehensive understanding of Random Forest hyperparameters. Remember: Random Forests are forgiving. Start with defaults, validate with OOB or CV, and only tune if there's clear room for improvement. Focus your energy on feature engineering for maximum impact.