Machine LearningIsolation Forest

Isolation Forest for Anomaly Detection

LevelIntermediate

Duration90 mins

TopicIsolation Forest

2 / 5

Random Partitioning

Building Isolation Trees Through Random Partitioning

The isolation principle tells us that anomalies are easier to isolate than normal points. But how do we actually isolate points? How do we create the 'random partitions' that separate anomalies quickly while requiring many splits to separate normal points?

The answer lies in Isolation Trees (iTrees)—binary trees constructed through a deceptively simple process of random recursive partitioning. Unlike decision trees that carefully select splits to optimize some criterion (information gain, Gini impurity), isolation trees intentionally use random attribute selection and random split points.

This randomness is not a weakness—it's the key feature. By randomizing everything, we create an unbiased 'probe' of the data structure that captures the true isolability of each point without being confounded by any specific partitioning strategy.

Learning Objectives

By the end of this page, you will understand: (1) How to construct an isolation tree through random recursive partitioning, (2) Why random splits create unbiased isolation measurements, (3) The role of subsampling in improving anomaly detection, (4) The complete algorithm for building an Isolation Forest ensemble, and (5) Implementation details and pseudocode.

Isolation Tree Construction

An Isolation Tree (iTree) is a binary tree built through random recursive partitioning of the data. The construction process is remarkably simple:

Algorithm: Build Isolation Tree

Base case: If the current node contains 1 point, or we've reached maximum depth, return a leaf node
Random split: Otherwise:
- Randomly select an attribute (dimension) $q$ from the feature set
- Randomly select a split value $p$ between the minimum and maximum values of $q$ in the current data
Partition: Divide the data into left (values < $p$) and right (values ≥ $p$) subsets
Recurse: Recursively build left and right subtrees

That's it. No optimization, no criterion evaluation, no pruning. Pure randomness.

isolation_tree.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import numpy as np
from dataclasses import dataclass
from typing import Optional, Union
 
@dataclass
class ITreeNode:
    """A node in an Isolation Tree."""
    # For internal nodes
    split_attribute: Optional[int] = None  # Which dimension to split
    split_value: Optional[float] = None    # The split threshold
    left: Optional['ITreeNode'] = None     # Left child (< split_value)
    right: Optional['ITreeNode'] = None    # Right child (>= split_value)
    
    # For leaf nodes
    size: int = 0  # Number of points that reached this leaf
    
    @property
    def is_leaf(self) -> bool:
        return self.left is None and self.right is None
 
 
def build_isolation_tree(
    X: np.ndarray,
    current_depth: int = 0,
    max_depth: Optional[int] = None
) -> ITreeNode:
    """
    Recursively build an Isolation Tree.
    
    Args:
        X: Data matrix of shape (n_samples, n_features)
        current_depth: Current depth in the tree
        max_depth: Maximum depth limit (typically ceil(log2(subsample_size)))
        
    Returns:
        Root node of the constructed isolation tree
    """
    n_samples, n_features = X.shape
    
    # Base case 1: Single point or no points
    if n_samples <= 1:
        return ITreeNode(size=n_samples)
    
    # Base case 2: Maximum depth reached
    if max_depth is not None and current_depth >= max_depth:
        return ITreeNode(size=n_samples)
    
    # Randomly select split attribute (dimension)
    split_attr = np.random.randint(0, n_features)
    
    # Get min and max values for this attribute
    attr_values = X[:, split_attr]
    attr_min, attr_max = attr_values.min(), attr_values.max()
    
    # If all values are the same, cannot split
    if attr_min == attr_max:
        return ITreeNode(size=n_samples)
    
    # Randomly select split value uniformly between min and max
    split_val = np.random.uniform(attr_min, attr_max)
    
    # Partition data
    left_mask = attr_values < split_val
    right_mask = ~left_mask
    
    X_left = X[left_mask]
    X_right = X[right_mask]
    
    # Recursively build subtrees
    left_child = build_isolation_tree(X_left, current_depth + 1, max_depth)
    right_child = build_isolation_tree(X_right, current_depth + 1, max_depth)
    
    return ITreeNode(
        split_attribute=split_attr,
        split_value=split_val,
        left=left_child,
        right=right_child
    )
 
 
# Example: Build a tree on sample data
np.random.seed(42)
X_sample = np.random.randn(10, 2)  # 10 points, 2 dimensions
tree = build_isolation_tree(X_sample, max_depth=5)
print("Isolation tree built successfully!")
print(f"Root splits on attribute {tree.split_attribute} at value {tree.split_value:.3f}")

Key Observations about the Construction:

No criterion optimization: Unlike CART or C4.5, we don't compute information gain, Gini impurity, or any other split quality measure. Every split is purely random.
Uniform split value selection: The split value is drawn uniformly between min and max. This means splits are more likely in regions with wider data ranges.
Axis-parallel splits: All splits are perpendicular to one axis (axis-parallel hyperplanes). This is both a strength (computational simplicity) and a limitation (may miss diagonal patterns).
Natural termination: The tree grows until each leaf contains a single point or maximum depth is reached.

Why Random Splits Work

Random splits create an 'unbiased' probe of the data structure. If a point is truly in a sparse region, it will be isolated quickly regardless of which dimension we randomly choose to split. The randomness averages out over an ensemble of trees, giving us a robust estimate of each point's true 'isolability.'

Computing Path Lengths

Once an isolation tree is built, we need to compute the path length for each data point—the number of edges traversed from the root to the point's termination node. This path length is the key signal for anomaly detection.

Algorithm: Compute Path Length

Start at the root with path length = 0
At each internal node:
- If point's attribute value < split value: go left, increment path length
- Otherwise: go right, increment path length
When reaching a leaf node:
- If leaf size = 1: return current path length
- If leaf size > 1 (due to max depth limit): add an adjustment term

Handling Early Termination (Max Depth Limit):

When we limit tree depth (which we always do in practice), some leaf nodes contain multiple points. This means we haven't fully isolated the point—we stopped early.

To account for this, we add the expected additional path length that would be needed to fully isolate the point from the other points in its leaf:

$$h(x) = \text{path_length}(x) + c(\text{leaf_size})$$

where $c(n)$ is the average BST path length correction we introduced in the previous page. This adjustment ensures that points reaching leaves with many other points get appropriately longer path lengths.

path_length.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from typing import Optional
 
def average_path_length(n: int) -> float:
    """
    Compute c(n) - the average unsuccessful BST search path length.
    Used for normalizing path lengths and adjusting for early termination.
    """
    if n <= 1:
        return 0.0
    elif n == 2:
        return 1.0
    else:
        euler_mascheroni = 0.5772156649
        harmonic = np.log(n - 1) + euler_mascheroni
        return 2.0 * harmonic - (2.0 * (n - 1) / n)
 
 
def path_length(
    x: np.ndarray,
    node: 'ITreeNode',
    current_length: int = 0
) -> float:
    """
    Compute the path length for a single point in an isolation tree.
    
    Args:
        x: A single data point of shape (n_features,)
        node: Current node in the tree
        current_length: Number of edges traversed so far
        
    Returns:
        Path length h(x) including adjustment for early termination
    """
    # Base case: reached a leaf node
    if node.is_leaf:
        # Add adjustment for points that would need further isolation
        return current_length + average_path_length(node.size)
    
    # Traverse to appropriate child
    attr_value = x[node.split_attribute]
    
    if attr_value < node.split_value:
        return path_length(x, node.left, current_length + 1)
    else:
        return path_length(x, node.right, current_length + 1)
 
 
def path_lengths_batch(X: np.ndarray, tree: 'ITreeNode') -> np.ndarray:
    """
    Compute path lengths for all points in a dataset.
    
    Args:
        X: Data matrix of shape (n_samples, n_features)
        tree: Root node of the isolation tree
        
    Returns:
        Array of path lengths, shape (n_samples,)
    """
    return np.array([path_length(x, tree) for x in X])
 
 
# Example: Show path lengths for different points
# Assume we have the tree from before
# tree = build_isolation_tree(X_sample, max_depth=5)
 
# Create sample points: one normal, one anomaly
normal_point = np.array([0.0, 0.0])   # Near center of normal distribution
anomaly_point = np.array([10.0, 10.0])  # Far from normal data
 
# Note: In practice, you'd use the tree built on training data
# h_normal = path_length(normal_point, tree)
# h_anomaly = path_length(anomaly_point, tree)
# print(f"Normal point path length: {h_normal:.2f}")
# print(f"Anomaly point path length: {h_anomaly:.2f}")
# Expected: h_anomaly << h_normal

Path Length vs. Tree Depth

Path length counts edges (the number of splits applied), not nodes. A point at depth d has path length d. Don't confuse path length with tree height or the number of nodes visited.

The Power of Subsampling

A crucial and counterintuitive aspect of Isolation Forest is that it works better with subsampling than with the full dataset. This is unlike most machine learning algorithms where more data typically means better performance.

Why Subsampling Helps:

Subsampling addresses two fundamental problems in anomaly detection:

1. Swamping When normal points surround an anomaly, they can 'swamp' it—forcing the anomaly to require many more splits to be isolated because it must be separated from all these neighbors. With subsampling:

Fewer normal points are present in each tree
The anomaly's neighborhood is sparser
Fewer splits are needed for isolation
The anomaly signal becomes clearer

Converting Mermaid diagram...

2. Masking When multiple anomalies cluster together, they can 'mask' each other—appearing as a small cluster that requires normal-like path lengths to isolate. With subsampling:

Anomaly clusters are likely to be broken up
Each tree sees at most one or two anomalies from a cluster
Individual anomalies are exposed
Detection performance improves

Recommended Subsample Size:

The original paper recommends a subsample size of ψ = 256 as a default. This value:

Is small enough to mitigate swamping and masking
Is large enough to maintain statistical stability
Works well across a wide variety of datasets
Keeps tree construction very fast

Impact of Subsample Size on Detection Performance
Subsample Size	Swamping Mitigation	Masking Mitigation	Computational Cost	Score Stability
Very small (< 64)	Excellent	Excellent	Very Low	Poor (high variance)
Small (64-128)	Good	Good	Low	Moderate
256 (Default)	Good	Good	Low	Good
Medium (512-1024)	Moderate	Moderate	Moderate	Good
Large (> 2048)	Poor	Poor	High	Excellent
Full dataset	Poor	Poor	Very High	Excellent

When to Increase Subsample Size

While 256 is a good default, you may need larger subsamples when: (1) The normal data has multiple distinct clusters, (2) The dataset is very high-dimensional, or (3) You need more stable/reproducible scores. Start with 256 and increase if anomaly detection performance is poor or highly variable.

Maximum Depth and Tree Height Limit

Isolation Forest limits tree depth, and this limit is not arbitrary—it's a deliberate design choice that improves both efficiency and detection quality.

The Depth Limit Formula:

For a subsample of size $\psi$, the maximum depth is set to:

$$\text{max_depth} = \lceil \log_2(\psi) \rceil$$

For the default ψ = 256, this gives max_depth = 8.

Why This Specific Limit?

Rationale for Depth Limit

•$\log_2(\psi)$ is the average BST height: For random data, a balanced binary tree with $\psi$ leaves has height approximately $\log_2(\psi)$. Going deeper provides diminishing returns.
•Anomalies are identified early: If a point is an anomaly, it will be isolated long before reaching max depth. The limit mainly affects normal points.
•Computational savings: Limiting depth keeps trees small. With max_depth = 8 and 256 samples, trees have at most 512 nodes.
•Focus on relative isolation: We care about detecting anomalies (short paths), not precisely measuring how 'normal' the most normal points are (very long paths).

The Path Length Correction:

When a point reaches a leaf at max depth with multiple points still in its partition, we know it would have needed more splits for full isolation. The correction term $c(n)$ estimates this:

$$h(x) = \text{observed_depth} + c(\text{leaf_size})$$

This correction ensures that points deep in dense regions get appropriately long path lengths, even though we stopped early.

Visualization of Depth Distribution:

depth_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
 
# Create dataset with clear structure
np.random.seed(42)
n_normal = 1000
n_anomalies = 10
 
# Normal points: 2D Gaussian cluster
X_normal = np.random.randn(n_normal, 2)
 
# Anomalies: scattered points far from cluster
X_anomalies = 4 + 2 * np.random.randn(n_anomalies, 2)
 
X = np.vstack([X_normal, X_anomalies])
labels = np.array([0]*n_normal + [1]*n_anomalies)  # 0=normal, 1=anomaly
 
# Fit Isolation Forest
clf = IsolationForest(
    n_estimators=100,
    max_samples=256,  # Subsample size ψ
    contamination=0.01,
    random_state=42
)
clf.fit(X)
 
# Get anomaly scores
scores = -clf.score_samples(X)  # Negate because sklearn returns negative
 
# Analyze score distribution by label
normal_scores = scores[labels == 0]
anomaly_scores = scores[labels == 1]
 
print("Score Statistics:")
print(f"Normal points:  mean={normal_scores.mean():.3f}, "
      f"min={normal_scores.min():.3f}, max={normal_scores.max():.3f}")
print(f"Anomaly points: mean={anomaly_scores.mean():.3f}, "
      f"min={anomaly_scores.min():.3f}, max={anomaly_scores.max():.3f}")
print()
print(f"Separation: {anomaly_scores.min():.3f} (min anomaly) vs "
      f"{normal_scores.max():.3f} (max normal)")
 
# Note: Anomaly scores should be clearly higher than normal scores
# This demonstrates the isolation principle in action

Depth Limit in Practice

In scikit-learn's implementation, max_samples determines both the subsample size and (implicitly) the maximum useful tree depth. You rarely need to set max_depth explicitly—the default behavior handles this appropriately.

Building the Forest: The Ensemble Approach

A single isolation tree provides a noisy estimate of each point's isolability due to the random nature of split selection. An Isolation Forest aggregates multiple trees to obtain robust, stable anomaly scores.

The Full Isolation Forest Algorithm:

isolation_forest_full.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import numpy as np
from typing import List, Optional
from dataclasses import dataclass
 
@dataclass
class IsolationForest:
    """
    Isolation Forest for anomaly detection.
    
    Key hyperparameters:
    - n_estimators: Number of isolation trees (default 100)
    - max_samples: Subsample size for each tree (default 256)
    """
    n_estimators: int = 100
    max_samples: int = 256
    random_state: Optional[int] = None
    
    trees_: List['ITreeNode'] = None
    n_samples_: int = None
    
    def fit(self, X: np.ndarray) -> 'IsolationForest':
        """
        Build the isolation forest from training data.
        
        Args:
            X: Training data of shape (n_samples, n_features)
            
        Returns:
            self
        """
        n_samples, n_features = X.shape
        self.n_samples_ = n_samples
        
        if self.random_state is not None:
            np.random.seed(self.random_state)
        
        # Determine actual subsample size
        subsample_size = min(self.max_samples, n_samples)
        
        # Maximum depth: ceil(log2(subsample_size))
        max_depth = int(np.ceil(np.log2(subsample_size)))
        
        self.trees_ = []
        
        for _ in range(self.n_estimators):
            # Random subsample (without replacement)
            indices = np.random.choice(n_samples, size=subsample_size, replace=False)
            X_subsample = X[indices]
            
            # Build isolation tree
            tree = build_isolation_tree(X_subsample, max_depth=max_depth)
            self.trees_.append(tree)
        
        return self
    
    def score_samples(self, X: np.ndarray) -> np.ndarray:
        """
        Compute anomaly scores for samples.
        
        Args:
            X: Data to score, shape (n_samples, n_features)
            
        Returns:
            Anomaly scores in (0, 1), higher = more anomalous
        """
        # Average path length across all trees
        path_lengths = np.zeros(len(X))
        
        for tree in self.trees_:
            for i, x in enumerate(X):
                path_lengths[i] += path_length(x, tree)
        
        # Average over trees
        avg_path_lengths = path_lengths / self.n_estimators
        
        # Convert to anomaly scores
        c_n = average_path_length(self.max_samples)
        scores = np.power(2, -avg_path_lengths / c_n)
        
        return scores
    
    def predict(self, X: np.ndarray, threshold: float = 0.5) -> np.ndarray:
        """
        Predict anomaly labels.
        
        Args:
            X: Data to predict, shape (n_samples, n_features)
            threshold: Score threshold for anomaly classification
            
        Returns:
            Binary predictions: 1 (anomaly), 0 (normal)
        """
        scores = self.score_samples(X)
        return (scores > threshold).astype(int)
 
 
# The training phase: O(n_estimators × max_samples × log(max_samples))
# The scoring phase: O(n_estimators × n_points × log(max_samples))
# Both phases are independent of the full dataset size n!

Ensemble Properties:

Variance reduction: Averaging path lengths across many trees reduces variance in anomaly scores, making them more stable and reliable.
Diverse partitions: Each tree uses different random splits, exploring different ways to partition the space. This diversity is key to robust detection.
Parallelizable: Trees are independent and can be built in parallel, making the algorithm highly scalable.
No bootstrap overlap concerns: Unlike Random Forests where bootstrap samples overlap significantly, the small subsample size means each tree sees a fairly different subset of data.

Recommended Hyperparameter Values
Parameter	Default	Typical Range	When to Adjust
n_estimators	100	50-500	Increase for more stable scores; decrease for faster training
max_samples	256 or 'auto'	64-2048	Increase if normal data has multiple clusters; decrease for very large datasets
contamination	0.1 or 'auto'	0.01-0.5	Set based on expected anomaly proportion; affects threshold only
max_features	1.0 (all)	0.5-1.0	Decrease for very high-dimensional data

Score Interpretation

Isolation Forest scores are relative rankings, not calibrated probabilities. A score of 0.7 doesn't mean '70% probability of being anomalous.' Use scores for ranking and apply domain-specific thresholds.

Computational Complexity Analysis

One of Isolation Forest's major advantages is its computational efficiency compared to distance-based methods. Let's analyze the complexity rigorously.

Variables:

$n$: Total number of training samples
$\psi$: Subsample size (max_samples)
$t$: Number of trees (n_estimators)
$d$: Number of features
$m$: Number of test points to score

Time and Space Complexity of Isolation Forest
Operation	Time Complexity	Space Complexity	Notes
Build one tree	$O(\psi \cdot \log \psi)$	$O(\psi)$	Heapsort-like depth; limited by max_depth
Build forest	$O(t \cdot \psi \cdot \log \psi)$	$O(t \cdot \psi)$	Trees are independent → parallelizable
Score one point	$O(t \cdot \log \psi)$	$O(1)$	Traverse t trees, each of depth $\log \psi$
Score m points	$O(t \cdot m \cdot \log \psi)$	$O(m)$	Parallelizable across points and trees
Total (with scoring)	$O(t \cdot \psi \cdot \log \psi + t \cdot m \cdot \log \psi)$	$O(t \cdot \psi + m)$	Dominated by scoring for large $m$

The Key Insight: Independence from n

Notice that the training complexity does not depend on n (the total dataset size). Only the subsample size $\psi$ matters. This is because:

We draw random subsamples of fixed size $\psi$
Each tree is built only on this subsample
The full dataset is never loaded into memory at once

This makes Isolation Forest linearly scalable to datasets of any size—you simply stream the data and draw random samples.

Comparison with Other Methods:

Complexity Comparison: Isolation Forest vs. Alternatives
Algorithm	Training Time	Scoring Time	Memory
Isolation Forest	$O(t \cdot \psi \log \psi)$	$O(t \cdot m \log \psi)$	$O(t \cdot \psi)$
k-NN Distance	$O(n \log n)$ (with indexing)	$O(k \cdot n)$ or $O(k \log n)$	$O(n \cdot d)$
Local Outlier Factor	$O(n^2)$ or $O(k \cdot n \log n)$	$O(k \cdot n)$	$O(n^2)$ or $O(n \cdot d)$
One-Class SVM	$O(n^2)$ to $O(n^3)$	$O(n_sv \cdot d)$	$O(n_sv \cdot d)$
Kernel Density Estimation	$O(n \cdot d)$	$O(n \cdot d)$	$O(n \cdot d)$

Practical Performance:

With default parameters ($\psi = 256$, $t = 100$):

Training: ~25,600 splits total (100 trees × 256 samples × log₂(256) depth)
Scoring: ~800 comparisons per point (100 trees × 8 depth)

This means:

Training 100 trees on any dataset takes roughly the same time
Scoring 1 million points takes only 800 million simple comparisons
The algorithm is embarrassingly parallel

Scaling to Billions of Points

Because training complexity is independent of n, Isolation Forest can handle streaming data and datasets too large to fit in memory. Simply sample from the stream to build trees, then score points on-the-fly. This makes it ideal for real-time anomaly detection in production systems.

Implementation Details and Best Practices

When implementing or using Isolation Forest, several practical considerations affect performance and results.

1. Handling Categorical Features:

Standard Isolation Forest uses axis-parallel splits with continuous thresholds. For categorical features:

One-hot encoding: Convert categories to binary features. Works but increases dimensionality.
Label encoding: Treat categories as ordinal. Can work if ordering is meaningful.
Category-aware splits: Some implementations allow categorical splits (all values in set A go left, others go right).

2. Missing Values:

Options for handling missing values:

Imputation before fitting: Simple but may distort anomaly patterns
Separate branch: Treat missing as a third option at each split
Random assignment: Send missing values randomly left or right

iforest_sklearn.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
import numpy as np
 
def fit_isolation_forest(X, contamination=0.01):
    """
    Fit Isolation Forest with recommended settings.
    
    Args:
        X: Data matrix (n_samples, n_features)
        contamination: Expected proportion of anomalies
        
    Returns:
        Fitted IsolationForest model
    """
    # Note: Scaling is OPTIONAL for Isolation Forest
    # Unlike distance-based methods, IF is scale-invariant
    # within each dimension (splits adapt to data range)
    
    clf = IsolationForest(
        n_estimators=100,        # Number of trees
        max_samples='auto',       # 256 or n_samples if smaller
        contamination=contamination,  # Affects threshold, not scores
        max_features=1.0,         # Use all features
        bootstrap=False,          # Subsample without replacement
        n_jobs=-1,                # Use all CPU cores
        random_state=42,          # Reproducibility
        warm_start=False,         # Don't incrementally add trees
    )
    
    clf.fit(X)
    return clf
 
 
def detect_anomalies(clf, X, method='score'):
    """
    Detect anomalies using fitted model.
    
    Args:
        clf: Fitted IsolationForest
        X: Data to score (n_samples, n_features)
        method: 'score' returns anomaly scores, 'predict' returns binary
        
    Returns:
        Scores or predictions
    """
    if method == 'score':
        # score_samples returns negative scores (sklearn convention)
        # More negative = more anomalous
        # Negate to get intuitive scores (higher = more anomalous)
        return -clf.score_samples(X)
    else:
        # predict returns -1 for anomalies, 1 for normal
        predictions = clf.predict(X)
        # Convert to 0/1 (0=normal, 1=anomaly)
        return (predictions == -1).astype(int)
 
 
# Example usage
np.random.seed(42)
X_train = np.random.randn(1000, 10)  # 1000 samples, 10 features
 
# Fit model
model = fit_isolation_forest(X_train, contamination=0.01)
 
# Score new data
X_test = np.vstack([
    np.random.randn(100, 10),           # Normal points
    5 + np.random.randn(5, 10)          # Anomalies
])
 
scores = detect_anomalies(model, X_test, method='score')
print(f"Normal scores range: {scores[:100].min():.3f} to {scores[:100].max():.3f}")
print(f"Anomaly scores range: {scores[100:].min():.3f} to {scores[100:].max():.3f}")

3. scikit-learn Conventions:

Be aware of scikit-learn's conventions:

score_samples() returns negative values (more negative = more anomalous) to follow the sklearn pattern where higher is better
predict() returns -1 for anomalies, +1 for normal (unlike the intuitive 1=anomaly convention)
decision_function() is an alias for score_samples()

4. Feature Preprocessing:

Unlike distance-based methods, Isolation Forest is relatively insensitive to feature scaling because:

Each dimension is handled independently
Split values adapt to the data range in each dimension
No distance metrics are computed

However, extreme outliers in feature values can affect split selection. Consider winsorizing or clipping extreme values if they're known errors.

Reproducibility Warning

Random partitioning means results vary between runs. Always set random_state for reproducible results. When comparing models, use the same random_state. For production, consider averaging scores from multiple random seeds.

Summary: Random Partitioning in Isolation Forest

We've explored how Isolation Forest implements the isolation principle through random recursive partitioning. The algorithm's simplicity—random attribute selection, random split points, no optimization—is precisely what makes it effective and efficient.

Key Takeaways

•Isolation Trees are built through random recursive partitioning: random attribute, random split value, no optimization criterion
•Path length measures how many splits are needed to isolate a point, with adjustments for early termination at max depth
•Subsampling improves detection by mitigating swamping (normal points obscuring anomalies) and masking (anomalies obscuring each other)
•Tree depth is limited to $\log_2(\psi)$ for efficiency, with correction terms accounting for incomplete isolation
•The ensemble averages path lengths across many trees, reducing variance and providing stable anomaly scores
•Computational complexity is independent of dataset size ($O(t \cdot \psi \log \psi)$ training), enabling streaming and large-scale applications

What's Next:

With the algorithmic foundation established, the next page dives deep into path length scoring—examining how average path lengths are converted to meaningful anomaly scores, the mathematical properties of the scoring function, and practical considerations for threshold selection and score interpretation.

Page Complete

You now understand how Isolation Forest constructs trees through random partitioning and computes path lengths. The key insight: randomness is a feature, not a bug—it creates unbiased measurements of each point's true isolability.

2 / 5

Loading learning content...

Machine LearningIsolation Forest

Isolation Forest for Anomaly Detection

LevelIntermediate

Duration90 mins

TopicIsolation Forest

2 / 5

Random Partitioning

Building Isolation Trees Through Random Partitioning

Learning Objectives

Isolation Tree Construction

An Isolation Tree (iTree) is a binary tree built through random recursive partitioning of the data. The construction process is remarkably simple:

Algorithm: Build Isolation Tree

Base case: If the current node contains 1 point, or we've reached maximum depth, return a leaf node
Random split: Otherwise:
- Randomly select an attribute (dimension) $q$ from the feature set
- Randomly select a split value $p$ between the minimum and maximum values of $q$ in the current data
Partition: Divide the data into left (values < $p$) and right (values ≥ $p$) subsets
Recurse: Recursively build left and right subtrees

That's it. No optimization, no criterion evaluation, no pruning. Pure randomness.

isolation_tree.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import numpy as np
from dataclasses import dataclass
from typing import Optional, Union
 
@dataclass
class ITreeNode:
    """A node in an Isolation Tree."""
    # For internal nodes
    split_attribute: Optional[int] = None  # Which dimension to split
    split_value: Optional[float] = None    # The split threshold
    left: Optional['ITreeNode'] = None     # Left child (< split_value)
    right: Optional['ITreeNode'] = None    # Right child (>= split_value)
    
    # For leaf nodes
    size: int = 0  # Number of points that reached this leaf
    
    @property
    def is_leaf(self) -> bool:
        return self.left is None and self.right is None
 
 
def build_isolation_tree(
    X: np.ndarray,
    current_depth: int = 0,
    max_depth: Optional[int] = None
) -> ITreeNode:
    """
    Recursively build an Isolation Tree.
    
    Args:
        X: Data matrix of shape (n_samples, n_features)
        current_depth: Current depth in the tree
        max_depth: Maximum depth limit (typically ceil(log2(subsample_size)))
        
    Returns:
        Root node of the constructed isolation tree
    """
    n_samples, n_features = X.shape
    
    # Base case 1: Single point or no points
    if n_samples <= 1:
        return ITreeNode(size=n_samples)
    
    # Base case 2: Maximum depth reached
    if max_depth is not None and current_depth >= max_depth:
        return ITreeNode(size=n_samples)
    
    # Randomly select split attribute (dimension)
    split_attr = np.random.randint(0, n_features)
    
    # Get min and max values for this attribute
    attr_values = X[:, split_attr]
    attr_min, attr_max = attr_values.min(), attr_values.max()
    
    # If all values are the same, cannot split
    if attr_min == attr_max:
        return ITreeNode(size=n_samples)
    
    # Randomly select split value uniformly between min and max
    split_val = np.random.uniform(attr_min, attr_max)
    
    # Partition data
    left_mask = attr_values < split_val
    right_mask = ~left_mask
    
    X_left = X[left_mask]
    X_right = X[right_mask]
    
    # Recursively build subtrees
    left_child = build_isolation_tree(X_left, current_depth + 1, max_depth)
    right_child = build_isolation_tree(X_right, current_depth + 1, max_depth)
    
    return ITreeNode(
        split_attribute=split_attr,
        split_value=split_val,
        left=left_child,
        right=right_child
    )
 
 
# Example: Build a tree on sample data
np.random.seed(42)
X_sample = np.random.randn(10, 2)  # 10 points, 2 dimensions
tree = build_isolation_tree(X_sample, max_depth=5)
print("Isolation tree built successfully!")
print(f"Root splits on attribute {tree.split_attribute} at value {tree.split_value:.3f}")

Key Observations about the Construction:

No criterion optimization: Unlike CART or C4.5, we don't compute information gain, Gini impurity, or any other split quality measure. Every split is purely random.
Uniform split value selection: The split value is drawn uniformly between min and max. This means splits are more likely in regions with wider data ranges.
Axis-parallel splits: All splits are perpendicular to one axis (axis-parallel hyperplanes). This is both a strength (computational simplicity) and a limitation (may miss diagonal patterns).
Natural termination: The tree grows until each leaf contains a single point or maximum depth is reached.

Why Random Splits Work

Computing Path Lengths

Algorithm: Compute Path Length

Start at the root with path length = 0
At each internal node:
- If point's attribute value < split value: go left, increment path length
- Otherwise: go right, increment path length
When reaching a leaf node:
- If leaf size = 1: return current path length
- If leaf size > 1 (due to max depth limit): add an adjustment term

Handling Early Termination (Max Depth Limit):

When we limit tree depth (which we always do in practice), some leaf nodes contain multiple points. This means we haven't fully isolated the point—we stopped early.

To account for this, we add the expected additional path length that would be needed to fully isolate the point from the other points in its leaf:

$$h(x) = \text{path_length}(x) + c(\text{leaf_size})$$

path_length.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from typing import Optional
 
def average_path_length(n: int) -> float:
    """
    Compute c(n) - the average unsuccessful BST search path length.
    Used for normalizing path lengths and adjusting for early termination.
    """
    if n <= 1:
        return 0.0
    elif n == 2:
        return 1.0
    else:
        euler_mascheroni = 0.5772156649
        harmonic = np.log(n - 1) + euler_mascheroni
        return 2.0 * harmonic - (2.0 * (n - 1) / n)
 
 
def path_length(
    x: np.ndarray,
    node: 'ITreeNode',
    current_length: int = 0
) -> float:
    """
    Compute the path length for a single point in an isolation tree.
    
    Args:
        x: A single data point of shape (n_features,)
        node: Current node in the tree
        current_length: Number of edges traversed so far
        
    Returns:
        Path length h(x) including adjustment for early termination
    """
    # Base case: reached a leaf node
    if node.is_leaf:
        # Add adjustment for points that would need further isolation
        return current_length + average_path_length(node.size)
    
    # Traverse to appropriate child
    attr_value = x[node.split_attribute]
    
    if attr_value < node.split_value:
        return path_length(x, node.left, current_length + 1)
    else:
        return path_length(x, node.right, current_length + 1)
 
 
def path_lengths_batch(X: np.ndarray, tree: 'ITreeNode') -> np.ndarray:
    """
    Compute path lengths for all points in a dataset.
    
    Args:
        X: Data matrix of shape (n_samples, n_features)
        tree: Root node of the isolation tree
        
    Returns:
        Array of path lengths, shape (n_samples,)
    """
    return np.array([path_length(x, tree) for x in X])
 
 
# Example: Show path lengths for different points
# Assume we have the tree from before
# tree = build_isolation_tree(X_sample, max_depth=5)
 
# Create sample points: one normal, one anomaly
normal_point = np.array([0.0, 0.0])   # Near center of normal distribution
anomaly_point = np.array([10.0, 10.0])  # Far from normal data
 
# Note: In practice, you'd use the tree built on training data
# h_normal = path_length(normal_point, tree)
# h_anomaly = path_length(anomaly_point, tree)
# print(f"Normal point path length: {h_normal:.2f}")
# print(f"Anomaly point path length: {h_anomaly:.2f}")
# Expected: h_anomaly << h_normal

Path Length vs. Tree Depth

Path length counts edges (the number of splits applied), not nodes. A point at depth d has path length d. Don't confuse path length with tree height or the number of nodes visited.

The Power of Subsampling

Why Subsampling Helps:

Subsampling addresses two fundamental problems in anomaly detection:

Fewer normal points are present in each tree
The anomaly's neighborhood is sparser
Fewer splits are needed for isolation
The anomaly signal becomes clearer

Converting Mermaid diagram...

2. Masking When multiple anomalies cluster together, they can 'mask' each other—appearing as a small cluster that requires normal-like path lengths to isolate. With subsampling:

Anomaly clusters are likely to be broken up
Each tree sees at most one or two anomalies from a cluster
Individual anomalies are exposed
Detection performance improves

Recommended Subsample Size:

The original paper recommends a subsample size of ψ = 256 as a default. This value:

Is small enough to mitigate swamping and masking
Is large enough to maintain statistical stability
Works well across a wide variety of datasets
Keeps tree construction very fast

Impact of Subsample Size on Detection Performance
Subsample Size	Swamping Mitigation	Masking Mitigation	Computational Cost	Score Stability
Very small (< 64)	Excellent	Excellent	Very Low	Poor (high variance)
Small (64-128)	Good	Good	Low	Moderate
256 (Default)	Good	Good	Low	Good
Medium (512-1024)	Moderate	Moderate	Moderate	Good
Large (> 2048)	Poor	Poor	High	Excellent
Full dataset	Poor	Poor	Very High	Excellent

When to Increase Subsample Size

Maximum Depth and Tree Height Limit

Isolation Forest limits tree depth, and this limit is not arbitrary—it's a deliberate design choice that improves both efficiency and detection quality.

The Depth Limit Formula:

For a subsample of size $\psi$, the maximum depth is set to:

$$\text{max_depth} = \lceil \log_2(\psi) \rceil$$

For the default ψ = 256, this gives max_depth = 8.

Why This Specific Limit?

Rationale for Depth Limit

•$\log_2(\psi)$ is the average BST height: For random data, a balanced binary tree with $\psi$ leaves has height approximately $\log_2(\psi)$. Going deeper provides diminishing returns.
•Anomalies are identified early: If a point is an anomaly, it will be isolated long before reaching max depth. The limit mainly affects normal points.
•Computational savings: Limiting depth keeps trees small. With max_depth = 8 and 256 samples, trees have at most 512 nodes.
•Focus on relative isolation: We care about detecting anomalies (short paths), not precisely measuring how 'normal' the most normal points are (very long paths).

The Path Length Correction:

When a point reaches a leaf at max depth with multiple points still in its partition, we know it would have needed more splits for full isolation. The correction term $c(n)$ estimates this:

$$h(x) = \text{observed_depth} + c(\text{leaf_size})$$

This correction ensures that points deep in dense regions get appropriately long path lengths, even though we stopped early.

Visualization of Depth Distribution:

depth_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
 
# Create dataset with clear structure
np.random.seed(42)
n_normal = 1000
n_anomalies = 10
 
# Normal points: 2D Gaussian cluster
X_normal = np.random.randn(n_normal, 2)
 
# Anomalies: scattered points far from cluster
X_anomalies = 4 + 2 * np.random.randn(n_anomalies, 2)
 
X = np.vstack([X_normal, X_anomalies])
labels = np.array([0]*n_normal + [1]*n_anomalies)  # 0=normal, 1=anomaly
 
# Fit Isolation Forest
clf = IsolationForest(
    n_estimators=100,
    max_samples=256,  # Subsample size ψ
    contamination=0.01,
    random_state=42
)
clf.fit(X)
 
# Get anomaly scores
scores = -clf.score_samples(X)  # Negate because sklearn returns negative
 
# Analyze score distribution by label
normal_scores = scores[labels == 0]
anomaly_scores = scores[labels == 1]
 
print("Score Statistics:")
print(f"Normal points:  mean={normal_scores.mean():.3f}, "
      f"min={normal_scores.min():.3f}, max={normal_scores.max():.3f}")
print(f"Anomaly points: mean={anomaly_scores.mean():.3f}, "
      f"min={anomaly_scores.min():.3f}, max={anomaly_scores.max():.3f}")
print()
print(f"Separation: {anomaly_scores.min():.3f} (min anomaly) vs "
      f"{normal_scores.max():.3f} (max normal)")
 
# Note: Anomaly scores should be clearly higher than normal scores
# This demonstrates the isolation principle in action

Depth Limit in Practice

Building the Forest: The Ensemble Approach

The Full Isolation Forest Algorithm:

isolation_forest_full.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import numpy as np
from typing import List, Optional
from dataclasses import dataclass
 
@dataclass
class IsolationForest:
    """
    Isolation Forest for anomaly detection.
    
    Key hyperparameters:
    - n_estimators: Number of isolation trees (default 100)
    - max_samples: Subsample size for each tree (default 256)
    """
    n_estimators: int = 100
    max_samples: int = 256
    random_state: Optional[int] = None
    
    trees_: List['ITreeNode'] = None
    n_samples_: int = None
    
    def fit(self, X: np.ndarray) -> 'IsolationForest':
        """
        Build the isolation forest from training data.
        
        Args:
            X: Training data of shape (n_samples, n_features)
            
        Returns:
            self
        """
        n_samples, n_features = X.shape
        self.n_samples_ = n_samples
        
        if self.random_state is not None:
            np.random.seed(self.random_state)
        
        # Determine actual subsample size
        subsample_size = min(self.max_samples, n_samples)
        
        # Maximum depth: ceil(log2(subsample_size))
        max_depth = int(np.ceil(np.log2(subsample_size)))
        
        self.trees_ = []
        
        for _ in range(self.n_estimators):
            # Random subsample (without replacement)
            indices = np.random.choice(n_samples, size=subsample_size, replace=False)
            X_subsample = X[indices]
            
            # Build isolation tree
            tree = build_isolation_tree(X_subsample, max_depth=max_depth)
            self.trees_.append(tree)
        
        return self
    
    def score_samples(self, X: np.ndarray) -> np.ndarray:
        """
        Compute anomaly scores for samples.
        
        Args:
            X: Data to score, shape (n_samples, n_features)
            
        Returns:
            Anomaly scores in (0, 1), higher = more anomalous
        """
        # Average path length across all trees
        path_lengths = np.zeros(len(X))
        
        for tree in self.trees_:
            for i, x in enumerate(X):
                path_lengths[i] += path_length(x, tree)
        
        # Average over trees
        avg_path_lengths = path_lengths / self.n_estimators
        
        # Convert to anomaly scores
        c_n = average_path_length(self.max_samples)
        scores = np.power(2, -avg_path_lengths / c_n)
        
        return scores
    
    def predict(self, X: np.ndarray, threshold: float = 0.5) -> np.ndarray:
        """
        Predict anomaly labels.
        
        Args:
            X: Data to predict, shape (n_samples, n_features)
            threshold: Score threshold for anomaly classification
            
        Returns:
            Binary predictions: 1 (anomaly), 0 (normal)
        """
        scores = self.score_samples(X)
        return (scores > threshold).astype(int)
 
 
# The training phase: O(n_estimators × max_samples × log(max_samples))
# The scoring phase: O(n_estimators × n_points × log(max_samples))
# Both phases are independent of the full dataset size n!

Ensemble Properties:

Variance reduction: Averaging path lengths across many trees reduces variance in anomaly scores, making them more stable and reliable.
Diverse partitions: Each tree uses different random splits, exploring different ways to partition the space. This diversity is key to robust detection.
Parallelizable: Trees are independent and can be built in parallel, making the algorithm highly scalable.
No bootstrap overlap concerns: Unlike Random Forests where bootstrap samples overlap significantly, the small subsample size means each tree sees a fairly different subset of data.

Recommended Hyperparameter Values
Parameter	Default	Typical Range	When to Adjust
n_estimators	100	50-500	Increase for more stable scores; decrease for faster training
max_samples	256 or 'auto'	64-2048	Increase if normal data has multiple clusters; decrease for very large datasets
contamination	0.1 or 'auto'	0.01-0.5	Set based on expected anomaly proportion; affects threshold only
max_features	1.0 (all)	0.5-1.0	Decrease for very high-dimensional data

Score Interpretation

Computational Complexity Analysis

One of Isolation Forest's major advantages is its computational efficiency compared to distance-based methods. Let's analyze the complexity rigorously.

Variables:

$n$: Total number of training samples
$\psi$: Subsample size (max_samples)
$t$: Number of trees (n_estimators)
$d$: Number of features
$m$: Number of test points to score

Time and Space Complexity of Isolation Forest
Operation	Time Complexity	Space Complexity	Notes
Build one tree	$O(\psi \cdot \log \psi)$	$O(\psi)$	Heapsort-like depth; limited by max_depth
Build forest	$O(t \cdot \psi \cdot \log \psi)$	$O(t \cdot \psi)$	Trees are independent → parallelizable
Score one point	$O(t \cdot \log \psi)$	$O(1)$	Traverse t trees, each of depth $\log \psi$
Score m points	$O(t \cdot m \cdot \log \psi)$	$O(m)$	Parallelizable across points and trees
Total (with scoring)	$O(t \cdot \psi \cdot \log \psi + t \cdot m \cdot \log \psi)$	$O(t \cdot \psi + m)$	Dominated by scoring for large $m$

The Key Insight: Independence from n

Notice that the training complexity does not depend on n (the total dataset size). Only the subsample size $\psi$ matters. This is because:

We draw random subsamples of fixed size $\psi$
Each tree is built only on this subsample
The full dataset is never loaded into memory at once

This makes Isolation Forest linearly scalable to datasets of any size—you simply stream the data and draw random samples.

Comparison with Other Methods:

Complexity Comparison: Isolation Forest vs. Alternatives
Algorithm	Training Time	Scoring Time	Memory
Isolation Forest	$O(t \cdot \psi \log \psi)$	$O(t \cdot m \log \psi)$	$O(t \cdot \psi)$
k-NN Distance	$O(n \log n)$ (with indexing)	$O(k \cdot n)$ or $O(k \log n)$	$O(n \cdot d)$
Local Outlier Factor	$O(n^2)$ or $O(k \cdot n \log n)$	$O(k \cdot n)$	$O(n^2)$ or $O(n \cdot d)$
One-Class SVM	$O(n^2)$ to $O(n^3)$	$O(n_sv \cdot d)$	$O(n_sv \cdot d)$
Kernel Density Estimation	$O(n \cdot d)$	$O(n \cdot d)$	$O(n \cdot d)$

Practical Performance:

With default parameters ($\psi = 256$, $t = 100$):

Training: ~25,600 splits total (100 trees × 256 samples × log₂(256) depth)
Scoring: ~800 comparisons per point (100 trees × 8 depth)

This means:

Training 100 trees on any dataset takes roughly the same time
Scoring 1 million points takes only 800 million simple comparisons
The algorithm is embarrassingly parallel

Scaling to Billions of Points

Implementation Details and Best Practices

When implementing or using Isolation Forest, several practical considerations affect performance and results.

1. Handling Categorical Features:

Standard Isolation Forest uses axis-parallel splits with continuous thresholds. For categorical features:

One-hot encoding: Convert categories to binary features. Works but increases dimensionality.
Label encoding: Treat categories as ordinal. Can work if ordering is meaningful.
Category-aware splits: Some implementations allow categorical splits (all values in set A go left, others go right).

2. Missing Values:

Options for handling missing values:

Imputation before fitting: Simple but may distort anomaly patterns
Separate branch: Treat missing as a third option at each split
Random assignment: Send missing values randomly left or right

iforest_sklearn.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
import numpy as np
 
def fit_isolation_forest(X, contamination=0.01):
    """
    Fit Isolation Forest with recommended settings.
    
    Args:
        X: Data matrix (n_samples, n_features)
        contamination: Expected proportion of anomalies
        
    Returns:
        Fitted IsolationForest model
    """
    # Note: Scaling is OPTIONAL for Isolation Forest
    # Unlike distance-based methods, IF is scale-invariant
    # within each dimension (splits adapt to data range)
    
    clf = IsolationForest(
        n_estimators=100,        # Number of trees
        max_samples='auto',       # 256 or n_samples if smaller
        contamination=contamination,  # Affects threshold, not scores
        max_features=1.0,         # Use all features
        bootstrap=False,          # Subsample without replacement
        n_jobs=-1,                # Use all CPU cores
        random_state=42,          # Reproducibility
        warm_start=False,         # Don't incrementally add trees
    )
    
    clf.fit(X)
    return clf
 
 
def detect_anomalies(clf, X, method='score'):
    """
    Detect anomalies using fitted model.
    
    Args:
        clf: Fitted IsolationForest
        X: Data to score (n_samples, n_features)
        method: 'score' returns anomaly scores, 'predict' returns binary
        
    Returns:
        Scores or predictions
    """
    if method == 'score':
        # score_samples returns negative scores (sklearn convention)
        # More negative = more anomalous
        # Negate to get intuitive scores (higher = more anomalous)
        return -clf.score_samples(X)
    else:
        # predict returns -1 for anomalies, 1 for normal
        predictions = clf.predict(X)
        # Convert to 0/1 (0=normal, 1=anomaly)
        return (predictions == -1).astype(int)
 
 
# Example usage
np.random.seed(42)
X_train = np.random.randn(1000, 10)  # 1000 samples, 10 features
 
# Fit model
model = fit_isolation_forest(X_train, contamination=0.01)
 
# Score new data
X_test = np.vstack([
    np.random.randn(100, 10),           # Normal points
    5 + np.random.randn(5, 10)          # Anomalies
])
 
scores = detect_anomalies(model, X_test, method='score')
print(f"Normal scores range: {scores[:100].min():.3f} to {scores[:100].max():.3f}")
print(f"Anomaly scores range: {scores[100:].min():.3f} to {scores[100:].max():.3f}")

3. scikit-learn Conventions:

Be aware of scikit-learn's conventions:

score_samples() returns negative values (more negative = more anomalous) to follow the sklearn pattern where higher is better
predict() returns -1 for anomalies, +1 for normal (unlike the intuitive 1=anomaly convention)
decision_function() is an alias for score_samples()

4. Feature Preprocessing:

Unlike distance-based methods, Isolation Forest is relatively insensitive to feature scaling because:

Each dimension is handled independently
Split values adapt to the data range in each dimension
No distance metrics are computed

However, extreme outliers in feature values can affect split selection. Consider winsorizing or clipping extreme values if they're known errors.

Reproducibility Warning

Summary: Random Partitioning in Isolation Forest

Key Takeaways

•Isolation Trees are built through random recursive partitioning: random attribute, random split value, no optimization criterion
•Path length measures how many splits are needed to isolate a point, with adjustments for early termination at max depth
•Subsampling improves detection by mitigating swamping (normal points obscuring anomalies) and masking (anomalies obscuring each other)
•Tree depth is limited to $\log_2(\psi)$ for efficiency, with correction terms accounting for incomplete isolation
•The ensemble averages path lengths across many trees, reducing variance and providing stable anomaly scores
•Computational complexity is independent of dataset size ($O(t \cdot \psi \log \psi)$ training), enabling streaming and large-scale applications

What's Next:

Page Complete

2 / 5