Statistical Methods - Learning Module

Loading content...

0/278

Multivariate Methods

When One Dimension Isn't Enough

Consider a person who is 6'8" tall and weighs 130 pounds. Their height is unusual but not extreme—professional basketball players routinely exceed this. Their weight is on the low side but not alarming—some people are naturally thin. Yet the combination of extreme height with low weight is highly anomalous—it suggests a medical condition, data entry error, or fraud.

This illustrates a fundamental limitation of univariate methods: anomalies often hide in the relationships between variables, not in individual variables. The Z-score, IQR, and Grubbs' tests we've studied can miss these multivariate outliers entirely.

Multivariate anomaly detection addresses this by considering the joint distribution of all variables simultaneously, accounting for correlations and dependencies that define what constitutes 'normal' in high-dimensional space.

What You Will Learn

By the end of this page, you will understand why univariate methods fail for correlated data, the mathematical foundations of the Mahalanobis distance, how to use chi-squared distributions for thresholding, challenges of covariance estimation, robust alternatives like the Minimum Covariance Determinant, and practical implementation strategies for multivariate outlier detection.

The Limitations of Univariate Methods

Let's formalize why applying univariate tests to each variable independently can fail dramatically.

The Marginal-Joint Discrepancy

Consider a 2D dataset where variables $X$ and $Y$ are positively correlated (when $X$ is high, $Y$ tends to be high). The data forms an elliptical cloud in 2D space.

A point with moderately high $X$ and moderately low $Y$ might be:

Univariately normal: Both $X$ and $Y$ values are within 2 standard deviations of their respective means
Multivariately anomalous: The combination violates the correlation structure—the point lies far from the data cloud

Mathematically: If $X$ and $Y$ are marginally normal, applying Z-scores to each gives:

$z_X = \frac{x - \mu_X}{\sigma_X}$
$z_Y = \frac{y - \mu_Y}{\sigma_Y}$

Even if $|z_X| < 3$ and $|z_Y| < 3$, the point $(x, y)$ can be highly improbable under the joint distribution.

Geometric Intuition

Univariate thresholds define a rectangle (in 2D) or hypercube (in higher dimensions) aligned with the coordinate axes.

Joint probability contours for correlated normal data form an ellipse (in 2D) or ellipsoid (in higher dimensions) that can be rotated and stretched.

The rectangle and ellipse can differ dramatically:

Points inside the rectangle but outside the ellipse are multivariate outliers missed by univariate tests
Points outside the rectangle but inside the ellipse are false positives from univariate tests

As dimensionality increases, this discrepancy worsens exponentially.

The Curse of Dimensionality Preview

In high dimensions, most of the probability mass in a Gaussian distribution lies in a thin shell at a specific distance from the center—not at the center itself. Univariate reasoning fails to capture this phenomenon. Points that appear 'central' in each dimension can be extremely unlikely in combination.

The Correlation Blind Spot

If you apply univariate outlier detection to each feature and take the union, you will miss anomalies that violate correlation structures. If you take the intersection (flag only if outlier in ALL dimensions), you will miss almost everything in high dimensions. Neither approach captures multivariate structure.

The Mahalanobis Distance: The Right Metric

The Mahalanobis distance, introduced by Prasanta Chandra Mahalanobis in 1936, is a distance measure that accounts for correlations between variables. It is the fundamental tool for multivariate anomaly detection.

Definition

For a point $\mathbf{x} \in \mathbb{R}^p$ and a distribution with mean vector $\boldsymbol{\mu}$ and covariance matrix $\boldsymbol{\Sigma}$, the Mahalanobis distance is:

$$D_M(\mathbf{x}) = \sqrt{(\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})}$$

Often, we work with the squared Mahalanobis distance:

$$D_M^2(\mathbf{x}) = (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})$$

Understanding the Formula

Let's unpack this expression:

$(\mathbf{x} - \boldsymbol{\mu})$: Centers the point relative to the mean (just like Z-score)
$\boldsymbol{\Sigma}^{-1}$: The inverse covariance matrix (precision matrix) standardizes and decorrelates
The quadratic form: Computes a weighted sum of squared deviations, where weights depend on variances and covariances

Special Cases

Case 1: Uncorrelated Variables

If variables are uncorrelated, $\boldsymbol{\Sigma}$ is diagonal: $$\boldsymbol{\Sigma} = \text{diag}(\sigma_1^2, \sigma_2^2, \ldots, \sigma_p^2)$$

The Mahalanobis distance simplifies to: $$D_M^2(\mathbf{x}) = \sum_{i=1}^{p} \left(\frac{x_i - \mu_i}{\sigma_i}\right)^2 = \sum_{i=1}^{p} z_i^2$$

This is the sum of squared Z-scores—the Euclidean distance in standardized space.

Case 2: Identity Covariance

If $\boldsymbol{\Sigma} = \mathbf{I}$ (standardized uncorrelated data), Mahalanobis distance equals the standard Euclidean distance from the mean: $$D_M(\mathbf{x}) = |\mathbf{x} - \boldsymbol{\mu}|_2$$

Geometric Interpretation

The covariance matrix $\boldsymbol{\Sigma}$ can be decomposed as $\boldsymbol{\Sigma} = \mathbf{V} \mathbf{D} \mathbf{V}^T$, where:

$\mathbf{V}$ contains the eigenvectors (principal directions)
$\mathbf{D}$ contains the eigenvalues (variances along principal directions)

Mahalanobis distance effectively:

Rotates the data so principal components align with coordinate axes (using $\mathbf{V}^T$)
Scales each rotated dimension by its standard deviation (using $\mathbf{D}^{-1/2}$)
Computes Euclidean distance in this transformed space

This transforms the elliptical probability contours into circular ones, making simple distance comparison meaningful.

The Key Insight

Mahalanobis distance measures how many 'standard deviations' a point is from the center, where 'standard deviation' is measured along the natural axes of the data ellipsoid. It's the multivariate generalization of the Z-score.

Chi-Squared Distribution and Thresholds

Just as Z-scores have probabilistic interpretation under normality, Mahalanobis distances have a precise distributional characterization.

The Chi-Squared Connection

If $\mathbf{x} \sim \mathcal{N}_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ (multivariate normal), then the squared Mahalanobis distance follows a chi-squared distribution with $p$ degrees of freedom:

$$D_M^2(\mathbf{x}) \sim \chi^2_p$$

This result is fundamental—it allows us to set thresholds with known false positive rates.

Setting Thresholds

To achieve a false positive rate of $\alpha$, set the threshold at the $(1-\alpha)$ quantile of the chi-squared distribution:

$$\tau = \chi^2_{p, 1-\alpha}$$

Points with $D_M^2(\mathbf{x}) > \tau$ are flagged as outliers.

Chi-Squared Critical Values for Common α and Dimensions
Dimensions (p)	α = 0.05	α = 0.01	α = 0.001
2	5.99	9.21	13.82
3	7.81	11.34	16.27
5	11.07	15.09	20.52
10	18.31	23.21	29.59
20	31.41	37.57	45.31
50	67.50	76.15	86.66
100	124.34	135.81	149.45

Interpretation

Notice how the threshold increases with dimensionality. In 2D, $D_M^2 > 5.99$ is an outlier at α=0.05. In 100D, you need $D_M^2 > 124$ for the same significance.

Why? In high dimensions, even normal points are expected to be 'far' from the center in Mahalanobis distance. The chi-squared distribution with more degrees of freedom has a larger mean and wider spread.

Mean of $\chi^2_p$: $\mathbb{E}[\chi^2_p] = p$

Variance of $\chi^2_p$: $\text{Var}(\chi^2_p) = 2p$

So in 100 dimensions, the average squared Mahalanobis distance is 100, not near zero.

Sample Estimates

The chi-squared distribution is exact only when the true mean and covariance are known. When estimated from sample data (as in practice), the distribution is better approximated by a scaled F-distribution or beta distribution. For small samples, this correction is important; for large samples (n >> p), the chi-squared approximation is adequate.

Implementation and Practical Considerations

Algorithm: Mahalanobis Distance-Based Outlier Detection

Input: Dataset $\mathbf{X} \in \mathbb{R}^{n \times p}$ (n observations, p variables), significance level $\alpha$

Output: Boolean array indicating outliers

1. Estimate mean vector: μ̂ = (1/n) Σᵢ xᵢ
2. Estimate covariance matrix: Σ̂ = (1/(n-1)) Σᵢ (xᵢ - μ̂)(xᵢ - μ̂)ᵀ
3. Compute precision matrix: Σ̂⁻¹ (with regularization if needed)
4. For each observation i:
   D²ᵢ = (xᵢ - μ̂)ᵀ Σ̂⁻¹ (xᵢ - μ̂)
5. Compute threshold: τ = χ²_p,1-α
6. Flag outliers: D²ᵢ > τ

Computational Complexity:

Covariance estimation: $O(np^2)$
Matrix inversion: $O(p^3)$
Distance computation per point: $O(p^2)$
Total: $O(np^2 + p^3)$

mahalanobis_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
import numpy as np
from scipy import stats
from typing import Tuple, NamedTuple
import warnings
 
class MahalanobisResult(NamedTuple):
    """Results from Mahalanobis distance-based outlier detection."""
    outlier_mask: np.ndarray
    distances_squared: np.ndarray
    threshold: float
    mean: np.ndarray
    covariance: np.ndarray
 
def mahalanobis_outlier_detection(
    X: np.ndarray,
    alpha: float = 0.05,
    regularization: float = 1e-6
) -> MahalanobisResult:
    """
    Detect outliers using Mahalanobis distance.
    
    Parameters
    ----------
    X : np.ndarray
        Data matrix of shape (n_samples, n_features)
    alpha : float
        Significance level for chi-squared threshold
    regularization : float
        Ridge regularization for covariance matrix (for stability)
        
    Returns
    -------
    MahalanobisResult : NamedTuple with detection results
    """
    n_samples, n_features = X.shape
    
    if n_samples <= n_features:
        warnings.warn(
            f"n_samples ({n_samples}) <= n_features ({n_features}). "
            "Covariance matrix will be singular. Consider PCA first."
        )
    
    # Estimate mean and covariance
    mean = np.mean(X, axis=0)
    covariance = np.cov(X, rowvar=False)
    
    # Add regularization for numerical stability
    covariance += regularization * np.eye(n_features)
    
    # Compute precision matrix (inverse covariance)
    try:
        precision = np.linalg.inv(covariance)
    except np.linalg.LinAlgError:
        # Use pseudo-inverse if singular
        precision = np.linalg.pinv(covariance)
        warnings.warn("Covariance matrix is singular, using pseudo-inverse")
    
    # Compute Mahalanobis distances
    centered = X - mean
    distances_squared = np.sum(centered @ precision * centered, axis=1)
    
    # Chi-squared threshold
    threshold = stats.chi2.ppf(1 - alpha, df=n_features)
    
    # Flag outliers
    outlier_mask = distances_squared > threshold
    
    return MahalanobisResult(
        outlier_mask=outlier_mask,
        distances_squared=distances_squared,
        threshold=threshold,
        mean=mean,
        covariance=covariance
    )
 
 
def plot_mahalanobis_2d(X: np.ndarray, result: MahalanobisResult):
    """
    Visualize Mahalanobis outlier detection in 2D.
    """
    import matplotlib.pyplot as plt
    from matplotlib.patches import Ellipse
    
    fig, ax = plt.subplots(figsize=(10, 8))
    
    # Plot inliers and outliers
    inliers = X[~result.outlier_mask]
    outliers = X[result.outlier_mask]
    
    ax.scatter(inliers[:, 0], inliers[:, 1], 
               c='blue', alpha=0.5, label='Inliers')
    ax.scatter(outliers[:, 0], outliers[:, 1], 
               c='red', s=100, marker='x', label='Outliers')
    
    # Draw threshold ellipse
    eigenvalues, eigenvectors = np.linalg.eigh(result.covariance)
    angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))
    
    # Ellipse axes are sqrt(eigenvalue * threshold)
    width = 2 * np.sqrt(eigenvalues[0] * result.threshold)
    height = 2 * np.sqrt(eigenvalues[1] * result.threshold)
    
    ellipse = Ellipse(
        result.mean, width, height, angle=angle,
        fill=False, edgecolor='green', linewidth=2,
        label=f'Threshold (α={0.05})'
    )
    ax.add_patch(ellipse)
    
    ax.set_xlabel('X₁')
    ax.set_ylabel('X₂')
    ax.legend()
    ax.set_title('Mahalanobis Distance Outlier Detection')
    plt.axis('equal')
    
    return fig
 
 
# Example usage
np.random.seed(42)
 
# Generate correlated normal data
mean = [50, 30]
cov = [[100, 80], [80, 100]]  # Strongly correlated
X_normal = np.random.multivariate_normal(mean, cov, 200)
 
# Add some outliers (violating correlation structure)
outliers = np.array([
    [80, 0],   # High X1, Low X2 (violates positive correlation)
    [20, 60],  # Low X1, High X2 (violates positive correlation)
    [100, 90], # Extreme in both (far from center)
])
X = np.vstack([X_normal, outliers])
 
# Detect outliers
result = mahalanobis_outlier_detection(X, alpha=0.01)
 
print(f"Dataset: {X.shape[0]} observations, {X.shape[1]} features")
print(f"Chi-squared threshold (α=0.01, df=2): {result.threshold:.2f}")
print(f"Outliers detected: {np.sum(result.outlier_mask)}")
print(f"Outlier Mahalanobis² distances: {result.distances_squared[result.outlier_mask]}")

The Masking Problem: Worse in High Dimensions

The masking problem we encountered with univariate methods becomes even more severe in the multivariate setting.

How Outliers Corrupt Covariance Estimates

When computing the sample covariance matrix, outliers can:

Inflate variances: Extreme values stretch the ellipsoid along certain axes
Distort correlations: Outliers can create spurious correlations or mask true ones
Shift the mean: Multiple outliers in one direction bias the center

The combined effect: The estimated ellipsoid expands and distorts to 'accommodate' the outliers, making their Mahalanobis distances appear smaller than they should be.

Illustration

Consider a clean dataset forming a tight ellipse. Add a single extreme outlier:

Without the outlier: Mahalanobis distance of outlier position would be huge
With the outlier in the data: The covariance estimate expands toward the outlier, so its computed Mahalanobis distance is much smaller

This is why directly applying Mahalanobis distance with sample estimates can miss exactly the outliers we're trying to detect.

Breakdown Point Implications

The sample mean has breakdown point 1/n. The sample covariance matrix has breakdown point approximately 1/n as well for certain types of contamination. In p dimensions, a single outlier can arbitrarily distort the covariance estimate in p(p+1)/2 ways (the number of unique covariance parameters).

High-Dimensional Fragility

In high dimensions, the covariance matrix has O(p²) parameters estimated from n observations. As p approaches n, estimation becomes extremely unstable even without outliers. Add outliers, and the estimates can become meaningless. This is why robust methods are critical.

Robust Alternatives: Minimum Covariance Determinant

To address the masking problem, we need robust estimators of location and scatter that are not influenced by outliers.

The Minimum Covariance Determinant (MCD) Estimator

The MCD estimator, introduced by Rousseeuw (1984), finds the subset of observations that gives the smallest determinant of the sample covariance matrix.

Definition:

Given $n$ observations in $\mathbb{R}^p$, find the subset $H$ of size $h$ (where $h > (n+p+1)/2$) that minimizes:

$$\det(\hat{\boldsymbol{\Sigma}}_H)$$

Where $\hat{\boldsymbol{\Sigma}}_H$ is the sample covariance of the subset $H$.

Intuition: The subset with smallest covariance determinant is the 'tightest' cluster—the core of non-outlying observations. Outliers, by definition, lie far from this core and would inflate the determinant if included.

Properties of MCD

High breakdown point: If $h \approx n/2$, the MCD can resist up to ~50% contamination.

Affine equivariance: The estimator behaves correctly under linear transformations of the data.

Statistical efficiency: For normal data, MCD is less efficient than classical estimates, but this is the price of robustness.

robust_mahalanobis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
import numpy as np
from scipy import stats
from sklearn.covariance import MinCovDeterminant
from typing import NamedTuple
 
class RobustMahalanobisResult(NamedTuple):
    """Results from robust Mahalanobis distance outlier detection."""
    outlier_mask: np.ndarray
    distances_squared: np.ndarray
    threshold: float
    robust_mean: np.ndarray
    robust_covariance: np.ndarray
 
def robust_mahalanobis_detection(
    X: np.ndarray,
    alpha: float = 0.05,
    support_fraction: float = None
) -> RobustMahalanobisResult:
    """
    Detect outliers using robust Mahalanobis distance based on MCD.
    
    Parameters
    ----------
    X : np.ndarray
        Data matrix of shape (n_samples, n_features)
    alpha : float
        Significance level for chi-squared threshold
    support_fraction : float, optional
        Proportion of observations used for MCD (default: (n + p + 1) / (2n))
        Lower values increase robustness but decrease efficiency
        
    Returns
    -------
    RobustMahalanobisResult : NamedTuple with detection results
    """
    n_samples, n_features = X.shape
    
    # Fit MCD estimator
    mcd = MinCovDeterminant(support_fraction=support_fraction)
    mcd.fit(X)
    
    # Get robust estimates
    robust_mean = mcd.location_
    robust_covariance = mcd.covariance_
    
    # Compute Mahalanobis distances using robust estimates
    distances_squared = mcd.mahalanobis(X)
    
    # Chi-squared threshold (adjusted for robustness)
    # Note: The distribution is approximately chi-squared for clean data
    threshold = stats.chi2.ppf(1 - alpha, df=n_features)
    
    # Flag outliers
    outlier_mask = distances_squared > threshold
    
    return RobustMahalanobisResult(
        outlier_mask=outlier_mask,
        distances_squared=distances_squared,
        threshold=threshold,
        robust_mean=robust_mean,
        robust_covariance=robust_covariance
    )
 
 
def compare_classical_vs_robust(X: np.ndarray, alpha: float = 0.01):
    """
    Compare classical and robust Mahalanobis detection.
    """
    from sklearn.covariance import EmpiricalCovariance, MinCovDeterminant
    
    n_samples, n_features = X.shape
    
    # Classical (non-robust) estimation
    emp_cov = EmpiricalCovariance().fit(X)
    classical_distances = emp_cov.mahalanobis(X)
    
    # Robust MCD estimation
    mcd = MinCovDeterminant().fit(X)
    robust_distances = mcd.mahalanobis(X)
    
    # Threshold
    threshold = stats.chi2.ppf(1 - alpha, df=n_features)
    
    classical_outliers = classical_distances > threshold
    robust_outliers = robust_distances > threshold
    
    print("=== Classical vs. Robust Mahalanobis Comparison ===")
    print(f"Chi-squared threshold (α={alpha}, df={n_features}): {threshold:.2f}")
    print(f"
Classical method:")
    print(f"  Outliers detected: {np.sum(classical_outliers)}")
    print(f"  Max distance²: {np.max(classical_distances):.2f}")
    print(f"
Robust MCD method:")
    print(f"  Outliers detected: {np.sum(robust_outliers)}")
    print(f"  Max distance²: {np.max(robust_distances):.2f}")
    
    # Points detected by robust but not classical (masking effect)
    masked = robust_outliers & ~classical_outliers
    if np.any(masked):
        print(f"
⚠️  Masking detected: {np.sum(masked)} outliers missed by classical method")
    
    return classical_outliers, robust_outliers
 
 
# Example: Demonstrating masking effect
np.random.seed(42)
 
# Clean correlated data
mean = [0, 0]
cov = [[1, 0.8], [0.8, 1]]
X_clean = np.random.multivariate_normal(mean, cov, 100)
 
# Add outliers that will cause masking
# Multiple outliers in same direction inflate covariance in that direction
outliers = np.array([
    [5, -3],
    [6, -4],
    [7, -3.5],
    [4, 5],
    [5.5, 4],
])
X_contaminated = np.vstack([X_clean, outliers])
 
# Compare methods
classical_mask, robust_mask = compare_classical_vs_robust(X_contaminated, alpha=0.01)

FAST-MCD Algorithm

Computing the exact MCD is NP-hard. The FAST-MCD algorithm (Rousseeuw & Van Driessen, 1999) provides an efficient approximation used in scikit-learn. It uses iterative concentration steps starting from multiple initial subsets.

High-Dimensional Considerations

When the number of features $p$ approaches or exceeds the number of observations $n$, classical covariance estimation breaks down entirely.

The Singularity Problem

The sample covariance matrix requires $n > p$ to be non-singular (invertible). When $n \leq p$, the matrix is rank-deficient and cannot be inverted directly.

Even when $n > p$, if $n$ is not substantially larger than $p$, the estimates are highly unstable. A common rule of thumb is $n \geq 5p$ to $10p$ for reliable estimation.

Solutions for High Dimensions

1. Regularization

Add a regularization term to the covariance matrix to ensure invertibility: $$\hat{\boldsymbol{\Sigma}}_{\text{reg}} = \hat{\boldsymbol{\Sigma}} + \lambda \mathbf{I}$$

Where $\lambda > 0$ is a regularization parameter. This shrinks eigenvalues toward a common value.

2. Shrinkage Estimators

Ledoit-Wolf shrinkage interpolates between the sample covariance and a structured target (e.g., diagonal or scaled identity): $$\hat{\boldsymbol{\Sigma}}_{\text{shrunk}} = (1 - \alpha) \hat{\boldsymbol{\Sigma}} + \alpha \mathbf{T}$$

Where $\alpha$ is optimally chosen to minimize estimation error.

3. Dimensionality Reduction First

Apply PCA to reduce dimensions to $k << n$, then perform Mahalanobis-based detection in the reduced space. The first $k$ principal components often capture the main structure; anomalies may appear in minor components too.

Approaches for Different n/p Ratios
Regime	n/p Ratio	Recommended Approach
Classical	n/p > 10	Standard sample covariance
Moderate	5 < n/p ≤ 10	Regularization or shrinkage
Challenging	2 < n/p ≤ 5	Strong regularization or MCD with care
High-dimensional	1 < n/p ≤ 2	Dimensionality reduction + regularization
Underdetermined	n/p ≤ 1	PCA first, or diagonal covariance only

The Blessing of Low Intrinsic Dimensionality

Real-world high-dimensional data often lies near a lower-dimensional manifold. PCA can identify this intrinsic dimensionality, allowing effective Mahalanobis-based detection in the reduced space where covariance estimation is stable.

Summary and Key Takeaways

Key Takeaways

•Univariate methods miss correlation-violating outliers — A point can be normal in each dimension but anomalous jointly.
•Mahalanobis distance accounts for covariance structure — It's the multivariate generalization of the Z-score.
•Squared Mahalanobis distance follows chi-squared under normality, enabling principled threshold selection.
•Masking is severe in the multivariate setting — Outliers corrupt covariance estimates, hiding themselves from detection.
•Robust estimators like MCD resist contamination — They find the 'clean core' of data before computing distances.
•High dimensionality requires special handling — Regularization, shrinkage, or dimension reduction are essential when n is not much larger than p.

What's Next

We've covered the core statistical methods for anomaly detection: univariate Z-scores, robust IQR, formal Grubbs' testing, and multivariate Mahalanobis distance. The final page of this module synthesizes everything by examining the Assumptions and Limitations of statistical approaches as a whole—when they work, when they fail, and how to know which method to apply in practice.

Page Complete

You now understand why multivariate structure matters for outlier detection, the mathematical foundations of Mahalanobis distance, chi-squared thresholds, the masking problem, and robust alternatives like MCD. This knowledge equips you to detect anomalies that hide in the relationships between variables.