Machine LearningAnomaly Detection

Statistical Methods for Anomaly Detection

LevelIntermediate

Duration75 mins

TopicAnomaly Detection

2 / 5

IQR Method

Robust Detection Without Distributional Assumptions

The Z-score method, despite its elegance, suffers from a critical flaw: its reliance on the mean and standard deviation—statistics that are notoriously sensitive to the very outliers we're trying to detect. When John Tukey developed the Interquartile Range (IQR) method in his seminal 1977 work on exploratory data analysis, he sought something more robust: a technique grounded in order statistics that would remain reliable even when data was contaminated.

The IQR method embodies a fundamental principle in robust statistics: replace non-robust estimators (mean, standard deviation) with robust alternatives (median, interquartile range). This single conceptual shift transforms outlier detection from a fragile procedure into one that can withstand substantial data contamination.

What You Will Learn

By the end of this page, you will understand quartiles and their statistical properties, how the IQR captures spread robustly, the construction and theory behind Tukey fences, how to select appropriate multipliers for different scenarios, and when the IQR method excels or fails compared to parametric alternatives.

Order Statistics and Quartiles

The IQR method is built on order statistics—the values obtained by sorting a dataset. Understanding this foundation is essential for grasping why the method achieves robustness.

Order Statistics Defined

Given observations ${x_1, x_2, \ldots, x_n}$, the order statistics are the sorted values:

$$x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$$

Where $x_{(k)}$ denotes the $k$-th smallest value. These provide a natural description of the data's distribution without parametric assumptions.

Quantiles and Percentiles

The $p$-th quantile (or $100p$-th percentile) is the value below which a proportion $p$ of the data falls. For continuous distributions:

$$Q_p = F^{-1}(p)$$

Where $F^{-1}$ is the inverse cumulative distribution function.

For sample data, various interpolation methods exist. The most common (linear interpolation) for quantile $p$ is:

$$\hat{Q}p = x{(\lfloor h \rfloor)} + (h - \lfloor h \rfloor)(x_{(\lfloor h \rfloor + 1)} - x_{(\lfloor h \rfloor)})$$

Where $h = (n-1)p + 1$.

The Three Quartiles

Quartiles divide sorted data into four equal parts:

First Quartile (Q1): The 25th percentile — 25% of data falls below this value
Second Quartile (Q2): The 50th percentile — the median
Third Quartile (Q3): The 75th percentile — 75% of data falls below this value

The Interquartile Range (IQR) is simply:

$$\text{IQR} = Q_3 - Q_1$$

This measures the spread of the middle 50% of data—the range containing the central bulk, excluding the tails.

Why IQR is Robust

The key insight is that Q1, Q3, and IQR depend only on the positions of certain data points in the sorted order, not on their actual values at the extremes:

If observations in the top 25% become more extreme, Q3 doesn't change (much)
If observations in the bottom 25% become more extreme, Q1 doesn't change (much)
Replacing 10% of data with arbitrarily extreme values has minimal impact on quartiles

This property—called breakdown resistance—is precisely what the mean and standard deviation lack.

Breakdown Point of the Median and IQR

The median has a breakdown point of 50%—you must corrupt half the data before the median becomes arbitrarily wrong. The IQR has a breakdown point of 25%—corrupt more than a quarter, and the IQR can fail. Compare this to 0% for the mean and standard deviation. This quantifies exactly why order statistics are more robust.

Tukey Fences: The Classic Construction

John Tukey introduced a simple rule for identifying outliers using the IQR. The construction defines 'fences' beyond which observations are deemed unusual.

Inner and Outer Fences

Inner Fences (Mild Outliers): $$\text{Lower Inner Fence} = Q_1 - 1.5 \times \text{IQR}$$ $$\text{Upper Inner Fence} = Q_3 + 1.5 \times \text{IQR}$$

Outer Fences (Extreme Outliers): $$\text{Lower Outer Fence} = Q_1 - 3.0 \times \text{IQR}$$ $$\text{Upper Outer Fence} = Q_3 + 3.0 \times \text{IQR}$$

Observations beyond inner fences are mild outliers. Observations beyond outer fences are extreme outliers.

The Box Plot Visualization

The IQR method is intimately connected to the box plot (box-and-whisker diagram):

The box spans from Q1 to Q3 (covering the IQR)
The line inside the box marks the median (Q2)
Whiskers extend to the most extreme points within 1.5×IQR of the box
Points beyond the whiskers are plotted individually as potential outliers

This visualization immediately reveals the data's center, spread, skewness, and potential outliers.

Why 1.5 and 3.0?

Tukey's choice of 1.5 and 3.0 as multipliers was deliberate, though somewhat arbitrary. The values were chosen to have sensible properties under normality:

Under the Normal distribution:

$Q_1 = \mu - 0.6745\sigma$
$Q_3 = \mu + 0.6745\sigma$
$\text{IQR} = 1.349\sigma$

Therefore, the inner fence is approximately: $$Q_3 + 1.5 \times \text{IQR} \approx \mu + 0.6745\sigma + 1.5(1.349\sigma) \approx \mu + 2.698\sigma$$

This is close to the Z-score threshold of 3 standard deviations. The outer fence corresponds to approximately 4.7 standard deviations.

Probability beyond inner fence (Normal): $$P(X > Q_3 + 1.5 \times \text{IQR}) \approx 0.35%$$

Total probability outside both inner fences: $$P(\text{outlier}) \approx 0.7%$$

So under normality, we expect roughly 7 flagged points per 1000—comparable to but slightly more conservative than the 3-sigma rule.

Comparison of Tukey Fences and Z-Score Thresholds (Normal Distribution)
Method	Threshold	Approx. σ-Equivalent	Expected Outliers per 1000
IQR Inner Fence	1.5 × IQR	~2.7σ	~7
IQR Outer Fence	3.0 × IQR	~4.7σ	~0.02
Z-Score	\|z\| > 2.5	2.5σ	~12
Z-Score	\|z\| > 3.0	3.0σ	~3

Algorithm and Implementation

Algorithm: IQR Outlier Detection

Input: Dataset ${x_1, x_2, \ldots, x_n}$, multiplier $k$ (default: 1.5)

Output: Set of outlier indices

1. Sort data to obtain order statistics
2. Compute Q1 (25th percentile)
3. Compute Q3 (75th percentile)
4. Compute IQR = Q3 - Q1
5. Compute fences:
   lower_fence = Q1 - k × IQR
   upper_fence = Q3 + k × IQR
6. For each observation:
   If x < lower_fence OR x > upper_fence:
      flag as outlier
7. Return flagged indices

Computational Complexity:

Sorting: $O(n \log n)$
Quartile computation: $O(1)$ after sorting
Flagging: $O(n)$
Total: $O(n \log n)$

Note: For very large datasets, approximate quantile algorithms (like t-digest) can compute approximate quartiles in $O(n)$ time.

iqr_detection.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
from typing import Tuple, NamedTuple
 
class IQRResult(NamedTuple):
    """Results from IQR-based outlier detection."""
    outlier_mask: np.ndarray
    q1: float
    q3: float
    iqr: float
    lower_fence: float
    upper_fence: float
 
def iqr_outlier_detection(
    data: np.ndarray,
    multiplier: float = 1.5,
    method: str = 'linear'
) -> IQRResult:
    """
    Detect outliers using the IQR (Interquartile Range) method.
    
    Parameters
    ----------
    data : np.ndarray
        1D array of observations
    multiplier : float
        Multiplier for the IQR (default: 1.5 for inner fence)
        Use 3.0 for outer fence (extreme outliers)
    method : str
        Interpolation method for percentile calculation
        Options: 'linear', 'lower', 'higher', 'midpoint', 'nearest'
        
    Returns
    -------
    IQRResult : NamedTuple containing:
        - outlier_mask: Boolean array where True indicates an outlier
        - q1, q3: First and third quartiles
        - iqr: Interquartile range
        - lower_fence, upper_fence: Fence values
    """
    # Compute quartiles
    q1 = np.percentile(data, 25, method=method)
    q3 = np.percentile(data, 75, method=method)
    iqr = q3 - q1
    
    # Compute fences
    lower_fence = q1 - multiplier * iqr
    upper_fence = q3 + multiplier * iqr
    
    # Identify outliers
    outlier_mask = (data < lower_fence) | (data > upper_fence)
    
    return IQRResult(
        outlier_mask=outlier_mask,
        q1=q1,
        q3=q3,
        iqr=iqr,
        lower_fence=lower_fence,
        upper_fence=upper_fence
    )
 
 
def adjusted_boxplot_fences(
    data: np.ndarray,
    multiplier: float = 1.5
) -> Tuple[float, float]:
    """
    Compute adjusted fences for skewed distributions using the 
    medcouple-based adjustment (Hubert & Vandervieren, 2008).
    
    For skewed data, the standard IQR method can be too 
    aggressive on one tail and too lenient on the other.
    """
    from scipy.stats import skew
    
    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    
    # Compute medcouple (robust skewness measure)
    # Simplified approximation using skewness
    mc = skew(data) * 0.1  # Rough approximation
    
    if mc >= 0:
        lower_fence = q1 - multiplier * np.exp(-4 * mc) * iqr
        upper_fence = q3 + multiplier * np.exp(3 * mc) * iqr
    else:
        lower_fence = q1 - multiplier * np.exp(-3 * mc) * iqr
        upper_fence = q3 + multiplier * np.exp(4 * mc) * iqr
    
    return lower_fence, upper_fence
 
 
# Example usage
np.random.seed(42)
 
# Generate normal data with outliers
normal_data = np.random.normal(50, 10, 1000)
outliers = np.array([5, 10, 95, 100, 150])  # Obvious outliers
data = np.concatenate([normal_data, outliers])
 
# Standard IQR detection
result = iqr_outlier_detection(data, multiplier=1.5)
 
print(f"Q1: {result.q1:.2f}, Q3: {result.q3:.2f}")
print(f"IQR: {result.iqr:.2f}")
print(f"Fences: [{result.lower_fence:.2f}, {result.upper_fence:.2f}]")
print(f"Outliers detected: {np.sum(result.outlier_mask)}")
print(f"Outlier values: {data[result.outlier_mask]}")

Choosing the Right Multiplier

While Tukey's 1.5 multiplier is the default, different applications may warrant different choices. The multiplier controls the sensitivity-specificity tradeoff just as the threshold does for Z-scores.

Common Multiplier Choices

k = 1.5 (Standard/Inner Fence)

Default for exploratory data analysis
Flags mild outliers for investigation
Under normality: ~0.7% of points flagged

k = 2.0 (Moderate)

Middle ground between inner and outer
Common when 1.5 is too aggressive
Under normality: ~0.15% flagged

k = 3.0 (Outer Fence)

Extreme outliers only
Very few false positives
Under normality: <0.003% flagged

k = 2.2 (Bowley/Excel)

Used in some software packages
Slightly more conservative than 1.5

Multiplier Selection Guidelines
Scenario	Recommended k	Rationale
Exploratory analysis	1.5	Identify all potentially unusual points for investigation
Quality control	2.0 - 2.5	Balance between catching defects and over-rejection
Automated anomaly detection	3.0	Minimize false alarms in production systems
Skewed distributions	1.5 with adjusted fences	Standard fences are asymmetric for skewed data
Heavy-tailed distributions	2.5 - 3.0	More extreme values are 'normal' for heavy tails
Small samples (n < 30)	2.0+	More conservative to avoid overdetection

Empirical Calibration

When labeled anomaly data is available, treat k as a hyperparameter and optimize it using cross-validation. Plot precision and recall as a function of k to find the optimal operating point for your specific use case.

IQR vs. Z-Score: A Comparative Analysis

Understanding when to use IQR versus Z-score methods requires comparing their fundamental properties:

Robustness Comparison

Property	Z-Score	IQR
Breakdown Point	0%	25%
Masking Resistance	Poor	Good
Swamping Resistance	Poor	Good
Sensitivity to Outliers	High	Low

The IQR method can tolerate up to 25% contamination before its estimates become unreliable. The Z-score method can be arbitrarily corrupted by a single extreme observation.

Advantages of IQR

•No distributional assumptions — works for any continuous data
•Robust to outliers — detection unaffected by the outliers themselves
•Intuitive visualization — box plots make interpretation easy
•Works for skewed data — with appropriate adjustments
•Simple to explain — 'outside the middle 50% by a lot'

Disadvantages of IQR

•Less statistically efficient — uses only ranks, not magnitudes
•Higher computational cost — requires sorting O(n log n)
•Discrete data issues — many ties can create IQR=0
•Less power under normality — Z-score is optimal for normal data
•Arbitrary multiplier — 1.5 has no deep theoretical justification

When to Use Which?

Use Z-Score When:

Data is known/verified to be normally distributed
Contamination is minimal (<1%)
You need precise probabilistic interpretations
Statistical efficiency matters (small samples)

Use IQR When:

Distribution is unknown or non-normal
Contamination level is unknown
Robustness is paramount
Data has heavy tails or skewness
You're doing exploratory analysis

The Practitioner's Default

In practice, many data scientists use the IQR method as their default for initial outlier screening because its assumptions are weaker. If you know nothing about your data's distribution, IQR is the safer choice. You can always refine with parametric methods after understanding your data better.

Handling Special Cases

Real-world data often presents challenges that require modifications to the basic IQR method.

Case 1: IQR = 0 (Constant or Discrete Data)

When more than 50% of observations share the same value, Q1 = Q3, and IQR = 0. This makes the fence computation degenerate (all non-median values become outliers).

Solutions:

Use the entire range: Fall back to median ± k × (max - min)
Use other percentiles: Try 10th and 90th percentiles instead
Report the issue: IQR = 0 usually indicates the data needs reconsideration

Case 2: Skewed Distributions

Standard IQR fences are symmetric around the median, but skewed distributions naturally have asymmetric tails. Values on the longer-tail side are flagged too aggressively.

Adjusted Boxplot (Hubert & Vandervieren, 2008):

Uses the medcouple (MC), a robust skewness measure ranging from -1 to +1:

For MC ≥ 0 (right-skewed):

Lower fence: $Q_1 - 1.5 \cdot e^{-4 \cdot MC} \cdot \text{IQR}$
Upper fence: $Q_3 + 1.5 \cdot e^{3 \cdot MC} \cdot \text{IQR}$

This widens the fence on the skewed tail and narrows it on the short tail.

Case 3: Small Sample Sizes

With small samples (n < 15-20), the quartile estimates are highly variable, leading to unreliable outlier detection.

Recommendations:

Increase multiplier k to 2.0-2.5 to reduce false positives
Report uncertainty: With small samples, any outlier flagging is tentative
Consider bootstrap: Compute confidence intervals for quartile estimates

Case 4: Grouped or Clustered Data

If data comes from multiple subpopulations, global IQR may not be meaningful. A value might be an outlier within its group but not globally (or vice versa).

Solutions:

Apply IQR within groups: Stratified outlier detection
Use mixture models: First identify clusters, then detect outliers
Domain knowledge: Determine if global or local detection is appropriate

The Edge Inflation Problem

With highly uniform data (values clustered tightly), the IQR can be very small, causing the fences to be extremely narrow. This leads to legitimate slight variations being flagged as outliers. Always sanity-check your fence values against domain knowledge.

Extensions and Variants

Several extensions of the basic IQR method address its limitations:

MAD-Based Detection

The Median Absolute Deviation (MAD) provides an even more robust measure of spread:

$$\text{MAD} = \text{median}(|x_i - \text{median}(x)|)$$

The modified Z-score uses MAD: $$M_i = \frac{0.6745(x_i - \text{median}(x))}{\text{MAD}}$$

The constant 0.6745 makes the MAD consistent with the standard deviation for normal data: $$\sigma \approx 1.4826 \cdot \text{MAD}$$

Points where $|M_i| > 3.5$ are flagged as outliers.

Advantage: MAD has 50% breakdown point (vs. 25% for IQR).

Semi-Interquartile Range

For symmetric distributions, the semi-interquartile range (SIQR) is sometimes used: $$\text{SIQR} = \frac{Q_3 - Q_1}{2}$$

This can be substituted for standard deviation in various applications.

Generalized Percentile Method

Instead of fixed quartiles, use any percentiles p and 100-p:

5th and 95th percentiles for wider intervals
1st and 99th percentiles for very conservative detection

The multiplier is adjusted accordingly to maintain similar detection rates.

mad_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
 
def mad_outlier_detection(data: np.ndarray, threshold: float = 3.5):
    """
    Detect outliers using the Median Absolute Deviation (MAD) method.
    More robust than both Z-score and IQR methods.
    
    Parameters
    ----------
    data : np.ndarray
        1D array of observations
    threshold : float
        Modified Z-score threshold (default: 3.5)
        
    Returns
    -------
    outlier_mask : np.ndarray
        Boolean array where True indicates an outlier
    modified_z : np.ndarray
        Modified Z-scores for each observation
    """
    median = np.median(data)
    mad = np.median(np.abs(data - median))
    
    # Avoid division by zero
    if mad == 0:
        # Fall back to mean absolute deviation
        mad = np.mean(np.abs(data - median))
        if mad == 0:
            return np.zeros(len(data), dtype=bool), np.zeros(len(data))
    
    # Compute modified Z-score
    # 0.6745 is the 75th percentile of the standard normal
    modified_z = 0.6745 * (data - median) / mad
    
    outlier_mask = np.abs(modified_z) > threshold
    
    return outlier_mask, modified_z
 
 
# Compare with IQR
np.random.seed(42)
normal_data = np.random.normal(50, 10, 100)
contaminated = np.array([200, 250, 300])  # Severe outliers
data = np.concatenate([normal_data, contaminated])
 
# MAD-based detection
mad_mask, mod_z = mad_outlier_detection(data)
 
# IQR-based detection
from scipy import stats
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1
iqr_mask = (data < q1 - 1.5*iqr) | (data > q3 + 1.5*iqr)
 
print(f"MAD outliers: {np.sum(mad_mask)}")
print(f"IQR outliers: {np.sum(iqr_mask)}")
print(f"MAD-detected values: {data[mad_mask]}")

Summary and Key Takeaways

Key Takeaways

•The IQR method is non-parametric — it makes no assumptions about the data's distribution, unlike Z-score.
•Robustness comes from order statistics — quartiles depend on rank positions, not value magnitudes, providing resistance to outlier contamination.
•Tukey fences (k=1.5, k=3.0) define mild and extreme outliers; these correspond roughly to 2.7σ and 4.7σ under normality.
•25% breakdown point means the IQR method can tolerate substantial contamination—vastly superior to Z-score's 0%.
•Special cases require attention — zero IQR, skewed data, small samples, and clustered data all need modified approaches.
•MAD provides even greater robustness — 50% breakdown point, useful when contamination is severe.

What's Next

The IQR method provides robustness but no formal statistical significance testing. In the next page, we'll explore Grubbs' Test—a formal hypothesis testing procedure for detecting outliers that provides p-values and rigorous statistical control, though at the cost of stronger assumptions.

Page Complete

You now understand the IQR method's foundations in order statistics, Tukey's fence construction, how to select multipliers, and when to prefer IQR over Z-score methods. This robust technique forms an essential part of any anomaly detection toolkit.

2 / 5

Loading learning content...

Machine LearningAnomaly Detection

Statistical Methods for Anomaly Detection

LevelIntermediate

Duration75 mins

TopicAnomaly Detection

2 / 5

IQR Method

Robust Detection Without Distributional Assumptions

What You Will Learn

Order Statistics and Quartiles

The IQR method is built on order statistics—the values obtained by sorting a dataset. Understanding this foundation is essential for grasping why the method achieves robustness.

Order Statistics Defined

Given observations ${x_1, x_2, \ldots, x_n}$, the order statistics are the sorted values:

$$x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$$

Where $x_{(k)}$ denotes the $k$-th smallest value. These provide a natural description of the data's distribution without parametric assumptions.

Quantiles and Percentiles

The $p$-th quantile (or $100p$-th percentile) is the value below which a proportion $p$ of the data falls. For continuous distributions:

$$Q_p = F^{-1}(p)$$

Where $F^{-1}$ is the inverse cumulative distribution function.

For sample data, various interpolation methods exist. The most common (linear interpolation) for quantile $p$ is:

$$\hat{Q}p = x{(\lfloor h \rfloor)} + (h - \lfloor h \rfloor)(x_{(\lfloor h \rfloor + 1)} - x_{(\lfloor h \rfloor)})$$

Where $h = (n-1)p + 1$.

The Three Quartiles

Quartiles divide sorted data into four equal parts:

First Quartile (Q1): The 25th percentile — 25% of data falls below this value
Second Quartile (Q2): The 50th percentile — the median
Third Quartile (Q3): The 75th percentile — 75% of data falls below this value

The Interquartile Range (IQR) is simply:

$$\text{IQR} = Q_3 - Q_1$$

This measures the spread of the middle 50% of data—the range containing the central bulk, excluding the tails.

Why IQR is Robust

The key insight is that Q1, Q3, and IQR depend only on the positions of certain data points in the sorted order, not on their actual values at the extremes:

If observations in the top 25% become more extreme, Q3 doesn't change (much)
If observations in the bottom 25% become more extreme, Q1 doesn't change (much)
Replacing 10% of data with arbitrarily extreme values has minimal impact on quartiles

This property—called breakdown resistance—is precisely what the mean and standard deviation lack.

Breakdown Point of the Median and IQR

Tukey Fences: The Classic Construction

John Tukey introduced a simple rule for identifying outliers using the IQR. The construction defines 'fences' beyond which observations are deemed unusual.

Inner and Outer Fences

Inner Fences (Mild Outliers): $$\text{Lower Inner Fence} = Q_1 - 1.5 \times \text{IQR}$$ $$\text{Upper Inner Fence} = Q_3 + 1.5 \times \text{IQR}$$

Outer Fences (Extreme Outliers): $$\text{Lower Outer Fence} = Q_1 - 3.0 \times \text{IQR}$$ $$\text{Upper Outer Fence} = Q_3 + 3.0 \times \text{IQR}$$

Observations beyond inner fences are mild outliers. Observations beyond outer fences are extreme outliers.

The Box Plot Visualization

The IQR method is intimately connected to the box plot (box-and-whisker diagram):

The box spans from Q1 to Q3 (covering the IQR)
The line inside the box marks the median (Q2)
Whiskers extend to the most extreme points within 1.5×IQR of the box
Points beyond the whiskers are plotted individually as potential outliers

This visualization immediately reveals the data's center, spread, skewness, and potential outliers.

Why 1.5 and 3.0?

Tukey's choice of 1.5 and 3.0 as multipliers was deliberate, though somewhat arbitrary. The values were chosen to have sensible properties under normality:

Under the Normal distribution:

$Q_1 = \mu - 0.6745\sigma$
$Q_3 = \mu + 0.6745\sigma$
$\text{IQR} = 1.349\sigma$

Therefore, the inner fence is approximately: $$Q_3 + 1.5 \times \text{IQR} \approx \mu + 0.6745\sigma + 1.5(1.349\sigma) \approx \mu + 2.698\sigma$$

This is close to the Z-score threshold of 3 standard deviations. The outer fence corresponds to approximately 4.7 standard deviations.

Probability beyond inner fence (Normal): $$P(X > Q_3 + 1.5 \times \text{IQR}) \approx 0.35%$$

Total probability outside both inner fences: $$P(\text{outlier}) \approx 0.7%$$

So under normality, we expect roughly 7 flagged points per 1000—comparable to but slightly more conservative than the 3-sigma rule.

Comparison of Tukey Fences and Z-Score Thresholds (Normal Distribution)
Method	Threshold	Approx. σ-Equivalent	Expected Outliers per 1000
IQR Inner Fence	1.5 × IQR	~2.7σ	~7
IQR Outer Fence	3.0 × IQR	~4.7σ	~0.02
Z-Score	\|z\| > 2.5	2.5σ	~12
Z-Score	\|z\| > 3.0	3.0σ	~3

Algorithm and Implementation

Algorithm: IQR Outlier Detection

Input: Dataset ${x_1, x_2, \ldots, x_n}$, multiplier $k$ (default: 1.5)

Output: Set of outlier indices

1. Sort data to obtain order statistics
2. Compute Q1 (25th percentile)
3. Compute Q3 (75th percentile)
4. Compute IQR = Q3 - Q1
5. Compute fences:
   lower_fence = Q1 - k × IQR
   upper_fence = Q3 + k × IQR
6. For each observation:
   If x < lower_fence OR x > upper_fence:
      flag as outlier
7. Return flagged indices

Computational Complexity:

Sorting: $O(n \log n)$
Quartile computation: $O(1)$ after sorting
Flagging: $O(n)$
Total: $O(n \log n)$

Note: For very large datasets, approximate quantile algorithms (like t-digest) can compute approximate quartiles in $O(n)$ time.

iqr_detection.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
from typing import Tuple, NamedTuple
 
class IQRResult(NamedTuple):
    """Results from IQR-based outlier detection."""
    outlier_mask: np.ndarray
    q1: float
    q3: float
    iqr: float
    lower_fence: float
    upper_fence: float
 
def iqr_outlier_detection(
    data: np.ndarray,
    multiplier: float = 1.5,
    method: str = 'linear'
) -> IQRResult:
    """
    Detect outliers using the IQR (Interquartile Range) method.
    
    Parameters
    ----------
    data : np.ndarray
        1D array of observations
    multiplier : float
        Multiplier for the IQR (default: 1.5 for inner fence)
        Use 3.0 for outer fence (extreme outliers)
    method : str
        Interpolation method for percentile calculation
        Options: 'linear', 'lower', 'higher', 'midpoint', 'nearest'
        
    Returns
    -------
    IQRResult : NamedTuple containing:
        - outlier_mask: Boolean array where True indicates an outlier
        - q1, q3: First and third quartiles
        - iqr: Interquartile range
        - lower_fence, upper_fence: Fence values
    """
    # Compute quartiles
    q1 = np.percentile(data, 25, method=method)
    q3 = np.percentile(data, 75, method=method)
    iqr = q3 - q1
    
    # Compute fences
    lower_fence = q1 - multiplier * iqr
    upper_fence = q3 + multiplier * iqr
    
    # Identify outliers
    outlier_mask = (data < lower_fence) | (data > upper_fence)
    
    return IQRResult(
        outlier_mask=outlier_mask,
        q1=q1,
        q3=q3,
        iqr=iqr,
        lower_fence=lower_fence,
        upper_fence=upper_fence
    )
 
 
def adjusted_boxplot_fences(
    data: np.ndarray,
    multiplier: float = 1.5
) -> Tuple[float, float]:
    """
    Compute adjusted fences for skewed distributions using the 
    medcouple-based adjustment (Hubert & Vandervieren, 2008).
    
    For skewed data, the standard IQR method can be too 
    aggressive on one tail and too lenient on the other.
    """
    from scipy.stats import skew
    
    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    
    # Compute medcouple (robust skewness measure)
    # Simplified approximation using skewness
    mc = skew(data) * 0.1  # Rough approximation
    
    if mc >= 0:
        lower_fence = q1 - multiplier * np.exp(-4 * mc) * iqr
        upper_fence = q3 + multiplier * np.exp(3 * mc) * iqr
    else:
        lower_fence = q1 - multiplier * np.exp(-3 * mc) * iqr
        upper_fence = q3 + multiplier * np.exp(4 * mc) * iqr
    
    return lower_fence, upper_fence
 
 
# Example usage
np.random.seed(42)
 
# Generate normal data with outliers
normal_data = np.random.normal(50, 10, 1000)
outliers = np.array([5, 10, 95, 100, 150])  # Obvious outliers
data = np.concatenate([normal_data, outliers])
 
# Standard IQR detection
result = iqr_outlier_detection(data, multiplier=1.5)
 
print(f"Q1: {result.q1:.2f}, Q3: {result.q3:.2f}")
print(f"IQR: {result.iqr:.2f}")
print(f"Fences: [{result.lower_fence:.2f}, {result.upper_fence:.2f}]")
print(f"Outliers detected: {np.sum(result.outlier_mask)}")
print(f"Outlier values: {data[result.outlier_mask]}")

Choosing the Right Multiplier

Common Multiplier Choices

k = 1.5 (Standard/Inner Fence)

Default for exploratory data analysis
Flags mild outliers for investigation
Under normality: ~0.7% of points flagged

k = 2.0 (Moderate)

Middle ground between inner and outer
Common when 1.5 is too aggressive
Under normality: ~0.15% flagged

k = 3.0 (Outer Fence)

Extreme outliers only
Very few false positives
Under normality: <0.003% flagged

k = 2.2 (Bowley/Excel)

Used in some software packages
Slightly more conservative than 1.5

Multiplier Selection Guidelines
Scenario	Recommended k	Rationale
Exploratory analysis	1.5	Identify all potentially unusual points for investigation
Quality control	2.0 - 2.5	Balance between catching defects and over-rejection
Automated anomaly detection	3.0	Minimize false alarms in production systems
Skewed distributions	1.5 with adjusted fences	Standard fences are asymmetric for skewed data
Heavy-tailed distributions	2.5 - 3.0	More extreme values are 'normal' for heavy tails
Small samples (n < 30)	2.0+	More conservative to avoid overdetection

Empirical Calibration

IQR vs. Z-Score: A Comparative Analysis

Understanding when to use IQR versus Z-score methods requires comparing their fundamental properties:

Robustness Comparison

Property	Z-Score	IQR
Breakdown Point	0%	25%
Masking Resistance	Poor	Good
Swamping Resistance	Poor	Good
Sensitivity to Outliers	High	Low

The IQR method can tolerate up to 25% contamination before its estimates become unreliable. The Z-score method can be arbitrarily corrupted by a single extreme observation.

Advantages of IQR

•No distributional assumptions — works for any continuous data
•Robust to outliers — detection unaffected by the outliers themselves
•Intuitive visualization — box plots make interpretation easy
•Works for skewed data — with appropriate adjustments
•Simple to explain — 'outside the middle 50% by a lot'

Disadvantages of IQR

•Less statistically efficient — uses only ranks, not magnitudes
•Higher computational cost — requires sorting O(n log n)
•Discrete data issues — many ties can create IQR=0
•Less power under normality — Z-score is optimal for normal data
•Arbitrary multiplier — 1.5 has no deep theoretical justification

When to Use Which?

Use Z-Score When:

Data is known/verified to be normally distributed
Contamination is minimal (<1%)
You need precise probabilistic interpretations
Statistical efficiency matters (small samples)

Use IQR When:

Distribution is unknown or non-normal
Contamination level is unknown
Robustness is paramount
Data has heavy tails or skewness
You're doing exploratory analysis

The Practitioner's Default

Handling Special Cases

Real-world data often presents challenges that require modifications to the basic IQR method.

Case 1: IQR = 0 (Constant or Discrete Data)

When more than 50% of observations share the same value, Q1 = Q3, and IQR = 0. This makes the fence computation degenerate (all non-median values become outliers).

Solutions:

Use the entire range: Fall back to median ± k × (max - min)
Use other percentiles: Try 10th and 90th percentiles instead
Report the issue: IQR = 0 usually indicates the data needs reconsideration

Case 2: Skewed Distributions

Standard IQR fences are symmetric around the median, but skewed distributions naturally have asymmetric tails. Values on the longer-tail side are flagged too aggressively.

Adjusted Boxplot (Hubert & Vandervieren, 2008):

Uses the medcouple (MC), a robust skewness measure ranging from -1 to +1:

For MC ≥ 0 (right-skewed):

Lower fence: $Q_1 - 1.5 \cdot e^{-4 \cdot MC} \cdot \text{IQR}$
Upper fence: $Q_3 + 1.5 \cdot e^{3 \cdot MC} \cdot \text{IQR}$

This widens the fence on the skewed tail and narrows it on the short tail.

Case 3: Small Sample Sizes

With small samples (n < 15-20), the quartile estimates are highly variable, leading to unreliable outlier detection.

Recommendations:

Increase multiplier k to 2.0-2.5 to reduce false positives
Report uncertainty: With small samples, any outlier flagging is tentative
Consider bootstrap: Compute confidence intervals for quartile estimates

Case 4: Grouped or Clustered Data

If data comes from multiple subpopulations, global IQR may not be meaningful. A value might be an outlier within its group but not globally (or vice versa).

Solutions:

Apply IQR within groups: Stratified outlier detection
Use mixture models: First identify clusters, then detect outliers
Domain knowledge: Determine if global or local detection is appropriate

The Edge Inflation Problem

Extensions and Variants

Several extensions of the basic IQR method address its limitations:

MAD-Based Detection

The Median Absolute Deviation (MAD) provides an even more robust measure of spread:

$$\text{MAD} = \text{median}(|x_i - \text{median}(x)|)$$

The modified Z-score uses MAD: $$M_i = \frac{0.6745(x_i - \text{median}(x))}{\text{MAD}}$$

The constant 0.6745 makes the MAD consistent with the standard deviation for normal data: $$\sigma \approx 1.4826 \cdot \text{MAD}$$

Points where $|M_i| > 3.5$ are flagged as outliers.

Advantage: MAD has 50% breakdown point (vs. 25% for IQR).

Semi-Interquartile Range

For symmetric distributions, the semi-interquartile range (SIQR) is sometimes used: $$\text{SIQR} = \frac{Q_3 - Q_1}{2}$$

This can be substituted for standard deviation in various applications.

Generalized Percentile Method

Instead of fixed quartiles, use any percentiles p and 100-p:

5th and 95th percentiles for wider intervals
1st and 99th percentiles for very conservative detection

The multiplier is adjusted accordingly to maintain similar detection rates.

mad_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import numpy as np
 
def mad_outlier_detection(data: np.ndarray, threshold: float = 3.5):
    """
    Detect outliers using the Median Absolute Deviation (MAD) method.
    More robust than both Z-score and IQR methods.
    
    Parameters
    ----------
    data : np.ndarray
        1D array of observations
    threshold : float
        Modified Z-score threshold (default: 3.5)
        
    Returns
    -------
    outlier_mask : np.ndarray
        Boolean array where True indicates an outlier
    modified_z : np.ndarray
        Modified Z-scores for each observation
    """
    median = np.median(data)
    mad = np.median(np.abs(data - median))
    
    # Avoid division by zero
    if mad == 0:
        # Fall back to mean absolute deviation
        mad = np.mean(np.abs(data - median))
        if mad == 0:
            return np.zeros(len(data), dtype=bool), np.zeros(len(data))
    
    # Compute modified Z-score
    # 0.6745 is the 75th percentile of the standard normal
    modified_z = 0.6745 * (data - median) / mad
    
    outlier_mask = np.abs(modified_z) > threshold
    
    return outlier_mask, modified_z
 
 
# Compare with IQR
np.random.seed(42)
normal_data = np.random.normal(50, 10, 100)
contaminated = np.array([200, 250, 300])  # Severe outliers
data = np.concatenate([normal_data, contaminated])
 
# MAD-based detection
mad_mask, mod_z = mad_outlier_detection(data)
 
# IQR-based detection
from scipy import stats
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1
iqr_mask = (data < q1 - 1.5*iqr) | (data > q3 + 1.5*iqr)
 
print(f"MAD outliers: {np.sum(mad_mask)}")
print(f"IQR outliers: {np.sum(iqr_mask)}")
print(f"MAD-detected values: {data[mad_mask]}")

Summary and Key Takeaways

Key Takeaways

•The IQR method is non-parametric — it makes no assumptions about the data's distribution, unlike Z-score.
•Robustness comes from order statistics — quartiles depend on rank positions, not value magnitudes, providing resistance to outlier contamination.
•Tukey fences (k=1.5, k=3.0) define mild and extreme outliers; these correspond roughly to 2.7σ and 4.7σ under normality.
•25% breakdown point means the IQR method can tolerate substantial contamination—vastly superior to Z-score's 0%.
•Special cases require attention — zero IQR, skewed data, small samples, and clustered data all need modified approaches.
•MAD provides even greater robustness — 50% breakdown point, useful when contamination is severe.

What's Next

Page Complete

2 / 5