Loading learning content...
The Z-score method, despite its elegance, suffers from a critical flaw: its reliance on the mean and standard deviation—statistics that are notoriously sensitive to the very outliers we're trying to detect. When John Tukey developed the Interquartile Range (IQR) method in his seminal 1977 work on exploratory data analysis, he sought something more robust: a technique grounded in order statistics that would remain reliable even when data was contaminated.
The IQR method embodies a fundamental principle in robust statistics: replace non-robust estimators (mean, standard deviation) with robust alternatives (median, interquartile range). This single conceptual shift transforms outlier detection from a fragile procedure into one that can withstand substantial data contamination.
By the end of this page, you will understand quartiles and their statistical properties, how the IQR captures spread robustly, the construction and theory behind Tukey fences, how to select appropriate multipliers for different scenarios, and when the IQR method excels or fails compared to parametric alternatives.
The IQR method is built on order statistics—the values obtained by sorting a dataset. Understanding this foundation is essential for grasping why the method achieves robustness.
Given observations ${x_1, x_2, \ldots, x_n}$, the order statistics are the sorted values:
$$x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$$
Where $x_{(k)}$ denotes the $k$-th smallest value. These provide a natural description of the data's distribution without parametric assumptions.
The $p$-th quantile (or $100p$-th percentile) is the value below which a proportion $p$ of the data falls. For continuous distributions:
$$Q_p = F^{-1}(p)$$
Where $F^{-1}$ is the inverse cumulative distribution function.
For sample data, various interpolation methods exist. The most common (linear interpolation) for quantile $p$ is:
$$\hat{Q}p = x{(\lfloor h \rfloor)} + (h - \lfloor h \rfloor)(x_{(\lfloor h \rfloor + 1)} - x_{(\lfloor h \rfloor)})$$
Where $h = (n-1)p + 1$.
Quartiles divide sorted data into four equal parts:
The Interquartile Range (IQR) is simply:
$$\text{IQR} = Q_3 - Q_1$$
This measures the spread of the middle 50% of data—the range containing the central bulk, excluding the tails.
The key insight is that Q1, Q3, and IQR depend only on the positions of certain data points in the sorted order, not on their actual values at the extremes:
This property—called breakdown resistance—is precisely what the mean and standard deviation lack.
The median has a breakdown point of 50%—you must corrupt half the data before the median becomes arbitrarily wrong. The IQR has a breakdown point of 25%—corrupt more than a quarter, and the IQR can fail. Compare this to 0% for the mean and standard deviation. This quantifies exactly why order statistics are more robust.
John Tukey introduced a simple rule for identifying outliers using the IQR. The construction defines 'fences' beyond which observations are deemed unusual.
Inner Fences (Mild Outliers): $$\text{Lower Inner Fence} = Q_1 - 1.5 \times \text{IQR}$$ $$\text{Upper Inner Fence} = Q_3 + 1.5 \times \text{IQR}$$
Outer Fences (Extreme Outliers): $$\text{Lower Outer Fence} = Q_1 - 3.0 \times \text{IQR}$$ $$\text{Upper Outer Fence} = Q_3 + 3.0 \times \text{IQR}$$
Observations beyond inner fences are mild outliers. Observations beyond outer fences are extreme outliers.
The IQR method is intimately connected to the box plot (box-and-whisker diagram):
This visualization immediately reveals the data's center, spread, skewness, and potential outliers.
Tukey's choice of 1.5 and 3.0 as multipliers was deliberate, though somewhat arbitrary. The values were chosen to have sensible properties under normality:
Under the Normal distribution:
Therefore, the inner fence is approximately: $$Q_3 + 1.5 \times \text{IQR} \approx \mu + 0.6745\sigma + 1.5(1.349\sigma) \approx \mu + 2.698\sigma$$
This is close to the Z-score threshold of 3 standard deviations. The outer fence corresponds to approximately 4.7 standard deviations.
Probability beyond inner fence (Normal): $$P(X > Q_3 + 1.5 \times \text{IQR}) \approx 0.35%$$
Total probability outside both inner fences: $$P(\text{outlier}) \approx 0.7%$$
So under normality, we expect roughly 7 flagged points per 1000—comparable to but slightly more conservative than the 3-sigma rule.
| Method | Threshold | Approx. σ-Equivalent | Expected Outliers per 1000 |
|---|---|---|---|
| IQR Inner Fence | 1.5 × IQR | ~2.7σ | ~7 |
| IQR Outer Fence | 3.0 × IQR | ~4.7σ | ~0.02 |
| Z-Score | |z| > 2.5 | 2.5σ | ~12 |
| Z-Score | |z| > 3.0 | 3.0σ | ~3 |
Input: Dataset ${x_1, x_2, \ldots, x_n}$, multiplier $k$ (default: 1.5)
Output: Set of outlier indices
1. Sort data to obtain order statistics
2. Compute Q1 (25th percentile)
3. Compute Q3 (75th percentile)
4. Compute IQR = Q3 - Q1
5. Compute fences:
lower_fence = Q1 - k × IQR
upper_fence = Q3 + k × IQR
6. For each observation:
If x < lower_fence OR x > upper_fence:
flag as outlier
7. Return flagged indices
Computational Complexity:
Note: For very large datasets, approximate quantile algorithms (like t-digest) can compute approximate quartiles in $O(n)$ time.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import numpy as npfrom typing import Tuple, NamedTuple class IQRResult(NamedTuple): """Results from IQR-based outlier detection.""" outlier_mask: np.ndarray q1: float q3: float iqr: float lower_fence: float upper_fence: float def iqr_outlier_detection( data: np.ndarray, multiplier: float = 1.5, method: str = 'linear') -> IQRResult: """ Detect outliers using the IQR (Interquartile Range) method. Parameters ---------- data : np.ndarray 1D array of observations multiplier : float Multiplier for the IQR (default: 1.5 for inner fence) Use 3.0 for outer fence (extreme outliers) method : str Interpolation method for percentile calculation Options: 'linear', 'lower', 'higher', 'midpoint', 'nearest' Returns ------- IQRResult : NamedTuple containing: - outlier_mask: Boolean array where True indicates an outlier - q1, q3: First and third quartiles - iqr: Interquartile range - lower_fence, upper_fence: Fence values """ # Compute quartiles q1 = np.percentile(data, 25, method=method) q3 = np.percentile(data, 75, method=method) iqr = q3 - q1 # Compute fences lower_fence = q1 - multiplier * iqr upper_fence = q3 + multiplier * iqr # Identify outliers outlier_mask = (data < lower_fence) | (data > upper_fence) return IQRResult( outlier_mask=outlier_mask, q1=q1, q3=q3, iqr=iqr, lower_fence=lower_fence, upper_fence=upper_fence ) def adjusted_boxplot_fences( data: np.ndarray, multiplier: float = 1.5) -> Tuple[float, float]: """ Compute adjusted fences for skewed distributions using the medcouple-based adjustment (Hubert & Vandervieren, 2008). For skewed data, the standard IQR method can be too aggressive on one tail and too lenient on the other. """ from scipy.stats import skew q1 = np.percentile(data, 25) q3 = np.percentile(data, 75) iqr = q3 - q1 # Compute medcouple (robust skewness measure) # Simplified approximation using skewness mc = skew(data) * 0.1 # Rough approximation if mc >= 0: lower_fence = q1 - multiplier * np.exp(-4 * mc) * iqr upper_fence = q3 + multiplier * np.exp(3 * mc) * iqr else: lower_fence = q1 - multiplier * np.exp(-3 * mc) * iqr upper_fence = q3 + multiplier * np.exp(4 * mc) * iqr return lower_fence, upper_fence # Example usagenp.random.seed(42) # Generate normal data with outliersnormal_data = np.random.normal(50, 10, 1000)outliers = np.array([5, 10, 95, 100, 150]) # Obvious outliersdata = np.concatenate([normal_data, outliers]) # Standard IQR detectionresult = iqr_outlier_detection(data, multiplier=1.5) print(f"Q1: {result.q1:.2f}, Q3: {result.q3:.2f}")print(f"IQR: {result.iqr:.2f}")print(f"Fences: [{result.lower_fence:.2f}, {result.upper_fence:.2f}]")print(f"Outliers detected: {np.sum(result.outlier_mask)}")print(f"Outlier values: {data[result.outlier_mask]}")While Tukey's 1.5 multiplier is the default, different applications may warrant different choices. The multiplier controls the sensitivity-specificity tradeoff just as the threshold does for Z-scores.
k = 1.5 (Standard/Inner Fence)
k = 2.0 (Moderate)
k = 3.0 (Outer Fence)
k = 2.2 (Bowley/Excel)
| Scenario | Recommended k | Rationale |
|---|---|---|
| Exploratory analysis | 1.5 | Identify all potentially unusual points for investigation |
| Quality control | 2.0 - 2.5 | Balance between catching defects and over-rejection |
| Automated anomaly detection | 3.0 | Minimize false alarms in production systems |
| Skewed distributions | 1.5 with adjusted fences | Standard fences are asymmetric for skewed data |
| Heavy-tailed distributions | 2.5 - 3.0 | More extreme values are 'normal' for heavy tails |
| Small samples (n < 30) | 2.0+ | More conservative to avoid overdetection |
When labeled anomaly data is available, treat k as a hyperparameter and optimize it using cross-validation. Plot precision and recall as a function of k to find the optimal operating point for your specific use case.
Understanding when to use IQR versus Z-score methods requires comparing their fundamental properties:
| Property | Z-Score | IQR |
|---|---|---|
| Breakdown Point | 0% | 25% |
| Masking Resistance | Poor | Good |
| Swamping Resistance | Poor | Good |
| Sensitivity to Outliers | High | Low |
The IQR method can tolerate up to 25% contamination before its estimates become unreliable. The Z-score method can be arbitrarily corrupted by a single extreme observation.
Use Z-Score When:
Use IQR When:
In practice, many data scientists use the IQR method as their default for initial outlier screening because its assumptions are weaker. If you know nothing about your data's distribution, IQR is the safer choice. You can always refine with parametric methods after understanding your data better.
Real-world data often presents challenges that require modifications to the basic IQR method.
When more than 50% of observations share the same value, Q1 = Q3, and IQR = 0. This makes the fence computation degenerate (all non-median values become outliers).
Solutions:
Standard IQR fences are symmetric around the median, but skewed distributions naturally have asymmetric tails. Values on the longer-tail side are flagged too aggressively.
Adjusted Boxplot (Hubert & Vandervieren, 2008):
Uses the medcouple (MC), a robust skewness measure ranging from -1 to +1:
For MC ≥ 0 (right-skewed):
This widens the fence on the skewed tail and narrows it on the short tail.
With small samples (n < 15-20), the quartile estimates are highly variable, leading to unreliable outlier detection.
Recommendations:
If data comes from multiple subpopulations, global IQR may not be meaningful. A value might be an outlier within its group but not globally (or vice versa).
Solutions:
With highly uniform data (values clustered tightly), the IQR can be very small, causing the fences to be extremely narrow. This leads to legitimate slight variations being flagged as outliers. Always sanity-check your fence values against domain knowledge.
Several extensions of the basic IQR method address its limitations:
The Median Absolute Deviation (MAD) provides an even more robust measure of spread:
$$\text{MAD} = \text{median}(|x_i - \text{median}(x)|)$$
The modified Z-score uses MAD: $$M_i = \frac{0.6745(x_i - \text{median}(x))}{\text{MAD}}$$
The constant 0.6745 makes the MAD consistent with the standard deviation for normal data: $$\sigma \approx 1.4826 \cdot \text{MAD}$$
Points where $|M_i| > 3.5$ are flagged as outliers.
Advantage: MAD has 50% breakdown point (vs. 25% for IQR).
For symmetric distributions, the semi-interquartile range (SIQR) is sometimes used: $$\text{SIQR} = \frac{Q_3 - Q_1}{2}$$
This can be substituted for standard deviation in various applications.
Instead of fixed quartiles, use any percentiles p and 100-p:
The multiplier is adjusted accordingly to maintain similar detection rates.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
import numpy as np def mad_outlier_detection(data: np.ndarray, threshold: float = 3.5): """ Detect outliers using the Median Absolute Deviation (MAD) method. More robust than both Z-score and IQR methods. Parameters ---------- data : np.ndarray 1D array of observations threshold : float Modified Z-score threshold (default: 3.5) Returns ------- outlier_mask : np.ndarray Boolean array where True indicates an outlier modified_z : np.ndarray Modified Z-scores for each observation """ median = np.median(data) mad = np.median(np.abs(data - median)) # Avoid division by zero if mad == 0: # Fall back to mean absolute deviation mad = np.mean(np.abs(data - median)) if mad == 0: return np.zeros(len(data), dtype=bool), np.zeros(len(data)) # Compute modified Z-score # 0.6745 is the 75th percentile of the standard normal modified_z = 0.6745 * (data - median) / mad outlier_mask = np.abs(modified_z) > threshold return outlier_mask, modified_z # Compare with IQRnp.random.seed(42)normal_data = np.random.normal(50, 10, 100)contaminated = np.array([200, 250, 300]) # Severe outliersdata = np.concatenate([normal_data, contaminated]) # MAD-based detectionmad_mask, mod_z = mad_outlier_detection(data) # IQR-based detectionfrom scipy import statsq1, q3 = np.percentile(data, [25, 75])iqr = q3 - q1iqr_mask = (data < q1 - 1.5*iqr) | (data > q3 + 1.5*iqr) print(f"MAD outliers: {np.sum(mad_mask)}")print(f"IQR outliers: {np.sum(iqr_mask)}")print(f"MAD-detected values: {data[mad_mask]}")The IQR method provides robustness but no formal statistical significance testing. In the next page, we'll explore Grubbs' Test—a formal hypothesis testing procedure for detecting outliers that provides p-values and rigorous statistical control, though at the cost of stronger assumptions.
You now understand the IQR method's foundations in order statistics, Tukey's fence construction, how to select multipliers, and when to prefer IQR over Z-score methods. This robust technique forms an essential part of any anomaly detection toolkit.