Loading content...
Every anomaly detection model produces continuous scores that indicate how 'anomalous' each data point is. But in practice, stakeholders need decisions: Is this an anomaly or not? Should we alert? Should we investigate? This translation from continuous scores to discrete decisions requires setting a threshold.
Threshold selection is often treated as an afterthought, but it's arguably the most critical step in deploying anomaly detection. A perfect scoring function is useless with a poor threshold. Conversely, a decent scoring function with a well-calibrated threshold can be highly effective.
The challenge is profound: unlike supervised classification, we typically lack labeled anomalies to optimize against. We must rely on statistical reasoning, domain knowledge, and operational feedback to set and maintain effective thresholds.
By the end of this page, you will understand: (1) Why threshold selection is challenging in anomaly detection, (2) Statistical methods for automatic threshold determination, (3) Business-aligned thresholds based on costs and constraints, (4) Extreme Value Theory for principled tail modeling, (5) Dynamic and adaptive threshold strategies, and (6) Multi-threshold approaches for graduated alerting.
Setting an anomaly threshold is fundamentally different from setting a classification threshold in supervised learning. In supervised classification:
In anomaly detection, we typically have:
The Core Trade-offs:
| Threshold Level | False Positive Rate | Detection Rate | Operational Impact |
|---|---|---|---|
| Too Low | High | High (catches all anomalies) | Alert fatigue; team ignores alerts; real anomalies lost in noise |
| Too High | Low | Low (misses anomalies) | False security; critical issues go undetected; costly failures |
| Just Right | Acceptable | Acceptable | Actionable alerts; sustainable operations; balanced costs |
What Makes a Good Threshold?
A good threshold depends entirely on the application context:
Cost asymmetry: What's the relative cost of false positives vs. false negatives?
Operational capacity: How many alerts can the team realistically investigate?
Base rate: How rare are anomalies in your data?
Anomaly severity: Are all anomalies equally important?
When anomalies are rare, even highly accurate detectors produce mostly false positives.
Example: Anomaly rate = 0.1%, Detector has 99% TPR and 1% FPR.
For 100,000 samples: • True anomalies: 100 → 99 detected (TP) • Normal samples: 99,900 → 999 false alarms (FP) • Precision = 99/(99+999) = 9%!
91% of alerts are false positives, despite 99% accuracy.
This is why threshold selection is so critical—and why you can't ignore the base rate.
Statistical methods provide principled, automatic threshold selection without requiring labeled anomalies. They work by modeling the distribution of scores on training (normal) data and setting thresholds based on statistical significance.
1. Percentile-Based Thresholds:
The simplest approach: set threshold at the p-th percentile of training scores.
$$\tau = \text{Percentile}_p({s_1, s_2, ..., s_n})$$
2. Gaussian Assumption (μ + kσ):
If scores are approximately Gaussian, use:
$$\tau = \mu + k \cdot \sigma$$
where μ and σ are mean and standard deviation of training scores.
Pros: Smooth, well-understood Cons: Scores often non-Gaussian (heavy-tailed, skewed)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
import numpy as npfrom scipy import statsfrom scipy.stats import genextreme, genparetofrom typing import Tuple, Optional class StatisticalThresholdSelector: """ Statistical methods for anomaly threshold selection. Provides multiple approaches that don't require labeled anomalies. """ def __init__(self, scores: np.ndarray): """ Initialize with training/normal scores. Parameters: ----------- scores : array-like Anomaly scores from training/normal data Higher scores = more anomalous """ self.scores = np.asarray(scores) self.n = len(scores) def percentile_threshold(self, percentile: float = 95) -> float: """ Set threshold at given percentile of training scores. Controls training false positive rate directly. """ threshold = np.percentile(self.scores, percentile) expected_fpr = (100 - percentile) / 100 print(f"Percentile threshold ({percentile}th): {threshold:.4f}") print(f"Expected training FP rate: {expected_fpr:.2%}") return threshold def gaussian_threshold(self, k: float = 3.0) -> float: """ Set threshold as mean + k * std (Gaussian assumption). Parameters: ----------- k : float Number of standard deviations (2, 3, or 4 common) """ mu = np.mean(self.scores) sigma = np.std(self.scores) threshold = mu + k * sigma # Theoretical FP rate under Gaussian expected_fpr = 1 - stats.norm.cdf(k) print(f"Gaussian threshold (μ + {k}σ): {threshold:.4f}") print(f"μ = {mu:.4f}, σ = {sigma:.4f}") print(f"Expected FP rate (if Gaussian): {expected_fpr:.4%}") return threshold def robust_threshold(self, k: float = 3.0) -> float: """ Robust threshold using median and MAD. More robust to outliers in training data. MAD = Median Absolute Deviation """ median = np.median(self.scores) mad = np.median(np.abs(self.scores - median)) # Scale MAD to be consistent with std for Gaussian # MAD * 1.4826 ≈ std for Gaussian robust_std = mad * 1.4826 threshold = median + k * robust_std print(f"Robust threshold (median + {k}*MAD_scaled): {threshold:.4f}") print(f"median = {median:.4f}, MAD = {mad:.4f}") return threshold def iqr_threshold(self, multiplier: float = 1.5) -> float: """ IQR-based threshold (Tukey's method). threshold = Q3 + multiplier * IQR Standard Tukey: multiplier = 1.5 (outlier) Extreme: multiplier = 3.0 (far outlier) """ q1, q3 = np.percentile(self.scores, [25, 75]) iqr = q3 - q1 threshold = q3 + multiplier * iqr print(f"IQR threshold (Q3 + {multiplier}*IQR): {threshold:.4f}") print(f"Q1 = {q1:.4f}, Q3 = {q3:.4f}, IQR = {iqr:.4f}") return threshold def contamination_threshold(self, contamination: float = 0.05) -> float: """ Set threshold assuming known contamination in training data. If we believe training data has ~5% anomalies, set threshold to exclude top 5%. """ threshold = np.percentile(self.scores, (1 - contamination) * 100) print(f"Contamination-based threshold ({contamination:.1%}): {threshold:.4f}") return threshold def elbow_threshold(self) -> Tuple[float, int]: """ Find threshold using elbow method on sorted scores. Looks for the 'knee' where scores start increasing rapidly. """ sorted_scores = np.sort(self.scores) n = len(sorted_scores) # Simple elbow detection: maximum curvature point # Use second derivative approximation if n < 10: # Too few points, fall back to percentile return self.percentile_threshold(95), int(0.95 * n) # Normalize to [0, 1] for both axes x = np.arange(n) / (n - 1) y = (sorted_scores - sorted_scores.min()) / (sorted_scores.max() - sorted_scores.min() + 1e-10) # Find point with maximum distance from line connecting first and last points # This is the elbow line_vec = np.array([1, y[-1] - y[0]]) line_vec = line_vec / np.linalg.norm(line_vec) point_vecs = np.column_stack([x, y - y[0]]) cross = np.abs(point_vecs[:, 0] * line_vec[1] - point_vecs[:, 1] * line_vec[0]) elbow_idx = np.argmax(cross) threshold = sorted_scores[elbow_idx] print(f"Elbow threshold: {threshold:.4f}") print(f"Elbow at index {elbow_idx} ({elbow_idx/n:.1%} of data)") return threshold, elbow_idx def compare_methods(self) -> dict: """Compare all threshold methods.""" print("="*60) print("Threshold Method Comparison") print("="*60) methods = { 'percentile_95': self.percentile_threshold(95), 'percentile_99': self.percentile_threshold(99), 'gaussian_2sigma': self.gaussian_threshold(2), 'gaussian_3sigma': self.gaussian_threshold(3), 'robust_3sigma': self.robust_threshold(3), 'iqr_1.5': self.iqr_threshold(1.5), 'iqr_3.0': self.iqr_threshold(3.0), 'elbow': self.elbow_threshold()[0], } print("\n" + "="*60) print("Summary:") for name, thresh in methods.items(): fpr = np.mean(self.scores > thresh) print(f" {name:20s}: {thresh:8.4f} (training FPR: {fpr:.2%})") return methods # Example usageif __name__ == "__main__": # Simulate anomaly scores (mostly normal, some outliers) np.random.seed(42) normal_scores = np.random.gamma(2, 0.5, 950) # Normal data outlier_scores = np.random.gamma(5, 1.0, 50) # Outliers in training scores = np.concatenate([normal_scores, outlier_scores]) selector = StatisticalThresholdSelector(scores) thresholds = selector.compare_methods()Standard statistical methods often fail for anomaly detection because anomaly scores are heavy-tailed—extreme values are more common than a Gaussian would predict. Extreme Value Theory (EVT) provides tools specifically designed for modeling distribution tails.
Why EVT for Anomaly Detection?
The Peaks Over Threshold (POT) Method:
POT focuses on exceedances above a high threshold u:
$$Y = X - u ;|; X > u$$
Under mild conditions, Y follows a Generalized Pareto Distribution (GPD):
$$F(y) = 1 - \left(1 + \frac{\xi y}{\sigma}\right)^{-1/\xi}$$
where:
The shape parameter ξ tells you about tail behavior:
• ξ > 0 (Fréchet-type): Heavy tail; extreme values likely. Common for anomaly scores. • ξ = 0 (Gumbel-type): Exponential tail; Gaussian-like decay • ξ < 0 (Weibull-type): Bounded tail; finite maximum value
Most anomaly score distributions have ξ > 0, meaning Gaussian assumptions underestimate the probability of extreme values.
EVT-Based Threshold Selection:
Return Level:
The return level for probability p is:
$$z_p = u + \frac{\sigma}{\xi}\left[\left(\frac{n \cdot p}{N_u}\right)^{-\xi} - 1\right]$$
where N_u is the number of observations exceeding u.
This gives the threshold such that only fraction p of future normal observations exceed it.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194
import numpy as npfrom scipy import statsfrom scipy.optimize import minimizefrom scipy.stats import genparetofrom typing import Tuple, Optional class EVTThresholdSelector: """ Extreme Value Theory-based threshold selection. Uses Peaks Over Threshold (POT) method with Generalized Pareto Distribution. """ def __init__(self, scores: np.ndarray): """ Initialize with training/normal scores. """ self.scores = np.asarray(scores) self.n = len(scores) # GPD parameters (fitted later) self.xi = None # Shape self.sigma = None # Scale self.u = None # Threshold for POT self.n_exceed = None # Number of exceedances def fit_gpd(self, preliminary_threshold_percentile: float = 90) -> Tuple[float, float, float]: """ Fit Generalized Pareto Distribution to tail exceedances. Parameters: ----------- preliminary_threshold_percentile : float Percentile for preliminary threshold (determines where 'tail' starts) Returns: -------- xi : float - Shape parameter sigma : float - Scale parameter u : float - Threshold used for POT """ # Preliminary threshold self.u = np.percentile(self.scores, preliminary_threshold_percentile) # Extract exceedances exceedances = self.scores[self.scores > self.u] - self.u self.n_exceed = len(exceedances) if self.n_exceed < 10: print(f"Warning: Only {self.n_exceed} exceedances. Results may be unreliable.") # Fit GPD using MLE # scipy's genpareto uses parametrization: c = -xi # We need to handle this carefully try: # Use scipy's genpareto fit c, loc, scale = genpareto.fit(exceedances, floc=0) self.xi = -c # Convert to standard EVT notation self.sigma = scale except Exception as e: print(f"GPD fit failed: {e}. Falling back to moment estimator.") # Moment estimator as fallback mean_exc = np.mean(exceedances) var_exc = np.var(exceedances) self.xi = 0.5 * (mean_exc**2 / var_exc - 1) self.sigma = mean_exc * (mean_exc**2 / var_exc + 1) / 2 print(f"GPD Fit Results:") print(f" Preliminary threshold (u): {self.u:.4f}") print(f" Exceedances: {self.n_exceed} ({self.n_exceed/self.n:.1%} of data)") print(f" Shape (ξ): {self.xi:.4f}") print(f" Scale (σ): {self.sigma:.4f}") if self.xi > 0: print(f" Tail type: Heavy (Fréchet)") elif self.xi < 0: print(f" Tail type: Bounded (Weibull)") else: print(f" Tail type: Exponential (Gumbel)") return self.xi, self.sigma, self.u def return_level(self, false_positive_rate: float) -> float: """ Compute threshold for given false positive rate. Parameters: ----------- false_positive_rate : float Desired probability that normal sample exceeds threshold Returns: -------- threshold : float Threshold value such that P(score > threshold) = false_positive_rate """ if self.xi is None: raise ValueError("Must call fit_gpd() first") # Probability of exceeding u p_exceed_u = self.n_exceed / self.n # We want P(X > z) = fpr for normal data # P(X > z) = P(X > u) * P(X > z | X > u) # = p_exceed_u * (1 - F_GPD(z - u)) # # Setting this equal to fpr and solving for z: # GPD quantile: For P(Y > y) = q, y = sigma/xi * ((q)^(-xi) - 1) for xi != 0 if false_positive_rate >= p_exceed_u: # Desired FPR is higher than exceedance rate, threshold is below u # Fall back to percentile return np.percentile(self.scores, (1 - false_positive_rate) * 100) # Conditional probability of exceeding threshold given exceeding u q = false_positive_rate / p_exceed_u if abs(self.xi) < 1e-6: # Exponential case (xi ≈ 0) exceedance = -self.sigma * np.log(q) else: # General GPD case exceedance = self.sigma / self.xi * (q**(-self.xi) - 1) threshold = self.u + exceedance return threshold def compute_threshold(self, false_positive_rate: float = 0.01, preliminary_percentile: float = 90) -> float: """ One-step method to compute EVT-based threshold. """ self.fit_gpd(preliminary_percentile) threshold = self.return_level(false_positive_rate) # Validate empirical_fpr = np.mean(self.scores > threshold) print(f"\nEVT Threshold for {false_positive_rate:.2%} FPR: {threshold:.4f}") print(f"Empirical training FPR: {empirical_fpr:.2%}") return threshold def stability_analysis(self, fpr: float = 0.01, percentile_range: Tuple[float, float] = (85, 95)) -> dict: """ Analyze sensitivity to preliminary threshold choice. A stable GPD fit should give similar thresholds across different u values. """ results = [] for pct in np.linspace(percentile_range[0], percentile_range[1], 11): self.fit_gpd(pct) threshold = self.return_level(fpr) results.append({ 'percentile': pct, 'u': self.u, 'xi': self.xi, 'sigma': self.sigma, 'threshold': threshold }) thresholds = [r['threshold'] for r in results] print(f"\nStability Analysis:") print(f"Threshold range: [{min(thresholds):.4f}, {max(thresholds):.4f}]") print(f"Threshold std: {np.std(thresholds):.4f}") print(f"Recommended threshold: {np.median(thresholds):.4f} (median)") return results # Exampleif __name__ == "__main__": # Simulate heavy-tailed anomaly scores np.random.seed(42) # Pareto-distributed scores (heavy tail) normal_scores = np.random.pareto(a=3, size=1000) + 1 selector = EVTThresholdSelector(normal_scores) print("="*60) print("EVT-Based Threshold Selection") print("="*60) threshold = selector.compute_threshold(false_positive_rate=0.01) print("\n" + "="*60) print("Stability Analysis") print("="*60) selector.stability_analysis(fpr=0.01)Statistical methods optimize for mathematical properties, but real systems operate under business constraints. Business-aligned thresholds translate operational requirements into threshold values.
Key Business Considerations:
Alert Budget Approach:
Set threshold to match investigation capacity:
$$\text{Daily Alerts} = \text{Daily Volume} \times \text{FPR} + \text{Expected Anomalies}$$
Solving for FPR: $$\text{FPR} = \frac{\text{Alert Budget} - \text{Expected Anomalies}}{\text{Daily Volume}}$$
Then use statistical methods to find threshold achieving this FPR.
When You Have Some Labeled Anomalies:
If you have even a small set of labeled anomalies (from past investigations), you can estimate the score distribution for anomalies and optimize more precisely:
Even 50-100 labeled anomalies can significantly improve threshold calibration.
F1-Optimal Threshold:
If you have labeled data and want balanced precision/recall:
$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$
Scan thresholds and select τ that maximizes F1 on a validation set.
In production, use investigation outcomes to continuously improve threshold:
This creates a virtuous cycle: better thresholds → fewer false positives → more trust in system → better investigation quality → better labels → even better thresholds.
Static thresholds assume data distributions don't change. In reality, systems evolve: user behavior shifts, code changes, external factors vary. Dynamic thresholds adapt to these changes, maintaining consistent detection behavior despite distribution shifts.
Why Thresholds Drift:
Adaptation Strategies:
| Strategy | How It Works | Best For |
|---|---|---|
| Rolling Window | Compute threshold from last N hours/days of data | Gradual drift; stable patterns |
| Exponential Smoothing | τ_t = α × τ_new + (1-α) × τ_{t-1} | Smooth adaptation; noisy data |
| Time-of-Day Profiles | Different thresholds for different hours | Strong seasonal patterns |
| Feedback-Based | Adjust based on FP/FN rates from labels | When investigation labels available |
| Control Charts | Detect when threshold needs update | Detecting sudden shifts |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278
import numpy as npfrom collections import dequefrom datetime import datetime, timedeltafrom typing import Optional, Callable class DynamicThreshold: """ Dynamic threshold that adapts to changing data distributions. Maintains a sliding window of recent scores and updates threshold based on configurable statistics. """ def __init__( self, window_size: int = 1000, percentile: float = 95, smoothing_factor: float = 0.1, min_samples: int = 100 ): """ Parameters: ----------- window_size : int Number of recent scores to consider percentile : float Percentile threshold within window smoothing_factor : float (0-1) Alpha for exponential smoothing (higher = faster adaptation) min_samples : int Minimum samples before updating threshold """ self.window_size = window_size self.percentile = percentile self.alpha = smoothing_factor self.min_samples = min_samples self.scores = deque(maxlen=window_size) self.threshold = None self.threshold_history = [] def update(self, new_score: float) -> float: """ Add new score and return current threshold. Threshold is updated after new score is added. """ self.scores.append(new_score) if len(self.scores) < self.min_samples: # Not enough data yet, return conservative threshold if self.threshold is None: self.threshold = float('inf') return self.threshold # Compute window threshold window_threshold = np.percentile(list(self.scores), self.percentile) if self.threshold is None or self.threshold == float('inf'): # First update self.threshold = window_threshold else: # Exponential smoothing self.threshold = self.alpha * window_threshold + (1 - self.alpha) * self.threshold self.threshold_history.append(self.threshold) return self.threshold def batch_update(self, new_scores: np.ndarray) -> float: """Update with batch of new scores.""" for score in new_scores: self.update(score) return self.threshold def is_anomaly(self, score: float) -> bool: """Check if score exceeds current threshold.""" return score > self.threshold class TimeAwareThreshold: """ Threshold that varies by time of day. Maintains separate thresholds for different time periods. """ def __init__( self, n_periods: int = 24, # Hourly by default window_size: int = 1000, percentile: float = 95 ): self.n_periods = n_periods self.thresholds = {i: DynamicThreshold(window_size, percentile) for i in range(n_periods)} def _get_period(self, timestamp: datetime) -> int: """Map timestamp to period index.""" # For hourly: period = hour # Could customize for other granularities return timestamp.hour % self.n_periods def update(self, score: float, timestamp: datetime) -> float: """Update threshold for the given time period.""" period = self._get_period(timestamp) return self.thresholds[period].update(score) def get_threshold(self, timestamp: datetime) -> float: """Get threshold for the given time period.""" period = self._get_period(timestamp) return self.thresholds[period].threshold def is_anomaly(self, score: float, timestamp: datetime) -> bool: """Check if score exceeds threshold for the time period.""" return score > self.get_threshold(timestamp) class FeedbackDrivenThreshold: """ Threshold that adapts based on investigation feedback. If false positives are too high, threshold increases. If false negatives are confirmed, threshold decreases. """ def __init__( self, initial_threshold: float, target_precision: float = 0.5, # Target 50% precision learning_rate: float = 0.05, feedback_window: int = 100 ): self.threshold = initial_threshold self.target_precision = target_precision self.learning_rate = learning_rate self.feedback_buffer = deque(maxlen=feedback_window) self.threshold_history = [initial_threshold] def record_feedback(self, score: float, was_true_positive: bool): """ Record investigation outcome. Parameters: ----------- score : float The anomaly score that triggered alert was_true_positive : bool True if investigation confirmed anomaly """ self.feedback_buffer.append({ 'score': score, 'tp': was_true_positive, 'threshold': self.threshold }) # Update threshold based on recent precision if len(self.feedback_buffer) >= 10: self._update_threshold() def _update_threshold(self): """Adjust threshold based on observed precision.""" recent = list(self.feedback_buffer) # Precision from recent alerts true_positives = sum(1 for f in recent if f['tp']) precision = true_positives / len(recent) # Adjust threshold if precision < self.target_precision: # Too many false positives, raise threshold adjustment = self.learning_rate * (self.target_precision - precision) self.threshold *= (1 + adjustment) print(f"Precision {precision:.2%} < target. Raising threshold to {self.threshold:.4f}") elif precision > self.target_precision + 0.1: # Possibly missing anomalies, lower threshold adjustment = self.learning_rate * (precision - self.target_precision) self.threshold *= (1 - adjustment * 0.5) # More conservative lowering print(f"Precision {precision:.2%} > target. Lowering threshold to {self.threshold:.4f}") self.threshold_history.append(self.threshold) def is_anomaly(self, score: float) -> bool: return score > self.threshold class ControlChartThreshold: """ Use control chart (EWMA) to detect when threshold needs recalibration. Monitors the score distribution and alerts when it shifts significantly. """ def __init__( self, base_threshold: float, lambda_param: float = 0.1, # EWMA smoothing control_limit_sigma: float = 3.0 ): self.base_threshold = base_threshold self.lambda_param = lambda_param self.sigma = control_limit_sigma self.ewma = None self.ewma_variance = None self.process_mean = None self.process_std = None self.requires_recalibration = False def initialize(self, initial_scores: np.ndarray): """Initialize control chart from baseline period.""" self.process_mean = np.mean(initial_scores) self.process_std = np.std(initial_scores) self.ewma = self.process_mean self.ewma_variance = (self.lambda_param / (2 - self.lambda_param)) * self.process_std**2 def update(self, score: float) -> dict: """ Update EWMA and check if distribution has shifted. Returns status dict with recalibration recommendation. """ if self.ewma is None: raise ValueError("Must call initialize() first") # Update EWMA self.ewma = self.lambda_param * score + (1 - self.lambda_param) * self.ewma # Control limits control_limit = self.sigma * np.sqrt(self.ewma_variance) ucl = self.process_mean + control_limit lcl = self.process_mean - control_limit # Check for out-of-control if self.ewma > ucl or self.ewma < lcl: self.requires_recalibration = True return { 'ewma': self.ewma, 'ucl': ucl, 'lcl': lcl, 'in_control': lcl <= self.ewma <= ucl, 'recalibration_recommended': self.requires_recalibration } # Example usageif __name__ == "__main__": np.random.seed(42) # Simulate evolving scores # Initial period initial_scores = np.random.normal(1.0, 0.3, 500) # Later period with drift drifted_scores = np.random.normal(1.5, 0.4, 500) all_scores = np.concatenate([initial_scores, drifted_scores]) # Dynamic threshold print("="*60) print("Dynamic Threshold Demo") print("="*60) dynamic = DynamicThreshold(window_size=200, percentile=95, smoothing_factor=0.1) thresholds = [] for i, score in enumerate(all_scores): thresh = dynamic.update(score) thresholds.append(thresh) if i % 250 == 0 and i > 0: print(f"Step {i}: threshold = {thresh:.4f}") print(f"\nInitial threshold: {thresholds[200]:.4f}") print(f"Final threshold: {thresholds[-1]:.4f}") print(f"Threshold adapted to drift: {(thresholds[-1] - thresholds[200])/thresholds[200]*100:.1f}% change")Real systems benefit from multiple thresholds that trigger different responses. Instead of a binary alert/no-alert decision, graduated approaches reduce alert fatigue while maintaining sensitivity.
Multi-Threshold Framework:
Score > τ_critical → Immediate alert, page on-call, block transaction
Score > τ_high → High-priority alert, investigate within 1 hour
Score > τ_medium → Standard alert, investigate today
Score > τ_low → Log for analysis, batch review
Score ≤ τ_low → Normal, no action
Benefits:
Ensemble Thresholds:
When using multiple models, combine their scores before thresholding:
Contextual Thresholds:
Vary thresholds based on context, not just time:
We've comprehensively explored threshold selection—the critical bridge between anomaly scores and actionable decisions. From statistical foundations to operational deployment, threshold selection deserves as much attention as model design.
Module Complete:
You've now completed the comprehensive study of One-Class Methods for Anomaly Detection. You understand:
These tools form a complete toolkit for building robust, production-ready anomaly detection systems across diverse domains and data types.
Congratulations! You've mastered One-Class Methods for anomaly detection, from mathematical foundations through practical deployment. You can now design, implement, and operationalize anomaly detection systems using kernel methods, deep learning, and principled threshold selection. These skills position you to tackle real-world anomaly detection challenges in fraud, security, quality control, and beyond.