Loading learning content...
While One-Class SVM separates data from the origin using a hyperplane, Support Vector Data Description (SVDD) takes an alternative geometric approach: it encloses the training data within a minimum-volume hypersphere. Points inside the sphere are classified as normal; points outside are anomalies.
Introduced by Tax and Duin in 1999, SVDD provides intuitive geometric interpretation—we're literally drawing a 'ball' around the normal data in feature space. This perspective often aids understanding and debugging, particularly in visualization-friendly low-dimensional settings.
Despite the different geometric motivation, SVDD and One-Class SVM are closely related. With certain kernel choices, they yield equivalent decision boundaries. Understanding both gives you flexibility to choose the formulation that best matches your problem intuition and computational constraints.
By the end of this page, you will understand: (1) The minimum enclosing hypersphere formulation, (2) How soft-margin SVDD handles outliers in training data, (3) Kernel SVDD for nonlinear data descriptions, (4) The mathematical equivalence with One-Class SVM under RBF kernels, (5) The dual formulation and its efficient computation, and (6) When to prefer SVDD over One-Class SVM.
Imagine your normal training data as points scattered in d-dimensional space. SVDD asks: What is the smallest sphere that contains (most of) these points?
The sphere is characterized by:
A new point x is classified as:
Why Minimum Volume?
A sphere with infinite radius would trivially enclose all data—but such a boundary wouldn't exclude any test point as anomalous. By minimizing the volume (equivalently, minimizing R²), we create a tight boundary that generalizes well:
| Aspect | SVDD | One-Class SVM |
|---|---|---|
| Boundary shape | Hypersphere (ball) | Hyperplane (half-space) |
| Defined by | Center a and radius R | Normal vector w and offset ρ |
| Normal region | Inside the sphere | One side of the hyperplane |
| Geometric intuition | 'Ball around the data' | 'Separating from the void' |
| Natural for | Compact, clustered data | Data with directional spread |
| Decision function | R² - ||x - a||² (distance to center) | w·φ(x) - ρ (projection onto normal) |
Visualizing the Ball:
In 2D, SVDD finds a circle that encloses the training points. In 3D, it's a sphere. In higher dimensions, the geometry becomes harder to visualize, but the principle remains: a closed, convex region that tightly fits the normal data.
The boundary points—those exactly at distance R from center a—are the support vectors. Just like in regular SVMs, the solution depends only on these critical points, not the entire training set.
Finding the minimum enclosing ball is a classic computational geometry problem with O(n) expected-time algorithms in low dimensions. However, SVDD generalizes this with kernels (for nonlinear boundaries) and soft margins (for outlier tolerance), requiring the quadratic programming machinery of SVMs.
Hard-Margin SVDD:
Given training examples {x₁, ..., xₙ}, the hard-margin SVDD finds the minimum-radius hypersphere enclosing all points:
$$\min_{a, R} R^2$$
Subject to: ||xᵢ - a||² ≤ R² for all i = 1, ..., n
This formulation requires all training points to lie inside or on the sphere. The solution is uniquely determined by the convex hull of the data—specifically, by the points on the boundary of the minimum enclosing ball.
Problem with Hard Margin:
Real training data often contains outliers. A single anomaly in the training set forces the sphere to expand dramatically to include it, destroying the tight boundary we seek.
Soft-Margin SVDD:
To handle training outliers, we introduce slack variables ξᵢ ≥ 0 that allow points to lie outside the sphere:
$$\min_{a, R, \xi} R^2 + C \sum_{i=1}^{n} \xi_i$$
Subject to:
The parameter C > 0 controls the trade-off:
Interpretation of ξᵢ:
For point xᵢ:
Dual Formulation:
Introducing Lagrange multipliers α ≥ 0 and using KKT conditions, we derive the dual:
$$\max_{\alpha} \sum_{i=1}^{n} \alpha_i (x_i \cdot x_i) - \sum_{i,j} \alpha_i \alpha_j (x_i \cdot x_j)$$
Subject to:
Key Observations:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
import numpy as npfrom scipy.optimize import minimizefrom sklearn.base import BaseEstimator, OutlierMixinfrom sklearn.metrics.pairwise import rbf_kernel, linear_kernel class SVDD(BaseEstimator, OutlierMixin): """ Support Vector Data Description (SVDD) for anomaly detection. Finds a minimum-volume hypersphere enclosing the training data, with optional soft margin for outlier tolerance. Parameters: ----------- C : float, default=1.0 Regularization parameter. Larger C = tighter sphere (less tolerance for outliers) kernel : str, default='rbf' Kernel type: 'linear' or 'rbf' gamma : float, default='scale' Kernel coefficient for RBF """ def __init__(self, C=1.0, kernel='rbf', gamma='scale'): self.C = C self.kernel = kernel self.gamma = gamma def _compute_kernel(self, X, Y=None): """Compute the kernel matrix between X and Y.""" if Y is None: Y = X if self.kernel == 'linear': return linear_kernel(X, Y) elif self.kernel == 'rbf': return rbf_kernel(X, Y, gamma=self.gamma_) else: raise ValueError(f"Unknown kernel: {self.kernel}") def fit(self, X, y=None): """ Fit the SVDD model. Solves the dual QP to find optimal alpha coefficients. """ X = np.asarray(X) n_samples = X.shape[0] self.X_train_ = X # Set gamma for RBF kernel if self.gamma == 'scale': self.gamma_ = 1.0 / (X.shape[1] * X.var()) else: self.gamma_ = self.gamma # Compute kernel matrix K = self._compute_kernel(X) # Diagonal elements: k(xi, xi) K_diag = np.diag(K) # Dual objective: maximize sum_i alpha_i * K_ii - sum_ij alpha_i * alpha_j * K_ij # Equivalent to: minimize sum_ij alpha_i * alpha_j * K_ij - sum_i alpha_i * K_ii def objective(alpha): return alpha @ K @ alpha - alpha @ K_diag def objective_grad(alpha): return 2 * K @ alpha - K_diag # Constraints: sum(alpha) = 1, 0 <= alpha <= C constraints = {'type': 'eq', 'fun': lambda a: np.sum(a) - 1, 'jac': lambda a: np.ones(n_samples)} bounds = [(0, self.C) for _ in range(n_samples)] # Initial guess: uniform distribution alpha_init = np.ones(n_samples) / n_samples # Solve the QP result = minimize( objective, alpha_init, method='SLSQP', jac=objective_grad, bounds=bounds, constraints=constraints, options={'maxiter': 1000, 'ftol': 1e-10} ) self.alpha_ = result.x # Identify support vectors (alpha > small threshold) sv_threshold = 1e-6 self.support_ = np.where(self.alpha_ > sv_threshold)[0] self.support_vectors_ = X[self.support_] self.alpha_sv_ = self.alpha_[self.support_] # Compute the center in feature space: a = sum_i alpha_i * phi(x_i) # For predictions, we need: ||phi(x) - a||^2 = k(x,x) - 2*sum_i alpha_i k(x,x_i) + sum_ij alpha_i alpha_j k(x_i,x_j) # The last term is constant: sum_ij alpha_i alpha_j K_ij self.center_norm_sq_ = self.alpha_ @ K @ self.alpha_ # Compute radius from boundary support vectors (0 < alpha < C) boundary_sv = np.where((self.alpha_ > sv_threshold) & (self.alpha_ < self.C - sv_threshold))[0] if len(boundary_sv) > 0: # Use any boundary SV to compute R^2 i = boundary_sv[0] # ||phi(x_i) - a||^2 = K_ii - 2 * sum_j alpha_j K_ij + sum_jk alpha_j alpha_k K_jk self.R_squared_ = K_diag[i] - 2 * self.alpha_ @ K[i] + self.center_norm_sq_ else: # No clear boundary SVs; use the maximum distance of SVs distances = np.array([ K_diag[i] - 2 * self.alpha_ @ K[i] + self.center_norm_sq_ for i in self.support_ ]) self.R_squared_ = np.max(distances) return self def decision_function(self, X): """ Compute the distance to the hypersphere center. Returns R^2 - ||phi(x) - a||^2 Positive = inside sphere (normal), Negative = outside (anomaly) """ X = np.asarray(X) # Kernel with training data K_test_train = self._compute_kernel(X, self.X_train_) # k(x, x) for test points if self.kernel == 'rbf': K_test_diag = np.ones(X.shape[0]) # RBF: k(x,x) = 1 else: K_test_diag = np.sum(X * X, axis=1) # Linear: k(x,x) = ||x||^2 # ||phi(x) - a||^2 = k(x,x) - 2 * sum_i alpha_i k(x, x_i) + ||a||^2 distances_sq = K_test_diag - 2 * K_test_train @ self.alpha_ + self.center_norm_sq_ # Return R^2 - distance^2 (positive inside, negative outside) return self.R_squared_ - distances_sq def predict(self, X): """ Predict if samples are normal (+1) or anomalies (-1). """ scores = self.decision_function(X) return np.where(scores >= 0, 1, -1) def fit_predict(self, X, y=None): """Fit and predict on the same data.""" self.fit(X) return self.predict(X) # Example usageif __name__ == "__main__": from sklearn.datasets import make_blobs from sklearn.metrics import classification_report # Generate normal training data X_train, _ = make_blobs(n_samples=200, centers=[[0, 0]], cluster_std=0.8, random_state=42) # Generate test data X_test_normal, _ = make_blobs(n_samples=50, centers=[[0, 0]], cluster_std=0.8, random_state=43) X_anomalies = np.random.uniform(-4, 4, size=(25, 2)) X_test = np.vstack([X_test_normal, X_anomalies]) y_test = np.array([1] * 50 + [-1] * 25) # Train SVDD print("Training SVDD...") svdd = SVDD(C=0.1, kernel='rbf', gamma='scale') svdd.fit(X_train) print(f"Number of support vectors: {len(svdd.support_)}") print(f"Radius squared: {svdd.R_squared_:.4f}") # Predict predictions = svdd.predict(X_test) print("\nClassification Report:") print(classification_report(y_test, predictions, target_names=['Anomaly', 'Normal']))Linear SVDD—fitting a sphere in the original input space—is limited to spherically-shaped data distributions. Real data often has complex, non-spherical structure. Kernel SVDD addresses this by fitting a sphere in a high-dimensional feature space, which corresponds to a complex, nonlinear boundary in the original space.
The Kernel Trick in SVDD:
Replace all inner products xᵢ · xⱼ with kernel evaluations k(xᵢ, xⱼ) = φ(xᵢ) · φ(xⱼ):
$$\max_{\alpha} \sum_{i=1}^{n} \alpha_i k(x_i, x_i) - \sum_{i,j} \alpha_i \alpha_j k(x_i, x_j)$$
Subject to: 0 ≤ αᵢ ≤ C, Σαᵢ = 1
What Happens in Feature Space:
With an RBF kernel, the data is mapped to an infinite-dimensional space. In this space, SVDD fits a hypersphere around the data. When projected back to the original space, this sphere becomes a complex, flexible boundary that can wrap around data of arbitrary shape.
With an RBF kernel, k(x, x) = 1 for all x. This means:
||φ(x)||² = k(x, x) = 1
All points lie on the unit hypersphere in feature space! This has profound implications:
The equivalence: separating data from the origin (OC-SVM) on the unit sphere is the same as finding the minimum enclosing spherical cap (SVDD on the sphere).
Equivalence Between SVDD and One-Class SVM:
For kernels where k(x, x) = constant (like RBF with k(x,x) = 1):
SVDD objective: max Σαᵢk(xᵢ,xᵢ) - Σαᵢαⱼk(xᵢ,xⱼ) = n·const - Σαᵢαⱼk(xᵢ,xⱼ)
One-Class SVM objective: min (1/2)Σαᵢαⱼk(xᵢ,xⱼ)
Both have constraints 0 ≤ αᵢ ≤ C', Σαᵢ = 1. Since the constant term doesn't affect the optimization, both methods find the same α, hence the same decision boundary.
Practical Implication:
When using RBF kernel, you can use either sklearn's OneClassSVM or a custom SVDD implementation—they produce equivalent results. The choice is often about available implementations and interpretability preference.
| Scenario | SVDD Behavior | One-Class SVM Behavior | Preferred |
|---|---|---|---|
| RBF kernel | Spherical cap on unit sphere | Hyperplane separating from origin | Equivalent |
| Linear kernel | Sphere in input space | Half-space in input space | Depends on data shape |
| Polynomial kernel | Different k(x,x) values | Different k(x,x) values | Not equivalent |
| Compact, clustered data | Natural interpretation | Less intuitive | SVDD |
| Directional data spread | May need large sphere | More natural | One-Class SVM |
SVDD has two primary hyperparameters that significantly influence performance: the regularization parameter C and the kernel parameter γ (for RBF). Proper selection is crucial for effective anomaly detection.
The C Parameter:
C controls the trade-off between sphere volume and training error tolerance:
Alternative Parameterization:
Some implementations use ν ∈ (0, 1] instead of C, with interpretation:
When C = 1/(νn), the formulations are roughly equivalent.
The γ Parameter (RBF Kernel):
γ controls the kernel width, affecting how 'local' or 'global' the kernel is:
Selection Strategies:
Grid search with proxy metrics: Since we lack labeled anomalies, use:
Heuristics:
Domain knowledge:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
import numpy as npfrom sklearn.model_selection import ParameterGridfrom sklearn.preprocessing import StandardScalerimport warnings def tune_svdd_parameters(X, C_range, gamma_range, target_outlier_fraction=0.1, n_splits=5): """ Tune SVDD parameters using stability-based evaluation. Since we don't have labeled anomalies, we use: 1. Training outlier fraction (should match expected contamination) 2. Stability under cross-validation splits 3. Radius reasonableness (not too large or collapsed) Parameters: ----------- X : array-like of shape (n_samples, n_features) Training data C_range : array-like Values of C to try gamma_range : array-like Values of gamma to try (for RBF kernel) target_outlier_fraction : float Expected fraction of outliers in training data n_splits : int Number of splits for stability estimation Returns: -------- best_params : dict Best (C, gamma) parameters results : list All evaluation results """ from svdd_implementation import SVDD # From previous code block X = np.asarray(X) n_samples = X.shape[0] # Scale data scaler = StandardScaler() X_scaled = scaler.fit_transform(X) results = [] for params in ParameterGrid({'C': C_range, 'gamma': gamma_range}): C, gamma = params['C'], params['gamma'] # Train on full data model = SVDD(C=C, kernel='rbf', gamma=gamma) with warnings.catch_warnings(): warnings.simplefilter("ignore") try: model.fit(X_scaled) except: continue # Metric 1: Training outlier fraction train_predictions = model.predict(X_scaled) train_outlier_fraction = np.mean(train_predictions == -1) # Metric 2: Deviation from target target_deviation = abs(train_outlier_fraction - target_outlier_fraction) # Metric 3: Support vector ratio sv_ratio = len(model.support_) / n_samples # Metric 4: Radius (should be positive and reasonable) radius = np.sqrt(max(0, model.R_squared_)) # Metric 5: Stability under random splits stabilities = [] for seed in range(n_splits): np.random.seed(seed) sample_idx = np.random.choice(n_samples, size=int(0.8 * n_samples), replace=False) X_sample = X_scaled[sample_idx] model_split = SVDD(C=C, kernel='rbf', gamma=gamma) try: model_split.fit(X_sample) pred_full = model.predict(X_scaled) pred_split = model_split.predict(X_scaled) agreement = np.mean(pred_full == pred_split) stabilities.append(agreement) except: stabilities.append(0.5) stability = np.mean(stabilities) result = { 'C': C, 'gamma': gamma, 'train_outlier_fraction': train_outlier_fraction, 'target_deviation': target_deviation, 'sv_ratio': sv_ratio, 'radius': radius, 'stability': stability, # Combined score: low deviation, high stability 'score': -target_deviation + 0.5 * stability } results.append(result) print(f"C={C:.3f}, γ={gamma:.3f}: outlier={train_outlier_fraction:.2%}, " f"stability={stability:.2%}, radius={radius:.3f}") # Select best by combined score best = max(results, key=lambda r: r['score']) print(f"\nBest parameters: C={best['C']:.4f}, γ={best['gamma']:.4f}") print(f" Outlier fraction: {best['train_outlier_fraction']:.2%}") print(f" Stability: {best['stability']:.2%}") return {'C': best['C'], 'gamma': best['gamma']}, results # Automated heuristic selectiondef svdd_auto_parameters(X, expected_contamination=0.1): """ Automatically select SVDD parameters using heuristics. Returns reasonable starting values for C and gamma. """ from scipy.spatial.distance import pdist X = np.asarray(X) n_samples, n_features = X.shape # Gamma: median heuristic pairwise_dists = pdist(X, 'sqeuclidean') gamma = 1.0 / (2 * np.median(pairwise_dists)) # C: based on expected contamination # C = 1/(n * nu) where nu ≈ expected_contamination C = 1.0 / (n_samples * expected_contamination) # Clamp to reasonable range C = np.clip(C, 0.001, 10.0) gamma = np.clip(gamma, 1e-6, 1e3) return {'C': C, 'gamma': gamma} if __name__ == "__main__": from sklearn.datasets import make_blobs # Generate data with some outliers X_normal, _ = make_blobs(n_samples=180, centers=[[0, 0]], cluster_std=1.0, random_state=42) X_outliers = np.random.uniform(-5, 5, size=(20, 2)) X = np.vstack([X_normal, X_outliers]) # 10% contamination print("Automatic parameter selection:") auto_params = svdd_auto_parameters(X, expected_contamination=0.1) print(f" C = {auto_params['C']:.4f}") print(f" γ = {auto_params['gamma']:.4f}") print("\nGrid search tuning:") C_range = [0.01, 0.05, 0.1, 0.5, 1.0] gamma_range = [0.1, 0.5, 1.0, 2.0, 5.0] best_params, results = tune_svdd_parameters(X, C_range, gamma_range, target_outlier_fraction=0.1)SVDD's geometric simplicity and interpretability make it particularly well-suited for certain anomaly detection scenarios. Understanding these use cases helps you choose between SVDD and alternative methods.
1. Machine Condition Monitoring:
In industrial settings, sensors continuously monitor machine vibrations, temperatures, and pressures. Training data comes from periods of known normal operation.
2. Network Intrusion Detection:
Network traffic during normal operation is characterized (packet sizes, timing, protocols). SVDD learns the normal traffic envelope.
| Domain | Normal Class | Anomalies | Why SVDD Works |
|---|---|---|---|
| Manufacturing QC | Products within spec | Defective items | Compact normal region in measurement space |
| Medical imaging | Healthy tissue patterns | Lesions, tumors | Normal anatomy forms coherent clusters |
| Fraud detection | Legitimate transactions | Fraudulent activity | Normal behavior is routine/clustered |
| Novelty detection | Known object classes | Unknown objects | Training classes form bounded regions |
| Predictive maintenance | Healthy equipment signals | Early failure indicators | Degradation moves away from normal envelope |
Beyond anomaly detection, SVDD is used for one-class classification: problems where you have examples only from one class and want to recognize new instances of that class.
Examples: • Face verification: Is this face 'me' or 'not me'? • Document filtering: Is this document about 'topic X'? • Biometric authentication: Is this the registered user?
The SVDD decision boundary represents 'what I know'; everything outside is 'unknown'.
Comparison with Other One-Class Methods:
When should you choose SVDD over alternatives?
Choose SVDD when:
Consider alternatives when:
We've comprehensively explored Support Vector Data Description, from geometric intuition to practical implementation. Let's consolidate the essential insights.
What's Next:
In the following page, we explore Autoencoders for Anomaly Detection—a fundamentally different approach that learns to reconstruct normal data and flags anomalies based on reconstruction error. This neural network approach scales to very high dimensions and can capture complex, hierarchical data structures that challenge kernel methods.
You now have a thorough understanding of SVDD for one-class classification and anomaly detection. You understand the minimum enclosing ball formulation, soft-margin extensions, kernel generalization, and the deep connection to One-Class SVM. You can implement SVDD, tune its hyperparameters, and apply it to real-world problems effectively.