Machine LearningAnomaly & Outlier Detection

One-Class Methods for Anomaly Detection

LevelAdvanced

Duration90 mins

TopicAnomaly & Outlier Detection

2 / 5

Support Vector Data Description (SVDD)

The Hypersphere Perspective on Anomaly Detection

While One-Class SVM separates data from the origin using a hyperplane, Support Vector Data Description (SVDD) takes an alternative geometric approach: it encloses the training data within a minimum-volume hypersphere. Points inside the sphere are classified as normal; points outside are anomalies.

Introduced by Tax and Duin in 1999, SVDD provides intuitive geometric interpretation—we're literally drawing a 'ball' around the normal data in feature space. This perspective often aids understanding and debugging, particularly in visualization-friendly low-dimensional settings.

Despite the different geometric motivation, SVDD and One-Class SVM are closely related. With certain kernel choices, they yield equivalent decision boundaries. Understanding both gives you flexibility to choose the formulation that best matches your problem intuition and computational constraints.

Learning Objectives

By the end of this page, you will understand: (1) The minimum enclosing hypersphere formulation, (2) How soft-margin SVDD handles outliers in training data, (3) Kernel SVDD for nonlinear data descriptions, (4) The mathematical equivalence with One-Class SVM under RBF kernels, (5) The dual formulation and its efficient computation, and (6) When to prefer SVDD over One-Class SVM.

Geometric Intuition: The Minimum Enclosing Ball

Imagine your normal training data as points scattered in d-dimensional space. SVDD asks: What is the smallest sphere that contains (most of) these points?

The sphere is characterized by:

Center a: The point equidistant from all boundary points
Radius R: The distance from center to the boundary

A new point x is classified as:

Normal if ||x - a||² ≤ R²
Anomaly if ||x - a||² > R²

Why Minimum Volume?

A sphere with infinite radius would trivially enclose all data—but such a boundary wouldn't exclude any test point as anomalous. By minimizing the volume (equivalently, minimizing R²), we create a tight boundary that generalizes well:

Points far from the training data fall outside the sphere
The sphere follows the shape of the data distribution
Tight fitting prevents anomalies from sneaking inside

SVDD vs. One-Class SVM: Geometric Comparison
Aspect	SVDD	One-Class SVM
Boundary shape	Hypersphere (ball)	Hyperplane (half-space)
Defined by	Center a and radius R	Normal vector w and offset ρ
Normal region	Inside the sphere	One side of the hyperplane
Geometric intuition	'Ball around the data'	'Separating from the void'
Natural for	Compact, clustered data	Data with directional spread
Decision function	R² - \|\|x - a\|\|² (distance to center)	w·φ(x) - ρ (projection onto normal)

Visualizing the Ball:

In 2D, SVDD finds a circle that encloses the training points. In 3D, it's a sphere. In higher dimensions, the geometry becomes harder to visualize, but the principle remains: a closed, convex region that tightly fits the normal data.

The boundary points—those exactly at distance R from center a—are the support vectors. Just like in regular SVMs, the solution depends only on these critical points, not the entire training set.

The Smallest Enclosing Ball Problem

Finding the minimum enclosing ball is a classic computational geometry problem with O(n) expected-time algorithms in low dimensions. However, SVDD generalizes this with kernels (for nonlinear boundaries) and soft margins (for outlier tolerance), requiring the quadratic programming machinery of SVMs.

Mathematical Formulation: Hard and Soft Margin SVDD

Hard-Margin SVDD:

Given training examples {x₁, ..., xₙ}, the hard-margin SVDD finds the minimum-radius hypersphere enclosing all points:

$$\min_{a, R} R^2$$

Subject to: ||xᵢ - a||² ≤ R² for all i = 1, ..., n

This formulation requires all training points to lie inside or on the sphere. The solution is uniquely determined by the convex hull of the data—specifically, by the points on the boundary of the minimum enclosing ball.

Problem with Hard Margin:

Real training data often contains outliers. A single anomaly in the training set forces the sphere to expand dramatically to include it, destroying the tight boundary we seek.

Soft-Margin SVDD:

To handle training outliers, we introduce slack variables ξᵢ ≥ 0 that allow points to lie outside the sphere:

$$\min_{a, R, \xi} R^2 + C \sum_{i=1}^{n} \xi_i$$

Subject to:

||xᵢ - a||² ≤ R² + ξᵢ for all i
ξᵢ ≥ 0 for all i

The parameter C > 0 controls the trade-off:

Large C: Strong penalty for violations → smaller slack → tighter sphere including more points
Small C: Weak penalty → larger slack → smaller sphere excluding more points

Interpretation of ξᵢ:

For point xᵢ:

ξᵢ = 0: Point is inside or exactly on the sphere
ξᵢ > 0: Point is outside the sphere by amount ξᵢ = ||xᵢ - a||² - R²

Support Vector Types in SVDD

•Interior points (αᵢ = 0): Strictly inside the sphere; don't influence the boundary
•Boundary support vectors (0 < αᵢ < C): Exactly on the sphere surface; define the boundary
•Exterior support vectors (αᵢ = C): Outside the sphere; treated as training outliers

Dual Formulation:

Introducing Lagrange multipliers α ≥ 0 and using KKT conditions, we derive the dual:

$$\max_{\alpha} \sum_{i=1}^{n} \alpha_i (x_i \cdot x_i) - \sum_{i,j} \alpha_i \alpha_j (x_i \cdot x_j)$$

Subject to:

0 ≤ αᵢ ≤ C for all i
Σᵢ αᵢ = 1

Key Observations:

The objective depends only on inner products xᵢ · xⱼ → kernel trick applies
The center is recovered as: a = Σᵢ αᵢ xᵢ (weighted combination of support vectors)
The radius R is computed from any boundary support vector: R² = ||xⱼ - a||² where 0 < αⱼ < C
The constraint Σαᵢ = 1 makes the solution scale-invariant

svdd_implementation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
import numpy as np
from scipy.optimize import minimize
from sklearn.base import BaseEstimator, OutlierMixin
from sklearn.metrics.pairwise import rbf_kernel, linear_kernel
 
class SVDD(BaseEstimator, OutlierMixin):
    """
    Support Vector Data Description (SVDD) for anomaly detection.
    
    Finds a minimum-volume hypersphere enclosing the training data,
    with optional soft margin for outlier tolerance.
    
    Parameters:
    -----------
    C : float, default=1.0
        Regularization parameter. Larger C = tighter sphere (less tolerance for outliers)
    kernel : str, default='rbf'
        Kernel type: 'linear' or 'rbf'
    gamma : float, default='scale'
        Kernel coefficient for RBF
    """
    
    def __init__(self, C=1.0, kernel='rbf', gamma='scale'):
        self.C = C
        self.kernel = kernel
        self.gamma = gamma
    
    def _compute_kernel(self, X, Y=None):
        """Compute the kernel matrix between X and Y."""
        if Y is None:
            Y = X
        
        if self.kernel == 'linear':
            return linear_kernel(X, Y)
        elif self.kernel == 'rbf':
            return rbf_kernel(X, Y, gamma=self.gamma_)
        else:
            raise ValueError(f"Unknown kernel: {self.kernel}")
    
    def fit(self, X, y=None):
        """
        Fit the SVDD model.
        
        Solves the dual QP to find optimal alpha coefficients.
        """
        X = np.asarray(X)
        n_samples = X.shape[0]
        self.X_train_ = X
        
        # Set gamma for RBF kernel
        if self.gamma == 'scale':
            self.gamma_ = 1.0 / (X.shape[1] * X.var())
        else:
            self.gamma_ = self.gamma
        
        # Compute kernel matrix
        K = self._compute_kernel(X)
        
        # Diagonal elements: k(xi, xi)
        K_diag = np.diag(K)
        
        # Dual objective: maximize sum_i alpha_i * K_ii - sum_ij alpha_i * alpha_j * K_ij
        # Equivalent to: minimize sum_ij alpha_i * alpha_j * K_ij - sum_i alpha_i * K_ii
        def objective(alpha):
            return alpha @ K @ alpha - alpha @ K_diag
        
        def objective_grad(alpha):
            return 2 * K @ alpha - K_diag
        
        # Constraints: sum(alpha) = 1, 0 <= alpha <= C
        constraints = {'type': 'eq', 'fun': lambda a: np.sum(a) - 1, 'jac': lambda a: np.ones(n_samples)}
        bounds = [(0, self.C) for _ in range(n_samples)]
        
        # Initial guess: uniform distribution
        alpha_init = np.ones(n_samples) / n_samples
        
        # Solve the QP
        result = minimize(
            objective,
            alpha_init,
            method='SLSQP',
            jac=objective_grad,
            bounds=bounds,
            constraints=constraints,
            options={'maxiter': 1000, 'ftol': 1e-10}
        )
        
        self.alpha_ = result.x
        
        # Identify support vectors (alpha > small threshold)
        sv_threshold = 1e-6
        self.support_ = np.where(self.alpha_ > sv_threshold)[0]
        self.support_vectors_ = X[self.support_]
        self.alpha_sv_ = self.alpha_[self.support_]
        
        # Compute the center in feature space: a = sum_i alpha_i * phi(x_i)
        # For predictions, we need: ||phi(x) - a||^2 = k(x,x) - 2*sum_i alpha_i k(x,x_i) + sum_ij alpha_i alpha_j k(x_i,x_j)
        
        # The last term is constant: sum_ij alpha_i alpha_j K_ij
        self.center_norm_sq_ = self.alpha_ @ K @ self.alpha_
        
        # Compute radius from boundary support vectors (0 < alpha < C)
        boundary_sv = np.where((self.alpha_ > sv_threshold) & (self.alpha_ < self.C - sv_threshold))[0]
        
        if len(boundary_sv) > 0:
            # Use any boundary SV to compute R^2
            i = boundary_sv[0]
            # ||phi(x_i) - a||^2 = K_ii - 2 * sum_j alpha_j K_ij + sum_jk alpha_j alpha_k K_jk
            self.R_squared_ = K_diag[i] - 2 * self.alpha_ @ K[i] + self.center_norm_sq_
        else:
            # No clear boundary SVs; use the maximum distance of SVs
            distances = np.array([
                K_diag[i] - 2 * self.alpha_ @ K[i] + self.center_norm_sq_
                for i in self.support_
            ])
            self.R_squared_ = np.max(distances)
        
        return self
    
    def decision_function(self, X):
        """
        Compute the distance to the hypersphere center.
        
        Returns R^2 - ||phi(x) - a||^2
        Positive = inside sphere (normal), Negative = outside (anomaly)
        """
        X = np.asarray(X)
        
        # Kernel with training data
        K_test_train = self._compute_kernel(X, self.X_train_)
        
        # k(x, x) for test points
        if self.kernel == 'rbf':
            K_test_diag = np.ones(X.shape[0])  # RBF: k(x,x) = 1
        else:
            K_test_diag = np.sum(X * X, axis=1)  # Linear: k(x,x) = ||x||^2
        
        # ||phi(x) - a||^2 = k(x,x) - 2 * sum_i alpha_i k(x, x_i) + ||a||^2
        distances_sq = K_test_diag - 2 * K_test_train @ self.alpha_ + self.center_norm_sq_
        
        # Return R^2 - distance^2 (positive inside, negative outside)
        return self.R_squared_ - distances_sq
    
    def predict(self, X):
        """
        Predict if samples are normal (+1) or anomalies (-1).
        """
        scores = self.decision_function(X)
        return np.where(scores >= 0, 1, -1)
    
    def fit_predict(self, X, y=None):
        """Fit and predict on the same data."""
        self.fit(X)
        return self.predict(X)
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.datasets import make_blobs
    from sklearn.metrics import classification_report
    
    # Generate normal training data
    X_train, _ = make_blobs(n_samples=200, centers=[[0, 0]], cluster_std=0.8, random_state=42)
    
    # Generate test data
    X_test_normal, _ = make_blobs(n_samples=50, centers=[[0, 0]], cluster_std=0.8, random_state=43)
    X_anomalies = np.random.uniform(-4, 4, size=(25, 2))
    
    X_test = np.vstack([X_test_normal, X_anomalies])
    y_test = np.array([1] * 50 + [-1] * 25)
    
    # Train SVDD
    print("Training SVDD...")
    svdd = SVDD(C=0.1, kernel='rbf', gamma='scale')
    svdd.fit(X_train)
    
    print(f"Number of support vectors: {len(svdd.support_)}")
    print(f"Radius squared: {svdd.R_squared_:.4f}")
    
    # Predict
    predictions = svdd.predict(X_test)
    print("\nClassification Report:")
    print(classification_report(y_test, predictions, target_names=['Anomaly', 'Normal']))

Kernel SVDD: Nonlinear Data Descriptions

Linear SVDD—fitting a sphere in the original input space—is limited to spherically-shaped data distributions. Real data often has complex, non-spherical structure. Kernel SVDD addresses this by fitting a sphere in a high-dimensional feature space, which corresponds to a complex, nonlinear boundary in the original space.

The Kernel Trick in SVDD:

Replace all inner products xᵢ · xⱼ with kernel evaluations k(xᵢ, xⱼ) = φ(xᵢ) · φ(xⱼ):

$$\max_{\alpha} \sum_{i=1}^{n} \alpha_i k(x_i, x_i) - \sum_{i,j} \alpha_i \alpha_j k(x_i, x_j)$$

Subject to: 0 ≤ αᵢ ≤ C, Σαᵢ = 1

What Happens in Feature Space:

With an RBF kernel, the data is mapped to an infinite-dimensional space. In this space, SVDD fits a hypersphere around the data. When projected back to the original space, this sphere becomes a complex, flexible boundary that can wrap around data of arbitrary shape.

RBF Kernel: The Special Case

With an RBF kernel, k(x, x) = 1 for all x. This means:

||φ(x)||² = k(x, x) = 1

All points lie on the unit hypersphere in feature space! This has profound implications:

The optimization simplifies (diagonal terms are constant)
Finding a minimum enclosing ball on a sphere becomes finding a spherical cap
This makes SVDD with RBF kernel mathematically equivalent to One-Class SVM

The equivalence: separating data from the origin (OC-SVM) on the unit sphere is the same as finding the minimum enclosing spherical cap (SVDD on the sphere).

Equivalence Between SVDD and One-Class SVM:

For kernels where k(x, x) = constant (like RBF with k(x,x) = 1):

SVDD objective: max Σαᵢk(xᵢ,xᵢ) - Σαᵢαⱼk(xᵢ,xⱼ) = n·const - Σαᵢαⱼk(xᵢ,xⱼ)

One-Class SVM objective: min (1/2)Σαᵢαⱼk(xᵢ,xⱼ)

Both have constraints 0 ≤ αᵢ ≤ C', Σαᵢ = 1. Since the constant term doesn't affect the optimization, both methods find the same α, hence the same decision boundary.

Practical Implication:

When using RBF kernel, you can use either sklearn's OneClassSVM or a custom SVDD implementation—they produce equivalent results. The choice is often about available implementations and interpretability preference.

When SVDD Differs from One-Class SVM
Scenario	SVDD Behavior	One-Class SVM Behavior	Preferred
RBF kernel	Spherical cap on unit sphere	Hyperplane separating from origin	Equivalent
Linear kernel	Sphere in input space	Half-space in input space	Depends on data shape
Polynomial kernel	Different k(x,x) values	Different k(x,x) values	Not equivalent
Compact, clustered data	Natural interpretation	Less intuitive	SVDD
Directional data spread	May need large sphere	More natural	One-Class SVM

Hyperparameter Selection for SVDD

SVDD has two primary hyperparameters that significantly influence performance: the regularization parameter C and the kernel parameter γ (for RBF). Proper selection is crucial for effective anomaly detection.

The C Parameter:

C controls the trade-off between sphere volume and training error tolerance:

C → ∞: Hard margin; sphere must contain all training points
C → 0: Extreme soft margin; many points treated as outliers
Typical range: [0.01, 10], often expressed as fraction 1/n ≤ C ≤ 1

Alternative Parameterization:

Some implementations use ν ∈ (0, 1] instead of C, with interpretation:

ν ≈ upper bound on fraction of training outliers
ν ≈ lower bound on fraction of support vectors

When C = 1/(νn), the formulations are roughly equivalent.

Small C Effects

•Smaller hypersphere
•More training points classified as outliers
•Higher sensitivity to anomalies
•Risk of overfitting to dense core
•Lower recall on test anomalies if boundary is too tight

Large C Effects

•Larger hypersphere
•More training points included
•Lower sensitivity to anomalies
•May include training outliers in normal region
•Risk of missing true test anomalies

The γ Parameter (RBF Kernel):

γ controls the kernel width, affecting how 'local' or 'global' the kernel is:

Large γ: Each point affects only nearby points → complex, tight boundary
Small γ: Each point affects many points → smooth, global boundary

Selection Strategies:

Grid search with proxy metrics: Since we lack labeled anomalies, use:
- Fraction of training data classified as normal (target ≈ 1 - expected contamination)
- Stability under bootstrap resampling
- Radius magnitude (not too large, not collapsed to zero)
Heuristics:
- γ = 1 / (n_features × variance)
- γ = 1 / (2 × median pairwise distance²)
Domain knowledge:
- If you know typical variation scale δ, set γ ≈ 1/(2δ²)

svdd_hyperparameter_tuning.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import numpy as np
from sklearn.model_selection import ParameterGrid
from sklearn.preprocessing import StandardScaler
import warnings
 
def tune_svdd_parameters(X, C_range, gamma_range, target_outlier_fraction=0.1, n_splits=5):
    """
    Tune SVDD parameters using stability-based evaluation.
    
    Since we don't have labeled anomalies, we use:
    1. Training outlier fraction (should match expected contamination)
    2. Stability under cross-validation splits
    3. Radius reasonableness (not too large or collapsed)
    
    Parameters:
    -----------
    X : array-like of shape (n_samples, n_features)
        Training data
    C_range : array-like
        Values of C to try
    gamma_range : array-like
        Values of gamma to try (for RBF kernel)
    target_outlier_fraction : float
        Expected fraction of outliers in training data
    n_splits : int
        Number of splits for stability estimation
        
    Returns:
    --------
    best_params : dict
        Best (C, gamma) parameters
    results : list
        All evaluation results
    """
    from svdd_implementation import SVDD  # From previous code block
    
    X = np.asarray(X)
    n_samples = X.shape[0]
    
    # Scale data
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    results = []
    
    for params in ParameterGrid({'C': C_range, 'gamma': gamma_range}):
        C, gamma = params['C'], params['gamma']
        
        # Train on full data
        model = SVDD(C=C, kernel='rbf', gamma=gamma)
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            try:
                model.fit(X_scaled)
            except:
                continue
        
        # Metric 1: Training outlier fraction
        train_predictions = model.predict(X_scaled)
        train_outlier_fraction = np.mean(train_predictions == -1)
        
        # Metric 2: Deviation from target
        target_deviation = abs(train_outlier_fraction - target_outlier_fraction)
        
        # Metric 3: Support vector ratio
        sv_ratio = len(model.support_) / n_samples
        
        # Metric 4: Radius (should be positive and reasonable)
        radius = np.sqrt(max(0, model.R_squared_))
        
        # Metric 5: Stability under random splits
        stabilities = []
        for seed in range(n_splits):
            np.random.seed(seed)
            sample_idx = np.random.choice(n_samples, size=int(0.8 * n_samples), replace=False)
            X_sample = X_scaled[sample_idx]
            
            model_split = SVDD(C=C, kernel='rbf', gamma=gamma)
            try:
                model_split.fit(X_sample)
                pred_full = model.predict(X_scaled)
                pred_split = model_split.predict(X_scaled)
                agreement = np.mean(pred_full == pred_split)
                stabilities.append(agreement)
            except:
                stabilities.append(0.5)
        
        stability = np.mean(stabilities)
        
        result = {
            'C': C,
            'gamma': gamma,
            'train_outlier_fraction': train_outlier_fraction,
            'target_deviation': target_deviation,
            'sv_ratio': sv_ratio,
            'radius': radius,
            'stability': stability,
            # Combined score: low deviation, high stability
            'score': -target_deviation + 0.5 * stability
        }
        results.append(result)
        
        print(f"C={C:.3f}, γ={gamma:.3f}: outlier={train_outlier_fraction:.2%}, "
              f"stability={stability:.2%}, radius={radius:.3f}")
    
    # Select best by combined score
    best = max(results, key=lambda r: r['score'])
    
    print(f"\nBest parameters: C={best['C']:.4f}, γ={best['gamma']:.4f}")
    print(f"  Outlier fraction: {best['train_outlier_fraction']:.2%}")
    print(f"  Stability: {best['stability']:.2%}")
    
    return {'C': best['C'], 'gamma': best['gamma']}, results
 
 
# Automated heuristic selection
def svdd_auto_parameters(X, expected_contamination=0.1):
    """
    Automatically select SVDD parameters using heuristics.
    
    Returns reasonable starting values for C and gamma.
    """
    from scipy.spatial.distance import pdist
    
    X = np.asarray(X)
    n_samples, n_features = X.shape
    
    # Gamma: median heuristic
    pairwise_dists = pdist(X, 'sqeuclidean')
    gamma = 1.0 / (2 * np.median(pairwise_dists))
    
    # C: based on expected contamination
    # C = 1/(n * nu) where nu ≈ expected_contamination
    C = 1.0 / (n_samples * expected_contamination)
    
    # Clamp to reasonable range
    C = np.clip(C, 0.001, 10.0)
    gamma = np.clip(gamma, 1e-6, 1e3)
    
    return {'C': C, 'gamma': gamma}
 
 
if __name__ == "__main__":
    from sklearn.datasets import make_blobs
    
    # Generate data with some outliers
    X_normal, _ = make_blobs(n_samples=180, centers=[[0, 0]], cluster_std=1.0, random_state=42)
    X_outliers = np.random.uniform(-5, 5, size=(20, 2))
    X = np.vstack([X_normal, X_outliers])  # 10% contamination
    
    print("Automatic parameter selection:")
    auto_params = svdd_auto_parameters(X, expected_contamination=0.1)
    print(f"  C = {auto_params['C']:.4f}")
    print(f"  γ = {auto_params['gamma']:.4f}")
    
    print("\nGrid search tuning:")
    C_range = [0.01, 0.05, 0.1, 0.5, 1.0]
    gamma_range = [0.1, 0.5, 1.0, 2.0, 5.0]
    best_params, results = tune_svdd_parameters(X, C_range, gamma_range, target_outlier_fraction=0.1)

Applications and Use Cases

SVDD's geometric simplicity and interpretability make it particularly well-suited for certain anomaly detection scenarios. Understanding these use cases helps you choose between SVDD and alternative methods.

1. Machine Condition Monitoring:

In industrial settings, sensors continuously monitor machine vibrations, temperatures, and pressures. Training data comes from periods of known normal operation.

SVDD creates a 'normal operation envelope' in feature space
New sensor readings outside the envelope trigger maintenance alerts
The sphere interpretation is intuitive for domain experts: 'how far are we from normal?'

2. Network Intrusion Detection:

Network traffic during normal operation is characterized (packet sizes, timing, protocols). SVDD learns the normal traffic envelope.

Attacks often generate unusual traffic patterns
Fast prediction on new traffic via kernel evaluation with support vectors
Handles high-dimensional feature spaces common in network data

SVDD Application Domains
Domain	Normal Class	Anomalies	Why SVDD Works
Manufacturing QC	Products within spec	Defective items	Compact normal region in measurement space
Medical imaging	Healthy tissue patterns	Lesions, tumors	Normal anatomy forms coherent clusters
Fraud detection	Legitimate transactions	Fraudulent activity	Normal behavior is routine/clustered
Novelty detection	Known object classes	Unknown objects	Training classes form bounded regions
Predictive maintenance	Healthy equipment signals	Early failure indicators	Degradation moves away from normal envelope

SVDD for One-Class Classification

Beyond anomaly detection, SVDD is used for one-class classification: problems where you have examples only from one class and want to recognize new instances of that class.

Examples: • Face verification: Is this face 'me' or 'not me'? • Document filtering: Is this document about 'topic X'? • Biometric authentication: Is this the registered user?

The SVDD decision boundary represents 'what I know'; everything outside is 'unknown'.

Comparison with Other One-Class Methods:

When should you choose SVDD over alternatives?

Choose SVDD when:

Data forms a relatively compact cluster
You want geometric interpretability (distance from center)
Training set has moderate contamination (handled by soft margin)
You need kernel flexibility for nonlinear boundaries

Consider alternatives when:

Data has multiple distinct clusters → use multiple SVDDs or GMM
Very high dimensions with complex structure → use autoencoders
Need probabilistic output → use density estimation or deep methods
Computational resources limited for large data → use Isolation Forest

Summary and Key Takeaways

We've comprehensively explored Support Vector Data Description, from geometric intuition to practical implementation. Let's consolidate the essential insights.

Key Takeaways

•Geometric principle: SVDD finds a minimum-volume hypersphere enclosing the training data. Points inside are normal; points outside are anomalies. This provides intuitive 'ball around the data' interpretation.
•Soft margin formulation: The C parameter (or ν) controls tolerance for training outliers. Larger C = tighter sphere excluding fewer training points; smaller C = looser fit treating more training points as outliers.
•Kernel SVDD: Using kernels (especially RBF) enables nonlinear boundaries. In feature space, we fit a sphere; in input space, this becomes a flexible, data-adaptive decision boundary.
•Equivalence with One-Class SVM: For RBF and other kernels with constant k(x,x), SVDD and One-Class SVM produce identical decision boundaries. Choose based on interpretability preference.
•Hyperparameter selection: Use training outlier fraction targets, stability metrics, and domain heuristics. Grid search with these proxy metrics works well without labeled anomalies.
•Best for compact data: SVDD excels when normal data forms a coherent, bounded cluster. For multimodal or highly complex data, consider ensembles or neural methods.

What's Next:

In the following page, we explore Autoencoders for Anomaly Detection—a fundamentally different approach that learns to reconstruct normal data and flags anomalies based on reconstruction error. This neural network approach scales to very high dimensions and can capture complex, hierarchical data structures that challenge kernel methods.

Page Complete

You now have a thorough understanding of SVDD for one-class classification and anomaly detection. You understand the minimum enclosing ball formulation, soft-margin extensions, kernel generalization, and the deep connection to One-Class SVM. You can implement SVDD, tune its hyperparameters, and apply it to real-world problems effectively.

2 / 5

Loading learning content...

Machine LearningAnomaly & Outlier Detection

One-Class Methods for Anomaly Detection

LevelAdvanced

Duration90 mins

TopicAnomaly & Outlier Detection

2 / 5

Support Vector Data Description (SVDD)

The Hypersphere Perspective on Anomaly Detection

Learning Objectives

Geometric Intuition: The Minimum Enclosing Ball

Imagine your normal training data as points scattered in d-dimensional space. SVDD asks: What is the smallest sphere that contains (most of) these points?

The sphere is characterized by:

Center a: The point equidistant from all boundary points
Radius R: The distance from center to the boundary

A new point x is classified as:

Normal if ||x - a||² ≤ R²
Anomaly if ||x - a||² > R²

Why Minimum Volume?

Points far from the training data fall outside the sphere
The sphere follows the shape of the data distribution
Tight fitting prevents anomalies from sneaking inside

SVDD vs. One-Class SVM: Geometric Comparison
Aspect	SVDD	One-Class SVM
Boundary shape	Hypersphere (ball)	Hyperplane (half-space)
Defined by	Center a and radius R	Normal vector w and offset ρ
Normal region	Inside the sphere	One side of the hyperplane
Geometric intuition	'Ball around the data'	'Separating from the void'
Natural for	Compact, clustered data	Data with directional spread
Decision function	R² - \|\|x - a\|\|² (distance to center)	w·φ(x) - ρ (projection onto normal)

Visualizing the Ball:

The Smallest Enclosing Ball Problem

Mathematical Formulation: Hard and Soft Margin SVDD

Hard-Margin SVDD:

Given training examples {x₁, ..., xₙ}, the hard-margin SVDD finds the minimum-radius hypersphere enclosing all points:

$$\min_{a, R} R^2$$

Subject to: ||xᵢ - a||² ≤ R² for all i = 1, ..., n

Problem with Hard Margin:

Real training data often contains outliers. A single anomaly in the training set forces the sphere to expand dramatically to include it, destroying the tight boundary we seek.

Soft-Margin SVDD:

To handle training outliers, we introduce slack variables ξᵢ ≥ 0 that allow points to lie outside the sphere:

$$\min_{a, R, \xi} R^2 + C \sum_{i=1}^{n} \xi_i$$

Subject to:

||xᵢ - a||² ≤ R² + ξᵢ for all i
ξᵢ ≥ 0 for all i

The parameter C > 0 controls the trade-off:

Large C: Strong penalty for violations → smaller slack → tighter sphere including more points
Small C: Weak penalty → larger slack → smaller sphere excluding more points

Interpretation of ξᵢ:

For point xᵢ:

ξᵢ = 0: Point is inside or exactly on the sphere
ξᵢ > 0: Point is outside the sphere by amount ξᵢ = ||xᵢ - a||² - R²

Support Vector Types in SVDD

•Interior points (αᵢ = 0): Strictly inside the sphere; don't influence the boundary
•Boundary support vectors (0 < αᵢ < C): Exactly on the sphere surface; define the boundary
•Exterior support vectors (αᵢ = C): Outside the sphere; treated as training outliers

Dual Formulation:

Introducing Lagrange multipliers α ≥ 0 and using KKT conditions, we derive the dual:

$$\max_{\alpha} \sum_{i=1}^{n} \alpha_i (x_i \cdot x_i) - \sum_{i,j} \alpha_i \alpha_j (x_i \cdot x_j)$$

Subject to:

0 ≤ αᵢ ≤ C for all i
Σᵢ αᵢ = 1

Key Observations:

The objective depends only on inner products xᵢ · xⱼ → kernel trick applies
The center is recovered as: a = Σᵢ αᵢ xᵢ (weighted combination of support vectors)
The radius R is computed from any boundary support vector: R² = ||xⱼ - a||² where 0 < αⱼ < C
The constraint Σαᵢ = 1 makes the solution scale-invariant

svdd_implementation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
import numpy as np
from scipy.optimize import minimize
from sklearn.base import BaseEstimator, OutlierMixin
from sklearn.metrics.pairwise import rbf_kernel, linear_kernel
 
class SVDD(BaseEstimator, OutlierMixin):
    """
    Support Vector Data Description (SVDD) for anomaly detection.
    
    Finds a minimum-volume hypersphere enclosing the training data,
    with optional soft margin for outlier tolerance.
    
    Parameters:
    -----------
    C : float, default=1.0
        Regularization parameter. Larger C = tighter sphere (less tolerance for outliers)
    kernel : str, default='rbf'
        Kernel type: 'linear' or 'rbf'
    gamma : float, default='scale'
        Kernel coefficient for RBF
    """
    
    def __init__(self, C=1.0, kernel='rbf', gamma='scale'):
        self.C = C
        self.kernel = kernel
        self.gamma = gamma
    
    def _compute_kernel(self, X, Y=None):
        """Compute the kernel matrix between X and Y."""
        if Y is None:
            Y = X
        
        if self.kernel == 'linear':
            return linear_kernel(X, Y)
        elif self.kernel == 'rbf':
            return rbf_kernel(X, Y, gamma=self.gamma_)
        else:
            raise ValueError(f"Unknown kernel: {self.kernel}")
    
    def fit(self, X, y=None):
        """
        Fit the SVDD model.
        
        Solves the dual QP to find optimal alpha coefficients.
        """
        X = np.asarray(X)
        n_samples = X.shape[0]
        self.X_train_ = X
        
        # Set gamma for RBF kernel
        if self.gamma == 'scale':
            self.gamma_ = 1.0 / (X.shape[1] * X.var())
        else:
            self.gamma_ = self.gamma
        
        # Compute kernel matrix
        K = self._compute_kernel(X)
        
        # Diagonal elements: k(xi, xi)
        K_diag = np.diag(K)
        
        # Dual objective: maximize sum_i alpha_i * K_ii - sum_ij alpha_i * alpha_j * K_ij
        # Equivalent to: minimize sum_ij alpha_i * alpha_j * K_ij - sum_i alpha_i * K_ii
        def objective(alpha):
            return alpha @ K @ alpha - alpha @ K_diag
        
        def objective_grad(alpha):
            return 2 * K @ alpha - K_diag
        
        # Constraints: sum(alpha) = 1, 0 <= alpha <= C
        constraints = {'type': 'eq', 'fun': lambda a: np.sum(a) - 1, 'jac': lambda a: np.ones(n_samples)}
        bounds = [(0, self.C) for _ in range(n_samples)]
        
        # Initial guess: uniform distribution
        alpha_init = np.ones(n_samples) / n_samples
        
        # Solve the QP
        result = minimize(
            objective,
            alpha_init,
            method='SLSQP',
            jac=objective_grad,
            bounds=bounds,
            constraints=constraints,
            options={'maxiter': 1000, 'ftol': 1e-10}
        )
        
        self.alpha_ = result.x
        
        # Identify support vectors (alpha > small threshold)
        sv_threshold = 1e-6
        self.support_ = np.where(self.alpha_ > sv_threshold)[0]
        self.support_vectors_ = X[self.support_]
        self.alpha_sv_ = self.alpha_[self.support_]
        
        # Compute the center in feature space: a = sum_i alpha_i * phi(x_i)
        # For predictions, we need: ||phi(x) - a||^2 = k(x,x) - 2*sum_i alpha_i k(x,x_i) + sum_ij alpha_i alpha_j k(x_i,x_j)
        
        # The last term is constant: sum_ij alpha_i alpha_j K_ij
        self.center_norm_sq_ = self.alpha_ @ K @ self.alpha_
        
        # Compute radius from boundary support vectors (0 < alpha < C)
        boundary_sv = np.where((self.alpha_ > sv_threshold) & (self.alpha_ < self.C - sv_threshold))[0]
        
        if len(boundary_sv) > 0:
            # Use any boundary SV to compute R^2
            i = boundary_sv[0]
            # ||phi(x_i) - a||^2 = K_ii - 2 * sum_j alpha_j K_ij + sum_jk alpha_j alpha_k K_jk
            self.R_squared_ = K_diag[i] - 2 * self.alpha_ @ K[i] + self.center_norm_sq_
        else:
            # No clear boundary SVs; use the maximum distance of SVs
            distances = np.array([
                K_diag[i] - 2 * self.alpha_ @ K[i] + self.center_norm_sq_
                for i in self.support_
            ])
            self.R_squared_ = np.max(distances)
        
        return self
    
    def decision_function(self, X):
        """
        Compute the distance to the hypersphere center.
        
        Returns R^2 - ||phi(x) - a||^2
        Positive = inside sphere (normal), Negative = outside (anomaly)
        """
        X = np.asarray(X)
        
        # Kernel with training data
        K_test_train = self._compute_kernel(X, self.X_train_)
        
        # k(x, x) for test points
        if self.kernel == 'rbf':
            K_test_diag = np.ones(X.shape[0])  # RBF: k(x,x) = 1
        else:
            K_test_diag = np.sum(X * X, axis=1)  # Linear: k(x,x) = ||x||^2
        
        # ||phi(x) - a||^2 = k(x,x) - 2 * sum_i alpha_i k(x, x_i) + ||a||^2
        distances_sq = K_test_diag - 2 * K_test_train @ self.alpha_ + self.center_norm_sq_
        
        # Return R^2 - distance^2 (positive inside, negative outside)
        return self.R_squared_ - distances_sq
    
    def predict(self, X):
        """
        Predict if samples are normal (+1) or anomalies (-1).
        """
        scores = self.decision_function(X)
        return np.where(scores >= 0, 1, -1)
    
    def fit_predict(self, X, y=None):
        """Fit and predict on the same data."""
        self.fit(X)
        return self.predict(X)
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.datasets import make_blobs
    from sklearn.metrics import classification_report
    
    # Generate normal training data
    X_train, _ = make_blobs(n_samples=200, centers=[[0, 0]], cluster_std=0.8, random_state=42)
    
    # Generate test data
    X_test_normal, _ = make_blobs(n_samples=50, centers=[[0, 0]], cluster_std=0.8, random_state=43)
    X_anomalies = np.random.uniform(-4, 4, size=(25, 2))
    
    X_test = np.vstack([X_test_normal, X_anomalies])
    y_test = np.array([1] * 50 + [-1] * 25)
    
    # Train SVDD
    print("Training SVDD...")
    svdd = SVDD(C=0.1, kernel='rbf', gamma='scale')
    svdd.fit(X_train)
    
    print(f"Number of support vectors: {len(svdd.support_)}")
    print(f"Radius squared: {svdd.R_squared_:.4f}")
    
    # Predict
    predictions = svdd.predict(X_test)
    print("\nClassification Report:")
    print(classification_report(y_test, predictions, target_names=['Anomaly', 'Normal']))

Kernel SVDD: Nonlinear Data Descriptions

The Kernel Trick in SVDD:

Replace all inner products xᵢ · xⱼ with kernel evaluations k(xᵢ, xⱼ) = φ(xᵢ) · φ(xⱼ):

$$\max_{\alpha} \sum_{i=1}^{n} \alpha_i k(x_i, x_i) - \sum_{i,j} \alpha_i \alpha_j k(x_i, x_j)$$

Subject to: 0 ≤ αᵢ ≤ C, Σαᵢ = 1

What Happens in Feature Space:

RBF Kernel: The Special Case

With an RBF kernel, k(x, x) = 1 for all x. This means:

||φ(x)||² = k(x, x) = 1

All points lie on the unit hypersphere in feature space! This has profound implications:

The optimization simplifies (diagonal terms are constant)
Finding a minimum enclosing ball on a sphere becomes finding a spherical cap
This makes SVDD with RBF kernel mathematically equivalent to One-Class SVM

The equivalence: separating data from the origin (OC-SVM) on the unit sphere is the same as finding the minimum enclosing spherical cap (SVDD on the sphere).

Equivalence Between SVDD and One-Class SVM:

For kernels where k(x, x) = constant (like RBF with k(x,x) = 1):

SVDD objective: max Σαᵢk(xᵢ,xᵢ) - Σαᵢαⱼk(xᵢ,xⱼ) = n·const - Σαᵢαⱼk(xᵢ,xⱼ)

One-Class SVM objective: min (1/2)Σαᵢαⱼk(xᵢ,xⱼ)

Both have constraints 0 ≤ αᵢ ≤ C', Σαᵢ = 1. Since the constant term doesn't affect the optimization, both methods find the same α, hence the same decision boundary.

Practical Implication:

When SVDD Differs from One-Class SVM
Scenario	SVDD Behavior	One-Class SVM Behavior	Preferred
RBF kernel	Spherical cap on unit sphere	Hyperplane separating from origin	Equivalent
Linear kernel	Sphere in input space	Half-space in input space	Depends on data shape
Polynomial kernel	Different k(x,x) values	Different k(x,x) values	Not equivalent
Compact, clustered data	Natural interpretation	Less intuitive	SVDD
Directional data spread	May need large sphere	More natural	One-Class SVM

Hyperparameter Selection for SVDD

The C Parameter:

C controls the trade-off between sphere volume and training error tolerance:

C → ∞: Hard margin; sphere must contain all training points
C → 0: Extreme soft margin; many points treated as outliers
Typical range: [0.01, 10], often expressed as fraction 1/n ≤ C ≤ 1

Alternative Parameterization:

Some implementations use ν ∈ (0, 1] instead of C, with interpretation:

ν ≈ upper bound on fraction of training outliers
ν ≈ lower bound on fraction of support vectors

When C = 1/(νn), the formulations are roughly equivalent.

Small C Effects

•Smaller hypersphere
•More training points classified as outliers
•Higher sensitivity to anomalies
•Risk of overfitting to dense core
•Lower recall on test anomalies if boundary is too tight

Large C Effects

•Larger hypersphere
•More training points included
•Lower sensitivity to anomalies
•May include training outliers in normal region
•Risk of missing true test anomalies

The γ Parameter (RBF Kernel):

γ controls the kernel width, affecting how 'local' or 'global' the kernel is:

Large γ: Each point affects only nearby points → complex, tight boundary
Small γ: Each point affects many points → smooth, global boundary

Selection Strategies:

Grid search with proxy metrics: Since we lack labeled anomalies, use:
- Fraction of training data classified as normal (target ≈ 1 - expected contamination)
- Stability under bootstrap resampling
- Radius magnitude (not too large, not collapsed to zero)
Heuristics:
- γ = 1 / (n_features × variance)
- γ = 1 / (2 × median pairwise distance²)
Domain knowledge:
- If you know typical variation scale δ, set γ ≈ 1/(2δ²)

svdd_hyperparameter_tuning.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
import numpy as np
from sklearn.model_selection import ParameterGrid
from sklearn.preprocessing import StandardScaler
import warnings
 
def tune_svdd_parameters(X, C_range, gamma_range, target_outlier_fraction=0.1, n_splits=5):
    """
    Tune SVDD parameters using stability-based evaluation.
    
    Since we don't have labeled anomalies, we use:
    1. Training outlier fraction (should match expected contamination)
    2. Stability under cross-validation splits
    3. Radius reasonableness (not too large or collapsed)
    
    Parameters:
    -----------
    X : array-like of shape (n_samples, n_features)
        Training data
    C_range : array-like
        Values of C to try
    gamma_range : array-like
        Values of gamma to try (for RBF kernel)
    target_outlier_fraction : float
        Expected fraction of outliers in training data
    n_splits : int
        Number of splits for stability estimation
        
    Returns:
    --------
    best_params : dict
        Best (C, gamma) parameters
    results : list
        All evaluation results
    """
    from svdd_implementation import SVDD  # From previous code block
    
    X = np.asarray(X)
    n_samples = X.shape[0]
    
    # Scale data
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    results = []
    
    for params in ParameterGrid({'C': C_range, 'gamma': gamma_range}):
        C, gamma = params['C'], params['gamma']
        
        # Train on full data
        model = SVDD(C=C, kernel='rbf', gamma=gamma)
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            try:
                model.fit(X_scaled)
            except:
                continue
        
        # Metric 1: Training outlier fraction
        train_predictions = model.predict(X_scaled)
        train_outlier_fraction = np.mean(train_predictions == -1)
        
        # Metric 2: Deviation from target
        target_deviation = abs(train_outlier_fraction - target_outlier_fraction)
        
        # Metric 3: Support vector ratio
        sv_ratio = len(model.support_) / n_samples
        
        # Metric 4: Radius (should be positive and reasonable)
        radius = np.sqrt(max(0, model.R_squared_))
        
        # Metric 5: Stability under random splits
        stabilities = []
        for seed in range(n_splits):
            np.random.seed(seed)
            sample_idx = np.random.choice(n_samples, size=int(0.8 * n_samples), replace=False)
            X_sample = X_scaled[sample_idx]
            
            model_split = SVDD(C=C, kernel='rbf', gamma=gamma)
            try:
                model_split.fit(X_sample)
                pred_full = model.predict(X_scaled)
                pred_split = model_split.predict(X_scaled)
                agreement = np.mean(pred_full == pred_split)
                stabilities.append(agreement)
            except:
                stabilities.append(0.5)
        
        stability = np.mean(stabilities)
        
        result = {
            'C': C,
            'gamma': gamma,
            'train_outlier_fraction': train_outlier_fraction,
            'target_deviation': target_deviation,
            'sv_ratio': sv_ratio,
            'radius': radius,
            'stability': stability,
            # Combined score: low deviation, high stability
            'score': -target_deviation + 0.5 * stability
        }
        results.append(result)
        
        print(f"C={C:.3f}, γ={gamma:.3f}: outlier={train_outlier_fraction:.2%}, "
              f"stability={stability:.2%}, radius={radius:.3f}")
    
    # Select best by combined score
    best = max(results, key=lambda r: r['score'])
    
    print(f"\nBest parameters: C={best['C']:.4f}, γ={best['gamma']:.4f}")
    print(f"  Outlier fraction: {best['train_outlier_fraction']:.2%}")
    print(f"  Stability: {best['stability']:.2%}")
    
    return {'C': best['C'], 'gamma': best['gamma']}, results
 
 
# Automated heuristic selection
def svdd_auto_parameters(X, expected_contamination=0.1):
    """
    Automatically select SVDD parameters using heuristics.
    
    Returns reasonable starting values for C and gamma.
    """
    from scipy.spatial.distance import pdist
    
    X = np.asarray(X)
    n_samples, n_features = X.shape
    
    # Gamma: median heuristic
    pairwise_dists = pdist(X, 'sqeuclidean')
    gamma = 1.0 / (2 * np.median(pairwise_dists))
    
    # C: based on expected contamination
    # C = 1/(n * nu) where nu ≈ expected_contamination
    C = 1.0 / (n_samples * expected_contamination)
    
    # Clamp to reasonable range
    C = np.clip(C, 0.001, 10.0)
    gamma = np.clip(gamma, 1e-6, 1e3)
    
    return {'C': C, 'gamma': gamma}
 
 
if __name__ == "__main__":
    from sklearn.datasets import make_blobs
    
    # Generate data with some outliers
    X_normal, _ = make_blobs(n_samples=180, centers=[[0, 0]], cluster_std=1.0, random_state=42)
    X_outliers = np.random.uniform(-5, 5, size=(20, 2))
    X = np.vstack([X_normal, X_outliers])  # 10% contamination
    
    print("Automatic parameter selection:")
    auto_params = svdd_auto_parameters(X, expected_contamination=0.1)
    print(f"  C = {auto_params['C']:.4f}")
    print(f"  γ = {auto_params['gamma']:.4f}")
    
    print("\nGrid search tuning:")
    C_range = [0.01, 0.05, 0.1, 0.5, 1.0]
    gamma_range = [0.1, 0.5, 1.0, 2.0, 5.0]
    best_params, results = tune_svdd_parameters(X, C_range, gamma_range, target_outlier_fraction=0.1)

Applications and Use Cases

1. Machine Condition Monitoring:

In industrial settings, sensors continuously monitor machine vibrations, temperatures, and pressures. Training data comes from periods of known normal operation.

SVDD creates a 'normal operation envelope' in feature space
New sensor readings outside the envelope trigger maintenance alerts
The sphere interpretation is intuitive for domain experts: 'how far are we from normal?'

2. Network Intrusion Detection:

Network traffic during normal operation is characterized (packet sizes, timing, protocols). SVDD learns the normal traffic envelope.

Attacks often generate unusual traffic patterns
Fast prediction on new traffic via kernel evaluation with support vectors
Handles high-dimensional feature spaces common in network data

SVDD Application Domains
Domain	Normal Class	Anomalies	Why SVDD Works
Manufacturing QC	Products within spec	Defective items	Compact normal region in measurement space
Medical imaging	Healthy tissue patterns	Lesions, tumors	Normal anatomy forms coherent clusters
Fraud detection	Legitimate transactions	Fraudulent activity	Normal behavior is routine/clustered
Novelty detection	Known object classes	Unknown objects	Training classes form bounded regions
Predictive maintenance	Healthy equipment signals	Early failure indicators	Degradation moves away from normal envelope

SVDD for One-Class Classification

Beyond anomaly detection, SVDD is used for one-class classification: problems where you have examples only from one class and want to recognize new instances of that class.

Examples: • Face verification: Is this face 'me' or 'not me'? • Document filtering: Is this document about 'topic X'? • Biometric authentication: Is this the registered user?

The SVDD decision boundary represents 'what I know'; everything outside is 'unknown'.

Comparison with Other One-Class Methods:

When should you choose SVDD over alternatives?

Choose SVDD when:

Data forms a relatively compact cluster
You want geometric interpretability (distance from center)
Training set has moderate contamination (handled by soft margin)
You need kernel flexibility for nonlinear boundaries

Consider alternatives when:

Data has multiple distinct clusters → use multiple SVDDs or GMM
Very high dimensions with complex structure → use autoencoders
Need probabilistic output → use density estimation or deep methods
Computational resources limited for large data → use Isolation Forest

Summary and Key Takeaways

We've comprehensively explored Support Vector Data Description, from geometric intuition to practical implementation. Let's consolidate the essential insights.

Key Takeaways

•Geometric principle: SVDD finds a minimum-volume hypersphere enclosing the training data. Points inside are normal; points outside are anomalies. This provides intuitive 'ball around the data' interpretation.
•Soft margin formulation: The C parameter (or ν) controls tolerance for training outliers. Larger C = tighter sphere excluding fewer training points; smaller C = looser fit treating more training points as outliers.
•Kernel SVDD: Using kernels (especially RBF) enables nonlinear boundaries. In feature space, we fit a sphere; in input space, this becomes a flexible, data-adaptive decision boundary.
•Equivalence with One-Class SVM: For RBF and other kernels with constant k(x,x), SVDD and One-Class SVM produce identical decision boundaries. Choose based on interpretability preference.
•Hyperparameter selection: Use training outlier fraction targets, stability metrics, and domain heuristics. Grid search with these proxy metrics works well without labeled anomalies.
•Best for compact data: SVDD excels when normal data forms a coherent, bounded cluster. For multimodal or highly complex data, consider ensembles or neural methods.

What's Next:

Page Complete

2 / 5