Loading content...
While autoencoders remain popular for anomaly detection, the deep learning toolkit offers many more powerful approaches. This page explores advanced neural network methods that address limitations of reconstruction-based approaches and push the boundaries of what's possible in anomaly detection.
We'll examine:
These methods address scenarios where autoencoders struggle: complex high-dimensional data, multimodal distributions, and cases requiring fine-grained anomaly localization.
By the end of this page, you will understand: (1) Deep SVDD and its neural one-class formulation, (2) How GANs can be repurposed for anomaly detection, (3) Self-supervised pretext tasks that create anomaly-sensitive representations, (4) Hybrid architectures combining deep and classical methods, (5) Attention mechanisms for interpretable anomaly detection, and (6) How to choose among these methods for different scenarios.
Deep Support Vector Data Description (Deep SVDD) combines the geometric intuition of SVDD with the representation learning power of deep neural networks. Instead of working in a fixed kernel space, Deep SVDD learns a neural network mapping that places normal data close to a center while pushing anomalies away.
Core Idea:
Learn a neural network φ(x; W) such that normal data maps close to a center c in the output space:
$$\min_W \frac{1}{n} \sum_{i=1}^{n} ||\phi(x_i; W) - c||^2 + \frac{\lambda}{2} \sum_{l} ||W^l||_F^2$$
The network learns representations where normal data clusters tightly around c, while anomalies—having different characteristics—map further away.
Key Components:
A naive implementation can collapse: the network might learn φ(x) = c for all inputs, achieving zero loss but useless for detection.
Preventions: • Remove bias terms in the final layer • Use bounded activations (tanh, sigmoid) in certain layers • Add auxiliary regularization tasks • Initialize c as the mean of pretrained representations • Use soft-boundary variant with learned radius
The Deep SVDD paper recommends using LeakyReLU without biases in the network.
Variants of Deep SVDD:
1. Soft-Boundary Deep SVDD:
Instead of a fixed objective, learn both the center and a data-enclosing hypersphere:
$$\min_{W, R} R^2 + \frac{1}{\nu n} \sum_{i=1}^{n} \max(0, ||\phi(x_i) - c||^2 - R^2) + \lambda ||W||^2$$
This mirrors classic soft-margin SVDD but with learned representations.
2. Deep SAD (Semi-Supervised):
When a few labeled anomalies are available, incorporate them:
$$\mathcal{L} = \frac{1}{n} \sum_{i=1}^{n} ||\phi(x_i) - c||^2 \cdot y_i \cdot (1 - y_i \cdot s_i)$$
where labeled anomalies (yᵢ = -1) are pushed away from the center.
3. Contrastive Deep SVDD:
Use contrastive learning to create more discriminative representations before applying the one-class objective.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232
import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as np class DeepSVDDNetwork(nn.Module): """ Feature extraction network for Deep SVDD. Following the original paper: - No bias terms to prevent trivial solution - LeakyReLU activations - Batch normalization for stable training """ def __init__(self, input_dim, hidden_dims=[128, 64], output_dim=32): super().__init__() layers = [] prev_dim = input_dim for hidden_dim in hidden_dims: layers.append(nn.Linear(prev_dim, hidden_dim, bias=False)) layers.append(nn.BatchNorm1d(hidden_dim, affine=False)) layers.append(nn.LeakyReLU(0.1)) prev_dim = hidden_dim # Final layer to output space layers.append(nn.Linear(prev_dim, output_dim, bias=False)) self.network = nn.Sequential(*layers) def forward(self, x): return self.network(x) class DeepSVDD: """ Deep Support Vector Data Description for anomaly detection. Learns a neural network that maps normal data close to a center c, while anomalies map further away. """ def __init__( self, input_dim, hidden_dims=[128, 64], output_dim=32, nu=0.1, # For soft-boundary variant device='cuda' if torch.cuda.is_available() else 'cpu' ): self.device = device self.nu = nu # Feature extraction network self.net = DeepSVDDNetwork( input_dim, hidden_dims, output_dim ).to(device) # Center will be initialized during training self.center = None self.R = None # Radius for soft-boundary def init_center(self, dataloader, eps=0.1): """ Initialize center c as mean of network outputs on training data. Add small epsilon to avoid center at origin (which could encourage collapse). """ self.net.eval() outputs = [] with torch.no_grad(): for batch in dataloader: x = batch[0].to(self.device) z = self.net(x) outputs.append(z) outputs = torch.cat(outputs, dim=0) center = outputs.mean(dim=0) # Avoid center components being exactly zero center[(torch.abs(center) < eps) & (center >= 0)] = eps center[(torch.abs(center) < eps) & (center < 0)] = -eps self.center = center.detach() return self.center def fit(self, X_train, epochs=100, batch_size=64, lr=1e-3, weight_decay=1e-6, pretrain_epochs=50, pretrain_ae=True): """ Train Deep SVDD. Optionally pretrain as autoencoder to initialize network weights. """ from torch.utils.data import DataLoader, TensorDataset dataset = TensorDataset(torch.FloatTensor(X_train)) dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True) # Optional: Pretrain as autoencoder if pretrain_ae and pretrain_epochs > 0: print("Pretraining as autoencoder...") self._pretrain_autoencoder(dataloader, pretrain_epochs, lr) # Initialize center from pretrained/initial network print("Initializing center...") self.init_center(dataloader) # Main Deep SVDD training optimizer = torch.optim.Adam( self.net.parameters(), lr=lr, weight_decay=weight_decay ) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.5) print("Training Deep SVDD...") self.net.train() for epoch in range(epochs): total_loss = 0 n_batches = 0 for batch in dataloader: x = batch[0].to(self.device) # Forward pass z = self.net(x) # One-class loss: distance to center dist_sq = torch.sum((z - self.center) ** 2, dim=1) loss = torch.mean(dist_sq) # Backward pass optimizer.zero_grad() loss.backward() optimizer.step() total_loss += loss.item() n_batches += 1 scheduler.step() if epoch % 20 == 0: avg_loss = total_loss / n_batches print(f"Epoch {epoch}: loss={avg_loss:.6f}") return self def _pretrain_autoencoder(self, dataloader, epochs, lr): """Pretrain the network as an autoencoder for better initialization.""" input_dim = next(iter(dataloader))[0].shape[1] # Build decoder (reverse of encoder) decoder = nn.Sequential( nn.Linear(32, 64, bias=False), nn.BatchNorm1d(64, affine=False), nn.LeakyReLU(0.1), nn.Linear(64, 128, bias=False), nn.BatchNorm1d(128, affine=False), nn.LeakyReLU(0.1), nn.Linear(128, input_dim, bias=False) ).to(self.device) params = list(self.net.parameters()) + list(decoder.parameters()) optimizer = torch.optim.Adam(params, lr=lr) for epoch in range(epochs): for batch in dataloader: x = batch[0].to(self.device) z = self.net(x) x_recon = decoder(z) loss = F.mse_loss(x_recon, x) optimizer.zero_grad() loss.backward() optimizer.step() def decision_function(self, X): """ Compute anomaly scores (distance to center). Higher = more anomalous. """ self.net.eval() X_tensor = torch.FloatTensor(X).to(self.device) with torch.no_grad(): z = self.net(X_tensor) dist_sq = torch.sum((z - self.center) ** 2, dim=1) return dist_sq.cpu().numpy() def predict(self, X, threshold=None): """Predict normal (1) or anomaly (-1).""" scores = self.decision_function(X) if threshold is None: # Use quantile-based threshold (top nu fraction as anomalies) threshold = np.percentile(scores, (1 - self.nu) * 100) return np.where(scores <= threshold, 1, -1) # Example usageif __name__ == "__main__": from sklearn.datasets import make_moons from sklearn.metrics import roc_auc_score, classification_report # Training data (normal) X_normal, _ = make_moons(n_samples=500, noise=0.05, random_state=42) X_normal = X_normal.astype(np.float32) # Test data X_test_normal, _ = make_moons(n_samples=100, noise=0.05, random_state=43) X_anomalies = np.random.uniform(-2, 3, size=(50, 2)).astype(np.float32) X_test = np.vstack([X_test_normal, X_anomalies]) y_test = np.array([1] * 100 + [-1] * 50) # Train Deep SVDD model = DeepSVDD(input_dim=2, hidden_dims=[64, 32], output_dim=16, nu=0.1) model.fit(X_normal, epochs=100, pretrain_epochs=30) # Evaluate scores = model.decision_function(X_test) predictions = model.predict(X_test) print("\nClassification Report:") print(classification_report(y_test, predictions, target_names=['Anomaly', 'Normal'])) print(f"AUROC: {roc_auc_score(y_test == -1, scores):.4f}")Generative Adversarial Networks offer a fundamentally different paradigm for anomaly detection. Instead of learning to reconstruct data, GANs learn to generate it. The insight: if a GAN trained on normal data can't generate something similar to a test sample, that sample is likely anomalous.
GAN Architecture Recap:
$$\min_G \max_D \mathbb{E}[\log D(x)] + \mathbb{E}[\log(1 - D(G(z)))]$$
Anomaly Detection Approaches with GANs:
| Method | Anomaly Score | Key Idea |
|---|---|---|
| AnoGAN | Reconstruction + Discriminator | Find z that best reconstructs input; anomalies can't be reconstructed |
| f-AnoGAN | Encoder + Reconstruction | Train encoder to map images to latent space directly (faster) |
| GANomaly | Latent + Reconstruction | Encoder-decoder-encoder architecture; measure latent consistency |
| Discriminator-only | D(x) score | Discriminator outputs anomaly probability directly |
| EGBAD | BiGAN framework | Bidirectional mapping between data and latent space |
AnoGAN: The Foundation
AnoGAN (Anomaly GAN) was the first major work on GAN-based anomaly detection. The core idea:
The Intuition:
A well-trained GAN's generator can only produce samples in the normal data distribution. If x is normal, there exists a z that generates something close to x. If x is anomalous, no z can generate it well.
Limitation: AnoGAN requires expensive iterative optimization at test time (finding z* via gradient descent). This makes inference slow.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272
import torchimport torch.nn as nnimport torch.nn.functional as Fimport numpy as np class Encoder(nn.Module): """Encoder G_E: Maps input to latent space.""" def __init__(self, input_dim, latent_dim, hidden_dims=[128, 64]): super().__init__() layers = [] prev_dim = input_dim for h_dim in hidden_dims: layers.extend([ nn.Linear(prev_dim, h_dim), nn.BatchNorm1d(h_dim), nn.LeakyReLU(0.2) ]) prev_dim = h_dim layers.append(nn.Linear(prev_dim, latent_dim)) self.encoder = nn.Sequential(*layers) def forward(self, x): return self.encoder(x) class Decoder(nn.Module): """Decoder G_D: Maps latent to reconstructed input.""" def __init__(self, latent_dim, output_dim, hidden_dims=[64, 128]): super().__init__() layers = [] prev_dim = latent_dim for h_dim in hidden_dims: layers.extend([ nn.Linear(prev_dim, h_dim), nn.BatchNorm1d(h_dim), nn.LeakyReLU(0.2) ]) prev_dim = h_dim layers.append(nn.Linear(prev_dim, output_dim)) layers.append(nn.Tanh()) # Output in [-1, 1] self.decoder = nn.Sequential(*layers) def forward(self, z): return self.decoder(z) class Discriminator(nn.Module): """Discriminator with feature extraction for intermediate layers.""" def __init__(self, input_dim, hidden_dims=[128, 64]): super().__init__() self.features = nn.Sequential( nn.Linear(input_dim, hidden_dims[0]), nn.LeakyReLU(0.2), nn.Linear(hidden_dims[0], hidden_dims[1]), nn.LeakyReLU(0.2), ) self.classifier = nn.Sequential( nn.Linear(hidden_dims[1], 1), nn.Sigmoid() ) def forward(self, x): features = self.features(x) validity = self.classifier(features) return validity, features class GANomaly: """ GANomaly for anomaly detection. Architecture: Encoder1 -> Decoder -> Encoder2 Anomaly score: difference between Encoder1(x) and Encoder2(G(Encoder1(x))) The idea: for normal data, the latent codes should be consistent. For anomalies, Encoder2 produces different codes than Encoder1. """ def __init__( self, input_dim, latent_dim=32, hidden_dims=[128, 64], device='cuda' if torch.cuda.is_available() else 'cpu' ): self.device = device self.latent_dim = latent_dim # Generator: Encoder1 -> Decoder self.encoder1 = Encoder(input_dim, latent_dim, hidden_dims).to(device) self.decoder = Decoder(latent_dim, input_dim, list(reversed(hidden_dims))).to(device) # Second encoder: maps reconstructed input back to latent space self.encoder2 = Encoder(input_dim, latent_dim, hidden_dims).to(device) # Discriminator self.discriminator = Discriminator(input_dim, hidden_dims).to(device) self.threshold = None def generator_forward(self, x): """Full generator pass: x -> z1 -> x_hat -> z2""" z1 = self.encoder1(x) x_hat = self.decoder(z1) z2 = self.encoder2(x_hat) return x_hat, z1, z2 def fit(self, X_train, epochs=100, batch_size=64, lr=2e-4, w_adv=1.0, w_con=50.0, w_enc=1.0): """ Train GANomaly. Loss components: - Adversarial: discriminator loss - Contextual: reconstruction ||x - x_hat|| - Encoder: latent consistency ||z1 - z2|| """ from torch.utils.data import DataLoader, TensorDataset # Normalize to [-1, 1] X_mean, X_std = X_train.mean(axis=0), X_train.std(axis=0) + 1e-8 X_normalized = (X_train - X_mean) / X_std self.X_mean, self.X_std = torch.FloatTensor(X_mean), torch.FloatTensor(X_std) dataset = TensorDataset(torch.FloatTensor(X_normalized)) dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True) # Optimizers gen_params = list(self.encoder1.parameters()) + list(self.decoder.parameters()) + list(self.encoder2.parameters()) opt_g = torch.optim.Adam(gen_params, lr=lr, betas=(0.5, 0.999)) opt_d = torch.optim.Adam(self.discriminator.parameters(), lr=lr, betas=(0.5, 0.999)) criterion_bce = nn.BCELoss() criterion_l1 = nn.L1Loss() criterion_l2 = nn.MSELoss() print("Training GANomaly...") for epoch in range(epochs): total_g_loss = 0 total_d_loss = 0 n_batches = 0 for batch in dataloader: x_real = batch[0].to(self.device) batch_size_curr = x_real.size(0) # Labels real_label = torch.ones(batch_size_curr, 1, device=self.device) fake_label = torch.zeros(batch_size_curr, 1, device=self.device) # --------------------- # Train Discriminator # --------------------- opt_d.zero_grad() # Real samples pred_real, _ = self.discriminator(x_real) loss_d_real = criterion_bce(pred_real, real_label) # Fake samples x_fake, _, _ = self.generator_forward(x_real) pred_fake, _ = self.discriminator(x_fake.detach()) loss_d_fake = criterion_bce(pred_fake, fake_label) loss_d = (loss_d_real + loss_d_fake) / 2 loss_d.backward() opt_d.step() # --------------------- # Train Generator # --------------------- opt_g.zero_grad() x_hat, z1, z2 = self.generator_forward(x_real) # Adversarial loss (fool discriminator) pred_fake_g, feat_fake = self.discriminator(x_hat) _, feat_real = self.discriminator(x_real) loss_adv = criterion_l2(feat_fake, feat_real) # Feature matching # Contextual loss (reconstruction) loss_con = criterion_l1(x_hat, x_real) # Encoder loss (latent consistency) loss_enc = criterion_l2(z2, z1) # Total generator loss loss_g = w_adv * loss_adv + w_con * loss_con + w_enc * loss_enc loss_g.backward() opt_g.step() total_g_loss += loss_g.item() total_d_loss += loss_d.item() n_batches += 1 if epoch % 20 == 0: print(f"Epoch {epoch}: G_loss={total_g_loss/n_batches:.4f}, D_loss={total_d_loss/n_batches:.4f}") # Calibrate threshold self.calibrate_threshold(X_train) return self def anomaly_score(self, x): """ Compute anomaly score based on encoder latent consistency. Higher = more anomalous. """ x_norm = (x - self.X_mean.numpy()) / self.X_std.numpy() x_tensor = torch.FloatTensor(x_norm).to(self.device) self.encoder1.eval() self.decoder.eval() self.encoder2.eval() with torch.no_grad(): _, z1, z2 = self.generator_forward(x_tensor) # Anomaly score: ||z1 - z2|| score = torch.sum((z1 - z2) ** 2, dim=1) return score.cpu().numpy() def calibrate_threshold(self, X, percentile=95): """Set threshold based on training data scores.""" scores = self.anomaly_score(X) self.threshold = np.percentile(scores, percentile) print(f"Threshold: {self.threshold:.4f}") def predict(self, X): """Predict normal (1) or anomaly (-1).""" scores = self.anomaly_score(X) return np.where(scores <= self.threshold, 1, -1) def decision_function(self, X): """Return anomaly scores.""" return self.anomaly_score(X) # Example usageif __name__ == "__main__": from sklearn.datasets import make_moons from sklearn.metrics import roc_auc_score, classification_report # Normal training data X_normal, _ = make_moons(n_samples=500, noise=0.05, random_state=42) X_normal = X_normal.astype(np.float32) # Test data X_test_normal, _ = make_moons(n_samples=100, noise=0.05, random_state=43) X_anomalies = np.random.uniform(-2, 3, size=(50, 2)).astype(np.float32) X_test = np.vstack([X_test_normal, X_anomalies]) y_test = np.array([1] * 100 + [-1] * 50) # Train GANomaly model = GANomaly(input_dim=2, latent_dim=8, hidden_dims=[32, 16]) model.fit(X_normal, epochs=100) # Evaluate scores = model.decision_function(X_test) predictions = model.predict(X_test) print("\nClassification Report:") print(classification_report(y_test, predictions, target_names=['Anomaly', 'Normal'])) print(f"AUROC: {roc_auc_score(y_test == -1, scores):.4f}")Self-supervised learning creates powerful representations by solving pretext tasks—artificial tasks that don't require labels but encourage learning useful features. For anomaly detection, the key insight is: models trained on pretext tasks for normal data will fail on anomalies.
Why Self-Supervised for Anomaly Detection?
Common Pretext Tasks:
| Pretext Task | How It Works | Anomaly Signal |
|---|---|---|
| Rotation Prediction | Predict which rotation (0°, 90°, 180°, 270°) was applied | Anomalies have inconsistent rotation patterns |
| Jigsaw Puzzle | Predict correct arrangement of shuffled patches | Anomalous patterns break expected spatial relationships |
| Contrastive Learning | Distinguish between augmented views of same vs different images | Anomalies don't cluster with normal data in embedding space |
| Masked Prediction | Predict masked portions of input | Anomalies don't follow learned completion patterns |
| Transformation Classification | Identify which transformation was applied | Model uncertain about transformations of anomalies |
Contrastive Learning for Anomaly Detection:
Contrastive methods like SimCLR learn representations by pulling together augmented views of the same image while pushing apart different images:
$$\mathcal{L} = -\log \frac{\exp(sim(z_i, z_j)/\tau)}{\sum_{k \neq i} \exp(sim(z_i, z_k)/\tau)}$$
For anomaly detection:
Distribution-Based Approach:
Fit a simple density model (GMM, KDE) in the learned representation space. The representation makes density estimation more tractable by:
Modern foundation models (CLIP, DINO, MAE) provide powerful pretrained representations that can be directly used for anomaly detection:
This approach often outperforms training from scratch, especially with limited data. The pretrained representations have already learned rich semantic structure that distinguishes normal from anomalous patterns.
Hybrid methods combine deep neural network feature extraction with classical anomaly detection algorithms. This leverages the representation learning power of deep networks while benefiting from the simplicity and interpretability of classical methods.
The Hybrid Paradigm:
Input x → Deep Feature Extractor → Features φ(x) → Classical Detector → Score
Examples:
Advantages of Hybrid Approach:
Popular Hybrid Combinations:
1. PatchCore (Industrial Anomaly Detection):
2. Deep Features + Mahalanobis Distance:
3. Autoencoder Latent Space + One-Class SVM:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234
import numpy as npimport torchimport torch.nn as nnfrom sklearn.svm import OneClassSVMfrom sklearn.mixture import GaussianMixturefrom sklearn.neighbors import LocalOutlierFactorfrom scipy.spatial.distance import mahalanobisfrom sklearn.covariance import EmpiricalCovariance class HybridAnomalyDetector: """ Hybrid anomaly detection: Deep feature extraction + Classical detection. Supports multiple classical backends: - One-Class SVM - Gaussian Mixture Model - Mahalanobis Distance - Local Outlier Factor """ def __init__( self, feature_extractor, detector_type='ocsvm', device='cuda' if torch.cuda.is_available() else 'cpu', **detector_kwargs ): """ Parameters: ----------- feature_extractor : nn.Module Neural network that maps input to feature space detector_type : str 'ocsvm', 'gmm', 'mahalanobis', or 'lof' detector_kwargs : dict Parameters for the classical detector """ self.feature_extractor = feature_extractor.to(device) self.device = device self.detector_type = detector_type self.detector_kwargs = detector_kwargs self.detector = None self.threshold = None # For Mahalanobis self.mean = None self.cov_inv = None def extract_features(self, X): """Extract features using the neural network.""" self.feature_extractor.eval() if isinstance(X, np.ndarray): X = torch.FloatTensor(X) X = X.to(self.device) with torch.no_grad(): features = self.feature_extractor(X) return features.cpu().numpy() def fit(self, X): """ Fit the hybrid detector. 1. Extract features using the neural network 2. Fit classical detector on features """ print("Extracting features...") features = self.extract_features(X) print(f"Fitting {self.detector_type} on {features.shape[1]}-dim features...") if self.detector_type == 'ocsvm': self.detector = OneClassSVM( kernel='rbf', nu=self.detector_kwargs.get('nu', 0.1), gamma=self.detector_kwargs.get('gamma', 'scale') ) self.detector.fit(features) elif self.detector_type == 'gmm': self.detector = GaussianMixture( n_components=self.detector_kwargs.get('n_components', 5), covariance_type='full', random_state=42 ) self.detector.fit(features) elif self.detector_type == 'mahalanobis': # Fit robust covariance estimator cov_estimator = EmpiricalCovariance().fit(features) self.mean = cov_estimator.location_ self.cov_inv = np.linalg.pinv(cov_estimator.covariance_) elif self.detector_type == 'lof': self.detector = LocalOutlierFactor( n_neighbors=self.detector_kwargs.get('n_neighbors', 20), contamination=self.detector_kwargs.get('contamination', 0.1), novelty=True ) self.detector.fit(features) # Calibrate threshold scores = self._score_features(features) self.threshold = np.percentile(scores, 95) print(f"Threshold set at: {self.threshold:.4f}") return self def _score_features(self, features): """Score features using the classical detector.""" if self.detector_type == 'ocsvm': # Negate so higher = more anomalous return -self.detector.decision_function(features) elif self.detector_type == 'gmm': # Negative log-likelihood return -self.detector.score_samples(features) elif self.detector_type == 'mahalanobis': # Mahalanobis distance to mean return np.array([ mahalanobis(f, self.mean, self.cov_inv) for f in features ]) elif self.detector_type == 'lof': return -self.detector.decision_function(features) def decision_function(self, X): """Compute anomaly scores (higher = more anomalous).""" features = self.extract_features(X) return self._score_features(features) def predict(self, X): """Predict normal (1) or anomaly (-1).""" scores = self.decision_function(X) return np.where(scores <= self.threshold, 1, -1) class PretrainedEncoder(nn.Module): """Example: Pretrained autoencoder encoder as feature extractor.""" def __init__(self, input_dim, hidden_dims=[128, 64], output_dim=32): super().__init__() layers = [] prev_dim = input_dim for h_dim in hidden_dims: layers.extend([ nn.Linear(prev_dim, h_dim), nn.ReLU(), nn.BatchNorm1d(h_dim) ]) prev_dim = h_dim layers.append(nn.Linear(prev_dim, output_dim)) self.encoder = nn.Sequential(*layers) def forward(self, x): return self.encoder(x) # Example: Pretrain encoder, then use with classical detectordef pretrain_encoder(X_train, input_dim, latent_dim=32, epochs=50): """Pretrain autoencoder and return just the encoder.""" # Full autoencoder encoder = PretrainedEncoder(input_dim, [128, 64], latent_dim) decoder = nn.Sequential( nn.Linear(latent_dim, 64), nn.ReLU(), nn.Linear(64, 128), nn.ReLU(), nn.Linear(128, input_dim) ) model = nn.Sequential(encoder, decoder) optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) X_tensor = torch.FloatTensor(X_train) for epoch in range(epochs): model.train() x_recon = model(X_tensor) loss = nn.functional.mse_loss(x_recon, X_tensor) optimizer.zero_grad() loss.backward() optimizer.step() if epoch % 10 == 0: print(f"Pretrain Epoch {epoch}: loss={loss.item():.6f}") return encoder if __name__ == "__main__": from sklearn.datasets import make_moons from sklearn.metrics import roc_auc_score, classification_report # Training data X_normal, _ = make_moons(n_samples=500, noise=0.05, random_state=42) X_normal = X_normal.astype(np.float32) # Test data X_test_normal, _ = make_moons(n_samples=100, noise=0.05, random_state=43) X_anomalies = np.random.uniform(-2, 3, size=(50, 2)).astype(np.float32) X_test = np.vstack([X_test_normal, X_anomalies]) y_test = np.array([1] * 100 + [-1] * 50) # Pretrain feature extractor print("Pretraining encoder...") encoder = pretrain_encoder(X_normal, input_dim=2, latent_dim=16, epochs=50) # Test different classical backends for detector_type in ['ocsvm', 'gmm', 'mahalanobis']: print(f"\n{'='*50}") print(f"Testing: Deep Features + {detector_type.upper()}") print('='*50) detector = HybridAnomalyDetector( feature_extractor=encoder, detector_type=detector_type, nu=0.1 ) detector.fit(X_normal) scores = detector.decision_function(X_test) predictions = detector.predict(X_test) print(f"AUROC: {roc_auc_score(y_test == -1, scores):.4f}")With many neural network approaches available, selecting the right one for your problem requires considering data characteristics, computational constraints, and interpretability requirements.
Decision Framework:
| Scenario | Recommended Approach | Rationale |
|---|---|---|
| Tabular data, moderate dimensionality | Deep SVDD or Dense Autoencoder | Simple architectures work well; interpretable latent space |
| Image data, pretrained models available | Hybrid: Pretrained features + Classical | Leverage rich pretrained representations |
| Image data, need localization | Autoencoder with pixel-wise error | Reconstruction error highlights anomalous regions |
| Sequence data (time series, logs) | LSTM Autoencoder or Transformer | Capture temporal dependencies |
| Limited data | Self-supervised pretrained + Simple detector | Foundation models reduce data requirements |
| Need probabilistic scores | VAE or Deep GMM | Principled uncertainty quantification |
| Fast inference required | Trained encoder + kNN or GMM | Feature extraction once; fast classical lookup |
| Some labeled anomalies available | Semi-supervised Deep SAD | Incorporates limited label information |
Start simple, add complexity as needed:
Always validate on held-out normal data and, if possible, some known anomalies.
We've explored the rich landscape of neural network approaches for anomaly detection, from principled one-class objectives to generative models and hybrid architectures.
What's Next:
In the final page of this module, we explore Threshold Selection—the often-overlooked but critical step of converting continuous anomaly scores into actionable decisions. We'll cover statistical methods, business-aligned thresholds, and dynamic adaptation strategies.
You now have a comprehensive understanding of neural network approaches for anomaly detection beyond autoencoders. You can implement Deep SVDD, GAN-based methods, leverage self-supervised representations, and design hybrid architectures. These tools prepare you to tackle complex, high-dimensional anomaly detection challenges across diverse domains.