Loading learning content...
Support Vector Machines, in their original formulation, are inherently binary classifiers. They find the optimal hyperplane separating two classes with maximum margin. Yet the real world rarely presents problems with only two outcomes. Medical diagnoses span dozens of conditions. Image classification encompasses thousands of categories. Document classification may involve hundreds of topics.
The fundamental question emerges: How do we extend the elegant mathematical framework of binary SVMs to handle K classes where K > 2?
This isn't merely an engineering inconvenience—it's a profound algorithmic challenge. The maximum margin principle, so elegantly defined for two classes, doesn't have an obvious generalization to multiple classes. The optimization problem changes fundamentally, and different decomposition strategies lead to different theoretical properties, computational requirements, and empirical performance characteristics.
By the end of this page, you will understand the One-vs-One (OvO) strategy in complete depth: its theoretical foundations, the construction of K(K-1)/2 pairwise classifiers, majority voting and its variants, handling of tied votes, computational complexity analysis, and when OvO outperforms alternative approaches.
Before diving into OvO specifically, let's crystallize why binary SVMs cannot directly handle multi-class problems and understand the design space of possible solutions.
The Binary SVM Formulation Recap:
For a binary classification problem with labels $y_i \in {-1, +1}$, the SVM optimization problem is:
$$\min_{\mathbf{w}, b} \frac{1}{2}|\mathbf{w}|^2 + C\sum_{i=1}^{n}\xi_i$$
subject to: $$y_i(\mathbf{w}^\top \mathbf{x}_i + b) \geq 1 - \xi_i, \quad \xi_i \geq 0$$
This formulation fundamentally assumes two classes with opposite labels. The constraint structure, the margin definition, and the decision function $\text{sign}(\mathbf{w}^\top \mathbf{x} + b)$ all presuppose a binary world.
There is no natural way to extend the maximum margin principle to K classes directly. Unlike logistic regression (which generalizes to softmax) or neural networks (which use K output nodes), SVMs require explicit decomposition strategies to handle multi-class scenarios.
Two Fundamental Approaches to Multi-class SVM:
The machine learning community has developed two distinct paradigms:
Decomposition Methods (Indirect): Reduce the K-class problem to multiple binary problems, then aggregate binary decisions
All-at-Once Methods (Direct): Formulate a single optimization problem that simultaneously considers all K classes
This page focuses on One-vs-One, the most intuitive decomposition approach, which constructs pairwise binary classifiers for every pair of classes.
The One-vs-One (OvO) strategy, also known as pairwise classification or all-pairs, is perhaps the most intuitive approach to multi-class classification. The core idea is beautifully simple:
For every pair of classes, train a binary SVM that distinguishes between them.
If we have K classes labeled ${1, 2, \ldots, K}$, we construct $\binom{K}{2} = \frac{K(K-1)}{2}$ binary classifiers. Each classifier $f_{ij}$ is trained to distinguish class $i$ from class $j$ using only the training examples belonging to these two classes.
The OvO strategy was popularized for SVMs by Kreßel (1999) and extensively analyzed by Hsu and Lin (2002). It has become the default multi-class strategy in popular SVM implementations like LIBSVM.
Formal Construction:
Let $\mathcal{D} = {(\mathbf{x}i, y_i)}{i=1}^n$ be our training set with $y_i \in {1, 2, \ldots, K}$.
For each pair $(i, j)$ where $1 \leq i < j \leq K$:
Extract subset: $\mathcal{D}_{ij} = {(\mathbf{x}, y) \in \mathcal{D} : y \in {i, j}}$
Relabel: Map class $i$ to $+1$ and class $j$ to $-1$
Train binary SVM: Solve the standard SVM optimization on $\mathcal{D}{ij}$: $$f{ij}(\mathbf{x}) = \text{sign}(\mathbf{w}{ij}^\top \mathbf{x} + b{ij})$$
Store classifier: The classifier $f_{ij}$ returns:
| Number of Classes (K) | Number of Classifiers K(K-1)/2 | Example Domain |
|---|---|---|
| 3 | 3 | Sentiment (Positive/Negative/Neutral) |
| 5 | 10 | Document Categories |
| 10 | 45 | Digit Recognition (0-9) |
| 26 | 325 | Letter Recognition |
| 100 | 4,950 | Fine-grained Classification |
| 1000 | 499,500 | Large-scale Image Classification |
The Quadratic Growth Problem:
The number of classifiers grows quadratically with K. For K=1000 classes, we need nearly half a million binary classifiers! This growth has significant implications:
However, there's a crucial offsetting factor: each binary classifier is trained on a much smaller subset of the data. If class sizes are roughly balanced, each classifier sees approximately $\frac{2n}{K}$ training examples instead of $n$.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
import numpy as npfrom itertools import combinationsfrom typing import List, Tuple, Dict class OneVsOneSVM: """ One-vs-One Multi-class SVM implementation. This implementation demonstrates the construction and training of K(K-1)/2 pairwise binary classifiers. """ def __init__(self, binary_svm_class, **svm_params): """ Initialize OvO classifier. Parameters: ----------- binary_svm_class : class A binary SVM class with fit(X, y) and predict(X) methods svm_params : dict Parameters to pass to each binary SVM """ self.binary_svm_class = binary_svm_class self.svm_params = svm_params self.classifiers: Dict[Tuple[int, int], object] = {} self.classes_: np.ndarray = None def fit(self, X: np.ndarray, y: np.ndarray) -> 'OneVsOneSVM': """ Train all K(K-1)/2 pairwise classifiers. Parameters: ----------- X : array of shape (n_samples, n_features) Training vectors y : array of shape (n_samples,) Target values (class labels) Returns: -------- self : OneVsOneSVM Fitted classifier """ self.classes_ = np.unique(y) n_classes = len(self.classes_) print(f"Training {n_classes * (n_classes - 1) // 2} " f"pairwise classifiers for {n_classes} classes...") # Train a classifier for each pair of classes for (class_i, class_j) in combinations(self.classes_, 2): # Extract samples belonging to class_i or class_j mask = (y == class_i) | (y == class_j) X_pair = X[mask] y_pair = y[mask] # Relabel: class_i -> +1, class_j -> -1 y_binary = np.where(y_pair == class_i, 1, -1) # Train binary SVM on this pair clf = self.binary_svm_class(**self.svm_params) clf.fit(X_pair, y_binary) # Store with (smaller_class, larger_class) key self.classifiers[(class_i, class_j)] = clf print(f" Trained classifier for classes {class_i} vs {class_j}: " f"{len(y_binary)} samples") return self def get_num_classifiers(self) -> int: """Return the number of pairwise classifiers.""" return len(self.classifiers) def get_classifier(self, class_i: int, class_j: int): """ Retrieve the classifier for a specific pair of classes. Parameters: ----------- class_i, class_j : int Class labels (order doesn't matter) Returns: -------- classifier : object The binary SVM for this pair """ # Ensure canonical ordering key = (min(class_i, class_j), max(class_i, class_j)) return self.classifiers.get(key)Training pairwise classifiers is straightforward, but prediction is where the real algorithmic challenge lies. Given a new test point $\mathbf{x}$, we must aggregate the decisions of $\frac{K(K-1)}{2}$ binary classifiers to produce a single class prediction.
The most common aggregation strategy is majority voting (also called "max-wins"):
Algorithm: Majority Voting
Initialize a vote counter $v_k = 0$ for each class $k \in {1, \ldots, K}$
For each classifier $f_{ij}$ (where $i < j$):
Predict: $\hat{y} = \arg\max_k v_k$
Each class receives exactly K-1 votes (one from each pairwise comparison it participates in). The vote count for class k represents how many other classes the point 'defeated' in pairwise comparisons. A class that wins all its pairwise comparisons receives K-1 votes.
Mathematical Properties of Voting:
Total votes cast: Each of the $\frac{K(K-1)}{2}$ classifiers casts one vote, so total votes = $\frac{K(K-1)}{2}$
Votes per class opportunity: Each class appears in $K-1$ pairwise comparisons
Maximum possible votes: A class can receive at most $K-1$ votes
Minimum votes to guarantee win: $\lceil \frac{K}{2} \rceil$ votes (but ties are possible with fewer)
Example with K=4 classes:
Classifiers: $f_{12}, f_{13}, f_{14}, f_{23}, f_{24}, f_{34}$ (6 classifiers)
Suppose for test point $\mathbf{x}$:
Vote counts: $v_1 = 2, v_2 = 0, v_3 = 1, v_4 = 3$
Prediction: Class 4 (with 3 votes)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
def predict(self, X: np.ndarray) -> np.ndarray: """ Predict class labels using majority voting. Parameters: ----------- X : array of shape (n_samples, n_features) Test vectors Returns: -------- y_pred : array of shape (n_samples,) Predicted class labels """ n_samples = X.shape[0] n_classes = len(self.classes_) # Vote matrix: votes[i, k] = votes for class k for sample i votes = np.zeros((n_samples, n_classes), dtype=np.int32) # Map class labels to indices class_to_idx = {c: i for i, c in enumerate(self.classes_)} # Collect votes from all pairwise classifiers for (class_i, class_j), clf in self.classifiers.items(): # Get predictions: +1 means class_i, -1 means class_j predictions = clf.predict(X) idx_i = class_to_idx[class_i] idx_j = class_to_idx[class_j] # Count votes votes[predictions == 1, idx_i] += 1 # Vote for class_i votes[predictions == -1, idx_j] += 1 # Vote for class_j # Return class with maximum votes winner_indices = np.argmax(votes, axis=1) return self.classes_[winner_indices] def predict_with_votes(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: """ Predict class labels and return vote counts. Useful for understanding classifier confidence. Returns: -------- y_pred : array of shape (n_samples,) Predicted class labels votes : array of shape (n_samples, n_classes) Vote counts for each class """ n_samples = X.shape[0] n_classes = len(self.classes_) votes = np.zeros((n_samples, n_classes), dtype=np.int32) class_to_idx = {c: i for i, c in enumerate(self.classes_)} for (class_i, class_j), clf in self.classifiers.items(): predictions = clf.predict(X) idx_i = class_to_idx[class_i] idx_j = class_to_idx[class_j] votes[predictions == 1, idx_i] += 1 votes[predictions == -1, idx_j] += 1 winner_indices = np.argmax(votes, axis=1) y_pred = self.classes_[winner_indices] return y_pred, votesA critical issue with majority voting is the possibility of ties—situations where multiple classes receive the same (maximum) number of votes. This is not a rare edge case; ties occur regularly in practice, especially when:
When do ties occur mathematically?
For K classes, the maximum votes any class can receive is K-1. A tie at the maximum occurs when two or more classes each receive the same top vote count.
For K=3 classes (3 classifiers), ties occur whenever no class wins all its pairwise comparisons. For example, if Class 1 beats Class 2, Class 2 beats Class 3, and Class 3 beats Class 1 (a cyclic pattern), each class has exactly 1 vote—a three-way tie!
Tie-Breaking Strategies:
Several approaches have been proposed to resolve voting ties:
1. Arbitrary Selection (Default) Simply pick the first class among those tied (or the one with smallest index). This is fast but can introduce systematic bias.
2. Random Selection Randomly choose among tied classes. Unbiased but non-deterministic, which can cause issues in production systems.
3. Distance-Based Resolution Among tied classes, pick the one whose pairwise classifiers have the largest total margin (sum of $|\mathbf{w}{ij}^\top \mathbf{x} + b{ij}|$ for classifiers involving that class).
4. Confidence-Weighted Resolution Weight each vote by the classifier's confidence (distance from the hyperplane) rather than using binary votes.
5. Prior-Based Resolution Among tied classes, pick the one with highest prior probability (most training examples).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
def predict_with_tie_breaking( self, X: np.ndarray, strategy: str = 'confidence') -> np.ndarray: """ Predict with sophisticated tie-breaking strategies. Parameters: ----------- X : array of shape (n_samples, n_features) Test vectors strategy : str Tie-breaking strategy: 'first', 'random', 'confidence', 'prior' Returns: -------- y_pred : array of shape (n_samples,) Predicted class labels """ n_samples = X.shape[0] n_classes = len(self.classes_) class_to_idx = {c: i for i, c in enumerate(self.classes_)} # Collect votes votes = np.zeros((n_samples, n_classes), dtype=np.int32) # For confidence-based tie-breaking, also track distances if strategy == 'confidence': confidence_sums = np.zeros((n_samples, n_classes)) for (class_i, class_j), clf in self.classifiers.items(): # Get decision values (distances from hyperplane) if hasattr(clf, 'decision_function'): distances = clf.decision_function(X) predictions = np.sign(distances) else: predictions = clf.predict(X) distances = predictions # Fallback idx_i = class_to_idx[class_i] idx_j = class_to_idx[class_j] # Count votes votes[predictions >= 0, idx_i] += 1 votes[predictions < 0, idx_j] += 1 # Accumulate confidence for tie-breaking if strategy == 'confidence': abs_dist = np.abs(distances) confidence_sums[predictions >= 0, idx_i] += abs_dist[predictions >= 0] confidence_sums[predictions < 0, idx_j] += abs_dist[predictions < 0] # Determine winners, handling ties y_pred = np.zeros(n_samples, dtype=self.classes_.dtype) for i in range(n_samples): max_votes = votes[i].max() tied_idx = np.where(votes[i] == max_votes)[0] if len(tied_idx) == 1: # No tie - clear winner y_pred[i] = self.classes_[tied_idx[0]] else: # Tie - apply strategy if strategy == 'first': # Pick first (smallest index) winner_idx = tied_idx[0] elif strategy == 'random': # Random selection winner_idx = np.random.choice(tied_idx) elif strategy == 'confidence': # Pick highest confidence among tied classes tied_confidences = confidence_sums[i, tied_idx] winner_idx = tied_idx[np.argmax(tied_confidences)] elif strategy == 'prior': # Pick class with most training examples # (requires storing class counts during training) tied_counts = [self.class_counts_.get(self.classes_[j], 0) for j in tied_idx] winner_idx = tied_idx[np.argmax(tied_counts)] else: raise ValueError(f"Unknown strategy: {strategy}") y_pred[i] = self.classes_[winner_idx] return y_predConfidence-Weighted Voting:
A more sophisticated approach abandons hard votes entirely, using soft votes based on classifier confidence. Instead of counting +1 for the winner, we add the distance from the hyperplane:
$$v_k = \sum_{j \neq k} |\mathbf{w}{kj}^\top \mathbf{x} + b{kj}| \cdot \mathbb{1}[f_{kj}(\mathbf{x}) \text{ favors class } k]$$
This naturally breaks ties (exact equality in continuous-valued sums is probability zero) and gives more weight to confident decisions. A classifier that predicts class $i$ over class $j$ with margin 2.5 contributes more than one predicting with margin 0.1.
Confidence-weighted voting generally outperforms hard voting in practice. It naturally handles ties, weights reliable classifiers more heavily, and can provide calibrated probability estimates when properly normalized.
Understanding the computational complexity of One-vs-One is crucial for practical deployment. Let's analyze both training and prediction phases in detail.
Notation:
| Aspect | One-vs-One (OvO) | One-vs-All (OvA) | Crammer-Singer |
|---|---|---|---|
| Number of subproblems | $K(K-1)/2$ | $K$ | $1$ |
| Samples per subproblem | $\approx 2n/K$ | $n$ | $n$ |
| Total training complexity* | $O(K \cdot n^2 d)$ | $O(K \cdot n^2 d)$ | $O(K \cdot n^2 d)$ |
| Parallelizability | Excellent | Excellent | Limited |
The training complexity analysis reveals a surprising result: despite having K(K-1)/2 classifiers vs K classifiers in OvA, the total work is often comparable or even less for OvO! This is because each OvO classifier trains on ~2n/K samples, making SVM training (which is superlinear in n) much faster per classifier.
Detailed Training Analysis:
For a single binary SVM, training complexity depends on the algorithm:
Using SMO with quadratic complexity:
OvO Training: $$T_{OvO} = \frac{K(K-1)}{2} \cdot O\left(\left(\frac{2n}{K}\right)^2 d\right) = O\left(\frac{K(K-1)}{2} \cdot \frac{4n^2}{K^2} \cdot d\right) = O\left(\frac{2(K-1)n^2 d}{K}\right)$$
OvA Training: $$T_{OvA} = K \cdot O(n^2 d) = O(K n^2 d)$$
For large K, $T_{OvO} \approx O(2n^2 d)$ while $T_{OvA} = O(K n^2 d)$. OvO is asymptotically faster for training when K is large!
Practical Example: With n=100,000 samples and K=100 classes:
Despite 50× more classifiers, OvO trains on 50× smaller problems each, often resulting in faster total training.
| Aspect | One-vs-One | One-vs-All | DAG-SVM |
|---|---|---|---|
| Classifiers evaluated | $K(K-1)/2$ | $K$ | $K-1$ |
| Prediction complexity | $O(K^2 \cdot d)$ | $O(K \cdot d)$ | $O(K \cdot d)$ |
| Can early-stop? | No (standard) | No | Yes (by design) |
Prediction Time Analysis:
Prediction is where OvO shows its main weakness. For each test point:
For K=100 classes with d=1000 features:
This 50× slowdown at prediction time is significant for latency-sensitive applications. DAG-SVM (covered in a later page) addresses this by requiring only K-1 classifier evaluations while using the OvO classifiers.
Memory Requirements:
Each linear SVM stores a weight vector $\mathbf{w} \in \mathbb{R}^d$ and bias $b$. Total storage:
For kernel SVMs, storage depends on the number of support vectors, which can be substantial.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
def analyze_ovo_complexity(n_samples: int, n_classes: int, n_features: int): """ Analyze computational complexity of OvO SVM. Parameters: ----------- n_samples : int Total number of training samples n_classes : int Number of classes (K) n_features : int Feature dimensionality (d) Returns: -------- dict : Complexity analysis results """ # Number of classifiers n_classifiers = n_classes * (n_classes - 1) // 2 # Samples per classifier (assuming balanced classes) samples_per_classifier = 2 * n_samples // n_classes # Training complexity (using O(n^2) SVM training approximation) training_ops_per_classifier = samples_per_classifier ** 2 * n_features total_training_ops = n_classifiers * training_ops_per_classifier # Prediction complexity prediction_ops_per_sample = n_classifiers * n_features # Memory for linear classifiers memory_params = n_classifiers * (n_features + 1) # Compare to OvA ova_classifiers = n_classes ova_training_ops = ova_classifiers * (n_samples ** 2) * n_features ova_prediction_ops = ova_classifiers * n_features ova_memory = ova_classifiers * (n_features + 1) return { 'ovo': { 'n_classifiers': n_classifiers, 'samples_per_classifier': samples_per_classifier, 'training_ops': total_training_ops, 'prediction_ops_per_sample': prediction_ops_per_sample, 'memory_params': memory_params, }, 'ova': { 'n_classifiers': ova_classifiers, 'training_ops': ova_training_ops, 'prediction_ops_per_sample': ova_prediction_ops, 'memory_params': ova_memory, }, 'ratios': { 'classifier_ratio': n_classifiers / ova_classifiers, 'training_ratio': total_training_ops / ova_training_ops, 'prediction_ratio': prediction_ops_per_sample / ova_prediction_ops, 'memory_ratio': memory_params / ova_memory, } } # Example analysisanalysis = analyze_ovo_complexity(n_samples=10000, n_classes=10, n_features=100)print(f"OvO Classifiers: {analysis['ovo']['n_classifiers']}") # 45print(f"Training ratio (OvO/OvA): {analysis['ratios']['training_ratio']:.3f}")print(f"Prediction ratio (OvO/OvA): {analysis['ratios']['prediction_ratio']:.1f}")Beyond computational considerations, understanding the theoretical properties of OvO helps us predict when it will work well and diagnose failure modes.
Theorem (Allwein, Schapire, Singer 2000): Error-correcting output codes (ECOC), of which OvO is a special case, can reduce the effective error rate. If each binary classifier has error rate $\epsilon$ and the ECOC has minimum distance $d_{min}$, the multi-class error rate is bounded by:
$$P(\text{error}) \leq 2 \exp\left(-\frac{d_{min}^2}{8}(1-2\epsilon)^2\right)$$
For OvO, the minimum Hamming distance between any two class codewords is 2 (they differ in all pairwise comparisons involving one class vs the other).
One-vs-One can be viewed as an Error-Correcting Output Code (ECOC) where each class has a codeword of length K(K-1)/2 with elements in {-1, 0, +1}. The 0 indicates classifiers where the class is not involved.
Property 1: Robustness to Class Imbalance
OvO naturally handles class imbalance better than OvA. In OvA, binary classifiers for rare classes face extreme imbalance (few positive vs many negative examples). In OvO, each binary classifier encounters only the examples from two specific classes, preserving their relative proportions.
Example: Dataset with 3 classes: 1000, 100, and 10 examples respectively.
While still imbalanced, OvO reduces the imbalance severity.
Property 2: Decision Region Shapes
The decision regions formed by OvO voting are convex polyhedra in feature space (for linear SVMs). The boundary between any two class regions consists of points where their vote counts are equal.
Property 3: Consistency
OvO voting is not Condorcet consistent—a class that beats all others pairwise (Condorcet winner) is guaranteed to win, but cycles can occur (like Rock-Paper-Scissors). When Class 1 beats Class 2, Class 2 beats Class 3, and Class 3 beats Class 1, majority voting must arbitrarily pick one.
Translating OvO from theory to production requires attention to practical details. Here are battle-tested implementation patterns:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
from concurrent.futures import ProcessPoolExecutor, as_completedfrom typing import Tuple, Optionalimport pickleimport logging logger = logging.getLogger(__name__) class ProductionOvOSVM: """ Production-ready One-vs-One SVM with parallel training, serialization, and monitoring. """ def __init__( self, binary_svm_class, n_jobs: int = -1, verbose: bool = True, **svm_params ): self.binary_svm_class = binary_svm_class self.n_jobs = n_jobs if n_jobs > 0 else os.cpu_count() self.verbose = verbose self.svm_params = svm_params self.classifiers = {} self.classes_ = None self.training_stats = {} def _train_one_classifier( self, class_pair: Tuple[int, int], X_pair: np.ndarray, y_pair: np.ndarray ) -> Tuple[Tuple[int, int], object, dict]: """Train a single binary classifier.""" import time start = time.time() clf = self.binary_svm_class(**self.svm_params) clf.fit(X_pair, y_pair) elapsed = time.time() - start stats = { 'n_samples': len(y_pair), 'training_time': elapsed, 'n_support_vectors': getattr(clf, 'n_support_', None), } return class_pair, clf, stats def fit(self, X: np.ndarray, y: np.ndarray) -> 'ProductionOvOSVM': """ Train all classifiers in parallel. """ self.classes_ = np.unique(y) n_classes = len(self.classes_) n_classifiers = n_classes * (n_classes - 1) // 2 logger.info(f"Training {n_classifiers} classifiers on {n_classes} " f"classes using {self.n_jobs} workers") # Prepare training tasks tasks = [] for i, class_i in enumerate(self.classes_): for class_j in self.classes_[i+1:]: mask = (y == class_i) | (y == class_j) X_pair = X[mask] y_pair = np.where(y[mask] == class_i, 1, -1) tasks.append(((class_i, class_j), X_pair, y_pair)) # Parallel training completed = 0 with ProcessPoolExecutor(max_workers=self.n_jobs) as executor: futures = { executor.submit(self._train_one_classifier, *task): task[0] for task in tasks } for future in as_completed(futures): class_pair, clf, stats = future.result() self.classifiers[class_pair] = clf self.training_stats[class_pair] = stats completed += 1 if self.verbose: logger.info(f"Completed {completed}/{n_classifiers}: " f"classes {class_pair}, {stats['training_time']:.2f}s") return self def save(self, path: str): """Serialize the model to disk.""" with open(path, 'wb') as f: pickle.dump({ 'classifiers': self.classifiers, 'classes_': self.classes_, 'training_stats': self.training_stats, 'svm_params': self.svm_params, }, f) logger.info(f"Model saved to {path}") @classmethod def load(cls, path: str) -> 'ProductionOvOSVM': """Load a serialized model.""" with open(path, 'rb') as f: data = pickle.load(f) model = cls.__new__(cls) model.classifiers = data['classifiers'] model.classes_ = data['classes_'] model.training_stats = data['training_stats'] model.svm_params = data['svm_params'] return modelWe have explored the One-vs-One strategy for multi-class SVM classification in comprehensive depth. Let's consolidate the key insights:
You now understand the One-vs-One strategy deeply—from construction through voting to complexity analysis. Next, we'll explore One-vs-All (OvA), which trades more classifiers for simpler decision making, and examine the conditions under which each approach excels.
What's Next:
The next page covers One-vs-All (OvA), the main alternative to OvO. We'll compare their mathematical properties, empirical performance, and practical trade-offs, giving you the knowledge to choose the right approach for any multi-class SVM application.