Loading content...
Classical MDS provides an elegant, closed-form solution when distances are exactly Euclidean. But the real world is messier:
Metric MDS addresses these needs by casting the embedding problem as an explicit optimization: find the configuration that minimizes a stress function measuring distance discrepancy. Unlike Classical MDS, Metric MDS works iteratively, refining the embedding until convergence.
This transition from closed-form to optimization-based characterizes the evolution of dimensionality reduction—from the analytical simplicity of PCA/Classical MDS to the flexibility of modern methods like t-SNE and UMAP.
By the end of this page, you will understand the stress function formulations used in Metric MDS, master the SMACOF optimization algorithm, learn how weights can prioritize certain distances, and recognize when to choose Metric MDS over Classical MDS.
Recall the goal of MDS: given dissimilarities $\delta_{ij}$ between $n$ objects, find coordinates $\mathbf{x}_1, \ldots, \mathbf{x}n \in \mathbb{R}^k$ such that the Euclidean distances $d{ij}(X) = |\mathbf{x}_i - \mathbf{x}_j|$ approximate the dissimilarities.
Classical MDS approach:
Metric MDS approach:
Why the name 'Metric'?
Metric MDS preserves the actual numerical values of dissimilarities (the 'metric' information). This contrasts with Non-metric MDS, which only preserves the rank ordering of dissimilarities (covered in the next page).
Metric MDS was developed by Joseph Kruskal and Roger Shepard in the 1960s. Kruskal's 1964 papers introduced the stress function and the iterative gradient descent approach that remains foundational. The term 'stress' comes from Kruskal's original terminology, representing the 'strain' on the configuration when distances don't match.
| Aspect | Classical MDS | Metric MDS |
|---|---|---|
| Objective | Implicit (eigenvalue-based) | Explicit (stress minimization) |
| Input assumption | Euclidean distances | Any dissimilarities |
| Solution method | Closed-form (eigendecomposition) | Iterative optimization |
| Computational complexity | O(n³) one-time | O(n² × iterations) |
| Local minima | No (unique solution) | Yes (multiple possible) |
| Weighting | Not supported | Supported (weighted stress) |
| Guarantees | Global optimum for Euclidean input | Local optimum (depends on initialization) |
The heart of Metric MDS is the stress function—a measure of how poorly the embedded distances match the target dissimilarities. Several variants exist, each with different properties.
Notation:
Kruskal's Stress-1 (Raw Stress):
$$\sigma_{\text{raw}}(X) = \sum_{i < j} w_{ij} \left( \delta_{ij} - d_{ij}(X) \right)^2$$
This is the sum of squared differences between target and embedded distances. Minimizing this finds a configuration where distances match as closely as possible.
Normalized Stress-1:
$$\sigma_1(X) = \sqrt{\frac{\sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2}{\sum_{i<j} w_{ij} \delta_{ij}^2}}$$
Normalizing by the sum of squared target distances makes stress comparable across datasets of different scales.
Kruskal provided rules of thumb for stress values: < 0.05 excellent, 0.05-0.10 good, 0.10-0.20 fair, > 0.20 poor. However, expected stress varies with n (more points → higher baseline stress) and k (fewer dimensions → higher stress). Always compare relative to your specific problem.
Stress-2 (Squared Normalization):
$$\sigma_2(X) = \sqrt{\frac{\sum_{i<j} (\delta_{ij} - d_{ij})^2}{\sum_{i<j} d_{ij}^2}}$$
Normalizes by embedded distances rather than target distances. Less commonly used.
Strain (used in Sammon Mapping):
$$E_{\text{Sammon}}(X) = \frac{1}{\sum_{i<j} \delta_{ij}} \sum_{i<j} \frac{(\delta_{ij} - d_{ij})^2}{\delta_{ij}}$$
Sammon's stress weights errors by $1/\delta_{ij}$, giving more importance to preserving small distances. This emphasizes local structure preservation.
Why square the differences?
SMACOF (Scaling by MAjorizing a COmplicated Function) is the workhorse algorithm for Metric MDS. It's a majorization-minimization (MM) algorithm that iteratively improves the embedding by solving a series of simpler problems.
The challenge: Raw stress is a non-convex function with many local minima. Direct gradient descent can be slow and may oscillate. SMACOF provides a guaranteed monotonic decrease in stress at each iteration.
Key insight: The stress function can be bounded above by a simpler quadratic function (the majorizer). Minimizing the majorizer is easy and guaranteed to decrease the original stress.
Mathematical setup:
Define the stress with weights $w_{ij}$:
$$\sigma(X) = \sum_{i<j} w_{ij}(\delta_{ij} - d_{ij}(X))^2$$
This can be rewritten using matrix notation. Let $X \in \mathbb{R}^{n \times k}$ be the coordinate matrix. Define:
$$d_{ij}(X) = \sqrt{\sum_{s=1}^k (x_{is} - x_{js})^2}$$
The stress can be decomposed as:
$$\sigma(X) = \eta^2 + \rho^2 - 2\rho(X)\eta(X)$$
where:
Majorization is like building a ceiling that touches your function at the current point but lies above it everywhere else. By minimizing the ceiling, you're guaranteed to reduce the original function. SMACOF constructs such a ceiling that's quadratic in X, making minimization easy.
The SMACOF update:
At iteration $t$, given current configuration $X^{(t)}$, the update is:
$$X^{(t+1)} = V^+ B(X^{(t)}) X^{(t)}$$
where:
$V$ is the weighted Laplacian matrix: $$V_{ij} = \begin{cases} -w_{ij} & i \neq j \ \sum_{k \neq i} w_{ik} & i = j \end{cases}$$
$V^+$ is the Moore-Penrose pseudoinverse of $V$
$B(X)$ is the Guttman transform matrix: $$B(X){ij} = \begin{cases} -w{ij}\delta_{ij}/d_{ij}(X) & i \neq j, d_{ij} > 0 \ 0 & i \neq j, d_{ij} = 0 \ -\sum_{k \neq i} B(X)_{ik} & i = j \end{cases}$$
Key properties of SMACOF:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197
import numpy as npfrom scipy.spatial.distance import cdist def compute_stress(D_target, X, weights=None): """ Compute raw stress for current configuration. Parameters: ----------- D_target : ndarray of shape (n, n) Target dissimilarity matrix X : ndarray of shape (n, k) Current embedding coordinates weights : ndarray of shape (n, n), optional Weight matrix (default: all ones) Returns: -------- stress : float Raw stress value """ n = X.shape[0] D_current = cdist(X, X, metric='euclidean') if weights is None: weights = np.ones((n, n)) # Only sum over upper triangle mask = np.triu(np.ones((n, n), dtype=bool), k=1) diff_sq = (D_target - D_current) ** 2 stress = np.sum(weights[mask] * diff_sq[mask]) return stress def compute_normalized_stress(D_target, X, weights=None): """Compute Kruskal's Stress-1 (normalized).""" n = X.shape[0] D_current = cdist(X, X, metric='euclidean') if weights is None: weights = np.ones((n, n)) mask = np.triu(np.ones((n, n), dtype=bool), k=1) numerator = np.sum(weights[mask] * (D_target[mask] - D_current[mask])**2) denominator = np.sum(weights[mask] * D_target[mask]**2) return np.sqrt(numerator / denominator) def smacof(D_target, n_components=2, weights=None, init=None, max_iter=300, eps=1e-6, random_state=None, verbose=False): """ SMACOF algorithm for Metric MDS. Parameters: ----------- D_target : ndarray of shape (n, n) Target dissimilarity matrix n_components : int Target dimensionality weights : ndarray of shape (n, n), optional Weight matrix (default: uniform weights) init : ndarray of shape (n, k), optional Initial configuration (default: Classical MDS solution) max_iter : int Maximum number of iterations eps : float Convergence threshold on stress improvement random_state : int, optional Random seed verbose : bool Print progress information Returns: -------- X : ndarray of shape (n, n_components) Final embedding stress : float Final stress value n_iter : int Number of iterations performed """ rng = np.random.RandomState(random_state) n = D_target.shape[0] k = n_components # Initialize weights if weights is None: weights = np.ones((n, n)) np.fill_diagonal(weights, 0) # No self-weights # Initialize configuration if init is None: # Use Classical MDS as initialization D_sq = D_target ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) grand_mean = D_sq.mean() B = -0.5 * (D_sq - row_means - col_means + grand_mean) eigenvalues, eigenvectors = np.linalg.eigh(B) idx = np.argsort(eigenvalues)[::-1] eigenvalues = eigenvalues[idx][:k] eigenvectors = eigenvectors[:, idx][:, :k] # Handle negative eigenvalues eigenvalues = np.maximum(eigenvalues, 0) X = eigenvectors * np.sqrt(eigenvalues) else: X = init.copy() # Compute V matrix (weighted Laplacian) V = np.diag(weights.sum(axis=1)) - weights # Pseudoinverse of V (for centering) # Note: V is singular (rows sum to 0), use pseudoinverse V_pinv = np.linalg.pinv(V) # Initial stress stress_prev = compute_stress(D_target, X, weights) if verbose: print(f"Initial stress: {stress_prev:.6f}") for iteration in range(max_iter): # Compute current distances D_current = cdist(X, X, metric='euclidean') # Compute B matrix (Guttman transform) # B_ij = -w_ij * delta_ij / d_ij for i != j and d_ij > 0 with np.errstate(divide='ignore', invalid='ignore'): B = -weights * D_target / D_current B[~np.isfinite(B)] = 0 # Handle division by zero np.fill_diagonal(B, 0) np.fill_diagonal(B, -B.sum(axis=1)) # Row sums to zero # SMACOF update: X_new = V^+ @ B @ X X_new = V_pinv @ B @ X # Compute new stress stress_new = compute_stress(D_target, X_new, weights) if verbose and iteration % 10 == 0: print(f"Iteration {iteration}: stress = {stress_new:.6f}") # Check convergence if stress_prev - stress_new < eps: if verbose: print(f"Converged at iteration {iteration}") X = X_new break X = X_new stress_prev = stress_new final_stress = compute_normalized_stress(D_target, X, weights) return X, final_stress, iteration + 1 def demonstrate_smacof(): """Demonstrate SMACOF on synthetic data.""" np.random.seed(42) # Generate 2D spiral data n = 150 t = np.linspace(0, 3*np.pi, n) X_true = np.column_stack([t * np.cos(t), t * np.sin(t)]) # Compute true distances D = cdist(X_true, X_true, metric='euclidean') # Add some noise to distances (makes them non-exactly-Euclidean) noise = np.random.randn(n, n) * 0.1 D_noisy = D + (noise + noise.T) / 2 # Keep symmetric D_noisy = np.maximum(D_noisy, 0) # Keep non-negative np.fill_diagonal(D_noisy, 0) # Run SMACOF X_mds, stress, n_iter = smacof( D_noisy, n_components=2, max_iter=300, verbose=True ) print(f"\nFinal normalized stress: {stress:.4f}") print(f"Iterations: {n_iter}") # Compare reconstructed distances with noisy distances D_reconstructed = cdist(X_mds, X_mds, metric='euclidean') corr = np.corrcoef(D_noisy.flatten(), D_reconstructed.flatten())[0, 1] print(f"Correlation with noisy input distances: {corr:.4f}") return X_mds, D_noisy, D_reconstructed if __name__ == "__main__": demonstrate_smacof()A key advantage of Metric MDS is the ability to assign weights to different pairwise dissimilarities. This enables:
The weighted stress function:
$$\sigma(X) = \sum_{i<j} w_{ij}(\delta_{ij} - d_{ij}(X))^2$$
The SMACOF algorithm naturally incorporates weights through the $V$ and $B$ matrices.
• Uniform: w_ij = 1 for all pairs (standard Metric MDS) • Distance-based: w_ij = 1/δ_ij (emphasize small distances, like Sammon mapping) • Distance-based: w_ij = 1/δ²_ij (even stronger local emphasis) • K-nearest neighbor: w_ij = 1 if j ∈ KNN(i), else 0 (only preserve local structure) • Binary reliability: w_ij = 1 if pair is reliable, 0 if uncertain
Sammon Mapping as Weighted MDS:
Sammon mapping (1969) is a special case of weighted Metric MDS with $w_{ij} = 1/\delta_{ij}$:
$$E_{\text{Sammon}} = \frac{1}{\sum_{i<j}\delta_{ij}} \sum_{i<j} \frac{(\delta_{ij} - d_{ij})^2}{\delta_{ij}}$$
Interpretation: Small distances are weighted more heavily, so local neighborhoods are preserved at the expense of global structure. This makes Sammon mapping a precursor to modern methods like t-SNE.
Local MDS:
For extreme local emphasis, use k-nearest neighbor weights:
$$w_{ij} = \begin{cases} 1 & \text{if } j \in \text{KNN}(i) \text{ or } i \in \text{KNN}(j) \ 0 & \text{otherwise} \end{cases}$$
This only tries to preserve distances between nearby points, ignoring long-range structure. The result is an embedding focused purely on local neighborhoods—conceptually similar to LLE and t-SNE.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157
import numpy as npfrom scipy.spatial.distance import cdistfrom smacof import smacof def create_weight_matrix(D, scheme='uniform', k=None): """ Create weight matrix for Weighted MDS. Parameters: ----------- D : ndarray of shape (n, n) Distance or dissimilarity matrix scheme : str Weighting scheme: 'uniform', 'inverse', 'inverse_sq', 'knn', 'sammon' k : int Number of neighbors for 'knn' scheme Returns: -------- W : ndarray of shape (n, n) Weight matrix """ n = D.shape[0] if scheme == 'uniform': W = np.ones((n, n)) elif scheme == 'inverse' or scheme == 'sammon': # w_ij = 1 / delta_ij with np.errstate(divide='ignore'): W = 1.0 / D W[~np.isfinite(W)] = 0 # Handle division by zero elif scheme == 'inverse_sq': # w_ij = 1 / delta_ij^2 with np.errstate(divide='ignore'): W = 1.0 / (D ** 2) W[~np.isfinite(W)] = 0 elif scheme == 'knn': if k is None: k = min(10, n - 1) W = np.zeros((n, n)) for i in range(n): # Find k nearest neighbors (excluding self) dists = D[i].copy() dists[i] = np.inf knn_idx = np.argsort(dists)[:k] W[i, knn_idx] = 1 # Symmetrize W = np.maximum(W, W.T) else: raise ValueError(f"Unknown weighting scheme: {scheme}") np.fill_diagonal(W, 0) # No self-weights return W def compare_weighting_schemes(): """Compare different weighting schemes on the same data.""" np.random.seed(42) # Generate clustered data in 5D n_per_cluster = 50 n_clusters = 3 clusters = [] for i in range(n_clusters): center = np.random.randn(5) * 10 cluster = center + np.random.randn(n_per_cluster, 5) clusters.append(cluster) X_true = np.vstack(clusters) labels = np.repeat(range(n_clusters), n_per_cluster) n = len(labels) # Compute distances D = cdist(X_true, X_true, metric='euclidean') # Different weighting schemes schemes = ['uniform', 'inverse', 'inverse_sq', 'knn'] results = {} for scheme in schemes: if scheme == 'knn': W = create_weight_matrix(D, scheme=scheme, k=10) else: W = create_weight_matrix(D, scheme=scheme) X_mds, stress, n_iter = smacof( D, n_components=2, weights=W, max_iter=500, random_state=42 ) results[scheme] = { 'embedding': X_mds, 'stress': stress, 'iterations': n_iter } print(f"Scheme: {scheme:12s} | Stress: {stress:.4f} | Iterations: {n_iter}") return results, labels def demonstrate_missing_data_mds(): """Demonstrate MDS with missing (unknown) distances.""" np.random.seed(42) # Generate 2D data n = 100 X_true = np.random.randn(n, 2) D_true = cdist(X_true, X_true, metric='euclidean') # Randomly mask 30% of pairwise distances mask = np.random.rand(n, n) > 0.3 # Keep 70% mask = np.maximum(mask, mask.T) # Keep symmetric np.fill_diagonal(mask, True) # Always keep diagonal # Create weight matrix: 1 for known distances, 0 for unknown W = mask.astype(float) np.fill_diagonal(W, 0) print(f"Known distances: {mask.sum() / n**2 * 100:.1f}%") # Run weighted MDS X_mds, stress, n_iter = smacof( D_true, n_components=2, weights=W, max_iter=500, random_state=42 ) # Compare with full-information MDS X_mds_full, stress_full, _ = smacof( D_true, n_components=2, max_iter=500, random_state=42 ) print(f"Missing data MDS stress: {stress:.4f}") print(f"Full data MDS stress: {stress_full:.4f}") # Compute correlation of recovered coordinates # (need to align via Procrustes first) from scipy.linalg import orthogonal_procrustes R, _ = orthogonal_procrustes(X_mds, X_mds_full) X_mds_aligned = X_mds @ R corr = np.corrcoef(X_mds_full.flatten(), X_mds_aligned.flatten())[0, 1] print(f"Correlation with full-data embedding: {corr:.4f}") return X_mds, X_mds_full, W if __name__ == "__main__": print("Comparing weighting schemes:") print("-" * 50) compare_weighting_schemes() print() print("Missing data example:") print("-" * 50) demonstrate_missing_data_mds()SMACOF and other iterative MDS algorithms converge to local minima, not necessarily the global minimum. The quality of the final embedding depends strongly on initialization.
The local minima problem:
Consider embedding 4 points with equal pairwise distances (a tetrahedron in 3D) into 2D. There are many valid 2D configurations that all have the same stress—the optimization landscape has multiple equivalent minima. For more complex data, there may be qualitatively different minima with different stress values.
Common initialization strategies:
1. Classical MDS (recommended default)
3. PCA of distance matrix
4. Warm start from previous solution
For critical applications, run SMACOF multiple times with different initializations (e.g., Classical MDS + 10 random starts). Keep the solution with lowest stress. This increases computation but provides more confidence in the result.
Diagnosing local minima:
Signs that you may be stuck in a bad local minimum:
Remedies:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
import numpy as npfrom scipy.spatial.distance import cdistfrom smacof import smacof, compute_normalized_stress def multistart_mds(D, n_components=2, n_starts=10, max_iter=300, eps=1e-6, random_state=None, verbose=False): """ Multi-start Metric MDS to avoid local minima. Parameters: ----------- D : ndarray of shape (n, n) Dissimilarity matrix n_components : int Target dimensionality n_starts : int Number of random initializations to try max_iter : int Maximum iterations per start eps : float Convergence tolerance random_state : int Master random seed verbose : bool Print progress Returns: -------- X_best : ndarray of shape (n, n_components) Best embedding found stress_best : float Stress of best embedding all_stresses : list Stress values for all starts """ rng = np.random.RandomState(random_state) n = D.shape[0] all_stresses = [] X_best = None stress_best = np.inf # Start 0: Classical MDS initialization X, stress, n_iter = smacof( D, n_components=n_components, init=None, # Uses Classical MDS max_iter=max_iter, eps=eps, random_state=rng.randint(100000) ) all_stresses.append(stress) if verbose: print(f"Start 0 (Classical MDS init): stress = {stress:.6f}") if stress < stress_best: stress_best = stress X_best = X.copy() # Starts 1 to n_starts-1: Random initializations for start in range(1, n_starts): # Random initialization scaled to match data mean_dist = D[np.triu_indices(n, k=1)].mean() init = rng.randn(n, n_components) * mean_dist / np.sqrt(n_components) X, stress, n_iter = smacof( D, n_components=n_components, init=init, max_iter=max_iter, eps=eps, random_state=rng.randint(100000) ) all_stresses.append(stress) if verbose: print(f"Start {start:2d} (random init): stress = {stress:.6f}") if stress < stress_best: stress_best = stress X_best = X.copy() if verbose: print(f"\nBest stress: {stress_best:.6f}") print(f"Stress range: [{min(all_stresses):.6f}, {max(all_stresses):.6f}]") return X_best, stress_best, all_stresses def analyze_local_minima(): """Analyze the local minima landscape for different data types.""" np.random.seed(42) # Dataset 1: Well-separated clusters (unique global minimum) print("Dataset 1: Well-separated clusters") n_per_cluster = 30 clusters_1 = [ np.random.randn(n_per_cluster, 5) + np.array([10, 0, 0, 0, 0]), np.random.randn(n_per_cluster, 5) + np.array([0, 10, 0, 0, 0]), np.random.randn(n_per_cluster, 5) + np.array([0, 0, 10, 0, 0]), ] X_1 = np.vstack(clusters_1) D_1 = cdist(X_1, X_1, metric='euclidean') X_best_1, stress_1, all_stresses_1 = multistart_mds( D_1, n_components=2, n_starts=10, verbose=True ) print(f"Stress std: {np.std(all_stresses_1):.6f}") print() # Dataset 2: Highly symmetric (many equivalent minima) print("Dataset 2: Regular simplex (symmetric)") n_2 = 20 # Points on a hypersphere (highly symmetric) theta = np.linspace(0, 2*np.pi, n_2, endpoint=False) X_2 = np.column_stack([np.cos(theta), np.sin(theta)]) D_2 = cdist(X_2, X_2, metric='euclidean') X_best_2, stress_2, all_stresses_2 = multistart_mds( D_2, n_components=2, n_starts=10, verbose=True ) print(f"Stress std: {np.std(all_stresses_2):.6f}") print() # Dataset 3: Noisy Swiss roll (challenging manifold) print("Dataset 3: Swiss roll (manifold)") n_3 = 200 t = 1.5 * np.pi * (1 + 2 * np.random.rand(n_3)) X_3 = np.column_stack([ t * np.cos(t), 20 * np.random.rand(n_3), t * np.sin(t) ]) D_3 = cdist(X_3, X_3, metric='euclidean') X_best_3, stress_3, all_stresses_3 = multistart_mds( D_3, n_components=2, n_starts=10, verbose=True ) print(f"Stress std: {np.std(all_stresses_3):.6f}") return { 'clusters': (X_best_1, all_stresses_1), 'simplex': (X_best_2, all_stresses_2), 'swiss_roll': (X_best_3, all_stresses_3) } if __name__ == "__main__": results = analyze_local_minima()Successfully applying Metric MDS requires attention to several practical details.
Choosing the target dimensionality $k$:
Preprocessing dissimilarities:
After MDS, consider: (1) Centering the embedding for interpretability, (2) Rotating to align with meaningful axes (Procrustes if you have a reference), (3) Flipping to match expected orientation (MDS is invariant to reflections).
Software implementations:
sklearn.manifold.MDS with metric=Truecmdscale() for Classical, isoMDS() and sammon() from MASS packagemdscale() with various stress criteriaUsing sklearn:
1234567891011121314151617181920212223242526272829303132333435363738394041
from sklearn.manifold import MDSfrom sklearn.metrics import pairwise_distancesfrom sklearn.datasets import load_digitsimport numpy as np # Load example datadigits = load_digits(n_class=5)X = digits.data[:300] # Use subset for speedy = digits.target[:300] # Compute distance matrixD = pairwise_distances(X, metric='euclidean') # Metric MDS (defaults to SMACOF)mds_metric = MDS( n_components=2, metric=True, # Metric MDS (vs non-metric) dissimilarity='precomputed', # We provide the distance matrix n_init=4, # Number of random starts max_iter=300, eps=1e-6, random_state=42, n_jobs=-1 # Parallelize over starts) X_mds = mds_metric.fit_transform(D) print(f"Final stress: {mds_metric.stress_:.4f}")print(f"Embedding shape: {X_mds.shape}") # Alternatively, from features directly (uses Euclidean distance internally)mds_from_features = MDS( n_components=2, metric=True, dissimilarity='euclidean', # Compute distances from features n_init=4, random_state=42) X_mds_alt = mds_from_features.fit_transform(X)print(f"Stress (from features): {mds_from_features.stress_:.4f}")Metric MDS has spawned several important variants and extensions.
1. SSTRESS (Squared Stress)
Instead of optimizing $(\delta_{ij} - d_{ij})^2$, optimize $(\delta_{ij}^2 - d_{ij}^2)^2$:
$$\text{SSTRESS}(X) = \sum_{i<j} (\delta_{ij}^2 - d_{ij}(X)^2)^2$$
Advantage: The squared distance $d_{ij}^2 = \sum_s (x_{is} - x_{js})^2$ is a polynomial in coordinates, making gradients simpler. Used in some implementations.
2. Robust MDS
Replace squared error with robust loss functions:
$$\sigma_{\text{robust}}(X) = \sum_{i<j} \rho(\delta_{ij} - d_{ij})$$
where $\rho$ is a robust loss (Huber, Tukey bisquare). This reduces sensitivity to outlier distances.
3. Procrustes MDS
When you have multiple dissimilarity matrices from different sources (e.g., different subjects rating the same stimuli), Procrustes MDS finds a common configuration that best fits all matrices.
4. Individual Differences Scaling (INDSCAL)
Extends MDS to handle multiple dissimilarity matrices by assuming a common configuration with subject-specific dimension weights. Reveals which dimensions are important for different subjects.
Metric MDS sits at the foundation of many modern methods: Isomap uses MDS on geodesic distances, t-SNE optimizes a related (but non-metric) objective, and kernel PCA can be viewed as MDS on kernel-induced distances. Understanding MDS illuminates all of these.
| Method | What It Preserves | Key Feature |
|---|---|---|
| Classical MDS | Euclidean distances (exactly) | Closed-form, O(n³) |
| Metric MDS (SMACOF) | Metric distances (approximately) | Iterative, weighted, flexible |
| Non-metric MDS | Distance rankings only | Rank-based, robust to measurement scale |
| Sammon Mapping | Local distances (emphasis) | Weighted by 1/δ |
| Isomap | Geodesic distances | MDS on graph shortest paths |
| Landmark MDS | Distances to landmarks | Scalable O(nm²) instead of O(n³) |
Metric MDS extends Classical MDS with explicit optimization, providing flexibility at the cost of iterative computation. Let's consolidate the key concepts:
What's next:
Metric MDS assumes that the numerical values of dissimilarities are meaningful. But what if dissimilarities come from subjective judgments where only the ranking matters? The next page covers Non-metric MDS, which abandons metric preservation in favor of monotonic (rank-preserving) embeddings—a more robust approach for ordinal data.
You now understand Metric MDS: the stress function formulation, SMACOF optimization, weighting schemes, and practical considerations. This forms the bridge between Classical MDS and the more flexible Non-metric MDS that follows.