Loading content...
Multidimensional Scaling is a powerful and elegant technique, but like all methods, it has limitations. Understanding these limitations is crucial for:
This page provides a comprehensive treatment of MDS limitations covering mathematical constraints, computational challenges, interpretability issues, and practical pitfalls. We'll also position MDS within the broader landscape of dimensionality reduction methods, helping you make informed choices.
By the end of this page, you will understand the fundamental mathematical limitations of MDS, recognize computational scalability barriers, master techniques for handling common failure modes, and know when to choose MDS versus alternative methods like PCA, t-SNE, or UMAP.
1. The Euclidean Assumption
Classical MDS assumes that dissimilarities are Euclidean distances—that is, there exists a configuration in some Euclidean space where pairwise distances exactly match the dissimilarities.
When this holds:
When this fails:
Symptoms of non-Euclidean input:
Remedies:
For distances to be Euclidean, they must satisfy the triangle inequality: d(A,C) ≤ d(A,B) + d(B,C) for all triplets. Many dissimilarity measures violate this—edit distance with certain operations, semantic similarity scores, or averaged human ratings with inconsistent respondents.
2. Global vs. Local Structure
Standard MDS optimizes a global stress function that treats all pairwise distances equally. This means:
Example: Swiss Roll
A classic failure case is the Swiss roll—a 2D sheet curled into 3D. Euclidean distance in 3D connects opposite sides of the spiral that are close in 3D but far along the 2D manifold. MDS using Euclidean distances produces a distorted embedding; methods like Isomap that use geodesic distances succeed.
3. Intrinsic Dimensionality Mismatch
MDS cannot embed data into fewer dimensions than its intrinsic dimensionality without distortion:
If intrinsic dimensionality exceeds target dimensionality, some stress is inevitable.
Time Complexity:
Classical MDS:
Metric/Non-metric MDS (SMACOF):
Space Complexity:
| n (points) | Distance Matrix | Eigendecomp Time | Memory (float64) |
|---|---|---|---|
| 1,000 | 1M pairs | ~1 second | ~8 MB |
| 10,000 | 100M pairs | ~15 minutes | ~800 MB |
| 50,000 | 2.5B pairs | ~weeks | ~20 GB |
| 100,000 | 10B pairs | Infeasible | ~80 GB |
For large n: (1) Landmark MDS: Select m << n landmarks, embed those exactly, project rest. Complexity: O(m³ + nm). (2) Randomized SVD: Approximate top-k eigenvalues in O(n²k). (3) Approximate nearest neighbors: Sparse distance matrix with only k-NN distances. (4) Mini-batch/Stochastic approaches: Sample pairs for each iteration.
Practical limits:
Parallelization:
MDS has limited parallelization opportunities:
GPU acceleration:
CuML (RAPIDS) provides GPU-accelerated MDS, achieving 10-100x speedup for the distance computation and embedding updates. However, the $O(n^2)$ memory limitation remains—GPU memory is typically smaller than system RAM.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163
import numpy as npfrom scipy.spatial.distance import cdistfrom scipy.linalg import eighimport time def benchmark_mds_scalability(): """Benchmark MDS scaling with dataset size.""" np.random.seed(42) sizes = [100, 500, 1000, 2000, 5000] results = [] print("MDS Scalability Benchmark") print("=" * 60) print(f"{'n':>8} {'Distance':>12} {'Eigendecomp':>12} {'Total':>12} {'Memory':>10}") print("-" * 60) for n in sizes: # Generate random data X = np.random.randn(n, 50) # Time distance computation t0 = time.time() D = cdist(X, X, metric='euclidean') t_dist = time.time() - t0 # Time eigendecomposition D_sq = D ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) B = -0.5 * (D_sq - row_means - col_means + D_sq.mean()) t0 = time.time() eigenvalues, eigenvectors = eigh(B) t_eig = time.time() - t0 # Memory estimate (float64) mem_mb = (n * n * 8 * 3) / (1024 ** 2) # D, B, eigenvectors print(f"{n:>8} {t_dist:>10.3f}s {t_eig:>10.3f}s {t_dist+t_eig:>10.3f}s {mem_mb:>8.1f}MB") results.append({ 'n': n, 't_dist': t_dist, 't_eig': t_eig, 'mem_mb': mem_mb }) return results def landmark_mds_scalable(X, n_landmarks=500, n_components=2, random_state=None): """ Scalable MDS using landmark approach for large datasets. Parameters: ----------- X : ndarray (n, p) Feature matrix n_landmarks : int Number of landmark points n_components : int Target dimensionality random_state : int Random seed Returns: -------- embedding : ndarray (n, n_components) Low-dimensional coordinates for all points """ rng = np.random.RandomState(random_state) n = X.shape[0] if n_landmarks >= n: # Fall back to full MDS D = cdist(X, X, metric='euclidean') D_sq = D ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) B = -0.5 * (D_sq - row_means - col_means + D_sq.mean()) eigenvalues, eigenvectors = eigh(B) idx = np.argsort(eigenvalues)[::-1][:n_components] return eigenvectors[:, idx] * np.sqrt(np.maximum(eigenvalues[idx], 0)) # Select landmarks landmark_idx = rng.choice(n, size=n_landmarks, replace=False) X_landmarks = X[landmark_idx] # MDS on landmarks D_ll = cdist(X_landmarks, X_landmarks, metric='euclidean') D_sq = D_ll ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) grand_mean = D_sq.mean() B = -0.5 * (D_sq - row_means - col_means + grand_mean) eigenvalues, eigenvectors = eigh(B) idx = np.argsort(eigenvalues)[::-1][:n_components] eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] L_landmarks = eigenvectors * np.sqrt(np.maximum(eigenvalues, 0)) # Project non-landmark points D_all_to_landmarks = cdist(X, X_landmarks, metric='euclidean') D_sq_al = D_all_to_landmarks ** 2 # Gower's projection landmark_sq_mean = D_sq.mean(axis=1) Lambda_inv = np.diag(1.0 / np.maximum(eigenvalues, 1e-10)) embedding = np.zeros((n, n_components)) for i in range(n): delta_sq = D_sq_al[i] - landmark_sq_mean embedding[i] = -0.5 * delta_sq @ eigenvectors @ Lambda_inv return embedding def compare_full_vs_landmark(): """Compare full MDS with Landmark MDS for accuracy and speed.""" np.random.seed(42) # Generate test data n = 5000 p = 50 X = np.random.randn(n, p) # Full MDS (may be slow) t0 = time.time() D_full = cdist(X, X, metric='euclidean') D_sq = D_full ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) B = -0.5 * (D_sq - row_means - col_means + D_sq.mean()) eigenvalues, eigenvectors = eigh(B) idx = np.argsort(eigenvalues)[::-1][:2] X_full = eigenvectors[:, idx] * np.sqrt(np.maximum(eigenvalues[idx], 0)) t_full = time.time() - t0 print(f"Full MDS ({n} points): {t_full:.2f}s") # Landmark MDS with various landmark counts for n_landmarks in [100, 200, 500, 1000]: t0 = time.time() X_lmds = landmark_mds_scalable(X, n_landmarks=n_landmarks, n_components=2, random_state=42) t_lmds = time.time() - t0 # Compare embeddings (Procrustes-aligned) # Simple correlation metric corr = np.abs(np.corrcoef(X_full.flatten(), X_lmds.flatten())[0, 1]) print(f"Landmark MDS ({n_landmarks:>4} landmarks): {t_lmds:.2f}s, " f"correlation with full: {corr:.4f}") if __name__ == "__main__": print("Scalability Benchmark:") benchmark_mds_scalability() print() print("Full vs Landmark Comparison:") compare_full_vs_landmark()1. Arbitrary Orientation
MDS solutions are invariant to rotation, reflection, and translation. This means:
Solution: Use Procrustes analysis to align multiple MDS solutions, or rotate to align with known reference dimensions.
2. No Feature Interpretation
Unlike PCA, where principal components are linear combinations of original features, MDS provides no connection to input features:
Solution: If interpretability is critical, use PCA or factor analysis. Or, regress original features onto MDS coordinates to find feature correlates.
3. Non-unique Solutions
For non-metric MDS especially:
Solution: Run multiple initializations; compare solutions with Procrustes; report variability.
Post-MDS interpretation techniques: (1) Regress features on MDS coordinates to find which features align with which dimensions. (2) Use property fitting: overlay known variables as contours on MDS plot. (3) Cluster analysis: apply k-means to MDS coordinates and interpret clusters. (4) Expert labeling: show MDS plot to domain expert for qualitative interpretation.
4. Distortion Accumulation
In low dimensions, some structure must be sacrificed. The nature of distortion is not transparent:
5. Visualization Limitations for k > 3
If intrinsic dimensionality exceeds 3, projection to 2D/3D for visualization necessarily loses information:
Let's catalog the ways MDS can fail and how to diagnose/fix each.
| Failure Mode | Symptoms | Cause | Solution |
|---|---|---|---|
| Degenerate solution | All points collapsed; zero distances | Optimization stuck or bad normalization | Use Stress-1; try different initialization |
| High stress despite many dimensions | Stress >> 0.20 even for k=5+ | Non-Euclidean input or inherently high-dimensional | Non-metric MDS; Cailliez correction; accept limitation |
| Negative eigenvalues (Classical) | Warning about negative eigenvalues | Dissimilarities not Euclidean | Add constant to off-diagonal; use Metric MDS |
| Inconsistent runs | Different results with different seeds | Multiple local minima | Multi-start; use Classical MDS initialization |
| Boundary effects | Points stuck at boundary of configuration | Distance constraints push points outward | Check for outlier dissimilarities; robust MDS |
| Manifold unfolding failure | Expected structure not recovered | Euclidean distance ≠ geodesic distance | Use Isomap, t-SNE, or UMAP instead |
Very low stress (< 0.01) with k matching your guess might mean you've overfit with too many dimensions, or your data is trivially structured. Stress near zero in 2D for complex data is suspicious—check that result wasn't trivially achieved (e.g., due to duplicate points or degenerate dissimilarities).
Debugging checklist:
MDS exists within a rich ecosystem of dimensionality reduction methods. Understanding when to choose MDS versus alternatives is crucial.
| Method | Input | Preserves | Best For |
|---|---|---|---|
| PCA | Features | Global variance | Linear, Euclidean, interpretable axes |
| Classical MDS | Euclidean distances | Pairwise distances | Distance-based input, equivalence to PCA |
| Metric MDS | Any dissimilarities | Metric distances | Non-Euclidean distances, weighted |
| Non-metric MDS | Ordinal dissimilarities | Distance rankings | Subjective judgments, ordinal data |
| Isomap | Features → geodesic | Geodesic distances | Smooth nonlinear manifolds |
| LLE | Features | Local linear structure | Manifolds with local linearity |
| t-SNE | Features | Local neighborhoods | Visualization of clusters |
| UMAP | Features | Local + some global | Fast visualization, clustering |
Choose MDS when: (1) You have dissimilarities rather than features; (2) You care about preserving global distance structure; (3) You need a principled, well-understood method; (4) Interpretability of stress and Shepard diagrams matters; (5) Data is roughly Euclidean or ordinal.
Detailed comparisons:
MDS vs. PCA:
MDS vs. t-SNE:
MDS vs. UMAP:
MDS vs. Isomap:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
import numpy as npimport matplotlib.pyplot as pltfrom scipy.spatial.distance import cdistfrom sklearn.manifold import MDS, TSNE, Isomapfrom sklearn.decomposition import PCAimport time def compare_dr_methods(X, labels=None, random_state=42): """ Compare MDS with other dimensionallity reduction methods. Parameters: ----------- X : ndarray (n, p) Feature matrix labels : ndarray (n,), optional Labels for coloring points random_state : int Random seed """ n = X.shape[0] D = cdist(X, X, metric='euclidean') methods = { 'PCA': PCA(n_components=2, random_state=random_state), 'Classical MDS': MDS(n_components=2, dissimilarity='precomputed', random_state=random_state), 'Metric MDS': MDS(n_components=2, metric=True, dissimilarity='precomputed', n_init=4, max_iter=300, random_state=random_state), 't-SNE': TSNE(n_components=2, random_state=random_state, perplexity=30), 'Isomap': Isomap(n_components=2, n_neighbors=10) } results = {} times = {} for name, method in methods.items(): print(f"Running {name}...", end=' ') t0 = time.time() if name in ['Classical MDS', 'Metric MDS']: embedding = method.fit_transform(D) else: embedding = method.fit_transform(X) times[name] = time.time() - t0 results[name] = embedding print(f"{times[name]:.2f}s") # Plot all results fig, axes = plt.subplots(2, 3, figsize=(15, 10)) axes = axes.flatten() for idx, (name, embedding) in enumerate(results.items()): ax = axes[idx] if labels is not None: ax.scatter(embedding[:, 0], embedding[:, 1], c=labels, cmap='viridis', alpha=0.6, s=20) else: ax.scatter(embedding[:, 0], embedding[:, 1], alpha=0.6, s=20) ax.set_title(f'{name}(time: {times[name]:.2f}s)') ax.set_xlabel('Dim 1') ax.set_ylabel('Dim 2') # Hide last subplot axes[-1].axis('off') plt.tight_layout() plt.savefig('dr_method_comparison.png', dpi=150) plt.close() return results, times def swiss_roll_comparison(): """Compare methods on Swiss roll (manifold) data.""" from sklearn.datasets import make_swiss_roll np.random.seed(42) X, t = make_swiss_roll(n_samples=1000, noise=0.2) print("Swiss Roll Comparison:") print("MDS with Euclidean distances will fail; Isomap should succeed.") print() results, times = compare_dr_methods(X, labels=t, random_state=42) # Quantify: correlation between embedding distance and manifold distance (t) # t represents position along the manifold print("Correlation of 1st dimension with manifold position (t):") for name, embedding in results.items(): corr = np.abs(np.corrcoef(t, embedding[:, 0])[0, 1]) print(f" {name}: {corr:.4f}") if __name__ == "__main__": swiss_roll_comparison()Drawing on our comprehensive understanding of MDS, here are best practices for applying it effectively.
For papers: (1) Include a Shepard diagram in supplementary materials. (2) Report exact stress value and formula (Stress-1 vs Stress-2). (3) Note number of random starts and convergence criterion. (4) Show stress elbow plot to justify k choice. (5) Acknowledge orientation arbitrariness in interpretation.
When NOT to use MDS:
MDS continues to evolve, with active research in several directions.
1. Scalable Algorithms:
2. Robust and Constrained MDS:
3. Probabilistic Extensions:
4. Deep Learning Integration:
While neural network methods (t-SNE, UMAP, autoencoders) dominate modern visualization, MDS foundations remain important: (1) MDS theory helps understand what these methods preserve and lose. (2) Classical MDS insights (eigendecomposition, Gram matrix) underpin kernel methods. (3) MDS-like losses are used in metric learning and representation learning.
We've completed a comprehensive exploration of Multidimensional Scaling—from mathematical foundations to practical limitations.
Mastery checklist:
You should now be able to:
Coming up: The next module covers Isomap—a manifold learning method that combines MDS with geodesic (shortest-path) distances to handle curved manifolds that confound standard MDS.
Congratulations! You've completed the Multidimensional Scaling module. You now have deep expertise in one of the foundational methods of multivariate analysis—knowledge that extends to understanding modern manifold learning and representation learning techniques.