Loading learning content...
Imagine you're an archaeologist who has discovered a collection of ancient artifacts at various excavation sites. You don't have GPS coordinates—only a table recording the physical distances between every pair of sites, painstakingly measured by foot. Your goal: reconstruct the spatial layout of these sites on a map. Given only pairwise distances, can you recover the original positions?
This is precisely the problem that Multidimensional Scaling (MDS) solves. It reconstructs a configuration of points in low-dimensional space such that the distances between points match, as closely as possible, a given set of observed dissimilarities. MDS is one of the oldest and most elegant dimensionality reduction techniques, predating PCA's modern formulation and serving as the conceptual ancestor of manifold learning methods like Isomap.
Classical MDS, also known as Principal Coordinate Analysis (PCoA) or Torgerson Scaling, represents the purest form of this idea: when distances are Euclidean, the reconstruction is exact and can be computed via a simple eigenvalue decomposition.
By the end of this page, you will understand the complete mathematical derivation of Classical MDS, master the double centering technique that converts distances to inner products, derive the eigendecomposition solution, and recognize when Classical MDS provides an exact reconstruction versus an approximation.
Multidimensional Scaling emerged from the intersection of psychophysics and statistics in the mid-20th century. Psychologists sought to understand how humans perceive similarity—between colors, sounds, concepts, or products. Unlike physical measurements, psychological similarity doesn't yield direct coordinates. Researchers could only ask subjects to rate how similar or different pairs of stimuli were, producing dissimilarity matrices rather than feature vectors.
The fundamental question: Given only pairwise dissimilarities, can we infer an underlying geometric structure?
The breakthrough came from Warren Torgerson in 1952, who showed that when dissimilarities correspond to Euclidean distances, the original configuration can be exactly recovered (up to rotation, reflection, and translation). This result, now called Classical MDS, remains the theoretical foundation for all subsequent MDS variants.
Key insight: Classical MDS doesn't require raw features—only distances. This makes it applicable in domains where:
When applied to Euclidean distances computed from centered data, Classical MDS produces exactly the same coordinates as PCA. This equivalence reveals deep connections between distance-based and variance-maximizing perspectives on dimensionality reduction.
Let's formalize the Classical MDS problem with mathematical precision.
Given:
Goal:
$$|\mathbf{x}_i - \mathbf{x}_j|2 \approx d{ij}$$
The special case: When the dissimilarities are actually Euclidean distances—meaning there exists some configuration $Z$ in $\mathbb{R}^p$ (possibly high-dimensional) such that $d_{ij} = |\mathbf{z}_i - \mathbf{z}_j|_2$—then the matrix $D$ is called Euclidean. For Euclidean distance matrices, Classical MDS can recover $Z$ exactly (up to the intrinsic dimensionality).
Not every symmetric matrix with zero diagonal is Euclidean. For example, the matrix where all off-diagonal entries are 1 is not Euclidean for n > 4. Classical MDS works optimally when the input is truly Euclidean; otherwise, it provides the best Euclidean approximation (possibly with negative eigenvalues).
Key mathematical observation:
The relationship between distances and inner products is the key to Classical MDS. If we know the pairwise distances, can we recover the inner products (the Gram matrix)? The answer is yes, through a technique called double centering.
Let $X \in \mathbb{R}^{n \times p}$ be a data matrix (rows are observations). The squared distance between points $i$ and $j$ is:
$$d_{ij}^2 = |\mathbf{x}_i - \mathbf{x}_j|^2 = |\mathbf{x}_i|^2 - 2\mathbf{x}_i^T\mathbf{x}_j + |\mathbf{x}_j|^2$$
This equation links squared distances to inner products. If we had a way to isolate the inner product term $\mathbf{x}_i^T\mathbf{x}_j$, we could recover the Gram matrix $B = XX^T$, and from $B$, we can extract coordinates via eigendecomposition.
The double centering transformation is the mathematical heart of Classical MDS. It converts a squared distance matrix into a matrix of inner products, enabling eigendecomposition.
Setup: Let $D^{(2)} \in \mathbb{R}^{n \times n}$ be the squared distance matrix with entries $d_{ij}^2$.
The centering matrix: Define the centering matrix:
$$H = I_n - \frac{1}{n}\mathbf{1}\mathbf{1}^T$$
where $I_n$ is the $n \times n$ identity matrix and $\mathbf{1}$ is a vector of ones. The matrix $H$ is symmetric and idempotent ($H^2 = H$). When applied to a data matrix, it centers the columns to have zero mean.
The double centering transformation:
$$B = -\frac{1}{2} H D^{(2)} H$$
Theorem (Torgerson, 1952): If $D^{(2)}$ is a squared Euclidean distance matrix derived from centered data $X$ (with $X^T\mathbf{1} = \mathbf{0}$), then:
$$B = XX^T$$
That is, double centering recovers the Gram matrix—the matrix of all pairwise inner products.
Double centering removes the norm terms that pollute the distance-to-inner-product relationship. By subtracting row means, column means, and adding the grand mean, we isolate the cross-term that contains the inner product information. It's analogous to how we center data to compute covariance from raw second moments.
Derivation of Double Centering:
Let's prove why double centering works. Start with the squared distance:
$$d_{ij}^2 = |\mathbf{x}_i|^2 - 2\mathbf{x}_i^T\mathbf{x}_j + |\mathbf{x}_j|^2$$
Define $a_i = |\mathbf{x}_i|^2$. Then:
$$d_{ij}^2 = a_i + a_j - 2b_{ij}$$
where $b_{ij} = \mathbf{x}_i^T\mathbf{x}_j$ is the $(i,j)$ entry of the Gram matrix.
Row mean of squared distances: $$\bar{d}{i\cdot}^2 = \frac{1}{n}\sum_j d{ij}^2 = a_i + \bar{a} - \frac{2}{n}\sum_j b_{ij}$$
Since the data is centered, $\sum_j \mathbf{x}j = \mathbf{0}$, so $\sum_j b{ij} = \mathbf{x}_i^T \sum_j \mathbf{x}j = 0$. Thus: $$\bar{d}{i\cdot}^2 = a_i + \bar{a}$$
Column mean (by symmetry): $\bar{d}_{\cdot j}^2 = a_j + \bar{a}$
Grand mean: $\bar{d}_{\cdot\cdot}^2 = 2\bar{a}$
Apply double centering: $$-\frac{1}{2}(d_{ij}^2 - \bar{d}{i\cdot}^2 - \bar{d}{\cdot j}^2 + \bar{d}{\cdot\cdot}^2)$$ $$= -\frac{1}{2}[(a_i + a_j - 2b{ij}) - (a_i + \bar{a}) - (a_j + \bar{a}) + 2\bar{a}]$$ $$= -\frac{1}{2}[-2b_{ij}] = b_{ij}$$
This proves that double centering recovers the Gram matrix entry $b_{ij}$.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
import numpy as np def double_centering(D_squared): """ Apply double centering to a squared distance matrix. Given a squared distance matrix D^(2), computes: B = -0.5 * H @ D^(2) @ H where H is the centering matrix. Parameters: ----------- D_squared : ndarray of shape (n, n) Squared distance matrix where D_squared[i,j] = d_{ij}^2 Returns: -------- B : ndarray of shape (n, n) The (estimated) Gram matrix """ n = D_squared.shape[0] # Method 1: Explicit construction of centering matrix # H = np.eye(n) - np.ones((n, n)) / n # B = -0.5 * H @ D_squared @ H # Method 2: Efficient computation (avoids forming n x n centering matrix) # Compute row means, column means, and grand mean row_means = D_squared.mean(axis=1, keepdims=True) # Shape (n, 1) col_means = D_squared.mean(axis=0, keepdims=True) # Shape (1, n) grand_mean = D_squared.mean() # Double centering formula B = -0.5 * (D_squared - row_means - col_means + grand_mean) return B def verify_double_centering(): """ Verify that double centering recovers the Gram matrix from Euclidean distances. """ np.random.seed(42) # Generate random centered data n, p = 50, 10 X = np.random.randn(n, p) X = X - X.mean(axis=0) # Center the data # Compute Gram matrix directly B_true = X @ X.T # Compute squared distance matrix # d_ij^2 = ||x_i||^2 + ||x_j||^2 - 2 * x_i^T x_j norms_sq = np.sum(X**2, axis=1) D_squared = norms_sq[:, np.newaxis] + norms_sq[np.newaxis, :] - 2 * X @ X.T # Apply double centering B_recovered = double_centering(D_squared) # Check recovery max_error = np.max(np.abs(B_true - B_recovered)) print(f"Max absolute error: {max_error:.2e}") print(f"Recovery successful: {max_error < 1e-10}") return B_true, B_recovered if __name__ == "__main__": verify_double_centering()Once we have the Gram matrix $B = XX^T$, we can extract coordinates through eigendecomposition. This step reveals the intimate connection between Classical MDS and PCA.
The Gram matrix is symmetric and positive semi-definite (when $D$ is Euclidean), so it has a real eigendecomposition:
$$B = V \Lambda V^T$$
where:
Coordinate recovery: Since $B = XX^T$, we can write:
$$X = V \Lambda^{1/2}$$
where $\Lambda^{1/2} = \text{diag}(\sqrt{\lambda_1}, \ldots, \sqrt{\lambda_n})$.
Dimensionality reduction: To embed in $k$ dimensions, we take only the top $k$ eigenvector-eigenvalue pairs:
$$X_k = V_k \Lambda_k^{1/2}$$
where $V_k \in \mathbb{R}^{n \times k}$ contains the first $k$ eigenvectors and $\Lambda_k$ is the $k \times k$ diagonal matrix of corresponding eigenvalues.
When distances come from centered Euclidean data, Classical MDS produces the same coordinates as PCA, but computed from the Gram matrix B = XX^T rather than the covariance matrix X^T X / n. The eigenvalues of B are n times those of the covariance matrix. This duality—between the n×n Gram matrix and the p×p covariance matrix—underlies the 'kernel trick' in machine learning.
Choosing the embedding dimensionality:
The eigenvalues tell us how much 'variance' each dimension captures. To choose $k$, we examine:
Scree plot: Plot eigenvalues in decreasing order. Look for an 'elbow' where eigenvalues drop sharply.
Proportion of variance explained: $$\text{PoV}(k) = \frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^n \lambda_i}$$
A common heuristic is to choose $k$ such that PoV$(k) \geq 0.8$ or $0.9$.
Handling negative eigenvalues:
If the input dissimilarities are not exactly Euclidean, $B$ may have negative eigenvalues. Common strategies:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
import numpy as npfrom scipy.linalg import eigh def classical_mds(D, n_components=2): """ Classical Multidimensional Scaling (Torgerson Scaling / PCoA). Given a distance matrix D, finds a k-dimensional embedding such that Euclidean distances in the embedding approximate the input distances. Parameters: ----------- D : ndarray of shape (n, n) Distance matrix (not squared) n_components : int Target dimensionality Returns: -------- X : ndarray of shape (n, n_components) The embedded coordinates eigenvalues : ndarray All eigenvalues (for scree plot analysis) variance_explained : ndarray Proportion of variance explained by each component """ n = D.shape[0] # Step 1: Square the distances D_squared = D ** 2 # Step 2: Double centering to get Gram matrix row_means = D_squared.mean(axis=1, keepdims=True) col_means = D_squared.mean(axis=0, keepdims=True) grand_mean = D_squared.mean() B = -0.5 * (D_squared - row_means - col_means + grand_mean) # Step 3: Eigendecomposition # eigh returns eigenvalues in ascending order eigenvalues, eigenvectors = eigh(B) # Reverse to get descending order eigenvalues = eigenvalues[::-1] eigenvectors = eigenvectors[:, ::-1] # Step 4: Handle negative eigenvalues # Count and warn about negative eigenvalues n_negative = np.sum(eigenvalues < 0) if n_negative > 0: print(f"Warning: {n_negative} negative eigenvalues detected.") print("The input may not be a valid Euclidean distance matrix.") # Step 5: Compute embedding using top k positive eigenvalues # Only use positive eigenvalues positive_mask = eigenvalues > 1e-10 eigenvalues_positive = eigenvalues[positive_mask] eigenvectors_positive = eigenvectors[:, positive_mask] # Compute proportion of variance explained total_variance = np.sum(np.abs(eigenvalues)) # Use abs for robustness variance_explained = eigenvalues_positive / total_variance # Take top n_components k = min(n_components, len(eigenvalues_positive)) eigenvalues_k = eigenvalues_positive[:k] eigenvectors_k = eigenvectors_positive[:, :k] # Compute coordinates: X = V * sqrt(Lambda) X = eigenvectors_k * np.sqrt(eigenvalues_k) return X, eigenvalues, variance_explained[:k] def demonstrate_classical_mds(): """ Demonstrate Classical MDS on synthetic data and compare with PCA. """ from sklearn.decomposition import PCA from sklearn.metrics import pairwise_distances import matplotlib.pyplot as plt np.random.seed(42) # Generate data on a 2D plane embedded in 10D n = 200 t = np.linspace(0, 4*np.pi, n) X_true = np.column_stack([ t * np.cos(t), t * np.sin(t) ]) # Embed in higher dimensions with some noise W = np.random.randn(2, 10) X_high = X_true @ W + 0.1 * np.random.randn(n, 10) X_high = X_high - X_high.mean(axis=0) # Center # Compute distance matrix D = pairwise_distances(X_high, metric='euclidean') # Apply Classical MDS X_mds, eigenvalues, var_explained = classical_mds(D, n_components=2) # Apply PCA for comparison pca = PCA(n_components=2) X_pca = pca.fit_transform(X_high) # Plot results fig, axes = plt.subplots(1, 3, figsize=(15, 4)) # MDS embedding axes[0].scatter(X_mds[:, 0], X_mds[:, 1], c=t, cmap='viridis', s=20) axes[0].set_title('Classical MDS Embedding') axes[0].set_xlabel('MDS 1') axes[0].set_ylabel('MDS 2') # PCA embedding axes[1].scatter(X_pca[:, 0], X_pca[:, 1], c=t, cmap='viridis', s=20) axes[1].set_title('PCA Embedding') axes[1].set_xlabel('PC 1') axes[1].set_ylabel('PC 2') # Scree plot n_show = min(20, len(eigenvalues)) axes[2].bar(range(1, n_show+1), eigenvalues[:n_show]) axes[2].axhline(y=0, color='r', linestyle='--', alpha=0.5) axes[2].set_xlabel('Component') axes[2].set_ylabel('Eigenvalue') axes[2].set_title('Scree Plot') plt.tight_layout() plt.savefig('classical_mds_demo.png', dpi=150) plt.close() print(f"Variance explained by first 2 MDS components: {var_explained[:2].sum():.3f}") print(f"Variance explained by first 2 PCA components: {pca.explained_variance_ratio_[:2].sum():.3f}") return X_mds, X_pca, eigenvalues if __name__ == "__main__": demonstrate_classical_mds()One of the most beautiful results in multivariate statistics is the equivalence between Classical MDS and PCA when applied to Euclidean distances from centered data. Understanding this equivalence deepens our appreciation for both methods and reveals why they appear in different contexts despite solving related problems.
Setup:
PCA perspective:
Classical MDS perspective:
The equivalence theorem:
$$Z_{\text{MDS}} = Z_{\text{PCA}} \quad \text{(up to orthogonal transformation)}$$
The key insight is the singular value decomposition. If X = UΣW^T, then B = XX^T = UΣ²U^T and C = X^TX/n = WΣ²W^T/n. The MDS coordinates V√Λ_B equal UΣ, which are exactly the PCA scores XW = UΣW^TW = UΣ.
Why does this equivalence matter?
Computational choice:
Conceptual unification:
Extension to kernels:
When do they differ?
| Aspect | PCA | Classical MDS |
|---|---|---|
| Input | Feature matrix X (n × p) | Distance matrix D (n × n) |
| Key matrix | Covariance X^TX/n (p × p) | Gram matrix XX^T (n × n) |
| Objective | Maximize variance | Preserve pairwise distances |
| Computation | O(p³ + np²) for eigendecomposition | O(n³) for eigendecomposition |
| Better when | p << n (many samples) | n << p (few samples, high dim) |
| Kernel extension | Requires kernel formulation | Natural (distances → kernels) |
| Non-Euclidean | Not directly applicable | Can approximate with caveats |
After computing a $k$-dimensional embedding, we need to assess how well it preserves the original distance structure. Several metrics quantify this.
1. Proportion of Variance Explained (PoV)
The ratio of eigenvalue sum determines how much geometric information is captured:
$$\text{PoV}(k) = \frac{\sum_{i=1}^k \lambda_i}{\sum_{i=1}^n |\lambda_i|}$$
For Euclidean inputs, all eigenvalues are non-negative, and this equals the fraction of total variance. Higher PoV indicates better preservation.
2. Residual Sum of Squares (RSS)
Compare original distances to embedded distances:
$$\text{RSS} = \sum_{i < j} (d_{ij} - \hat{d}_{ij})^2$$
where $\hat{d}_{ij} = |\mathbf{x}_i - \mathbf{x}_j|$ in the embedding. Lower is better.
3. Stress (in the MDS sense)
$$\text{Stress} = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j} d_{ij}^2}}$$
Normalized measure of distortion. Common interpretation:
When input distances are truly Euclidean, Classical MDS provides the best k-dimensional embedding in the least-squares sense. The stress is minimized by the eigenvalue solution—no iterative optimization is needed. This is a closed-form optimal solution.
4. Shepard Diagram
A Shepard diagram plots original distances $d_{ij}$ against embedded distances $\hat{d}_{ij}$. For a perfect embedding, all points lie on the diagonal. Deviations reveal systematic distortions:
5. Procrustes Analysis
If you have ground truth coordinates (e.g., in simulation), Procrustes analysis finds the optimal rotation/reflection/scaling to align the embedding with truth, then measures residual error:
$$\text{Procrustes error} = \min_{R \in \mathcal{O}(k)} |X_{\text{true}} - X_{\text{MDS}} R|_F$$
where $\mathcal{O}(k)$ is the set of orthogonal matrices.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
import numpy as npfrom scipy.spatial.distance import cdistimport matplotlib.pyplot as plt def compute_mds_quality_metrics(D_original, X_embedded): """ Compute various quality metrics for an MDS embedding. Parameters: ----------- D_original : ndarray of shape (n, n) Original distance matrix X_embedded : ndarray of shape (n, k) Embedded coordinates Returns: -------- metrics : dict Dictionary containing various quality metrics """ # Compute embedded distances D_embedded = cdist(X_embedded, X_embedded, metric='euclidean') # Extract upper triangular (exclude diagonal) n = D_original.shape[0] upper_tri_idx = np.triu_indices(n, k=1) d_orig = D_original[upper_tri_idx] d_emb = D_embedded[upper_tri_idx] # 1. Residual Sum of Squares rss = np.sum((d_orig - d_emb) ** 2) # 2. Kruskal's Stress-1 stress = np.sqrt(np.sum((d_orig - d_emb)**2) / np.sum(d_orig**2)) # 3. Normalized Stress normalized_stress = np.sqrt(np.sum((d_orig - d_emb)**2) / np.sum(d_emb**2)) # 4. Correlation between distances correlation = np.corrcoef(d_orig, d_emb)[0, 1] # 5. Mean Absolute Error mae = np.mean(np.abs(d_orig - d_emb)) # 6. Root Mean Squared Error rmse = np.sqrt(np.mean((d_orig - d_emb)**2)) metrics = { 'rss': rss, 'stress': stress, 'normalized_stress': normalized_stress, 'correlation': correlation, 'mae': mae, 'rmse': rmse } return metrics, d_orig, d_emb def plot_shepard_diagram(d_original, d_embedded, ax=None): """ Create a Shepard diagram comparing original and embedded distances. """ if ax is None: fig, ax = plt.subplots(figsize=(6, 6)) ax.scatter(d_original, d_embedded, alpha=0.3, s=5) # Add diagonal reference line max_d = max(d_original.max(), d_embedded.max()) ax.plot([0, max_d], [0, max_d], 'r--', linewidth=2, label='Perfect fit') # Compute correlation corr = np.corrcoef(d_original, d_embedded)[0, 1] ax.set_xlabel('Original Distance', fontsize=12) ax.set_ylabel('Embedded Distance', fontsize=12) ax.set_title(f'Shepard Diagram (r = {corr:.4f})', fontsize=14) ax.legend() ax.set_aspect('equal') return ax def demonstrate_quality_metrics(): """ Demonstrate quality metrics on embeddings of different dimensionality. """ np.random.seed(42) # Generate 5D data n = 100 X = np.random.randn(n, 5) X = X - X.mean(axis=0) # True distance matrix D = cdist(X, X, metric='euclidean') # Embed at different dimensions from classical_mds import classical_mds fig, axes = plt.subplots(1, 4, figsize=(16, 4)) for dim, ax in zip([1, 2, 3, 5], axes): X_emb, eigenvals, var_exp = classical_mds(D, n_components=dim) metrics, d_orig, d_emb = compute_mds_quality_metrics(D, X_emb) plot_shepard_diagram(d_orig, d_emb, ax=ax) ax.set_title(f'{dim}D Embedding\nStress={metrics["stress"]:.4f}') plt.tight_layout() plt.savefig('mds_quality_comparison.png', dpi=150) plt.close() # Print metrics table print("\nQuality Metrics by Embedding Dimension:") print("-" * 60) print(f"{'Dim':>4} {'Stress':>10} {'Corr':>10} {'RMSE':>10} {'PoV':>10}") print("-" * 60) for dim in [1, 2, 3, 4, 5]: X_emb, eigenvals, var_exp = classical_mds(D, n_components=dim) metrics, _, _ = compute_mds_quality_metrics(D, X_emb) pov = eigenvals[:dim].sum() / eigenvals.sum() print(f"{dim:>4} {metrics['stress']:>10.4f} {metrics['correlation']:>10.4f} " f"{metrics['rmse']:>10.4f} {pov:>10.4f}") if __name__ == "__main__": demonstrate_quality_metrics()Classical MDS is computationally elegant but faces scalability challenges for large datasets. Let's analyze the computational requirements.
Time Complexity:
Total: $O(n^2 p + n^3)$
For $n = 10,000$ points, the eigendecomposition alone requires $\sim 10^{12}$ operations—potentially hours of computation.
Space Complexity:
Total: $O(n^2)$
For $n = 50,000$ with float64: $50000^2 \times 8 \approx 20$ GB per matrix. This quickly exceeds RAM.
Classical MDS is typically practical for n < 10,000 on standard hardware. For larger datasets, consider Landmark MDS, Nyström approximation, or out-of-sample extensions.
Optimization strategies:
1. Exploit sparsity in distance computation
2. Partial eigendecomposition
3. Landmark MDS (L-MDS)
4. Nyström approximation
5. Out-of-sample extension
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
import numpy as npfrom scipy.linalg import eighfrom scipy.spatial.distance import cdist def landmark_mds(D, n_landmarks, n_components=2, random_state=None): """ Landmark MDS for scalable embedding. Uses a subset of points (landmarks) to compute exact MDS, then projects remaining points. Parameters: ----------- D : ndarray of shape (n, n) Full distance matrix (or can be modified to accept only distances to landmarks) n_landmarks : int Number of landmark points n_components : int Target dimensionality random_state : int, optional Random seed for reproducibility Returns: -------- X : ndarray of shape (n, n_components) Embedded coordinates for all points """ n = D.shape[0] rng = np.random.RandomState(random_state) # Step 1: Select landmarks (random selection, could also use k-means++) landmark_idx = rng.choice(n, size=n_landmarks, replace=False) non_landmark_idx = np.setdiff1d(np.arange(n), landmark_idx) # Step 2: Extract landmark distance submatrix D_landmarks = D[np.ix_(landmark_idx, landmark_idx)] # Step 3: Classical MDS on landmarks D_sq = D_landmarks ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) grand_mean = D_sq.mean() B_landmarks = -0.5 * (D_sq - row_means - col_means + grand_mean) eigenvalues, eigenvectors = eigh(B_landmarks) eigenvalues = eigenvalues[::-1] eigenvectors = eigenvectors[:, ::-1] # Take top k positive eigenvalues k = n_components eigenvalues_k = eigenvalues[:k] eigenvectors_k = eigenvectors[:, :k] # Landmark coordinates X_landmarks = eigenvectors_k * np.sqrt(np.maximum(eigenvalues_k, 0)) # Step 4: Project non-landmark points (Gower's method) # Distance from non-landmarks to landmarks D_to_landmarks = D[np.ix_(non_landmark_idx, landmark_idx)] D_sq_to_landmarks = D_to_landmarks ** 2 # Mean squared distance from landmarks (used in centering) mean_sq_dist_landmarks = D_sq.mean(axis=1) # Mean over landmarks # Project using Gower's formula # X_new = -0.5 * L_k^(-1) * V_k^T * (d_new^2 - mu) # where mu is the mean of squared distances among landmarks Delta_sq = D_sq_to_landmarks - mean_sq_dist_landmarks Lambda_inv = np.diag(1.0 / np.maximum(eigenvalues_k, 1e-10)) X_non_landmarks = -0.5 * Delta_sq @ eigenvectors_k @ Lambda_inv # Step 5: Combine coordinates X = np.zeros((n, n_components)) X[landmark_idx] = X_landmarks X[non_landmark_idx] = X_non_landmarks return X def demo_landmark_mds(): """Compare Landmark MDS with full Classical MDS.""" from sklearn.metrics import pairwise_distances import time np.random.seed(42) # Generate data n = 2000 X_true = np.random.randn(n, 50) # 50D data D = pairwise_distances(X_true) # Full Classical MDS start = time.time() from classical_mds import classical_mds X_full, _, _ = classical_mds(D, n_components=2) time_full = time.time() - start # Landmark MDS with varying landmarks for n_landmarks in [50, 100, 200, 500]: start = time.time() X_lmds = landmark_mds(D, n_landmarks=n_landmarks, n_components=2, random_state=42) time_lmds = time.time() - start # Compute correlation with full MDS corr = np.corrcoef(X_full.flatten(), X_lmds.flatten())[0, 1] print(f"Landmarks: {n_landmarks:>4} | " f"Time: {time_lmds:.3f}s (vs {time_full:.3f}s) | " f"Correlation: {abs(corr):.4f}") if __name__ == "__main__": demo_landmark_mds()Classical Multidimensional Scaling provides a foundational approach to understanding and visualizing data from pairwise distances. Let's consolidate the key concepts:
What's next:
Classical MDS assumes that dissimilarities are Euclidean distances and optimizes a specific (implicit) objective. In the next page, we explore Metric MDS, which explicitly optimizes a stress function, allowing us to handle a broader class of dissimilarities and to weight distance preservation according to importance. Metric MDS introduces iterative optimization, providing a more flexible framework at the cost of computational efficiency.
You now understand the complete mathematical machinery of Classical MDS: from pairwise distances to inner products via double centering, from Gram matrices to coordinates via eigendecomposition, and the deep connection to PCA. Next, we'll see how Metric MDS generalizes this with explicit optimization.