Loading content...
At the heart of every MDS algorithm lies a deceptively simple question: How do we quantify how well an embedding preserves the original distances?
The answer is the stress function—a scalar measure that captures the discrepancy between target dissimilarities and realized distances in the embedding. Understanding stress functions is essential for:
This page provides a comprehensive treatment of stress functions: their mathematical foundations, the relationships between variants, and practical guidelines for interpretation.
By the end of this page, you will understand the mathematical derivation and properties of major stress variants, master techniques for interpreting stress values in context, learn how stress relates to the optimization landscape, and gain practical skills for diagnosing and improving MDS results.
The fundamental MDS problem:
Given a dissimilarity matrix $\Delta = [\delta_{ij}]$ for $n$ objects, find coordinates $X \in \mathbb{R}^{n \times k}$ such that Euclidean distances $d_{ij}(X) = |\mathbf{x}_i - \mathbf{x}_j|$ are close to the dissimilarities.
The stress function family:
All stress functions share a common form:
$$\sigma(X) = \mathcal{N}\left( \sum_{i<j} w_{ij} \cdot \mathcal{L}(\delta_{ij}, d_{ij}(X)) \right)$$
where:
Common loss functions:
Squared loss dominates for several reasons: (1) It's differentiable everywhere, enabling gradient-based optimization; (2) It heavily penalizes large errors, focusing the algorithm on fixing the worst discrepancies; (3) It connects to least squares theory and probabilistic models (maximum likelihood under Gaussian noise).
Properties of a good stress function:
Degenerate solutions:
A degenerate solution occurs when all points collapse to the origin ($d_{ij} = 0$ for all pairs). Raw stress would be minimized by making all $d_{ij} = 0$, which is unhelpful. Normalization schemes prevent this.
Joseph Kruskal's 1964 papers established the formulations that became standard in MDS practice.
Raw Stress:
$$\sigma_{raw}(X) = \sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2$$
This is the unnormalized sum of squared errors. It's sensitive to scale and not comparable across datasets.
Stress-1 (Kruskal's Stress):
$$\sigma_1(X) = \sqrt{\frac{\sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2}{\sum_{i<j} w_{ij} d_{ij}^2}}$$
Normalizes by the sum of squared embedded distances. This prevents degenerate solutions (collapsing points) because the denominator goes to zero as points collapse.
Stress-2:
$$\sigma_2(X) = \sqrt{\frac{\sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2}{\sum_{i<j} w_{ij}(d_{ij} - \bar{d})^2}}$$
Normalizes by the variance of embedded distances. Penalizes solutions where all distances are nearly equal.
| Formulation | Normalizer | Degenerate-Proof | Scale Invariant |
|---|---|---|---|
| Raw stress | None | No | No |
| Stress-1 | Σd²ᵢⱼ | Yes | Partially |
| Stress-2 | Var(dᵢⱼ) | Yes | Yes |
| Normalized raw | Σδ²ᵢⱼ | No | Yes |
Mathematical properties of Stress-1:
Range: $\sigma_1 \in [0, \infty)$. In practice, values rarely exceed 1.
Scale: If all $d_{ij} = c \cdot \delta_{ij}$ for some constant $c$, then $\sigma_1 = 0$. The embedding can freely scale.
Rotation/reflection invariance: Stress is unchanged by rigid transformations (rotation, reflection, translation).
Not translation invariant for raw stress: The denominator in Stress-1 depends on distances from origin, so implicitly includes translation.
Computing gradients:
For optimization, we need $\frac{\partial \sigma_1}{\partial \mathbf{x}_i}$. Using the chain rule:
$$\frac{\partial \sigma_1}{\partial \mathbf{x}_i} = \frac{1}{\sigma_1} \cdot \frac{\partial \sigma_1^2}{\partial \mathbf{x}_i}$$
where:
$$\frac{\partial d_{ij}}{\partial \mathbf{x}_i} = \frac{\mathbf{x}_i - \mathbf{x}j}{d{ij}}$$
In practice, Stress-1 (or its variant STRESS1) is the default in most software. When papers report 'stress' without qualification, they typically mean Stress-1. Always check documentation, as implementations vary.
Beyond Kruskal's formulations, several alternative stress functions serve specific purposes.
SSTRESS (Squared Stress on Squared Distances):
$$\text{SSTRESS} = \sum_{i<j} w_{ij}(\delta_{ij}^2 - d_{ij}^2)^2$$
Also written with normalization: $$\text{SSTRESS} = \sqrt{\frac{\sum_{i<j} (\delta_{ij}^2 - d_{ij}^2)^2}{\sum_{i<j} \delta_{ij}^4}}$$
Why use SSTRESS?
Sammon Mapping Stress:
$$E_{Sammon} = \frac{1}{\sum_{i<j}\delta_{ij}} \sum_{i<j} \frac{(\delta_{ij} - d_{ij})^2}{\delta_{ij}}$$
Properties:
The 1/δ weighting makes Sammon mapping very sensitive to small distances. If some δᵢⱼ ≈ 0 (near-duplicate points), the weight goes to infinity. Always check for duplicate/nearly-identical points before applying Sammon mapping.
Robust Stress:
Replace squared loss with robust M-estimators:
$$\sigma_{robust} = \sum_{i<j} \rho(\delta_{ij} - d_{ij})$$
Common choices for $\rho$:
Huber loss: $$\rho_{Huber}(r) = \begin{cases} \frac{1}{2}r^2 & |r| \leq \delta \ \delta|r| - \frac{1}{2}\delta^2 & |r| > \delta \end{cases}$$
Tukey bisquare: $$\rho_{Tukey}(r) = \begin{cases} \frac{c^2}{6}\left[1 - \left(1 - \frac{r^2}{c^2}\right)^3\right] & |r| \leq c \ \frac{c^2}{6} & |r| > c \end{cases}$$
Robust stress reduces the influence of outlier distances, useful when some dissimilarities are unreliable.
Normalized Disparities Stress (for Non-metric MDS):
In non-metric MDS, disparities $\hat{d}_{ij}$ replace targets:
$$\sigma_{NM} = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j} d_{ij}^2}}$$
where $\hat{d}$ is the isotonic regression of $d$ on the ordering induced by $\delta$.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
import numpy as npfrom scipy.spatial.distance import cdist def compute_all_stress_variants(D_target, X, weights=None): """ Compute multiple stress formulations for an MDS embedding. Parameters: ----------- D_target : ndarray (n, n) Target dissimilarity matrix X : ndarray (n, k) Embedding coordinates weights : ndarray (n, n), optional Weight matrix Returns: -------- dict : Dictionary of stress values by formulation """ n = X.shape[0] D_embed = cdist(X, X, metric='euclidean') # Extract upper triangle triu = np.triu_indices(n, k=1) delta = D_target[triu] d = D_embed[triu] if weights is None: w = np.ones(len(delta)) else: w = weights[triu] # Squared differences diff = delta - d diff_sq = diff ** 2 weighted_diff_sq = w * diff_sq # Raw stress raw_stress = np.sum(weighted_diff_sq) # Stress-1 (Kruskal) stress1_num = np.sum(weighted_diff_sq) stress1_den = np.sum(w * d ** 2) stress1 = np.sqrt(stress1_num / stress1_den) if stress1_den > 0 else np.inf # Stress-2 d_mean = np.average(d, weights=w) stress2_den = np.sum(w * (d - d_mean) ** 2) stress2 = np.sqrt(stress1_num / stress2_den) if stress2_den > 0 else np.inf # Normalized raw stress (by target distances) normalized_raw = np.sqrt(raw_stress / np.sum(w * delta ** 2)) # SSTRESS (squared distances) delta_sq = delta ** 2 d_sq = d ** 2 sstress_num = np.sum(w * (delta_sq - d_sq) ** 2) sstress_den = np.sum(w * delta_sq ** 2) sstress = np.sqrt(sstress_num / sstress_den) if sstress_den > 0 else np.inf # Sammon stress with np.errstate(divide='ignore', invalid='ignore'): sammon_terms = (diff_sq / delta) sammon_terms = np.where(np.isfinite(sammon_terms), sammon_terms, 0) sammon = np.sum(w * sammon_terms) / np.sum(w * delta) # Correlation-based "stress" (actually a measure of fit) correlation = np.corrcoef(delta, d)[0, 1] return { 'raw_stress': raw_stress, 'stress1': stress1, 'stress2': stress2, 'normalized_raw': normalized_raw, 'sstress': sstress, 'sammon': sammon, 'correlation': correlation } def compare_stress_variants(): """Compare stress variants on different data types.""" np.random.seed(42) # Generate test data: 3D spiral n = 100 t = np.linspace(0, 4*np.pi, n) X_3d = np.column_stack([ t * np.cos(t), t * np.sin(t), t ]) D = cdist(X_3d, X_3d, metric='euclidean') # 2D embedding via classical MDS D_sq = D ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) grand_mean = D_sq.mean() B = -0.5 * (D_sq - row_means - col_means + grand_mean) eigenvalues, eigenvectors = np.linalg.eigh(B) idx = np.argsort(eigenvalues)[::-1][:2] eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] eigenvalues = np.maximum(eigenvalues, 0) X_2d = eigenvectors * np.sqrt(eigenvalues) # Compute all stress variants stresses = compute_all_stress_variants(D, X_2d) print("Stress Variants for 3D→2D Spiral Embedding:") print("-" * 50) for name, value in stresses.items(): print(f"{name:20s}: {value:.6f}") return stresses if __name__ == "__main__": compare_stress_variants()One of the most common questions in MDS practice is: Is my stress value good enough?
Kruskal's original guidelines (1964):
| Stress (σ₁) | Goodness of Fit | Interpretation |
|---|---|---|
| < 0.025 | Excellent | Near-perfect representation |
| 0.025 – 0.05 | Good | Reliable representation |
| 0.05 – 0.10 | Fair | Interpretable with caution |
| 0.10 – 0.20 | Poor | May be misleading |
0.20 | Bad | Not interpretable |
Kruskal's guidelines are rough heuristics from the 1960s. Modern understanding recognizes that expected stress depends critically on n (number of points), k (dimensions), and data structure. A stress of 0.15 might be excellent for n=500 in k=2 but terrible for n=20.
Factors affecting expected stress:
1. Number of points (n):
More points means more pairwise constraints to satisfy. Random data in $k$ dimensions has expected stress that increases with $n$. Rule of thumb: stress increases roughly as $\sqrt{n}$ for fixed $k$.
2. Target dimensionality (k):
Higher $k$ allows better fit. Stress decreases as $k$ increases. The 'elbow' in stress vs $k$ indicates intrinsic dimensionality.
3. Data structure:
4. Measurement noise:
Even if data lies in $k$ dimensions, measurement error adds irreducible stress.
Simulation-based calibration:
The gold standard for interpreting stress is comparison to:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
import numpy as npfrom scipy.spatial.distance import cdistfrom sklearn.manifold import MDS def expected_stress_simulation(n, k, n_simulations=100, random_state=None): """ Estimate expected stress for random data via Monte Carlo simulation. Parameters: ----------- n : int Number of points k : int Target dimensionality n_simulations : int Number of random datasets to generate random_state : int Random seed Returns: -------- mean_stress : float Average stress across simulations std_stress : float Standard deviation of stress stress_values : list All stress values """ rng = np.random.RandomState(random_state) stress_values = [] for i in range(n_simulations): # Generate random Euclidean data in k dimensions X_true = rng.randn(n, k) D = cdist(X_true, X_true, metric='euclidean') # Apply MDS mds = MDS(n_components=k, dissimilarity='precomputed', n_init=1, max_iter=100, random_state=rng.randint(100000)) mds.fit(D) stress_values.append(mds.stress_) return np.mean(stress_values), np.std(stress_values), stress_values def permutation_test_stress(D_target, X_embedded, n_permutations=100, random_state=None): """ Permutation test: compare actual stress to stress from randomized data. Parameters: ----------- D_target : ndarray (n, n) Original dissimilarity matrix X_embedded : ndarray (n, k) MDS embedding to evaluate n_permutations : int Number of random permutations random_state : int Random seed Returns: -------- actual_stress : float Stress of actual embedding permutation_stresses : list Stresses for permuted data p_value : float Probability of observing this stress or lower by chance """ rng = np.random.RandomState(random_state) n = D_target.shape[0] k = X_embedded.shape[1] # Compute actual stress D_embedded = cdist(X_embedded, X_embedded, metric='euclidean') triu = np.triu_indices(n, k=1) delta = D_target[triu] d = D_embedded[triu] actual_stress = np.sqrt(np.sum((delta - d)**2) / np.sum(d**2)) # Permutation stresses perm_stresses = [] for _ in range(n_permutations): # Permute rows and columns of dissimilarity matrix perm = rng.permutation(n) D_perm = D_target[perm][:, perm] # Fit MDS to permuted data mds = MDS(n_components=k, dissimilarity='precomputed', n_init=1, max_iter=100, random_state=rng.randint(100000)) mds.fit(D_perm) perm_stresses.append(mds.stress_) # P-value: fraction of permutations with stress ≤ actual p_value = np.mean([s <= actual_stress for s in perm_stresses]) return actual_stress, perm_stresses, p_value def stress_vs_n_k_table(): """Generate reference table of expected stress values.""" print("Expected Stress (σ₁) for Random Euclidean Data") print("=" * 55) print(f"{'n':>6} | {'k=1':>8} {'k=2':>8} {'k=3':>8} {'k=4':>8}") print("-" * 55) for n in [10, 20, 50, 100, 200]: row = f"{n:>6} |" for k in [1, 2, 3, 4]: if k >= n: row += f" {'N/A':>8}" else: mean_s, std_s, _ = expected_stress_simulation( n, k, n_simulations=20, random_state=42 ) row += f" {mean_s:>8.4f}" print(row) def demonstrate_stress_elbow(): """Demonstrate stress elbow plot for choosing dimensionality.""" np.random.seed(42) # Generate data with intrinsic 3D structure n = 150 t = np.linspace(0, 2*np.pi, n) X_true = np.column_stack([ (2 + np.cos(8*t)) * np.cos(t), (2 + np.cos(8*t)) * np.sin(t), np.sin(8*t) ]) D = cdist(X_true, X_true, metric='euclidean') # Embed at different dimensions stresses = [] for k in range(1, 8): mds = MDS(n_components=k, dissimilarity='precomputed', n_init=4, max_iter=300, random_state=42) mds.fit(D) stresses.append(mds.stress_) print(f"k={k}: stress = {mds.stress_:.4f}") print("\nElbow appears around k=3 (intrinsic dimensionality)") return stresses if __name__ == "__main__": print("Stress Elbow Analysis:") print("-" * 40) demonstrate_stress_elbow() print() print("Reference Table:") print("-" * 40) stress_vs_n_k_table()Understanding the stress function's optimization landscape helps diagnose and fix MDS problems.
Non-convexity:
Stress is a non-convex function of the coordinates $X$. This means:
Invariances:
Stress has several invariances that create continuous families of equivalent solutions:
Translation: Adding a constant to all coordinates doesn't change distances $$X \mapsto X + \mathbf{c}$$
Rotation: Orthogonal transformations preserve distances $$X \mapsto XQ \text{ where } Q^TQ = I$$
Reflection: Flipping signs preserves distances $$X \mapsto X \cdot \text{diag}(\pm 1, \ldots, \pm 1)$$
These invariances mean the solution is unique only up to rigid transformations.
Most MDS implementations center the solution (Σᵢ xᵢ = 0) to remove translational ambiguity. This is implicit in Classical MDS (double centering) and can be enforced in iterative methods via the pseudoinverse of the Laplacian.
Local minima in practice:
When are local minima problematic?
When are local minima benign?
Diagnosing local minima:
The Shepard diagram is the primary visual tool for assessing MDS quality. It plots original dissimilarities against embedded distances.
Construction:
Ideal appearance:
For perfect Metric MDS: all points on the diagonal ($d = \delta$) For perfect Non-metric MDS: all points on a monotonically increasing curve
Interpreting deviations:
| Pattern | Appearance | Interpretation |
|---|---|---|
| Tight diagonal | Points clustered on y=x line | Excellent metric fit |
| Tight monotonic | Points on increasing curve | Good non-metric fit |
| Systematic bend | S-curve or other shape | Nonlinear scaling mismatch |
| Wide scatter | Points dispersed around trend | Poor fit; many distance distortions |
| Outliers | A few points far from trend | Some pairs badly distorted |
| Heteroscedasticity | Scatter increases with δ | Large distances less reliable |
For large n, Shepard diagrams become cluttered. Enhancements: (1) 2D histogram or heatmap instead of scatter, (2) Show only a random sample of pairs, (3) Add LOESS smooth line, (4) Color points by local density or another variable.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
import numpy as npimport matplotlib.pyplot as pltfrom scipy.spatial.distance import cdistfrom sklearn.isotonic import IsotonicRegression def enhanced_shepard_diagram(D_target, X_embedded, sample_frac=1.0, metric=True, ax=None, random_state=None): """ Create an enhanced Shepard diagram. Parameters: ----------- D_target : ndarray (n, n) Target dissimilarity matrix X_embedded : ndarray (n, k) Embedding coordinates sample_frac : float Fraction of pairs to plot (for large n) metric : bool If True, show diagonal reference. If False, show isotonic fit. ax : matplotlib axis Optional axis to plot on random_state : int Random seed for sampling """ rng = np.random.RandomState(random_state) n = X_embedded.shape[0] D_embedded = cdist(X_embedded, X_embedded, metric='euclidean') # Extract upper triangle triu = np.triu_indices(n, k=1) delta = D_target[triu] d = D_embedded[triu] # Sample if needed n_pairs = len(delta) if sample_frac < 1.0: n_sample = int(sample_frac * n_pairs) idx = rng.choice(n_pairs, size=n_sample, replace=False) delta_plot = delta[idx] d_plot = d[idx] else: delta_plot = delta d_plot = d if ax is None: fig, ax = plt.subplots(figsize=(8, 8)) # Scatter plot ax.scatter(delta_plot, d_plot, alpha=0.3, s=5, c='steelblue') # Reference line(s) max_val = max(delta.max(), d.max()) if metric: # Diagonal line for metric MDS ax.plot([0, max_val], [0, max_val], 'r--', linewidth=2, label='Perfect fit (d = δ)') else: # Isotonic regression line for non-metric MDS order = np.argsort(delta) iso = IsotonicRegression() d_hat = iso.fit_transform(delta[order], d[order]) ax.plot(delta[order], d_hat, 'r-', linewidth=2, label='Isotonic fit (disparities)') # Compute stress if metric: stress = np.sqrt(np.sum((delta - d)**2) / np.sum(d**2)) else: order = np.argsort(delta) iso = IsotonicRegression() d_hat = iso.fit_transform(delta[order], d[order]) d_hat_full = np.zeros_like(d) d_hat_full[order] = d_hat stress = np.sqrt(np.sum((d - d_hat_full)**2) / np.sum(d**2)) # Correlation corr = np.corrcoef(delta, d)[0, 1] ax.set_xlabel('Target Dissimilarity (δ)', fontsize=12) ax.set_ylabel('Embedded Distance (d)', fontsize=12) ax.set_title(f'Shepard Diagram (stress={stress:.4f}, r={corr:.4f})', fontsize=14) ax.legend() ax.set_aspect('equal') ax.grid(True, alpha=0.3) return ax, stress, corr def compare_shepard_diagrams(): """Compare Shepard diagrams for good and poor embeddings.""" np.random.seed(42) # Generate 3D data n = 100 X_true = np.random.randn(n, 3) D = cdist(X_true, X_true, metric='euclidean') # Good embedding: 3D MDS from sklearn.manifold import MDS mds_3d = MDS(n_components=3, dissimilarity='precomputed', random_state=42) X_3d = mds_3d.fit_transform(D) # Poor embedding: 1D MDS mds_1d = MDS(n_components=1, dissimilarity='precomputed', random_state=42) X_1d = mds_1d.fit_transform(D) # Plot fig, axes = plt.subplots(1, 2, figsize=(14, 6)) enhanced_shepard_diagram(D, X_3d, metric=True, ax=axes[0]) axes[0].set_title('3D Embedding (matches intrinsic dim)') enhanced_shepard_diagram(D, X_1d, metric=True, ax=axes[1]) axes[1].set_title('1D Embedding (too few dimensions)') plt.tight_layout() plt.savefig('shepard_comparison.png', dpi=150) plt.close() if __name__ == "__main__": compare_shepard_diagrams()Armed with understanding of stress functions, here's a practical framework for MDS analysis.
• Stress > 0.20 (for k=2): Embedding may be misleading • Wide variation across runs: Unstable solution • Shepard diagram with S-curve: Wrong stress type (metric vs non-metric) • Points collapsed together: Degenerate solution • k chosen to make stress 'look good': May be overfitting with too many dimensions
Reporting MDS results:
When publishing or presenting MDS results, include:
Comparing across studies:
Stress values are comparable only for:
When these differ, use relative comparisons (e.g., "20% lower stress than baseline").
Stress functions are the mathematical core of MDS—they define what we mean by a 'good' embedding.
What's next:
With a solid understanding of MDS mechanics—Classical, Metric, Non-metric, and stress functions—the final page covers limitations and practical considerations. We'll discuss when MDS fails, computational bottlenecks, relationship to other methods, and best practices for real-world applications.
You now have deep knowledge of stress functions: their mathematical properties, the differences between variants, techniques for interpretation, and practical guidelines. Next, we examine MDS limitations and when to consider alternative approaches.