Multidimensional Scaling - Learning Module

Loading content...

0/245

Stress Functions

Measuring Embedding Quality

At the heart of every MDS algorithm lies a deceptively simple question: How do we quantify how well an embedding preserves the original distances?

The answer is the stress function—a scalar measure that captures the discrepancy between target dissimilarities and realized distances in the embedding. Understanding stress functions is essential for:

Algorithm selection: Different stress functions have different properties; choosing the right one affects results
Hyperparameter tuning: Target dimensionality, weights, and other choices all affect achievable stress
Result interpretation: What does a stress of 0.15 mean? Is it good or bad?
Debugging: High stress indicates either a problem with the algorithm or fundamental incompatibility between data and low-dimensional Euclidean space

This page provides a comprehensive treatment of stress functions: their mathematical foundations, the relationships between variants, and practical guidelines for interpretation.

What You Will Learn

By the end of this page, you will understand the mathematical derivation and properties of major stress variants, master techniques for interpreting stress values in context, learn how stress relates to the optimization landscape, and gain practical skills for diagnosing and improving MDS results.

Foundations of Stress

The fundamental MDS problem:

Given a dissimilarity matrix $\Delta = [\delta_{ij}]$ for $n$ objects, find coordinates $X \in \mathbb{R}^{n \times k}$ such that Euclidean distances $d_{ij}(X) = |\mathbf{x}_i - \mathbf{x}_j|$ are close to the dissimilarities.

The stress function family:

All stress functions share a common form:

$$\sigma(X) = \mathcal{N}\left( \sum_{i<j} w_{ij} \cdot \mathcal{L}(\delta_{ij}, d_{ij}(X)) \right)$$

where:

$w_{ij}$ are weights (importance of pair $(i,j)$)
$\mathcal{L}(\delta, d)$ is a loss function measuring discrepancy
$\mathcal{N}$ is a normalization operator

Common loss functions:

Squared loss: $\mathcal{L}(\delta, d) = (\delta - d)^2$ (standard)
Absolute loss: $\mathcal{L}(\delta, d) = |\delta - d|$ (robust)
Squared difference on squares: $\mathcal{L}(\delta, d) = (\delta^2 - d^2)^2$ (SSTRESS)
Relative loss: $\mathcal{L}(\delta, d) = \frac{(\delta - d)^2}{\delta}$ (Sammon)

Why Squared Loss?

Squared loss dominates for several reasons: (1) It's differentiable everywhere, enabling gradient-based optimization; (2) It heavily penalizes large errors, focusing the algorithm on fixing the worst discrepancies; (3) It connects to least squares theory and probabilistic models (maximum likelihood under Gaussian noise).

Properties of a good stress function:

Non-negativity: $\sigma(X) \geq 0$
Zero at perfect fit: $\sigma(X) = 0$ if and only if $d_{ij}(X) = \delta_{ij}$ for all $i,j$
Scale invariance (for comparison): Stress should be comparable across datasets
Smoothness: Enables gradient-based optimization
Resistance to degenerate solutions: Stress should not be trivially minimized

Degenerate solutions:

A degenerate solution occurs when all points collapse to the origin ($d_{ij} = 0$ for all pairs). Raw stress would be minimized by making all $d_{ij} = 0$, which is unhelpful. Normalization schemes prevent this.

Kruskal's Stress Formulations

Joseph Kruskal's 1964 papers established the formulations that became standard in MDS practice.

Raw Stress:

$$\sigma_{raw}(X) = \sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2$$

This is the unnormalized sum of squared errors. It's sensitive to scale and not comparable across datasets.

Stress-1 (Kruskal's Stress):

$$\sigma_1(X) = \sqrt{\frac{\sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2}{\sum_{i<j} w_{ij} d_{ij}^2}}$$

Normalizes by the sum of squared embedded distances. This prevents degenerate solutions (collapsing points) because the denominator goes to zero as points collapse.

Stress-2:

$$\sigma_2(X) = \sqrt{\frac{\sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2}{\sum_{i<j} w_{ij}(d_{ij} - \bar{d})^2}}$$

Normalizes by the variance of embedded distances. Penalizes solutions where all distances are nearly equal.

Comparison of Stress Normalization
Formulation	Normalizer	Degenerate-Proof	Scale Invariant
Raw stress	None	No	No
Stress-1	Σd²ᵢⱼ	Yes	Partially
Stress-2	Var(dᵢⱼ)	Yes	Yes
Normalized raw	Σδ²ᵢⱼ	No	Yes

Mathematical properties of Stress-1:

Range: $\sigma_1 \in [0, \infty)$. In practice, values rarely exceed 1.
Scale: If all $d_{ij} = c \cdot \delta_{ij}$ for some constant $c$, then $\sigma_1 = 0$. The embedding can freely scale.
Rotation/reflection invariance: Stress is unchanged by rigid transformations (rotation, reflection, translation).
Not translation invariant for raw stress: The denominator in Stress-1 depends on distances from origin, so implicitly includes translation.

Computing gradients:

For optimization, we need $\frac{\partial \sigma_1}{\partial \mathbf{x}_i}$. Using the chain rule:

$$\frac{\partial \sigma_1}{\partial \mathbf{x}_i} = \frac{1}{\sigma_1} \cdot \frac{\partial \sigma_1^2}{\partial \mathbf{x}_i}$$

where:

$$\frac{\partial d_{ij}}{\partial \mathbf{x}_i} = \frac{\mathbf{x}_i - \mathbf{x}j}{d{ij}}$$

Stress-1 Is Standard

In practice, Stress-1 (or its variant STRESS1) is the default in most software. When papers report 'stress' without qualification, they typically mean Stress-1. Always check documentation, as implementations vary.

Alternative Stress Formulations

Beyond Kruskal's formulations, several alternative stress functions serve specific purposes.

SSTRESS (Squared Stress on Squared Distances):

$$\text{SSTRESS} = \sum_{i<j} w_{ij}(\delta_{ij}^2 - d_{ij}^2)^2$$

Also written with normalization: $$\text{SSTRESS} = \sqrt{\frac{\sum_{i<j} (\delta_{ij}^2 - d_{ij}^2)^2}{\sum_{i<j} \delta_{ij}^4}}$$

Why use SSTRESS?

$d_{ij}^2 = \sum_s (x_{is} - x_{js})^2$ is polynomial in coordinates (easier gradients)
Avoids square root in distance computation
Connection to Young-Householder transformation

Sammon Mapping Stress:

$$E_{Sammon} = \frac{1}{\sum_{i<j}\delta_{ij}} \sum_{i<j} \frac{(\delta_{ij} - d_{ij})^2}{\delta_{ij}}$$

Properties:

Weights errors by $1/\delta_{ij}$: small distances matter more
Emphasizes local neighborhood preservation
Precursor to modern methods like t-SNE

Sammon Mapping Sensitivity

The 1/δ weighting makes Sammon mapping very sensitive to small distances. If some δᵢⱼ ≈ 0 (near-duplicate points), the weight goes to infinity. Always check for duplicate/nearly-identical points before applying Sammon mapping.

Robust Stress:

Replace squared loss with robust M-estimators:

$$\sigma_{robust} = \sum_{i<j} \rho(\delta_{ij} - d_{ij})$$

Common choices for $\rho$:

Huber loss: $$\rho_{Huber}(r) = \begin{cases} \frac{1}{2}r^2 & |r| \leq \delta \ \delta|r| - \frac{1}{2}\delta^2 & |r| > \delta \end{cases}$$
Tukey bisquare: $$\rho_{Tukey}(r) = \begin{cases} \frac{c^2}{6}\left[1 - \left(1 - \frac{r^2}{c^2}\right)^3\right] & |r| \leq c \ \frac{c^2}{6} & |r| > c \end{cases}$$

Robust stress reduces the influence of outlier distances, useful when some dissimilarities are unreliable.

Normalized Disparities Stress (for Non-metric MDS):

In non-metric MDS, disparities $\hat{d}_{ij}$ replace targets:

$$\sigma_{NM} = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j} d_{ij}^2}}$$

where $\hat{d}$ is the isotonic regression of $d$ on the ordering induced by $\delta$.

stress_variants.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
import numpy as np
from scipy.spatial.distance import cdist
 
def compute_all_stress_variants(D_target, X, weights=None):
    """
    Compute multiple stress formulations for an MDS embedding.
    
    Parameters:
    -----------
    D_target : ndarray (n, n)
        Target dissimilarity matrix
    X : ndarray (n, k)  
        Embedding coordinates
    weights : ndarray (n, n), optional
        Weight matrix
        
    Returns:
    --------
    dict : Dictionary of stress values by formulation
    """
    n = X.shape[0]
    D_embed = cdist(X, X, metric='euclidean')
    
    # Extract upper triangle
    triu = np.triu_indices(n, k=1)
    delta = D_target[triu]
    d = D_embed[triu]
    
    if weights is None:
        w = np.ones(len(delta))
    else:
        w = weights[triu]
    
    # Squared differences
    diff = delta - d
    diff_sq = diff ** 2
    weighted_diff_sq = w * diff_sq
    
    # Raw stress
    raw_stress = np.sum(weighted_diff_sq)
    
    # Stress-1 (Kruskal)
    stress1_num = np.sum(weighted_diff_sq)
    stress1_den = np.sum(w * d ** 2)
    stress1 = np.sqrt(stress1_num / stress1_den) if stress1_den > 0 else np.inf
    
    # Stress-2
    d_mean = np.average(d, weights=w)
    stress2_den = np.sum(w * (d - d_mean) ** 2)
    stress2 = np.sqrt(stress1_num / stress2_den) if stress2_den > 0 else np.inf
    
    # Normalized raw stress (by target distances)
    normalized_raw = np.sqrt(raw_stress / np.sum(w * delta ** 2))
    
    # SSTRESS (squared distances)
    delta_sq = delta ** 2
    d_sq = d ** 2
    sstress_num = np.sum(w * (delta_sq - d_sq) ** 2)
    sstress_den = np.sum(w * delta_sq ** 2)
    sstress = np.sqrt(sstress_num / sstress_den) if sstress_den > 0 else np.inf
    
    # Sammon stress
    with np.errstate(divide='ignore', invalid='ignore'):
        sammon_terms = (diff_sq / delta)
    sammon_terms = np.where(np.isfinite(sammon_terms), sammon_terms, 0)
    sammon = np.sum(w * sammon_terms) / np.sum(w * delta)
    
    # Correlation-based "stress" (actually a measure of fit)
    correlation = np.corrcoef(delta, d)[0, 1]
    
    return {
        'raw_stress': raw_stress,
        'stress1': stress1,
        'stress2': stress2,
        'normalized_raw': normalized_raw,
        'sstress': sstress,
        'sammon': sammon,
        'correlation': correlation
    }
 
 
def compare_stress_variants():
    """Compare stress variants on different data types."""
    np.random.seed(42)
    
    # Generate test data: 3D spiral
    n = 100
    t = np.linspace(0, 4*np.pi, n)
    X_3d = np.column_stack([
        t * np.cos(t),
        t * np.sin(t),
        t
    ])
    
    D = cdist(X_3d, X_3d, metric='euclidean')
    
    # 2D embedding via classical MDS
    D_sq = D ** 2
    row_means = D_sq.mean(axis=1, keepdims=True)
    col_means = D_sq.mean(axis=0, keepdims=True)
    grand_mean = D_sq.mean()
    B = -0.5 * (D_sq - row_means - col_means + grand_mean)
    
    eigenvalues, eigenvectors = np.linalg.eigh(B)
    idx = np.argsort(eigenvalues)[::-1][:2]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    eigenvalues = np.maximum(eigenvalues, 0)
    X_2d = eigenvectors * np.sqrt(eigenvalues)
    
    # Compute all stress variants
    stresses = compute_all_stress_variants(D, X_2d)
    
    print("Stress Variants for 3D→2D Spiral Embedding:")
    print("-" * 50)
    for name, value in stresses.items():
        print(f"{name:20s}: {value:.6f}")
    
    return stresses
 
 
if __name__ == "__main__":
    compare_stress_variants()

Interpreting Stress Values

One of the most common questions in MDS practice is: Is my stress value good enough?

Kruskal's original guidelines (1964):

Kruskal's Stress Interpretation
Stress (σ₁)	Goodness of Fit	Interpretation
< 0.025	Excellent	Near-perfect representation
0.025 – 0.05	Good	Reliable representation
0.05 – 0.10	Fair	Interpretable with caution
0.10 – 0.20	Poor	May be misleading
0.20	Bad	Not interpretable

Context Matters!

Kruskal's guidelines are rough heuristics from the 1960s. Modern understanding recognizes that expected stress depends critically on n (number of points), k (dimensions), and data structure. A stress of 0.15 might be excellent for n=500 in k=2 but terrible for n=20.

Factors affecting expected stress:

1. Number of points (n):

More points means more pairwise constraints to satisfy. Random data in $k$ dimensions has expected stress that increases with $n$. Rule of thumb: stress increases roughly as $\sqrt{n}$ for fixed $k$.

2. Target dimensionality (k):

Higher $k$ allows better fit. Stress decreases as $k$ increases. The 'elbow' in stress vs $k$ indicates intrinsic dimensionality.

3. Data structure:

Data on a $k$-dimensional manifold: should achieve low stress in $k$ dimensions
Noisy Euclidean data: moderate stress due to noise
Non-Euclidean dissimilarities: fundamentally cannot be embedded; expect higher stress

4. Measurement noise:

Even if data lies in $k$ dimensions, measurement error adds irreducible stress.

Simulation-based calibration:

The gold standard for interpreting stress is comparison to:

Random permutation: Permute dissimilarity matrix randomly. Actual stress should be much lower.
Monte Carlo simulation: Generate random data of same $n$ and $k$; compute expected stress distribution.

stress_interpretation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.manifold import MDS
 
def expected_stress_simulation(n, k, n_simulations=100, random_state=None):
    """
    Estimate expected stress for random data via Monte Carlo simulation.
    
    Parameters:
    -----------
    n : int
        Number of points
    k : int
        Target dimensionality
    n_simulations : int
        Number of random datasets to generate
    random_state : int
        Random seed
        
    Returns:
    --------
    mean_stress : float
        Average stress across simulations
    std_stress : float
        Standard deviation of stress
    stress_values : list
        All stress values
    """
    rng = np.random.RandomState(random_state)
    stress_values = []
    
    for i in range(n_simulations):
        # Generate random Euclidean data in k dimensions
        X_true = rng.randn(n, k)
        D = cdist(X_true, X_true, metric='euclidean')
        
        # Apply MDS
        mds = MDS(n_components=k, dissimilarity='precomputed', 
                  n_init=1, max_iter=100, random_state=rng.randint(100000))
        mds.fit(D)
        stress_values.append(mds.stress_)
    
    return np.mean(stress_values), np.std(stress_values), stress_values
 
 
def permutation_test_stress(D_target, X_embedded, n_permutations=100, random_state=None):
    """
    Permutation test: compare actual stress to stress from randomized data.
    
    Parameters:
    -----------
    D_target : ndarray (n, n)
        Original dissimilarity matrix
    X_embedded : ndarray (n, k)
        MDS embedding to evaluate
    n_permutations : int
        Number of random permutations
    random_state : int
        Random seed
        
    Returns:
    --------
    actual_stress : float
        Stress of actual embedding
    permutation_stresses : list
        Stresses for permuted data
    p_value : float
        Probability of observing this stress or lower by chance
    """
    rng = np.random.RandomState(random_state)
    n = D_target.shape[0]
    k = X_embedded.shape[1]
    
    # Compute actual stress
    D_embedded = cdist(X_embedded, X_embedded, metric='euclidean')
    triu = np.triu_indices(n, k=1)
    delta = D_target[triu]
    d = D_embedded[triu]
    actual_stress = np.sqrt(np.sum((delta - d)**2) / np.sum(d**2))
    
    # Permutation stresses
    perm_stresses = []
    for _ in range(n_permutations):
        # Permute rows and columns of dissimilarity matrix
        perm = rng.permutation(n)
        D_perm = D_target[perm][:, perm]
        
        # Fit MDS to permuted data
        mds = MDS(n_components=k, dissimilarity='precomputed',
                  n_init=1, max_iter=100, random_state=rng.randint(100000))
        mds.fit(D_perm)
        perm_stresses.append(mds.stress_)
    
    # P-value: fraction of permutations with stress ≤ actual
    p_value = np.mean([s <= actual_stress for s in perm_stresses])
    
    return actual_stress, perm_stresses, p_value
 
 
def stress_vs_n_k_table():
    """Generate reference table of expected stress values."""
    print("Expected Stress (σ₁) for Random Euclidean Data")
    print("=" * 55)
    print(f"{'n':>6} | {'k=1':>8} {'k=2':>8} {'k=3':>8} {'k=4':>8}")
    print("-" * 55)
    
    for n in [10, 20, 50, 100, 200]:
        row = f"{n:>6} |"
        for k in [1, 2, 3, 4]:
            if k >= n:
                row += f" {'N/A':>8}"
            else:
                mean_s, std_s, _ = expected_stress_simulation(
                    n, k, n_simulations=20, random_state=42
                )
                row += f" {mean_s:>8.4f}"
        print(row)
 
 
def demonstrate_stress_elbow():
    """Demonstrate stress elbow plot for choosing dimensionality."""
    np.random.seed(42)
    
    # Generate data with intrinsic 3D structure
    n = 150
    t = np.linspace(0, 2*np.pi, n)
    X_true = np.column_stack([
        (2 + np.cos(8*t)) * np.cos(t),
        (2 + np.cos(8*t)) * np.sin(t),
        np.sin(8*t)
    ])
    
    D = cdist(X_true, X_true, metric='euclidean')
    
    # Embed at different dimensions
    stresses = []
    for k in range(1, 8):
        mds = MDS(n_components=k, dissimilarity='precomputed',
                  n_init=4, max_iter=300, random_state=42)
        mds.fit(D)
        stresses.append(mds.stress_)
        print(f"k={k}: stress = {mds.stress_:.4f}")
    
    print("\nElbow appears around k=3 (intrinsic dimensionality)")
    
    return stresses
 
if __name__ == "__main__":
    print("Stress Elbow Analysis:")
    print("-" * 40)
    demonstrate_stress_elbow()
    print()
    print("Reference Table:")
    print("-" * 40)
    stress_vs_n_k_table()

The Optimization Landscape

Understanding the stress function's optimization landscape helps diagnose and fix MDS problems.

Non-convexity:

Stress is a non-convex function of the coordinates $X$. This means:

Multiple local minima may exist
Gradient descent can get stuck
Different initializations may give different results
Global optimum is not guaranteed

Invariances:

Stress has several invariances that create continuous families of equivalent solutions:

Translation: Adding a constant to all coordinates doesn't change distances $$X \mapsto X + \mathbf{c}$$
Rotation: Orthogonal transformations preserve distances $$X \mapsto XQ \text{ where } Q^TQ = I$$
Reflection: Flipping signs preserves distances $$X \mapsto X \cdot \text{diag}(\pm 1, \ldots, \pm 1)$$

These invariances mean the solution is unique only up to rigid transformations.

Centering Convention

Most MDS implementations center the solution (Σᵢ xᵢ = 0) to remove translational ambiguity. This is implicit in Classical MDS (double centering) and can be enforced in iterative methods via the pseudoinverse of the Laplacian.

Local minima in practice:

When are local minima problematic?

High symmetry: Highly symmetric data (regular polygons, grids) has many equivalent minima
Low dimensionality: Embedding in $k=1$ or $k=2$ is more constrained, increasing local minima
Non-Euclidean input: Irreconcilable distance constraints create a rugged landscape

When are local minima benign?

Euclidean input, sufficient dimensions: Usually a single basin of attraction
Good initialization: Classical MDS typically lands near the global minimum
Over-parameterized: Higher $k$ than necessary flattens the landscape

Diagnosing local minima:

Run multiple initializations: If results vary widely, local minima are an issue
Compare stress values: Large variation suggests a rugged landscape
Procrustes analysis: Check if different solutions are related by rotation or are fundamentally different

Strategies for Escaping Local Minima

•Multi-start optimization: Run 4-20 random initializations; keep the best (lowest stress)
•Classical MDS initialization: Usually provides a good starting point near global optimum
•Simulated annealing: Add decreasing noise to escape shallow local minima
•Genetic algorithms: Maintain population of solutions; crossover and mutation explore the landscape
•Increase dimensions: Embed in k+1 or k+2, then project down (sometimes reveals true structure)

The Shepard Diagram: Visualizing Fit

The Shepard diagram is the primary visual tool for assessing MDS quality. It plots original dissimilarities against embedded distances.

Construction:

X-axis: Target dissimilarity $\delta_{ij}$
Y-axis: Embedded distance $d_{ij}$
One point per pair $(i,j)$ where $i < j$

Ideal appearance:

For perfect Metric MDS: all points on the diagonal ($d = \delta$) For perfect Non-metric MDS: all points on a monotonically increasing curve

Interpreting deviations:

Shepard Diagram Patterns
Pattern	Appearance	Interpretation
Tight diagonal	Points clustered on y=x line	Excellent metric fit
Tight monotonic	Points on increasing curve	Good non-metric fit
Systematic bend	S-curve or other shape	Nonlinear scaling mismatch
Wide scatter	Points dispersed around trend	Poor fit; many distance distortions
Outliers	A few points far from trend	Some pairs badly distorted
Heteroscedasticity	Scatter increases with δ	Large distances less reliable

Enhanced Shepard Diagrams

For large n, Shepard diagrams become cluttered. Enhancements: (1) 2D histogram or heatmap instead of scatter, (2) Show only a random sample of pairs, (3) Add LOESS smooth line, (4) Color points by local density or another variable.

shepard_diagram.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist
from sklearn.isotonic import IsotonicRegression
 
def enhanced_shepard_diagram(D_target, X_embedded, sample_frac=1.0, 
                              metric=True, ax=None, random_state=None):
    """
    Create an enhanced Shepard diagram.
    
    Parameters:
    -----------
    D_target : ndarray (n, n)
        Target dissimilarity matrix
    X_embedded : ndarray (n, k)
        Embedding coordinates
    sample_frac : float
        Fraction of pairs to plot (for large n)
    metric : bool
        If True, show diagonal reference. If False, show isotonic fit.
    ax : matplotlib axis
        Optional axis to plot on
    random_state : int
        Random seed for sampling
    """
    rng = np.random.RandomState(random_state)
    n = X_embedded.shape[0]
    D_embedded = cdist(X_embedded, X_embedded, metric='euclidean')
    
    # Extract upper triangle
    triu = np.triu_indices(n, k=1)
    delta = D_target[triu]
    d = D_embedded[triu]
    
    # Sample if needed
    n_pairs = len(delta)
    if sample_frac < 1.0:
        n_sample = int(sample_frac * n_pairs)
        idx = rng.choice(n_pairs, size=n_sample, replace=False)
        delta_plot = delta[idx]
        d_plot = d[idx]
    else:
        delta_plot = delta
        d_plot = d
    
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 8))
    
    # Scatter plot
    ax.scatter(delta_plot, d_plot, alpha=0.3, s=5, c='steelblue')
    
    # Reference line(s)
    max_val = max(delta.max(), d.max())
    
    if metric:
        # Diagonal line for metric MDS
        ax.plot([0, max_val], [0, max_val], 'r--', linewidth=2, 
                label='Perfect fit (d = δ)')
    else:
        # Isotonic regression line for non-metric MDS
        order = np.argsort(delta)
        iso = IsotonicRegression()
        d_hat = iso.fit_transform(delta[order], d[order])
        ax.plot(delta[order], d_hat, 'r-', linewidth=2, 
                label='Isotonic fit (disparities)')
    
    # Compute stress
    if metric:
        stress = np.sqrt(np.sum((delta - d)**2) / np.sum(d**2))
    else:
        order = np.argsort(delta)
        iso = IsotonicRegression()
        d_hat = iso.fit_transform(delta[order], d[order])
        d_hat_full = np.zeros_like(d)
        d_hat_full[order] = d_hat
        stress = np.sqrt(np.sum((d - d_hat_full)**2) / np.sum(d**2))
    
    # Correlation
    corr = np.corrcoef(delta, d)[0, 1]
    
    ax.set_xlabel('Target Dissimilarity (δ)', fontsize=12)
    ax.set_ylabel('Embedded Distance (d)', fontsize=12)
    ax.set_title(f'Shepard Diagram (stress={stress:.4f}, r={corr:.4f})', fontsize=14)
    ax.legend()
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    
    return ax, stress, corr
 
 
def compare_shepard_diagrams():
    """Compare Shepard diagrams for good and poor embeddings."""
    np.random.seed(42)
    
    # Generate 3D data
    n = 100
    X_true = np.random.randn(n, 3)
    D = cdist(X_true, X_true, metric='euclidean')
    
    # Good embedding: 3D MDS
    from sklearn.manifold import MDS
    mds_3d = MDS(n_components=3, dissimilarity='precomputed', random_state=42)
    X_3d = mds_3d.fit_transform(D)
    
    # Poor embedding: 1D MDS
    mds_1d = MDS(n_components=1, dissimilarity='precomputed', random_state=42)
    X_1d = mds_1d.fit_transform(D)
    
    # Plot
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    enhanced_shepard_diagram(D, X_3d, metric=True, ax=axes[0])
    axes[0].set_title('3D Embedding (matches intrinsic dim)')
    
    enhanced_shepard_diagram(D, X_1d, metric=True, ax=axes[1])
    axes[1].set_title('1D Embedding (too few dimensions)')
    
    plt.tight_layout()
    plt.savefig('shepard_comparison.png', dpi=150)
    plt.close()
    
if __name__ == "__main__":
    compare_shepard_diagrams()

Stress in Practice: Decision Framework

Armed with understanding of stress functions, here's a practical framework for MDS analysis.

MDS Decision Workflow

•Choose stress formulation: Stress-1 for general use; Sammon for local emphasis; Non-metric for ordinal data
•Determine target k: Run stress elbow plot (k=1 to 6). Choose elbow point or smallest k with acceptable stress
•Run MDS with multi-start: Use n_init ≥ 4. Classical MDS initialization + random starts
•Evaluate stress: Compare to Kruskal's guidelines, but calibrate with simulation for your n and k
•Inspect Shepard diagram: Look for systematic patterns, outliers, or heteroscedasticity
•Test stability: Multiple runs with different seeds. Solutions should be similar (Procrustes aligned)
•Consider alternatives: If stress is unacceptable, try Non-metric MDS, different distance metric, or accept that data doesn't embed well

Red Flags in MDS Results

• Stress > 0.20 (for k=2): Embedding may be misleading • Wide variation across runs: Unstable solution • Shepard diagram with S-curve: Wrong stress type (metric vs non-metric) • Points collapsed together: Degenerate solution • k chosen to make stress 'look good': May be overfitting with too many dimensions

Reporting MDS results:

When publishing or presenting MDS results, include:

Stress value and type: e.g., "Stress-1 = 0.089"
Dimensionality: e.g., "2D embedding"
Number of iterations/restarts: e.g., "4 random starts, 300 max iterations"
Algorithm: e.g., "Non-metric MDS with SMACOF"
Shepard diagram: Always include for full transparency
Variance explained (Classical MDS): Proportion of eigenvalue sum

Comparing across studies:

Stress values are comparable only for:

Same $n$ (number of objects)
Same $k$ (dimensions)
Same stress formulation (Stress-1 vs Stress-2)
Similar data structure

When these differ, use relative comparisons (e.g., "20% lower stress than baseline").

Summary: Stress Functions

Stress functions are the mathematical core of MDS—they define what we mean by a 'good' embedding.

Key Takeaways

•Stress measures distance discrepancy: How well embedded distances match target dissimilarities
•Stress-1 (Kruskal's) is the standard: Normalizes by Σd²ᵢⱼ to prevent degenerate solutions
•Alternative formulations exist for special cases: SSTRESS, Sammon, robust variants
•Interpretation depends on context: n, k, and data structure all affect expected stress
•The optimization landscape is non-convex: Multiple local minima possible; use multi-start
•Shepard diagrams visualize fit: Essential diagnostic tool showing δ vs d
•Stress elbow plots guide dimensionality choice: Find the k where stress stops decreasing sharply

What's next:

With a solid understanding of MDS mechanics—Classical, Metric, Non-metric, and stress functions—the final page covers limitations and practical considerations. We'll discuss when MDS fails, computational bottlenecks, relationship to other methods, and best practices for real-world applications.

Page Complete

You now have deep knowledge of stress functions: their mathematical properties, the differences between variants, techniques for interpretation, and practical guidelines. Next, we examine MDS limitations and when to consider alternative approaches.

Stress Functions

Measuring Embedding Quality

At the heart of every MDS algorithm lies a deceptively simple question: How do we quantify how well an embedding preserves the original distances?

Algorithm selection: Different stress functions have different properties; choosing the right one affects results
Hyperparameter tuning: Target dimensionality, weights, and other choices all affect achievable stress
Result interpretation: What does a stress of 0.15 mean? Is it good or bad?
Debugging: High stress indicates either a problem with the algorithm or fundamental incompatibility between data and low-dimensional Euclidean space

This page provides a comprehensive treatment of stress functions: their mathematical foundations, the relationships between variants, and practical guidelines for interpretation.

What You Will Learn

Foundations of Stress

The fundamental MDS problem:

The stress function family:

All stress functions share a common form:

$$\sigma(X) = \mathcal{N}\left( \sum_{i<j} w_{ij} \cdot \mathcal{L}(\delta_{ij}, d_{ij}(X)) \right)$$

where:

$w_{ij}$ are weights (importance of pair $(i,j)$)
$\mathcal{L}(\delta, d)$ is a loss function measuring discrepancy
$\mathcal{N}$ is a normalization operator

Common loss functions:

Squared loss: $\mathcal{L}(\delta, d) = (\delta - d)^2$ (standard)
Absolute loss: $\mathcal{L}(\delta, d) = |\delta - d|$ (robust)
Squared difference on squares: $\mathcal{L}(\delta, d) = (\delta^2 - d^2)^2$ (SSTRESS)
Relative loss: $\mathcal{L}(\delta, d) = \frac{(\delta - d)^2}{\delta}$ (Sammon)

Why Squared Loss?

Properties of a good stress function:

Non-negativity: $\sigma(X) \geq 0$
Zero at perfect fit: $\sigma(X) = 0$ if and only if $d_{ij}(X) = \delta_{ij}$ for all $i,j$
Scale invariance (for comparison): Stress should be comparable across datasets
Smoothness: Enables gradient-based optimization
Resistance to degenerate solutions: Stress should not be trivially minimized

Degenerate solutions:

Kruskal's Stress Formulations

Joseph Kruskal's 1964 papers established the formulations that became standard in MDS practice.

Raw Stress:

$$\sigma_{raw}(X) = \sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2$$

This is the unnormalized sum of squared errors. It's sensitive to scale and not comparable across datasets.

Stress-1 (Kruskal's Stress):

$$\sigma_1(X) = \sqrt{\frac{\sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2}{\sum_{i<j} w_{ij} d_{ij}^2}}$$

Normalizes by the sum of squared embedded distances. This prevents degenerate solutions (collapsing points) because the denominator goes to zero as points collapse.

Stress-2:

$$\sigma_2(X) = \sqrt{\frac{\sum_{i<j} w_{ij}(\delta_{ij} - d_{ij})^2}{\sum_{i<j} w_{ij}(d_{ij} - \bar{d})^2}}$$

Normalizes by the variance of embedded distances. Penalizes solutions where all distances are nearly equal.

Comparison of Stress Normalization
Formulation	Normalizer	Degenerate-Proof	Scale Invariant
Raw stress	None	No	No
Stress-1	Σd²ᵢⱼ	Yes	Partially
Stress-2	Var(dᵢⱼ)	Yes	Yes
Normalized raw	Σδ²ᵢⱼ	No	Yes

Mathematical properties of Stress-1:

Range: $\sigma_1 \in [0, \infty)$. In practice, values rarely exceed 1.
Scale: If all $d_{ij} = c \cdot \delta_{ij}$ for some constant $c$, then $\sigma_1 = 0$. The embedding can freely scale.
Rotation/reflection invariance: Stress is unchanged by rigid transformations (rotation, reflection, translation).
Not translation invariant for raw stress: The denominator in Stress-1 depends on distances from origin, so implicitly includes translation.

Computing gradients:

For optimization, we need $\frac{\partial \sigma_1}{\partial \mathbf{x}_i}$. Using the chain rule:

$$\frac{\partial \sigma_1}{\partial \mathbf{x}_i} = \frac{1}{\sigma_1} \cdot \frac{\partial \sigma_1^2}{\partial \mathbf{x}_i}$$

where:

$$\frac{\partial d_{ij}}{\partial \mathbf{x}_i} = \frac{\mathbf{x}_i - \mathbf{x}j}{d{ij}}$$

Stress-1 Is Standard

Alternative Stress Formulations

Beyond Kruskal's formulations, several alternative stress functions serve specific purposes.

SSTRESS (Squared Stress on Squared Distances):

$$\text{SSTRESS} = \sum_{i<j} w_{ij}(\delta_{ij}^2 - d_{ij}^2)^2$$

Also written with normalization: $$\text{SSTRESS} = \sqrt{\frac{\sum_{i<j} (\delta_{ij}^2 - d_{ij}^2)^2}{\sum_{i<j} \delta_{ij}^4}}$$

Why use SSTRESS?

$d_{ij}^2 = \sum_s (x_{is} - x_{js})^2$ is polynomial in coordinates (easier gradients)
Avoids square root in distance computation
Connection to Young-Householder transformation

Sammon Mapping Stress:

$$E_{Sammon} = \frac{1}{\sum_{i<j}\delta_{ij}} \sum_{i<j} \frac{(\delta_{ij} - d_{ij})^2}{\delta_{ij}}$$

Properties:

Weights errors by $1/\delta_{ij}$: small distances matter more
Emphasizes local neighborhood preservation
Precursor to modern methods like t-SNE

Sammon Mapping Sensitivity

Robust Stress:

Replace squared loss with robust M-estimators:

$$\sigma_{robust} = \sum_{i<j} \rho(\delta_{ij} - d_{ij})$$

Common choices for $\rho$:

Huber loss: $$\rho_{Huber}(r) = \begin{cases} \frac{1}{2}r^2 & |r| \leq \delta \ \delta|r| - \frac{1}{2}\delta^2 & |r| > \delta \end{cases}$$
Tukey bisquare: $$\rho_{Tukey}(r) = \begin{cases} \frac{c^2}{6}\left[1 - \left(1 - \frac{r^2}{c^2}\right)^3\right] & |r| \leq c \ \frac{c^2}{6} & |r| > c \end{cases}$$

Robust stress reduces the influence of outlier distances, useful when some dissimilarities are unreliable.

Normalized Disparities Stress (for Non-metric MDS):

In non-metric MDS, disparities $\hat{d}_{ij}$ replace targets:

$$\sigma_{NM} = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j} d_{ij}^2}}$$

where $\hat{d}$ is the isotonic regression of $d$ on the ordering induced by $\delta$.

stress_variants.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
import numpy as np
from scipy.spatial.distance import cdist
 
def compute_all_stress_variants(D_target, X, weights=None):
    """
    Compute multiple stress formulations for an MDS embedding.
    
    Parameters:
    -----------
    D_target : ndarray (n, n)
        Target dissimilarity matrix
    X : ndarray (n, k)  
        Embedding coordinates
    weights : ndarray (n, n), optional
        Weight matrix
        
    Returns:
    --------
    dict : Dictionary of stress values by formulation
    """
    n = X.shape[0]
    D_embed = cdist(X, X, metric='euclidean')
    
    # Extract upper triangle
    triu = np.triu_indices(n, k=1)
    delta = D_target[triu]
    d = D_embed[triu]
    
    if weights is None:
        w = np.ones(len(delta))
    else:
        w = weights[triu]
    
    # Squared differences
    diff = delta - d
    diff_sq = diff ** 2
    weighted_diff_sq = w * diff_sq
    
    # Raw stress
    raw_stress = np.sum(weighted_diff_sq)
    
    # Stress-1 (Kruskal)
    stress1_num = np.sum(weighted_diff_sq)
    stress1_den = np.sum(w * d ** 2)
    stress1 = np.sqrt(stress1_num / stress1_den) if stress1_den > 0 else np.inf
    
    # Stress-2
    d_mean = np.average(d, weights=w)
    stress2_den = np.sum(w * (d - d_mean) ** 2)
    stress2 = np.sqrt(stress1_num / stress2_den) if stress2_den > 0 else np.inf
    
    # Normalized raw stress (by target distances)
    normalized_raw = np.sqrt(raw_stress / np.sum(w * delta ** 2))
    
    # SSTRESS (squared distances)
    delta_sq = delta ** 2
    d_sq = d ** 2
    sstress_num = np.sum(w * (delta_sq - d_sq) ** 2)
    sstress_den = np.sum(w * delta_sq ** 2)
    sstress = np.sqrt(sstress_num / sstress_den) if sstress_den > 0 else np.inf
    
    # Sammon stress
    with np.errstate(divide='ignore', invalid='ignore'):
        sammon_terms = (diff_sq / delta)
    sammon_terms = np.where(np.isfinite(sammon_terms), sammon_terms, 0)
    sammon = np.sum(w * sammon_terms) / np.sum(w * delta)
    
    # Correlation-based "stress" (actually a measure of fit)
    correlation = np.corrcoef(delta, d)[0, 1]
    
    return {
        'raw_stress': raw_stress,
        'stress1': stress1,
        'stress2': stress2,
        'normalized_raw': normalized_raw,
        'sstress': sstress,
        'sammon': sammon,
        'correlation': correlation
    }
 
 
def compare_stress_variants():
    """Compare stress variants on different data types."""
    np.random.seed(42)
    
    # Generate test data: 3D spiral
    n = 100
    t = np.linspace(0, 4*np.pi, n)
    X_3d = np.column_stack([
        t * np.cos(t),
        t * np.sin(t),
        t
    ])
    
    D = cdist(X_3d, X_3d, metric='euclidean')
    
    # 2D embedding via classical MDS
    D_sq = D ** 2
    row_means = D_sq.mean(axis=1, keepdims=True)
    col_means = D_sq.mean(axis=0, keepdims=True)
    grand_mean = D_sq.mean()
    B = -0.5 * (D_sq - row_means - col_means + grand_mean)
    
    eigenvalues, eigenvectors = np.linalg.eigh(B)
    idx = np.argsort(eigenvalues)[::-1][:2]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    eigenvalues = np.maximum(eigenvalues, 0)
    X_2d = eigenvectors * np.sqrt(eigenvalues)
    
    # Compute all stress variants
    stresses = compute_all_stress_variants(D, X_2d)
    
    print("Stress Variants for 3D→2D Spiral Embedding:")
    print("-" * 50)
    for name, value in stresses.items():
        print(f"{name:20s}: {value:.6f}")
    
    return stresses
 
 
if __name__ == "__main__":
    compare_stress_variants()

Interpreting Stress Values

One of the most common questions in MDS practice is: Is my stress value good enough?

Kruskal's original guidelines (1964):

Kruskal's Stress Interpretation
Stress (σ₁)	Goodness of Fit	Interpretation
< 0.025	Excellent	Near-perfect representation
0.025 – 0.05	Good	Reliable representation
0.05 – 0.10	Fair	Interpretable with caution
0.10 – 0.20	Poor	May be misleading
0.20	Bad	Not interpretable

Context Matters!

Factors affecting expected stress:

1. Number of points (n):

More points means more pairwise constraints to satisfy. Random data in $k$ dimensions has expected stress that increases with $n$. Rule of thumb: stress increases roughly as $\sqrt{n}$ for fixed $k$.

2. Target dimensionality (k):

Higher $k$ allows better fit. Stress decreases as $k$ increases. The 'elbow' in stress vs $k$ indicates intrinsic dimensionality.

3. Data structure:

Data on a $k$-dimensional manifold: should achieve low stress in $k$ dimensions
Noisy Euclidean data: moderate stress due to noise
Non-Euclidean dissimilarities: fundamentally cannot be embedded; expect higher stress

4. Measurement noise:

Even if data lies in $k$ dimensions, measurement error adds irreducible stress.

Simulation-based calibration:

The gold standard for interpreting stress is comparison to:

Random permutation: Permute dissimilarity matrix randomly. Actual stress should be much lower.
Monte Carlo simulation: Generate random data of same $n$ and $k$; compute expected stress distribution.

stress_interpretation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.manifold import MDS
 
def expected_stress_simulation(n, k, n_simulations=100, random_state=None):
    """
    Estimate expected stress for random data via Monte Carlo simulation.
    
    Parameters:
    -----------
    n : int
        Number of points
    k : int
        Target dimensionality
    n_simulations : int
        Number of random datasets to generate
    random_state : int
        Random seed
        
    Returns:
    --------
    mean_stress : float
        Average stress across simulations
    std_stress : float
        Standard deviation of stress
    stress_values : list
        All stress values
    """
    rng = np.random.RandomState(random_state)
    stress_values = []
    
    for i in range(n_simulations):
        # Generate random Euclidean data in k dimensions
        X_true = rng.randn(n, k)
        D = cdist(X_true, X_true, metric='euclidean')
        
        # Apply MDS
        mds = MDS(n_components=k, dissimilarity='precomputed', 
                  n_init=1, max_iter=100, random_state=rng.randint(100000))
        mds.fit(D)
        stress_values.append(mds.stress_)
    
    return np.mean(stress_values), np.std(stress_values), stress_values
 
 
def permutation_test_stress(D_target, X_embedded, n_permutations=100, random_state=None):
    """
    Permutation test: compare actual stress to stress from randomized data.
    
    Parameters:
    -----------
    D_target : ndarray (n, n)
        Original dissimilarity matrix
    X_embedded : ndarray (n, k)
        MDS embedding to evaluate
    n_permutations : int
        Number of random permutations
    random_state : int
        Random seed
        
    Returns:
    --------
    actual_stress : float
        Stress of actual embedding
    permutation_stresses : list
        Stresses for permuted data
    p_value : float
        Probability of observing this stress or lower by chance
    """
    rng = np.random.RandomState(random_state)
    n = D_target.shape[0]
    k = X_embedded.shape[1]
    
    # Compute actual stress
    D_embedded = cdist(X_embedded, X_embedded, metric='euclidean')
    triu = np.triu_indices(n, k=1)
    delta = D_target[triu]
    d = D_embedded[triu]
    actual_stress = np.sqrt(np.sum((delta - d)**2) / np.sum(d**2))
    
    # Permutation stresses
    perm_stresses = []
    for _ in range(n_permutations):
        # Permute rows and columns of dissimilarity matrix
        perm = rng.permutation(n)
        D_perm = D_target[perm][:, perm]
        
        # Fit MDS to permuted data
        mds = MDS(n_components=k, dissimilarity='precomputed',
                  n_init=1, max_iter=100, random_state=rng.randint(100000))
        mds.fit(D_perm)
        perm_stresses.append(mds.stress_)
    
    # P-value: fraction of permutations with stress ≤ actual
    p_value = np.mean([s <= actual_stress for s in perm_stresses])
    
    return actual_stress, perm_stresses, p_value
 
 
def stress_vs_n_k_table():
    """Generate reference table of expected stress values."""
    print("Expected Stress (σ₁) for Random Euclidean Data")
    print("=" * 55)
    print(f"{'n':>6} | {'k=1':>8} {'k=2':>8} {'k=3':>8} {'k=4':>8}")
    print("-" * 55)
    
    for n in [10, 20, 50, 100, 200]:
        row = f"{n:>6} |"
        for k in [1, 2, 3, 4]:
            if k >= n:
                row += f" {'N/A':>8}"
            else:
                mean_s, std_s, _ = expected_stress_simulation(
                    n, k, n_simulations=20, random_state=42
                )
                row += f" {mean_s:>8.4f}"
        print(row)
 
 
def demonstrate_stress_elbow():
    """Demonstrate stress elbow plot for choosing dimensionality."""
    np.random.seed(42)
    
    # Generate data with intrinsic 3D structure
    n = 150
    t = np.linspace(0, 2*np.pi, n)
    X_true = np.column_stack([
        (2 + np.cos(8*t)) * np.cos(t),
        (2 + np.cos(8*t)) * np.sin(t),
        np.sin(8*t)
    ])
    
    D = cdist(X_true, X_true, metric='euclidean')
    
    # Embed at different dimensions
    stresses = []
    for k in range(1, 8):
        mds = MDS(n_components=k, dissimilarity='precomputed',
                  n_init=4, max_iter=300, random_state=42)
        mds.fit(D)
        stresses.append(mds.stress_)
        print(f"k={k}: stress = {mds.stress_:.4f}")
    
    print("\nElbow appears around k=3 (intrinsic dimensionality)")
    
    return stresses
 
if __name__ == "__main__":
    print("Stress Elbow Analysis:")
    print("-" * 40)
    demonstrate_stress_elbow()
    print()
    print("Reference Table:")
    print("-" * 40)
    stress_vs_n_k_table()

The Optimization Landscape

Understanding the stress function's optimization landscape helps diagnose and fix MDS problems.

Non-convexity:

Stress is a non-convex function of the coordinates $X$. This means:

Multiple local minima may exist
Gradient descent can get stuck
Different initializations may give different results
Global optimum is not guaranteed

Invariances:

Stress has several invariances that create continuous families of equivalent solutions:

Translation: Adding a constant to all coordinates doesn't change distances $$X \mapsto X + \mathbf{c}$$
Rotation: Orthogonal transformations preserve distances $$X \mapsto XQ \text{ where } Q^TQ = I$$
Reflection: Flipping signs preserves distances $$X \mapsto X \cdot \text{diag}(\pm 1, \ldots, \pm 1)$$

These invariances mean the solution is unique only up to rigid transformations.

Centering Convention

Local minima in practice:

When are local minima problematic?

High symmetry: Highly symmetric data (regular polygons, grids) has many equivalent minima
Low dimensionality: Embedding in $k=1$ or $k=2$ is more constrained, increasing local minima
Non-Euclidean input: Irreconcilable distance constraints create a rugged landscape

When are local minima benign?

Euclidean input, sufficient dimensions: Usually a single basin of attraction
Good initialization: Classical MDS typically lands near the global minimum
Over-parameterized: Higher $k$ than necessary flattens the landscape

Diagnosing local minima:

Run multiple initializations: If results vary widely, local minima are an issue
Compare stress values: Large variation suggests a rugged landscape
Procrustes analysis: Check if different solutions are related by rotation or are fundamentally different

Strategies for Escaping Local Minima

•Multi-start optimization: Run 4-20 random initializations; keep the best (lowest stress)
•Classical MDS initialization: Usually provides a good starting point near global optimum
•Simulated annealing: Add decreasing noise to escape shallow local minima
•Genetic algorithms: Maintain population of solutions; crossover and mutation explore the landscape
•Increase dimensions: Embed in k+1 or k+2, then project down (sometimes reveals true structure)

The Shepard Diagram: Visualizing Fit

The Shepard diagram is the primary visual tool for assessing MDS quality. It plots original dissimilarities against embedded distances.

Construction:

X-axis: Target dissimilarity $\delta_{ij}$
Y-axis: Embedded distance $d_{ij}$
One point per pair $(i,j)$ where $i < j$

Ideal appearance:

For perfect Metric MDS: all points on the diagonal ($d = \delta$) For perfect Non-metric MDS: all points on a monotonically increasing curve

Interpreting deviations:

Shepard Diagram Patterns
Pattern	Appearance	Interpretation
Tight diagonal	Points clustered on y=x line	Excellent metric fit
Tight monotonic	Points on increasing curve	Good non-metric fit
Systematic bend	S-curve or other shape	Nonlinear scaling mismatch
Wide scatter	Points dispersed around trend	Poor fit; many distance distortions
Outliers	A few points far from trend	Some pairs badly distorted
Heteroscedasticity	Scatter increases with δ	Large distances less reliable

Enhanced Shepard Diagrams

shepard_diagram.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist
from sklearn.isotonic import IsotonicRegression
 
def enhanced_shepard_diagram(D_target, X_embedded, sample_frac=1.0, 
                              metric=True, ax=None, random_state=None):
    """
    Create an enhanced Shepard diagram.
    
    Parameters:
    -----------
    D_target : ndarray (n, n)
        Target dissimilarity matrix
    X_embedded : ndarray (n, k)
        Embedding coordinates
    sample_frac : float
        Fraction of pairs to plot (for large n)
    metric : bool
        If True, show diagonal reference. If False, show isotonic fit.
    ax : matplotlib axis
        Optional axis to plot on
    random_state : int
        Random seed for sampling
    """
    rng = np.random.RandomState(random_state)
    n = X_embedded.shape[0]
    D_embedded = cdist(X_embedded, X_embedded, metric='euclidean')
    
    # Extract upper triangle
    triu = np.triu_indices(n, k=1)
    delta = D_target[triu]
    d = D_embedded[triu]
    
    # Sample if needed
    n_pairs = len(delta)
    if sample_frac < 1.0:
        n_sample = int(sample_frac * n_pairs)
        idx = rng.choice(n_pairs, size=n_sample, replace=False)
        delta_plot = delta[idx]
        d_plot = d[idx]
    else:
        delta_plot = delta
        d_plot = d
    
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 8))
    
    # Scatter plot
    ax.scatter(delta_plot, d_plot, alpha=0.3, s=5, c='steelblue')
    
    # Reference line(s)
    max_val = max(delta.max(), d.max())
    
    if metric:
        # Diagonal line for metric MDS
        ax.plot([0, max_val], [0, max_val], 'r--', linewidth=2, 
                label='Perfect fit (d = δ)')
    else:
        # Isotonic regression line for non-metric MDS
        order = np.argsort(delta)
        iso = IsotonicRegression()
        d_hat = iso.fit_transform(delta[order], d[order])
        ax.plot(delta[order], d_hat, 'r-', linewidth=2, 
                label='Isotonic fit (disparities)')
    
    # Compute stress
    if metric:
        stress = np.sqrt(np.sum((delta - d)**2) / np.sum(d**2))
    else:
        order = np.argsort(delta)
        iso = IsotonicRegression()
        d_hat = iso.fit_transform(delta[order], d[order])
        d_hat_full = np.zeros_like(d)
        d_hat_full[order] = d_hat
        stress = np.sqrt(np.sum((d - d_hat_full)**2) / np.sum(d**2))
    
    # Correlation
    corr = np.corrcoef(delta, d)[0, 1]
    
    ax.set_xlabel('Target Dissimilarity (δ)', fontsize=12)
    ax.set_ylabel('Embedded Distance (d)', fontsize=12)
    ax.set_title(f'Shepard Diagram (stress={stress:.4f}, r={corr:.4f})', fontsize=14)
    ax.legend()
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)
    
    return ax, stress, corr
 
 
def compare_shepard_diagrams():
    """Compare Shepard diagrams for good and poor embeddings."""
    np.random.seed(42)
    
    # Generate 3D data
    n = 100
    X_true = np.random.randn(n, 3)
    D = cdist(X_true, X_true, metric='euclidean')
    
    # Good embedding: 3D MDS
    from sklearn.manifold import MDS
    mds_3d = MDS(n_components=3, dissimilarity='precomputed', random_state=42)
    X_3d = mds_3d.fit_transform(D)
    
    # Poor embedding: 1D MDS
    mds_1d = MDS(n_components=1, dissimilarity='precomputed', random_state=42)
    X_1d = mds_1d.fit_transform(D)
    
    # Plot
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    enhanced_shepard_diagram(D, X_3d, metric=True, ax=axes[0])
    axes[0].set_title('3D Embedding (matches intrinsic dim)')
    
    enhanced_shepard_diagram(D, X_1d, metric=True, ax=axes[1])
    axes[1].set_title('1D Embedding (too few dimensions)')
    
    plt.tight_layout()
    plt.savefig('shepard_comparison.png', dpi=150)
    plt.close()
    
if __name__ == "__main__":
    compare_shepard_diagrams()

Stress in Practice: Decision Framework

Armed with understanding of stress functions, here's a practical framework for MDS analysis.

MDS Decision Workflow

•Choose stress formulation: Stress-1 for general use; Sammon for local emphasis; Non-metric for ordinal data
•Determine target k: Run stress elbow plot (k=1 to 6). Choose elbow point or smallest k with acceptable stress
•Run MDS with multi-start: Use n_init ≥ 4. Classical MDS initialization + random starts
•Evaluate stress: Compare to Kruskal's guidelines, but calibrate with simulation for your n and k
•Inspect Shepard diagram: Look for systematic patterns, outliers, or heteroscedasticity
•Test stability: Multiple runs with different seeds. Solutions should be similar (Procrustes aligned)
•Consider alternatives: If stress is unacceptable, try Non-metric MDS, different distance metric, or accept that data doesn't embed well

Red Flags in MDS Results

Reporting MDS results:

When publishing or presenting MDS results, include:

Stress value and type: e.g., "Stress-1 = 0.089"
Dimensionality: e.g., "2D embedding"
Number of iterations/restarts: e.g., "4 random starts, 300 max iterations"
Algorithm: e.g., "Non-metric MDS with SMACOF"
Shepard diagram: Always include for full transparency
Variance explained (Classical MDS): Proportion of eigenvalue sum

Comparing across studies:

Stress values are comparable only for:

Same $n$ (number of objects)
Same $k$ (dimensions)
Same stress formulation (Stress-1 vs Stress-2)
Similar data structure

When these differ, use relative comparisons (e.g., "20% lower stress than baseline").

Summary: Stress Functions

Stress functions are the mathematical core of MDS—they define what we mean by a 'good' embedding.

Key Takeaways

•Stress measures distance discrepancy: How well embedded distances match target dissimilarities
•Stress-1 (Kruskal's) is the standard: Normalizes by Σd²ᵢⱼ to prevent degenerate solutions
•Alternative formulations exist for special cases: SSTRESS, Sammon, robust variants
•Interpretation depends on context: n, k, and data structure all affect expected stress
•The optimization landscape is non-convex: Multiple local minima possible; use multi-start
•Shepard diagrams visualize fit: Essential diagnostic tool showing δ vs d
•Stress elbow plots guide dimensionality choice: Find the k where stress stops decreasing sharply

What's next:

Page Complete