Multidimensional Scaling - Learning Module

Loading content...

0/245

Non-Metric Multidimensional Scaling

When Ranks Matter More Than Numbers

Consider a wine tasting experiment where experts rate the similarity between pairs of wines. Expert A might use a 1–10 scale whereas Expert B uses 1–100. One expert might be generous, another harsh. The exact numerical values are arbitrary—what matters is the relative ordering: if wine pair (A,B) is rated more similar than (C,D), that ranking should be preserved regardless of the exact numbers.

Non-metric MDS addresses this by preserving only the monotonic ordering of dissimilarities rather than their exact values. If $\delta_{ij} < \delta_{kl}$ in the input, then the embedded distances should satisfy $d_{ij} < d_{kl}$. This makes Non-metric MDS robust to:

Arbitrary measurement scales
Nonlinear response functions (e.g., Weber-Fechner law in perception)
Outliers in dissimilarity judgments
Data that is truly ordinal (rankings, not measurements)

Developed by Shepard (1962) and Kruskal (1964), Non-metric MDS became the dominant approach in psychology, marketing research, and anywhere subjective similarity judgments are collected.

What You Will Learn

By the end of this page, you will understand the mathematical formulation of Non-metric MDS, master the isotonic regression step that enables rank preservation, learn the alternating optimization algorithm, and recognize when to choose Non-metric over Metric MDS.

The Ordinal Data Assumption

Levels of measurement:

In statistics, we distinguish different levels of measurement:

Nominal: Categories with no order (e.g., colors, country names)
Ordinal: Ordered categories but unknown intervals (e.g., rankings, Likert scales)
Interval: Ordered with meaningful differences but no true zero (e.g., temperature in Celsius)
Ratio: Ordered with meaningful differences and true zero (e.g., distance, weight)

Metric MDS assumes interval or ratio data: The difference between dissimilarity 2 and 4 is the same as between 6 and 8.

Non-metric MDS assumes ordinal data: Only the order matters. Dissimilarity 4 being greater than 2 is all we know—we don't assume 4-2 = 8-6.

Why is ordinal data common?

In many domains, especially those involving human judgment:

Perception is nonlinearly related to physical stimuli (Weber-Fechner, Stevens' power law)
Survey responses use arbitrary scales (1–5 vs 1–100)
Preference data only indicates 'more than' or 'less than'
Rank orderings (1st, 2nd, 3rd) contain no interval information

Mental Model

Think of Non-metric MDS as finding an embedding where: 'If the input says objects A and B are MORE similar than C and D, then the embedding should place A and B CLOSER together than C and D.' The exact distances don't matter—only their relative ordering.

When to Use Metric vs Non-metric MDS
Use Metric MDS When...	Use Non-metric MDS When...
Dissimilarities are precise measurements	Dissimilarities are subjective judgments
Scale is meaningful (e.g., miles, seconds)	Scale is arbitrary (e.g., Likert ratings)
You trust the numeric values	You only trust the ordering
Data is interval or ratio scale	Data is ordinal
Distances come from features (Euclidean)	Dissimilarities come from surveys, ratings
You want to match exact distances	You want to match distance ordering

The Non-metric MDS Objective

The key insight of Non-metric MDS is to introduce an intermediate set of disparities $\hat{d}_{ij}$ that can be freely adjusted as long as they preserve the rank ordering of the original dissimilarities.

Setup:

$\delta_{ij}$: Input dissimilarities (ordinal data)
$d_{ij}(X)$: Euclidean distances in the embedding
$\hat{d}{ij}$: Disparities — auxiliary variables that must be monotonically related to $\delta{ij}$

The monotonicity constraint:

Disparities must satisfy: $$\delta_{ij} < \delta_{kl} \implies \hat{d}{ij} \leq \hat{d}{kl}$$

In other words, disparities are a monotone transformation of dissimilarities.

The Non-metric stress (Kruskal's Stress-1):

$$\sigma(X) = \sqrt{\frac{\sum_{i<j}(d_{ij}(X) - \hat{d}{ij})^2}{\sum{i<j} d_{ij}(X)^2}}$$

This measures how well the embedded distances match the disparities. The disparities themselves are chosen to be as close to the embedded distances as possible while respecting the monotonicity constraint.

Two Variables to Optimize

Non-metric MDS optimizes over two sets of variables: (1) the embedding coordinates X, and (2) the disparities d̂. The coordinates control the embedded distances, while the disparities provide the 'targets' that embedded distances should match. The disparities act as a flexible proxy for the original dissimilarities.

Why introduce disparities?

Consider dissimilarities $\delta = [1, 5, 100]$. Metric MDS would try to make embedded distances match 1, 5, and 100—respecting the huge gap between 5 and 100.

But if we only trust the ordering (1 < 5 < 100), then disparities $\hat{d} = [1, 2, 3]$ are equally valid, as are $\hat{d} = [10, 20, 21]$. Non-metric MDS finds the disparities that:

Preserve the ordering: $\hat{d}_1 \leq \hat{d}_2 \leq \hat{d}_3$
Best match the achievable embedded distances

This decouples the scale of the input from the structure of the output.

The optimization problem:

$$\min_{X, \hat{d}} \sum_{i<j}(d_{ij}(X) - \hat{d}{ij})^2$$ $$\text{subject to: } \delta{ij} < \delta_{kl} \implies \hat{d}{ij} \leq \hat{d}{kl}$$

This is a constrained optimization problem solved by alternating between:

Fix $\hat{d}$, optimize $X$: Standard MDS (e.g., SMACOF)
Fix $X$, optimize $\hat{d}$: Isotonic regression

Isotonic Regression: Finding Optimal Disparities

Given fixed embedding coordinates $X$, finding the optimal disparities is an isotonic regression problem. Isotonic regression finds the best monotonic fit to data.

Problem statement:

Given embedded distances $d_{ij}$ and the ordering from dissimilarities $\delta_{ij}$, find disparities $\hat{d}_{ij}$ that:

$$\min_{\hat{d}} \sum_{i<j}(d_{ij} - \hat{d}{ij})^2$$ $$\text{s.t. } \hat{d}{ij} \leq \hat{d}{kl} \text{ whenever } \delta{ij} \leq \delta_{kl}$$

Interpretation: Find the monotonically increasing function of $\delta$ that best predicts $d$.

The PAVA (Pool Adjacent Violators Algorithm):

Sort all pairs $(i,j)$ by their dissimilarity $\delta_{ij}$
Initialize $\hat{d}{ij} = d{ij}$ (embedded distances)
Scan through pairs in order. If a pair violates monotonicity (smaller $\delta$ with larger $\hat{d}$), pool adjacent violating pairs and replace with their average.
Repeat until no violations remain.

PAVA runs in $O(m \log m)$ time where $m = n(n-1)/2$ is the number of pairs.

PAVA Intuition

Imagine plotting d vs δ. Draw a horizontal line at each d value. PAVA finds the best 'staircase' function (weakly increasing) that minimizes squared distance to the observed points. When points violate the order, they're averaged and 'pooled' into a single step.

isotonic_regression.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
import numpy as np
 
def pava_isotonic_regression(y, weights=None):
    """
    Pool Adjacent Violators Algorithm (PAVA) for isotonic regression.
    
    Finds the best monotonically non-decreasing fit to y.
    
    Parameters:
    -----------
    y : array-like
        Values to fit (assumed already sorted by the ordering variable)
    weights : array-like, optional
        Weights for each observation
        
    Returns:
    --------
    y_iso : ndarray
        Isotonic fit (non-decreasing)
    """
    y = np.asarray(y, dtype=float).copy()
    n = len(y)
    
    if weights is None:
        weights = np.ones(n)
    else:
        weights = np.asarray(weights, dtype=float)
    
    # PAVA: Maintains weighted sums and counts for each block
    # Work with blocks of consecutive elements
    
    # Block data: each block has (sum, weight, start_idx, end_idx)
    # Initially each element is its own block
    blocks = list(range(n))
    block_values = y.copy()
    block_weights = weights.copy()
    
    i = 0
    while i < len(blocks) - 1:
        # Check monotonicity with next block
        if block_values[i] > block_values[i + 1]:
            # Violation! Pool blocks i and i+1
            # New value is weighted average
            combined_weight = block_weights[i] + block_weights[i + 1]
            combined_value = (
                block_values[i] * block_weights[i] + 
                block_values[i + 1] * block_weights[i + 1]
            ) / combined_weight
            
            # Update block i
            block_values[i] = combined_value
            block_weights[i] = combined_weight
            
            # Remove block i+1
            block_values = np.delete(block_values, i + 1)
            block_weights = np.delete(block_weights, i + 1)
            blocks.pop(i + 1)
            
            # Check backwards for new violations
            if i > 0:
                i -= 1
        else:
            i += 1
    
    # Reconstruct full solution (expand blocks back to original indices)
    y_iso = np.zeros(n)
    block_idx = 0
    current_pos = 0
    for b in range(len(blocks)):
        if b < len(blocks) - 1:
            next_pos = blocks[b + 1] if b + 1 < len(blocks) else n
            block_size = next_pos - blocks[b]
        else:
            block_size = n - blocks[b]
        
        # Actually we need to track which original indices are in each block
        # Simplified: rebuild from scratch
        pass
    
    # Simpler implementation using scipy
    from scipy.interpolate import interp1d
    
    # Just return the final block values expanded
    # For production, use sklearn.isotonic.IsotonicRegression
    return None  # Placeholder
 
 
def isotonic_regression_simple(x, y, weights=None):
    """
    Simple isotonic regression implementation.
    
    Parameters:
    -----------
    x : array-like
        Ordering variable (will be sorted)
    y : array-like  
        Response variable
    weights : array-like, optional
        Weights
        
    Returns:
    --------
    y_iso : ndarray
        Isotonic fit corresponding to sorted x
    order : ndarray
        Sort order used
    """
    x = np.asarray(x)
    y = np.asarray(y)
    
    # Sort by x
    order = np.argsort(x)
    y_sorted = y[order]
    
    if weights is None:
        weights_sorted = np.ones_like(y_sorted)
    else:
        weights_sorted = np.asarray(weights)[order]
    
    n = len(y_sorted)
    y_iso = y_sorted.copy()
    
    # PAVA with block tracking
    block_starts = list(range(n))
    block_sums = y_sorted * weights_sorted
    block_weights = weights_sorted.copy()
    
    pos = 0
    while pos < len(block_starts) - 1:
        # Current and next block means
        curr_mean = block_sums[pos] / block_weights[pos]
        next_mean = block_sums[pos + 1] / block_weights[pos + 1]
        
        if curr_mean > next_mean:
            # Pool blocks
            block_sums[pos] += block_sums[pos + 1]
            block_weights[pos] += block_weights[pos + 1]
            
            # Remove next block
            block_sums = np.delete(block_sums, pos + 1)
            block_weights = np.delete(block_weights, pos + 1)
            block_starts.pop(pos + 1)
            
            # Check backward
            if pos > 0:
                pos -= 1
        else:
            pos += 1
    
    # Expand blocks to get final values
    block_means = block_sums / block_weights
    y_iso = np.zeros(n)
    
    for b in range(len(block_starts)):
        start = block_starts[b]
        end = block_starts[b + 1] if b + 1 < len(block_starts) else n
        y_iso[start:end] = block_means[b]
    
    return y_iso, order
 
 
def demo_isotonic_regression():
    """Demonstrate isotonic regression on simple data."""
    import matplotlib.pyplot as plt
    
    np.random.seed(42)
    
    # True monotonic function with noise
    n = 50
    x = np.sort(np.random.rand(n))
    y_true = 2 * x + 1  # Linear increasing
    y_noisy = y_true + np.random.randn(n) * 0.5
    
    # Apply isotonic regression
    y_iso, order = isotonic_regression_simple(x, y_noisy)
    
    # Also use sklearn for comparison
    from sklearn.isotonic import IsotonicRegression
    iso_sklearn = IsotonicRegression()
    y_iso_sklearn = iso_sklearn.fit_transform(x, y_noisy)
    
    # Plot
    plt.figure(figsize=(10, 5))
    plt.scatter(x, y_noisy, label='Noisy data', alpha=0.7)
    plt.plot(x, y_true, 'g--', label='True function', linewidth=2)
    plt.step(x, y_iso, 'r-', where='post', label='PAVA isotonic', linewidth=2)
    plt.plot(x, y_iso_sklearn, 'b:', label='sklearn isotonic', linewidth=2)
    plt.xlabel('x (ordering variable)')
    plt.ylabel('y')
    plt.legend()
    plt.title('Isotonic Regression: Fitting a Monotonic Function')
    plt.savefig('isotonic_demo.png', dpi=150)
    plt.close()
    
    print("Isotonic regression demonstration complete.")
    
if __name__ == "__main__":
    demo_isotonic_regression()

Handling ties in dissimilarities:

When $\delta_{ij} = \delta_{kl}$, we have two options:

Primary approach: Tied dissimilarities can have any relative ordering: $\hat{d}{ij} \leq \hat{d}{kl}$ or $\hat{d}{ij} \geq \hat{d}{kl}$. This gives more flexibility.
Secondary approach: Tied dissimilarities must have equal disparities: $\delta_{ij} = \delta_{kl} \implies \hat{d}{ij} = \hat{d}{kl}$. This is more restrictive.

Most implementations use the primary (weak monotonicity) approach.

Shepard Diagram for Non-metric MDS:

For non-metric MDS, the Shepard diagram plots $\delta_{ij}$ vs $d_{ij}(X)$ for all pairs. A good non-metric fit shows:

A monotonically increasing trend (not necessarily linear)
Tight scatter around the isotonic regression line

The Non-metric MDS Algorithm

Non-metric MDS uses an alternating optimization approach, iterating between:

Isotonic regression step: Fix $X$, update disparities $\hat{d}$ to minimize stress subject to monotonicity
Gradient descent / SMACOF step: Fix $\hat{d}$, update $X$ to minimize stress

Complete algorithm:

Input: Dissimilarities δ, target dimension k, convergence tolerance ε

1. Initialize X (e.g., Classical MDS or random)
2. Repeat:
   a. Compute embedded distances d_ij(X)
   b. Run isotonic regression: δ → d̂ (monotonic fit to d)
   c. Update X using SMACOF with d̂ as targets
   d. Compute stress σ
   e. If stress improvement < ε, stop
3. Return X

Convergence properties:

Each step is guaranteed to decrease (or not increase) stress
Global convergence is not guaranteed; depends on initialization
In practice, converges well with Classical MDS initialization

Degenerate Solutions

Non-metric MDS can produce degenerate solutions where all points collapse to a single location (stress = 0 trivially). The normalization in Stress-1 (dividing by sum of d²_ij) prevents this: if points collapse, the denominator goes to zero, making stress finite.

nonmetric_mds.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
import numpy as np
from scipy.spatial.distance import cdist
from sklearn.isotonic import IsotonicRegression
 
def nonmetric_stress(D_target, X):
    """
    Compute Kruskal's Stress-1 for non-metric MDS.
    Uses isotonic regression to find optimal disparities.
    """
    n = X.shape[0]
    D_embedded = cdist(X, X, metric='euclidean')
    
    # Extract upper triangle
    triu_idx = np.triu_indices(n, k=1)
    delta = D_target[triu_idx]  # Original dissimilarities
    d = D_embedded[triu_idx]     # Embedded distances
    
    # Isotonic regression: find disparities
    order = np.argsort(delta)
    delta_sorted = delta[order]
    d_sorted = d[order]
    
    iso = IsotonicRegression(out_of_bounds='clip')
    d_hat_sorted = iso.fit_transform(delta_sorted, d_sorted)
    
    # Map back to original order
    d_hat = np.zeros_like(d)
    d_hat[order] = d_hat_sorted
    
    # Stress-1 formula
    numerator = np.sum((d - d_hat) ** 2)
    denominator = np.sum(d ** 2)
    
    if denominator == 0:
        return np.inf
    
    stress = np.sqrt(numerator / denominator)
    
    return stress
 
 
def nonmetric_mds(D_target, n_components=2, init=None, max_iter=300, 
                  eps=1e-6, random_state=None, verbose=False):
    """
    Non-metric MDS using alternating isotonic regression and SMACOF.
    
    Parameters:
    -----------
    D_target : ndarray of shape (n, n)
        Dissimilarity matrix
    n_components : int
        Target dimensionality
    init : ndarray, optional
        Initial configuration
    max_iter : int
        Maximum iterations
    eps : float
        Convergence tolerance on stress change
    random_state : int
        Random seed
    verbose : bool
        Print progress
        
    Returns:
    --------
    X : ndarray of shape (n, n_components)
        Final embedding
    stress : float
        Final stress value
    n_iter : int
        Number of iterations
    """
    rng = np.random.RandomState(random_state)
    n = D_target.shape[0]
    k = n_components
    
    # Initialize (use Classical MDS)
    if init is None:
        D_sq = D_target ** 2
        row_means = D_sq.mean(axis=1, keepdims=True)
        col_means = D_sq.mean(axis=0, keepdims=True)
        grand_mean = D_sq.mean()
        B = -0.5 * (D_sq - row_means - col_means + grand_mean)
        
        eigenvalues, eigenvectors = np.linalg.eigh(B)
        idx = np.argsort(eigenvalues)[::-1][:k]
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
        eigenvalues = np.maximum(eigenvalues, 0)
        X = eigenvectors * np.sqrt(eigenvalues)
    else:
        X = init.copy()
    
    # Precompute sorted order of dissimilarities
    triu_idx = np.triu_indices(n, k=1)
    delta = D_target[triu_idx]
    delta_order = np.argsort(delta)
    
    # Initialize isotonic regressor
    iso = IsotonicRegression(out_of_bounds='clip')
    
    # Weighted Laplacian for SMACOF (uniform weights)
    weights = np.ones((n, n))
    np.fill_diagonal(weights, 0)
    V = np.diag(weights.sum(axis=1)) - weights
    V_pinv = np.linalg.pinv(V)
    
    stress_prev = np.inf
    
    for iteration in range(max_iter):
        # Step 1: Compute embedded distances
        D_embedded = cdist(X, X, metric='euclidean')
        d = D_embedded[triu_idx]
        
        # Step 2: Isotonic regression to get disparities
        d_sorted = d[delta_order]
        d_hat_sorted = iso.fit_transform(np.arange(len(d_sorted)), d_sorted)
        
        # Map disparities back to matrix form
        d_hat = np.zeros_like(d)
        d_hat[delta_order] = d_hat_sorted
        
        D_hat = np.zeros((n, n))
        D_hat[triu_idx] = d_hat
        D_hat = D_hat + D_hat.T  # Symmetrize
        
        # Step 3: SMACOF update with disparities as targets
        with np.errstate(divide='ignore', invalid='ignore'):
            B = -weights * D_hat / D_embedded
        B[~np.isfinite(B)] = 0
        np.fill_diagonal(B, 0)
        np.fill_diagonal(B, -B.sum(axis=1))
        
        X_new = V_pinv @ B @ X
        
        # Step 4: Compute stress
        D_new = cdist(X_new, X_new, metric='euclidean')
        d_new = D_new[triu_idx]
        
        numerator = np.sum((d_new - d_hat) ** 2)
        denominator = np.sum(d_new ** 2)
        stress = np.sqrt(numerator / denominator) if denominator > 0 else np.inf
        
        if verbose and iteration % 10 == 0:
            print(f"Iteration {iteration}: stress = {stress:.6f}")
        
        # Check convergence
        if abs(stress_prev - stress) < eps:
            if verbose:
                print(f"Converged at iteration {iteration}")
            X = X_new
            break
        
        X = X_new
        stress_prev = stress
    
    return X, stress, iteration + 1
 
 
def compare_metric_nonmetric():
    """Compare Metric and Non-metric MDS on ordinal data."""
    np.random.seed(42)
    
    # Generate 2D data
    n = 100
    t = np.linspace(0, 2*np.pi, n)
    X_true = np.column_stack([np.cos(t), np.sin(t)])  # Circle
    
    # True Euclidean distances
    D_true = cdist(X_true, X_true, metric='euclidean')
    
    # Create ordinal version: apply monotonic but nonlinear transform
    # Simulates subjective ratings where small distances are exaggerated
    D_ordinal = np.log1p(D_true * 10)  # Log transform
    
    # Metric MDS on ordinal data (wrong assumption)
    from smacof import smacof
    X_metric, stress_metric, _ = smacof(D_ordinal, n_components=2, random_state=42)
    
    # Non-metric MDS on ordinal data (correct assumption)
    X_nonmetric, stress_nonmetric, _ = nonmetric_mds(
        D_ordinal, n_components=2, random_state=42, verbose=True
    )
    
    print(f"\nMetric MDS stress on ordinal data: {stress_metric:.4f}")
    print(f"Non-metric MDS stress on ordinal data: {stress_nonmetric:.4f}")
    
    # Evaluate: correlation of distance ranks
    D_metric = cdist(X_metric, X_metric)
    D_nonmetric = cdist(X_nonmetric, X_nonmetric)
    
    from scipy.stats import spearmanr
    triu = np.triu_indices(n, k=1)
    
    corr_metric = spearmanr(D_true[triu], D_metric[triu])[0]
    corr_nonmetric = spearmanr(D_true[triu], D_nonmetric[triu])[0]
    
    print(f"\nRank correlation with true distances:")
    print(f"  Metric MDS: {corr_metric:.4f}")
    print(f"  Non-metric MDS: {corr_nonmetric:.4f}")
    
    return X_metric, X_nonmetric, X_true
 
if __name__ == "__main__":
    compare_metric_nonmetric()

Stress Formulations in Non-metric MDS

Several stress formulations are used in Non-metric MDS practice:

Kruskal's Stress-1:

$$\sigma_1 = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j} d_{ij}^2}}$$

Normalizes by embedded distances squared. Most common.

Kruskal's Stress-2:

$$\sigma_2 = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j}(d_{ij} - \bar{d})^2}}$$

Normalizes by the variance of embedded distances. Penalizes embeddings where all distances are nearly equal.

Raw stress:

$$\sigma_{raw} = \sum_{i<j}(d_{ij} - \hat{d}_{ij})^2$$

Unnormalized. Not recommended for comparing across datasets.

S-stress (on squared distances):

$$\text{S-stress} = \sqrt{\frac{\sum_{i<j}(d_{ij}^2 - \hat{d}{ij}^2)^2}{\sum{i<j} d_{ij}^4}}$$

Works with squared distances. Computationally convenient.

Interpreting Stress Values

Kruskal's rule of thumb: Stress < 0.05 (excellent), 0.05-0.10 (good), 0.10-0.20 (fair), > 0.20 (poor). However, stress depends on n and k: more points or fewer dimensions → higher baseline stress. Compare relative to your specific context.

Typical Minimum Stress Values by n and k
n (points)	k=1	k=2	k=3	k=4
10	0.20	0.07	0.03	0.01
20	0.26	0.12	0.07	0.04
50	0.32	0.18	0.12	0.08
100	0.36	0.22	0.16	0.11
500	0.42	0.28	0.22	0.18

Best practices for assessing fit:

Stress elbow plot: Compute stress for $k = 1, 2, 3, \ldots$ Look for the 'elbow' where stress improvements diminish.
Shepard diagram: Plot $\delta_{ij}$ vs $d_{ij}$. A good fit shows a tight, monotonically increasing relationship. Wide scatter indicates poor fit.
Compare to random: Randomize the dissimilarity matrix and compute stress. Your actual stress should be much lower.
Stability analysis: Run multiple times with different initializations. Stable results across runs indicate robust solution.
Procrustes rotation: Compare solutions from different runs by aligning with Procrustes analysis.

Practical Implementation with scikit-learn

scikit-learn provides a robust implementation of both Metric and Non-metric MDS through the MDS class.

sklearn_nonmetric_mds.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
from sklearn.manifold import MDS
from sklearn.metrics import pairwise_distances
import numpy as np
import matplotlib.pyplot as plt
 
def demonstrate_nonmetric_mds():
    """
    Complete demonstration of Non-metric MDS with sklearn.
    """
    np.random.seed(42)
    
    # Generate clustered data
    n_per_cluster = 50
    clusters = [
        np.random.randn(n_per_cluster, 10) + np.array([5, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
        np.random.randn(n_per_cluster, 10) + np.array([0, 5, 0, 0, 0, 0, 0, 0, 0, 0]),
        np.random.randn(n_per_cluster, 10) + np.array([0, 0, 5, 0, 0, 0, 0, 0, 0, 0]),
    ]
    X = np.vstack(clusters)
    labels = np.repeat([0, 1, 2], n_per_cluster)
    
    # Compute distance matrix
    D = pairwise_distances(X, metric='euclidean')
    
    # Apply nonlinear transformation to simulate ordinal data
    D_ordinal = np.sqrt(D)  # Square root transform
    
    # Non-metric MDS
    mds_nonmetric = MDS(
        n_components=2,
        metric=False,  # Non-metric MDS
        dissimilarity='precomputed',
        n_init=4,
        max_iter=300,
        eps=1e-6,
        random_state=42,
        normalized_stress='auto'  # Use 'auto' for sklearn >= 1.4
    )
    
    X_nmds = mds_nonmetric.fit_transform(D_ordinal)
    stress_nmds = mds_nonmetric.stress_
    
    # Metric MDS for comparison
    mds_metric = MDS(
        n_components=2,
        metric=True,
        dissimilarity='precomputed',
        n_init=4,
        max_iter=300,
        random_state=42
    )
    
    X_mmds = mds_metric.fit_transform(D_ordinal)
    stress_mmds = mds_metric.stress_
    
    print(f"Non-metric MDS stress: {stress_nmds:.4f}")
    print(f"Metric MDS stress: {stress_mmds:.4f}")
    
    # Visualization
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    for ax, X_embed, title in [
        (axes[0], X_nmds, 'Non-metric MDS'),
        (axes[1], X_mmds, 'Metric MDS')
    ]:
        scatter = ax.scatter(X_embed[:, 0], X_embed[:, 1], 
                            c=labels, cmap='viridis', s=50, alpha=0.7)
        ax.set_title(title)
        ax.set_xlabel('Dimension 1')
        ax.set_ylabel('Dimension 2')
    
    plt.colorbar(scatter, ax=axes[1], label='Cluster')
    plt.tight_layout()
    plt.savefig('nonmetric_vs_metric_mds.png', dpi=150)
    plt.close()
    
    return X_nmds, X_mmds
 
 
def stress_vs_dimensions(D, max_k=6, n_init=4):
    """
    Compute stress for different target dimensions.
    Helps choose appropriate k.
    """
    stresses = []
    
    for k in range(1, max_k + 1):
        mds = MDS(
            n_components=k,
            metric=False,
            dissimilarity='precomputed',
            n_init=n_init,
            random_state=42
        )
        mds.fit(D)
        stresses.append(mds.stress_)
        print(f"k={k}: stress = {mds.stress_:.4f}")
    
    # Plot
    plt.figure(figsize=(8, 5))
    plt.plot(range(1, max_k + 1), stresses, 'bo-', linewidth=2, markersize=8)
    plt.xlabel('Number of dimensions (k)')
    plt.ylabel('Stress')
    plt.title('Elbow Plot for Choosing k')
    plt.grid(True, alpha=0.3)
    plt.savefig('stress_elbow_plot.png', dpi=150)
    plt.close()
    
    return stresses
 
 
def create_shepard_diagram(D_original, X_embedded):
    """
    Create Shepard diagram for Non-metric MDS.
    Shows relationship between original dissimilarities and embedded distances.
    """
    from sklearn.isotonic import IsotonicRegression
    
    D_embedded = pairwise_distances(X_embedded, metric='euclidean')
    
    n = D_original.shape[0]
    triu_idx = np.triu_indices(n, k=1)
    
    delta = D_original[triu_idx]
    d = D_embedded[triu_idx]
    
    # Isotonic regression for disparity line
    order = np.argsort(delta)
    iso = IsotonicRegression()
    d_hat = iso.fit_transform(delta[order], d[order])
    
    plt.figure(figsize=(8, 6))
    
    # Scatter plot of all pairs
    plt.scatter(delta, d, alpha=0.3, s=10, label='Distance pairs')
    
    # Isotonic regression line (disparities)
    plt.plot(delta[order], d_hat, 'r-', linewidth=2, label='Isotonic fit')
    
    plt.xlabel('Original Dissimilarity (δ)')
    plt.ylabel('Embedded Distance (d)')
    plt.title('Shepard Diagram for Non-metric MDS')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.savefig('shepard_diagram_nmds.png', dpi=150)
    plt.close()
 
 
if __name__ == "__main__":
    X_nmds, X_mmds = demonstrate_nonmetric_mds()

Applications of Non-metric MDS

Non-metric MDS finds applications across many domains where ordinal similarity data is available.

Domain Applications

•Psychology: Mapping mental representations of concepts, colors, sounds. E.g., how do people organize their perception of emotion words? NMDS reveals the underlying structure (valence, arousal dimensions).
•Marketing Research: Perceptual mapping of brands or products. Respondents rate similarity between pairs of brands; NMDS reveals competitive positioning in consumer minds.
•Ecology: Ordination of species communities based on Bray-Curtis or other ecological dissimilarities. NMDS is preferred over PCA for non-Euclidean ecological distances.
•Genetics: Visualizing population structure from genetic distances (FST, Nei's distance). NMDS handles the ordinal nature of many genetic distance metrics.
•Linguistics: Mapping dialect similarity based on linguistic features. NMDS reveals geographic and social clustering of language varieties.
•Sensory Science: Mapping food or beverage samples based on sensory panel ratings. Reveals how trained panelists perceive product differences.

Ecology Example: Why Ecologists Prefer NMDS

In community ecology, Bray-Curtis dissimilarity is the standard metric for comparing species composition. It's bounded [0,1], non-Euclidean, and sensitive to dominant species. NMDS handles Bray-Curtis naturally, while PCA/Classical MDS would inappropriately treat these as Euclidean. The ecology package 'vegan' in R has contributed to NMDS becoming standard practice.

Example: Color Perception Study

A classic application is Shepard's (1962) study of color perception:

Subjects compared pairs of color chips and rated similarity
Ratings were averaged but treated as ordinal (only rank ordering is trusted)
Non-metric MDS was applied to the similarity matrix
The result: a 2D circular arrangement corresponding to the color wheel!

The color wheel structure emerged purely from behavioral similarity judgments, without any knowledge of wavelength or color theory. NMDS recovered the latent perceptual structure.

Practical tips for these applications:

Always run multiple random starts (n_init ≥ 4)
Report stress and Shepard diagram
Use 2D for visualization, but check stress in higher dimensions
Consider permutation tests for statistical inference

Summary: Non-metric MDS

Non-metric MDS provides a robust approach for ordinal data where only the ranking of dissimilarities is meaningful.

Key Takeaways

•Non-metric MDS preserves rank ordering rather than exact values, making it robust to measurement scale and nonlinear response functions.
•Disparities are auxiliary variables that must be monotonically related to input dissimilarities but can take any values.
•Isotonic regression (PAVA) finds optimal disparities given embedded distances—the key algorithmic component.
•Alternating optimization iterates between isotonic regression and SMACOF until convergence.
•Stress-1 is the standard objective: √(Σ(d-d̂)² / Σd²). Values < 0.1 indicate good fit.
•Shepard diagrams visualize fit quality by plotting dissimilarities vs embedded distances.
•Use when data is ordinal: subjective ratings, rankings, or when you distrust the exact numerical values.

What's next:

Both Metric and Non-metric MDS optimize stress functions that measure distance discrepancy. The next page dives deep into Stress Functions themselves—their mathematical properties, how different formulations affect results, and how to interpret stress values. Understanding stress is crucial for effective MDS practice.

Page Complete

You now understand Non-metric MDS: when to use it (ordinal data), how it works (isotonic regression + SMACOF), and how it differs from Metric MDS (rank preservation vs value preservation). Next, we examine stress functions in detail.