Loading content...
Consider a wine tasting experiment where experts rate the similarity between pairs of wines. Expert A might use a 1–10 scale whereas Expert B uses 1–100. One expert might be generous, another harsh. The exact numerical values are arbitrary—what matters is the relative ordering: if wine pair (A,B) is rated more similar than (C,D), that ranking should be preserved regardless of the exact numbers.
Non-metric MDS addresses this by preserving only the monotonic ordering of dissimilarities rather than their exact values. If $\delta_{ij} < \delta_{kl}$ in the input, then the embedded distances should satisfy $d_{ij} < d_{kl}$. This makes Non-metric MDS robust to:
Developed by Shepard (1962) and Kruskal (1964), Non-metric MDS became the dominant approach in psychology, marketing research, and anywhere subjective similarity judgments are collected.
By the end of this page, you will understand the mathematical formulation of Non-metric MDS, master the isotonic regression step that enables rank preservation, learn the alternating optimization algorithm, and recognize when to choose Non-metric over Metric MDS.
Levels of measurement:
In statistics, we distinguish different levels of measurement:
Metric MDS assumes interval or ratio data: The difference between dissimilarity 2 and 4 is the same as between 6 and 8.
Non-metric MDS assumes ordinal data: Only the order matters. Dissimilarity 4 being greater than 2 is all we know—we don't assume 4-2 = 8-6.
Why is ordinal data common?
In many domains, especially those involving human judgment:
Think of Non-metric MDS as finding an embedding where: 'If the input says objects A and B are MORE similar than C and D, then the embedding should place A and B CLOSER together than C and D.' The exact distances don't matter—only their relative ordering.
| Use Metric MDS When... | Use Non-metric MDS When... |
|---|---|
| Dissimilarities are precise measurements | Dissimilarities are subjective judgments |
| Scale is meaningful (e.g., miles, seconds) | Scale is arbitrary (e.g., Likert ratings) |
| You trust the numeric values | You only trust the ordering |
| Data is interval or ratio scale | Data is ordinal |
| Distances come from features (Euclidean) | Dissimilarities come from surveys, ratings |
| You want to match exact distances | You want to match distance ordering |
The key insight of Non-metric MDS is to introduce an intermediate set of disparities $\hat{d}_{ij}$ that can be freely adjusted as long as they preserve the rank ordering of the original dissimilarities.
Setup:
The monotonicity constraint:
Disparities must satisfy: $$\delta_{ij} < \delta_{kl} \implies \hat{d}{ij} \leq \hat{d}{kl}$$
In other words, disparities are a monotone transformation of dissimilarities.
The Non-metric stress (Kruskal's Stress-1):
$$\sigma(X) = \sqrt{\frac{\sum_{i<j}(d_{ij}(X) - \hat{d}{ij})^2}{\sum{i<j} d_{ij}(X)^2}}$$
This measures how well the embedded distances match the disparities. The disparities themselves are chosen to be as close to the embedded distances as possible while respecting the monotonicity constraint.
Non-metric MDS optimizes over two sets of variables: (1) the embedding coordinates X, and (2) the disparities d̂. The coordinates control the embedded distances, while the disparities provide the 'targets' that embedded distances should match. The disparities act as a flexible proxy for the original dissimilarities.
Why introduce disparities?
Consider dissimilarities $\delta = [1, 5, 100]$. Metric MDS would try to make embedded distances match 1, 5, and 100—respecting the huge gap between 5 and 100.
But if we only trust the ordering (1 < 5 < 100), then disparities $\hat{d} = [1, 2, 3]$ are equally valid, as are $\hat{d} = [10, 20, 21]$. Non-metric MDS finds the disparities that:
This decouples the scale of the input from the structure of the output.
The optimization problem:
$$\min_{X, \hat{d}} \sum_{i<j}(d_{ij}(X) - \hat{d}{ij})^2$$ $$\text{subject to: } \delta{ij} < \delta_{kl} \implies \hat{d}{ij} \leq \hat{d}{kl}$$
This is a constrained optimization problem solved by alternating between:
Given fixed embedding coordinates $X$, finding the optimal disparities is an isotonic regression problem. Isotonic regression finds the best monotonic fit to data.
Problem statement:
Given embedded distances $d_{ij}$ and the ordering from dissimilarities $\delta_{ij}$, find disparities $\hat{d}_{ij}$ that:
$$\min_{\hat{d}} \sum_{i<j}(d_{ij} - \hat{d}{ij})^2$$ $$\text{s.t. } \hat{d}{ij} \leq \hat{d}{kl} \text{ whenever } \delta{ij} \leq \delta_{kl}$$
Interpretation: Find the monotonically increasing function of $\delta$ that best predicts $d$.
The PAVA (Pool Adjacent Violators Algorithm):
PAVA runs in $O(m \log m)$ time where $m = n(n-1)/2$ is the number of pairs.
Imagine plotting d vs δ. Draw a horizontal line at each d value. PAVA finds the best 'staircase' function (weakly increasing) that minimizes squared distance to the observed points. When points violate the order, they're averaged and 'pooled' into a single step.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198
import numpy as np def pava_isotonic_regression(y, weights=None): """ Pool Adjacent Violators Algorithm (PAVA) for isotonic regression. Finds the best monotonically non-decreasing fit to y. Parameters: ----------- y : array-like Values to fit (assumed already sorted by the ordering variable) weights : array-like, optional Weights for each observation Returns: -------- y_iso : ndarray Isotonic fit (non-decreasing) """ y = np.asarray(y, dtype=float).copy() n = len(y) if weights is None: weights = np.ones(n) else: weights = np.asarray(weights, dtype=float) # PAVA: Maintains weighted sums and counts for each block # Work with blocks of consecutive elements # Block data: each block has (sum, weight, start_idx, end_idx) # Initially each element is its own block blocks = list(range(n)) block_values = y.copy() block_weights = weights.copy() i = 0 while i < len(blocks) - 1: # Check monotonicity with next block if block_values[i] > block_values[i + 1]: # Violation! Pool blocks i and i+1 # New value is weighted average combined_weight = block_weights[i] + block_weights[i + 1] combined_value = ( block_values[i] * block_weights[i] + block_values[i + 1] * block_weights[i + 1] ) / combined_weight # Update block i block_values[i] = combined_value block_weights[i] = combined_weight # Remove block i+1 block_values = np.delete(block_values, i + 1) block_weights = np.delete(block_weights, i + 1) blocks.pop(i + 1) # Check backwards for new violations if i > 0: i -= 1 else: i += 1 # Reconstruct full solution (expand blocks back to original indices) y_iso = np.zeros(n) block_idx = 0 current_pos = 0 for b in range(len(blocks)): if b < len(blocks) - 1: next_pos = blocks[b + 1] if b + 1 < len(blocks) else n block_size = next_pos - blocks[b] else: block_size = n - blocks[b] # Actually we need to track which original indices are in each block # Simplified: rebuild from scratch pass # Simpler implementation using scipy from scipy.interpolate import interp1d # Just return the final block values expanded # For production, use sklearn.isotonic.IsotonicRegression return None # Placeholder def isotonic_regression_simple(x, y, weights=None): """ Simple isotonic regression implementation. Parameters: ----------- x : array-like Ordering variable (will be sorted) y : array-like Response variable weights : array-like, optional Weights Returns: -------- y_iso : ndarray Isotonic fit corresponding to sorted x order : ndarray Sort order used """ x = np.asarray(x) y = np.asarray(y) # Sort by x order = np.argsort(x) y_sorted = y[order] if weights is None: weights_sorted = np.ones_like(y_sorted) else: weights_sorted = np.asarray(weights)[order] n = len(y_sorted) y_iso = y_sorted.copy() # PAVA with block tracking block_starts = list(range(n)) block_sums = y_sorted * weights_sorted block_weights = weights_sorted.copy() pos = 0 while pos < len(block_starts) - 1: # Current and next block means curr_mean = block_sums[pos] / block_weights[pos] next_mean = block_sums[pos + 1] / block_weights[pos + 1] if curr_mean > next_mean: # Pool blocks block_sums[pos] += block_sums[pos + 1] block_weights[pos] += block_weights[pos + 1] # Remove next block block_sums = np.delete(block_sums, pos + 1) block_weights = np.delete(block_weights, pos + 1) block_starts.pop(pos + 1) # Check backward if pos > 0: pos -= 1 else: pos += 1 # Expand blocks to get final values block_means = block_sums / block_weights y_iso = np.zeros(n) for b in range(len(block_starts)): start = block_starts[b] end = block_starts[b + 1] if b + 1 < len(block_starts) else n y_iso[start:end] = block_means[b] return y_iso, order def demo_isotonic_regression(): """Demonstrate isotonic regression on simple data.""" import matplotlib.pyplot as plt np.random.seed(42) # True monotonic function with noise n = 50 x = np.sort(np.random.rand(n)) y_true = 2 * x + 1 # Linear increasing y_noisy = y_true + np.random.randn(n) * 0.5 # Apply isotonic regression y_iso, order = isotonic_regression_simple(x, y_noisy) # Also use sklearn for comparison from sklearn.isotonic import IsotonicRegression iso_sklearn = IsotonicRegression() y_iso_sklearn = iso_sklearn.fit_transform(x, y_noisy) # Plot plt.figure(figsize=(10, 5)) plt.scatter(x, y_noisy, label='Noisy data', alpha=0.7) plt.plot(x, y_true, 'g--', label='True function', linewidth=2) plt.step(x, y_iso, 'r-', where='post', label='PAVA isotonic', linewidth=2) plt.plot(x, y_iso_sklearn, 'b:', label='sklearn isotonic', linewidth=2) plt.xlabel('x (ordering variable)') plt.ylabel('y') plt.legend() plt.title('Isotonic Regression: Fitting a Monotonic Function') plt.savefig('isotonic_demo.png', dpi=150) plt.close() print("Isotonic regression demonstration complete.") if __name__ == "__main__": demo_isotonic_regression()Handling ties in dissimilarities:
When $\delta_{ij} = \delta_{kl}$, we have two options:
Primary approach: Tied dissimilarities can have any relative ordering: $\hat{d}{ij} \leq \hat{d}{kl}$ or $\hat{d}{ij} \geq \hat{d}{kl}$. This gives more flexibility.
Secondary approach: Tied dissimilarities must have equal disparities: $\delta_{ij} = \delta_{kl} \implies \hat{d}{ij} = \hat{d}{kl}$. This is more restrictive.
Most implementations use the primary (weak monotonicity) approach.
Shepard Diagram for Non-metric MDS:
For non-metric MDS, the Shepard diagram plots $\delta_{ij}$ vs $d_{ij}(X)$ for all pairs. A good non-metric fit shows:
Non-metric MDS uses an alternating optimization approach, iterating between:
Complete algorithm:
Input: Dissimilarities δ, target dimension k, convergence tolerance ε
1. Initialize X (e.g., Classical MDS or random)
2. Repeat:
a. Compute embedded distances d_ij(X)
b. Run isotonic regression: δ → d̂ (monotonic fit to d)
c. Update X using SMACOF with d̂ as targets
d. Compute stress σ
e. If stress improvement < ε, stop
3. Return X
Convergence properties:
Non-metric MDS can produce degenerate solutions where all points collapse to a single location (stress = 0 trivially). The normalization in Stress-1 (dividing by sum of d²_ij) prevents this: if points collapse, the denominator goes to zero, making stress finite.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205
import numpy as npfrom scipy.spatial.distance import cdistfrom sklearn.isotonic import IsotonicRegression def nonmetric_stress(D_target, X): """ Compute Kruskal's Stress-1 for non-metric MDS. Uses isotonic regression to find optimal disparities. """ n = X.shape[0] D_embedded = cdist(X, X, metric='euclidean') # Extract upper triangle triu_idx = np.triu_indices(n, k=1) delta = D_target[triu_idx] # Original dissimilarities d = D_embedded[triu_idx] # Embedded distances # Isotonic regression: find disparities order = np.argsort(delta) delta_sorted = delta[order] d_sorted = d[order] iso = IsotonicRegression(out_of_bounds='clip') d_hat_sorted = iso.fit_transform(delta_sorted, d_sorted) # Map back to original order d_hat = np.zeros_like(d) d_hat[order] = d_hat_sorted # Stress-1 formula numerator = np.sum((d - d_hat) ** 2) denominator = np.sum(d ** 2) if denominator == 0: return np.inf stress = np.sqrt(numerator / denominator) return stress def nonmetric_mds(D_target, n_components=2, init=None, max_iter=300, eps=1e-6, random_state=None, verbose=False): """ Non-metric MDS using alternating isotonic regression and SMACOF. Parameters: ----------- D_target : ndarray of shape (n, n) Dissimilarity matrix n_components : int Target dimensionality init : ndarray, optional Initial configuration max_iter : int Maximum iterations eps : float Convergence tolerance on stress change random_state : int Random seed verbose : bool Print progress Returns: -------- X : ndarray of shape (n, n_components) Final embedding stress : float Final stress value n_iter : int Number of iterations """ rng = np.random.RandomState(random_state) n = D_target.shape[0] k = n_components # Initialize (use Classical MDS) if init is None: D_sq = D_target ** 2 row_means = D_sq.mean(axis=1, keepdims=True) col_means = D_sq.mean(axis=0, keepdims=True) grand_mean = D_sq.mean() B = -0.5 * (D_sq - row_means - col_means + grand_mean) eigenvalues, eigenvectors = np.linalg.eigh(B) idx = np.argsort(eigenvalues)[::-1][:k] eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] eigenvalues = np.maximum(eigenvalues, 0) X = eigenvectors * np.sqrt(eigenvalues) else: X = init.copy() # Precompute sorted order of dissimilarities triu_idx = np.triu_indices(n, k=1) delta = D_target[triu_idx] delta_order = np.argsort(delta) # Initialize isotonic regressor iso = IsotonicRegression(out_of_bounds='clip') # Weighted Laplacian for SMACOF (uniform weights) weights = np.ones((n, n)) np.fill_diagonal(weights, 0) V = np.diag(weights.sum(axis=1)) - weights V_pinv = np.linalg.pinv(V) stress_prev = np.inf for iteration in range(max_iter): # Step 1: Compute embedded distances D_embedded = cdist(X, X, metric='euclidean') d = D_embedded[triu_idx] # Step 2: Isotonic regression to get disparities d_sorted = d[delta_order] d_hat_sorted = iso.fit_transform(np.arange(len(d_sorted)), d_sorted) # Map disparities back to matrix form d_hat = np.zeros_like(d) d_hat[delta_order] = d_hat_sorted D_hat = np.zeros((n, n)) D_hat[triu_idx] = d_hat D_hat = D_hat + D_hat.T # Symmetrize # Step 3: SMACOF update with disparities as targets with np.errstate(divide='ignore', invalid='ignore'): B = -weights * D_hat / D_embedded B[~np.isfinite(B)] = 0 np.fill_diagonal(B, 0) np.fill_diagonal(B, -B.sum(axis=1)) X_new = V_pinv @ B @ X # Step 4: Compute stress D_new = cdist(X_new, X_new, metric='euclidean') d_new = D_new[triu_idx] numerator = np.sum((d_new - d_hat) ** 2) denominator = np.sum(d_new ** 2) stress = np.sqrt(numerator / denominator) if denominator > 0 else np.inf if verbose and iteration % 10 == 0: print(f"Iteration {iteration}: stress = {stress:.6f}") # Check convergence if abs(stress_prev - stress) < eps: if verbose: print(f"Converged at iteration {iteration}") X = X_new break X = X_new stress_prev = stress return X, stress, iteration + 1 def compare_metric_nonmetric(): """Compare Metric and Non-metric MDS on ordinal data.""" np.random.seed(42) # Generate 2D data n = 100 t = np.linspace(0, 2*np.pi, n) X_true = np.column_stack([np.cos(t), np.sin(t)]) # Circle # True Euclidean distances D_true = cdist(X_true, X_true, metric='euclidean') # Create ordinal version: apply monotonic but nonlinear transform # Simulates subjective ratings where small distances are exaggerated D_ordinal = np.log1p(D_true * 10) # Log transform # Metric MDS on ordinal data (wrong assumption) from smacof import smacof X_metric, stress_metric, _ = smacof(D_ordinal, n_components=2, random_state=42) # Non-metric MDS on ordinal data (correct assumption) X_nonmetric, stress_nonmetric, _ = nonmetric_mds( D_ordinal, n_components=2, random_state=42, verbose=True ) print(f"\nMetric MDS stress on ordinal data: {stress_metric:.4f}") print(f"Non-metric MDS stress on ordinal data: {stress_nonmetric:.4f}") # Evaluate: correlation of distance ranks D_metric = cdist(X_metric, X_metric) D_nonmetric = cdist(X_nonmetric, X_nonmetric) from scipy.stats import spearmanr triu = np.triu_indices(n, k=1) corr_metric = spearmanr(D_true[triu], D_metric[triu])[0] corr_nonmetric = spearmanr(D_true[triu], D_nonmetric[triu])[0] print(f"\nRank correlation with true distances:") print(f" Metric MDS: {corr_metric:.4f}") print(f" Non-metric MDS: {corr_nonmetric:.4f}") return X_metric, X_nonmetric, X_true if __name__ == "__main__": compare_metric_nonmetric()Several stress formulations are used in Non-metric MDS practice:
Kruskal's Stress-1:
$$\sigma_1 = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j} d_{ij}^2}}$$
Normalizes by embedded distances squared. Most common.
Kruskal's Stress-2:
$$\sigma_2 = \sqrt{\frac{\sum_{i<j}(d_{ij} - \hat{d}{ij})^2}{\sum{i<j}(d_{ij} - \bar{d})^2}}$$
Normalizes by the variance of embedded distances. Penalizes embeddings where all distances are nearly equal.
Raw stress:
$$\sigma_{raw} = \sum_{i<j}(d_{ij} - \hat{d}_{ij})^2$$
Unnormalized. Not recommended for comparing across datasets.
S-stress (on squared distances):
$$\text{S-stress} = \sqrt{\frac{\sum_{i<j}(d_{ij}^2 - \hat{d}{ij}^2)^2}{\sum{i<j} d_{ij}^4}}$$
Works with squared distances. Computationally convenient.
Kruskal's rule of thumb: Stress < 0.05 (excellent), 0.05-0.10 (good), 0.10-0.20 (fair), > 0.20 (poor). However, stress depends on n and k: more points or fewer dimensions → higher baseline stress. Compare relative to your specific context.
| n (points) | k=1 | k=2 | k=3 | k=4 |
|---|---|---|---|---|
| 10 | 0.20 | 0.07 | 0.03 | 0.01 |
| 20 | 0.26 | 0.12 | 0.07 | 0.04 |
| 50 | 0.32 | 0.18 | 0.12 | 0.08 |
| 100 | 0.36 | 0.22 | 0.16 | 0.11 |
| 500 | 0.42 | 0.28 | 0.22 | 0.18 |
Best practices for assessing fit:
Stress elbow plot: Compute stress for $k = 1, 2, 3, \ldots$ Look for the 'elbow' where stress improvements diminish.
Shepard diagram: Plot $\delta_{ij}$ vs $d_{ij}$. A good fit shows a tight, monotonically increasing relationship. Wide scatter indicates poor fit.
Compare to random: Randomize the dissimilarity matrix and compute stress. Your actual stress should be much lower.
Stability analysis: Run multiple times with different initializations. Stable results across runs indicate robust solution.
Procrustes rotation: Compare solutions from different runs by aligning with Procrustes analysis.
scikit-learn provides a robust implementation of both Metric and Non-metric MDS through the MDS class.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150
from sklearn.manifold import MDSfrom sklearn.metrics import pairwise_distancesimport numpy as npimport matplotlib.pyplot as plt def demonstrate_nonmetric_mds(): """ Complete demonstration of Non-metric MDS with sklearn. """ np.random.seed(42) # Generate clustered data n_per_cluster = 50 clusters = [ np.random.randn(n_per_cluster, 10) + np.array([5, 0, 0, 0, 0, 0, 0, 0, 0, 0]), np.random.randn(n_per_cluster, 10) + np.array([0, 5, 0, 0, 0, 0, 0, 0, 0, 0]), np.random.randn(n_per_cluster, 10) + np.array([0, 0, 5, 0, 0, 0, 0, 0, 0, 0]), ] X = np.vstack(clusters) labels = np.repeat([0, 1, 2], n_per_cluster) # Compute distance matrix D = pairwise_distances(X, metric='euclidean') # Apply nonlinear transformation to simulate ordinal data D_ordinal = np.sqrt(D) # Square root transform # Non-metric MDS mds_nonmetric = MDS( n_components=2, metric=False, # Non-metric MDS dissimilarity='precomputed', n_init=4, max_iter=300, eps=1e-6, random_state=42, normalized_stress='auto' # Use 'auto' for sklearn >= 1.4 ) X_nmds = mds_nonmetric.fit_transform(D_ordinal) stress_nmds = mds_nonmetric.stress_ # Metric MDS for comparison mds_metric = MDS( n_components=2, metric=True, dissimilarity='precomputed', n_init=4, max_iter=300, random_state=42 ) X_mmds = mds_metric.fit_transform(D_ordinal) stress_mmds = mds_metric.stress_ print(f"Non-metric MDS stress: {stress_nmds:.4f}") print(f"Metric MDS stress: {stress_mmds:.4f}") # Visualization fig, axes = plt.subplots(1, 2, figsize=(12, 5)) for ax, X_embed, title in [ (axes[0], X_nmds, 'Non-metric MDS'), (axes[1], X_mmds, 'Metric MDS') ]: scatter = ax.scatter(X_embed[:, 0], X_embed[:, 1], c=labels, cmap='viridis', s=50, alpha=0.7) ax.set_title(title) ax.set_xlabel('Dimension 1') ax.set_ylabel('Dimension 2') plt.colorbar(scatter, ax=axes[1], label='Cluster') plt.tight_layout() plt.savefig('nonmetric_vs_metric_mds.png', dpi=150) plt.close() return X_nmds, X_mmds def stress_vs_dimensions(D, max_k=6, n_init=4): """ Compute stress for different target dimensions. Helps choose appropriate k. """ stresses = [] for k in range(1, max_k + 1): mds = MDS( n_components=k, metric=False, dissimilarity='precomputed', n_init=n_init, random_state=42 ) mds.fit(D) stresses.append(mds.stress_) print(f"k={k}: stress = {mds.stress_:.4f}") # Plot plt.figure(figsize=(8, 5)) plt.plot(range(1, max_k + 1), stresses, 'bo-', linewidth=2, markersize=8) plt.xlabel('Number of dimensions (k)') plt.ylabel('Stress') plt.title('Elbow Plot for Choosing k') plt.grid(True, alpha=0.3) plt.savefig('stress_elbow_plot.png', dpi=150) plt.close() return stresses def create_shepard_diagram(D_original, X_embedded): """ Create Shepard diagram for Non-metric MDS. Shows relationship between original dissimilarities and embedded distances. """ from sklearn.isotonic import IsotonicRegression D_embedded = pairwise_distances(X_embedded, metric='euclidean') n = D_original.shape[0] triu_idx = np.triu_indices(n, k=1) delta = D_original[triu_idx] d = D_embedded[triu_idx] # Isotonic regression for disparity line order = np.argsort(delta) iso = IsotonicRegression() d_hat = iso.fit_transform(delta[order], d[order]) plt.figure(figsize=(8, 6)) # Scatter plot of all pairs plt.scatter(delta, d, alpha=0.3, s=10, label='Distance pairs') # Isotonic regression line (disparities) plt.plot(delta[order], d_hat, 'r-', linewidth=2, label='Isotonic fit') plt.xlabel('Original Dissimilarity (δ)') plt.ylabel('Embedded Distance (d)') plt.title('Shepard Diagram for Non-metric MDS') plt.legend() plt.grid(True, alpha=0.3) plt.savefig('shepard_diagram_nmds.png', dpi=150) plt.close() if __name__ == "__main__": X_nmds, X_mmds = demonstrate_nonmetric_mds()Non-metric MDS finds applications across many domains where ordinal similarity data is available.
In community ecology, Bray-Curtis dissimilarity is the standard metric for comparing species composition. It's bounded [0,1], non-Euclidean, and sensitive to dominant species. NMDS handles Bray-Curtis naturally, while PCA/Classical MDS would inappropriately treat these as Euclidean. The ecology package 'vegan' in R has contributed to NMDS becoming standard practice.
Example: Color Perception Study
A classic application is Shepard's (1962) study of color perception:
The color wheel structure emerged purely from behavioral similarity judgments, without any knowledge of wavelength or color theory. NMDS recovered the latent perceptual structure.
Practical tips for these applications:
Non-metric MDS provides a robust approach for ordinal data where only the ranking of dissimilarities is meaningful.
What's next:
Both Metric and Non-metric MDS optimize stress functions that measure distance discrepancy. The next page dives deep into Stress Functions themselves—their mathematical properties, how different formulations affect results, and how to interpret stress values. Understanding stress is crucial for effective MDS practice.
You now understand Non-metric MDS: when to use it (ordinal data), how it works (isotonic regression + SMACOF), and how it differs from Metric MDS (rank preservation vs value preservation). Next, we examine stress functions in detail.