Loading learning content...
Imagine you're standing on the surface of the Earth and want to measure the distance between two cities—say, New York and Tokyo. You could draw a straight line through the Earth (the Euclidean distance), but that line passes through molten rock and is completely irrelevant for any practical purpose. What matters is the shortest path along the surface—the geodesic distance.\n\nThis seemingly simple observation has profound implications for machine learning. When data lies on a curved manifold embedded in high-dimensional space, Euclidean distances can be catastrophically misleading. Two points that appear close in ambient space might actually be far apart when measured along the manifold's surface. Isomap (Isometric Feature Mapping) was designed to solve precisely this problem by using geodesic distances instead of Euclidean distances.
By the end of this page, you will understand: (1) why Euclidean distances fail on curved manifolds, (2) the mathematical definition of geodesic distances, (3) how geodesic distances relate to intrinsic geometry, (4) the theoretical foundations connecting geodesics to manifold structure, and (5) why preserving geodesic distances enables faithful low-dimensional embeddings.
Euclidean distance is the workhorse of classical machine learning. For two points $\mathbf{x}, \mathbf{y} \in \mathbb{R}^D$, it's defined as:\n\n$$d_E(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\|2 = \sqrt{\sum{i=1}^{D}(x_i - y_i)^2}$$\n\nThis metric assumes a flat, Euclidean geometry where the shortest path between two points is always a straight line. While this assumption works well for many datasets, it can be fundamentally wrong when data lies on a curved manifold.
The Swiss Roll Example\n\nConsider the classic Swiss Roll dataset—a 2D manifold embedded in 3D space. Imagine a flat sheet of paper that has been rolled up. Two points that are close in straight-line Euclidean distance might actually be far apart along the surface of the roll. Conversely, points on opposite "layers" of the roll might be close in Euclidean distance but distant along the surface.\n\nThis creates a fundamental problem: if we apply PCA or any method based on Euclidean distances, we will:\n\n1. Conflate points that should be separated — points from different "layers" of the roll appear close\n2. Separate points that should be together — nearby points along the manifold appear distant\n3. Destroy the true structure — the unrolled 2D structure is completely lost\n\nThe Euclidean distance is measuring the extrinsic geometry (the embedding in 3D), not the intrinsic geometry (the true structure of the manifold itself).
Euclidean distances 'short-circuit' through the manifold—they cut through regions that don't belong to the data distribution. This is why standard dimensionality reduction methods like PCA fail on curved manifolds: they're optimizing for the wrong distance metric. The solution is to use distances that respect the manifold's geometry.
A geodesic is a curve on a manifold that represents the shortest path between two points along the manifold's surface. The length of this curve is the geodesic distance. Let's build this concept rigorously.\n\nFormal Definition\n\nLet $\mathcal{M}$ be a smooth manifold embedded in $\mathbb{R}^D$. For two points $\mathbf{x}, \mathbf{y} \in \mathcal{M}$, the geodesic distance is:\n\n$$d_G(\mathbf{x}, \mathbf{y}) = \inf_{\gamma} \int_0^1 \left\| \frac{d\gamma(t)}{dt} \right\| dt$$\n\nwhere the infimum is taken over all smooth curves $\gamma: [0, 1] \to \mathcal{M}$ with $\gamma(0) = \mathbf{x}$ and $\gamma(1) = \mathbf{y}$. This integral computes the arc length of the curve, and the geodesic minimizes this length.
Riemannian Perspective\n\nIn the language of Riemannian geometry, a manifold $\mathcal{M}$ is equipped with a metric tensor $g$ that defines inner products on tangent spaces at each point. The geodesic distance between points is then:\n\n$$d_G(\mathbf{x}, \mathbf{y}) = \inf_{\gamma} \int_0^1 \sqrt{g_{\gamma(t)}\left(\dot{\gamma}(t), \dot{\gamma}(t)\right)} \, dt$$\n\nwhere $\dot{\gamma}(t) = \frac{d\gamma(t)}{dt}$ is the velocity vector along the curve, and $g_{\gamma(t)}(\cdot, \cdot)$ is the inner product defined by the metric tensor at point $\gamma(t)$.\n\nFor a manifold embedded in Euclidean space with the induced metric (the restriction of the Euclidean metric to the tangent spaces of the manifold), this reduces to the arc-length integral above.
Geodesic distance is an intrinsic property of the manifold—it depends only on the manifold's internal geometry, not on how it's embedded in ambient space. A key theorem in differential geometry (Nash's embedding theorem) states that any smooth manifold can be isometrically embedded in some Euclidean space, but the embedding is not unique. Geodesic distances, however, are invariant to the choice of embedding.
Properties of Geodesic Distances\n\nGeodesic distance satisfies the axioms of a metric:\n\n1. Non-negativity: $d_G(\mathbf{x}, \mathbf{y}) \geq 0$, with equality iff $\mathbf{x} = \mathbf{y}$\n2. Symmetry: $d_G(\mathbf{x}, \mathbf{y}) = d_G(\mathbf{y}, \mathbf{x})$\n3. Triangle inequality: $d_G(\mathbf{x}, \mathbf{z}) \leq d_G(\mathbf{x}, \mathbf{y}) + d_G(\mathbf{y}, \mathbf{z})$\n\nImportantly, geodesic distance provides an upper bound on Euclidean distance:\n\n$$d_E(\mathbf{x}, \mathbf{y}) \leq d_G(\mathbf{x}, \mathbf{y})$$\n\nThe equality holds only when the geodesic between $\mathbf{x}$ and $\mathbf{y}$ is a straight line in the ambient space—which happens only on flat regions of the manifold. On curved regions, the geodesic must bend to stay on the manifold, making it longer than the straight-line distance.
To develop intuition for geodesics, let's examine them on several familiar manifolds. These examples illuminate how curvature affects the relationship between Euclidean and geodesic distances.
| Manifold | Geodesic | Geodesic Distance | Key Insight |
|---|---|---|---|
| $\mathbb{R}^n$ (Euclidean space) | Straight line | $\|\mathbf{x} - \mathbf{y}\|$ | Geodesic = Euclidean (flat manifold) |
| Sphere $S^2$ (radius $r$) | Great circle arc | $r \cdot \arccos\left(\frac{\mathbf{x} \cdot \mathbf{y}}{r^2}\right)$ | Geodesic > Euclidean for distant points |
| Cylinder $S^1 \times \mathbb{R}$ | Helix segment | Depends on wrapping | Can 'wrap around' the cylinder |
| Torus $T^2$ | Complex curves | No closed form | Multiple geodesics may exist between points |
| Swiss Roll | Straight line when unrolled | Distance on unrolled sheet | Captures 2D structure, not 3D embedding |
The Sphere Example in Detail\n\nOn a sphere of radius $r$, the geodesic between any two points is an arc of a great circle—a circle whose center coincides with the center of the sphere. Familiar examples include the equator and any circle of longitude.\n\nFor two points $\mathbf{x}, \mathbf{y}$ on the unit sphere, the geodesic distance is:\n\n$$d_G(\mathbf{x}, \mathbf{y}) = \arccos(\mathbf{x} \cdot \mathbf{y})$$\n\nThe Euclidean (chord) distance is:\n\n$$d_E(\mathbf{x}, \mathbf{y}) = \|\mathbf{x} - \mathbf{y}\| = \sqrt{2(1 - \mathbf{x} \cdot \mathbf{y})}$$\n\nFor antipodal points (diametrically opposite), the geodesic distance is $\pi$ (half the circumference), while the Euclidean distance is 2 (the diameter). The ratio $d_G / d_E = \pi / 2 \approx 1.57$.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
import numpy as np def sphere_geodesic_distance(x, y): """ Compute geodesic distance on unit sphere. Parameters: x, y: Points on unit sphere (3D unit vectors) Returns: Geodesic distance (arc length) """ # Ensure numerical stability for unit vectors dot_product = np.clip(np.dot(x, y), -1.0, 1.0) return np.arccos(dot_product) def sphere_euclidean_distance(x, y): """Compute Euclidean (chord) distance.""" return np.linalg.norm(x - y) # Example: Two points on unit spherenp.random.seed(42) # Generate random points on unit spheredef random_sphere_point(): v = np.random.randn(3) return v / np.linalg.norm(v) p1 = random_sphere_point()p2 = random_sphere_point() geo_dist = sphere_geodesic_distance(p1, p2)euc_dist = sphere_euclidean_distance(p1, p2) print(f"Geodesic distance: {geo_dist:.4f}")print(f"Euclidean distance: {euc_dist:.4f}")print(f"Ratio (geodesic/euclidean): {geo_dist/euc_dist:.4f}")print(f"Difference: {geo_dist - euc_dist:.4f}") # Demonstrate distortion grows with distanceprint("\nDistortion analysis:")print("Angle (deg) | Euclidean | Geodesic | Distortion")print("-" * 50)for angle in [10, 30, 60, 90, 120, 150, 180]: rad = np.radians(angle) # Create two points with given angular separation p1 = np.array([1, 0, 0]) p2 = np.array([np.cos(rad), np.sin(rad), 0]) geo = sphere_geodesic_distance(p1, p2) euc = sphere_euclidean_distance(p1, p2) distortion = geo / euc if euc > 0 else 1.0 print(f"{angle:^11} | {euc:^9.4f} | {geo:^8.4f} | {distortion:^10.4f}")The discrepancy between Euclidean and geodesic distances increases with both (1) the curvature of the manifold and (2) the distance between points. On highly curved manifolds, or for distant points, Euclidean distances can be arbitrarily misleading. This is why Isomap's use of geodesic distances is so powerful for nonlinear dimensionality reduction.
The key insight of Isomap is that if we can preserve geodesic distances in a low-dimensional embedding, we capture the intrinsic geometry of the manifold. This idea is grounded in the concept of isometric embeddings.\n\nDefinition: Isometric Embedding\n\nAn embedding $f: \mathcal{M} \to \mathbb{R}^d$ is isometric if it preserves distances:\n\n$$d_{\mathbb{R}^d}(f(\mathbf{x}), f(\mathbf{y})) = d_G(\mathbf{x}, \mathbf{y}) \quad \text{for all } \mathbf{x}, \mathbf{y} \in \mathcal{M}$$\n\nIn words, the Euclidean distance between embedded points equals the geodesic distance between original points. An isometric embedding is a "perfect unfolding" of the manifold—no stretching, no compression, no distortion.
The Isometry Theorem for Manifolds\n\nA fundamental result in differential geometry states that a $d$-dimensional Riemannian manifold with zero curvature (a flat manifold) can be locally isometrically embedded in $\mathbb{R}^d$. Examples include:\n\n- A plane (trivially embeds in $\mathbb{R}^2$)\n- A cylinder (unrolls to a plane; same as $\mathbb{R}^2$)\n- The Swiss Roll (is intrinsically flat; can be unrolled to $\mathbb{R}^2$)\n\nThis is profound: the Swiss Roll, despite being a curved surface in 3D, has the same intrinsic geometry as a flat rectangle. If we measure distances correctly (geodesically), we can discover this hidden flatness and embed it in 2D without distortion.
A cylinder is curved 'extrinsically' (it curves in 3D space) but flat 'intrinsically' (a bug walking on it can't tell it's curved without leaving the surface). The Swiss Roll is the same—it's just a flat sheet that's been rolled up. Isomap exploits this by using geodesic distances, which capture only the intrinsic geometry, to discover the hidden flat structure.
What About Curved Manifolds?\n\nManifolds with non-zero intrinsic curvature (like spheres) cannot be isometrically embedded in a Euclidean space of the same dimension. A 2D sphere cannot be flattened into 2D without distortion—this is why all map projections of Earth introduce some distortion.\n\nFor such manifolds, Isomap produces an embedding that approximately preserves geodesic distances, minimizing distortion in a least-squares sense. The quality of the approximation depends on the manifold's curvature and the target dimension.\n\nThe Isomap Objective\n\nIsomap seeks an embedding $\mathbf{Y} = \{\mathbf{y}1, \ldots, \mathbf{y}N\} \subset \mathbb{R}^d$ that minimizes:\n\n$$\min{\mathbf{Y}} \sum{i < j} \left( d_G(\mathbf{x}_i, \mathbf{x}_j) - \|\mathbf{y}_i - \mathbf{y}_j\| \right)^2$$\n\nThis is exactly the objective of classical Multidimensional Scaling (MDS), but with geodesic distances instead of Euclidean distances. Isomap's innovation is computing geodesic distances from local Euclidean distances via shortest-path algorithms.
We've established that geodesic distances are the right concept for manifold learning. But there's a fundamental practical problem: we don't know the true manifold.\n\nGiven only a finite sample $\{\mathbf{x}_1, \ldots, \mathbf{x}_N\}$ of points from an unknown manifold $\mathcal{M}$, how can we estimate geodesic distances?\n\nThe key insight is that for nearby points, geodesic distance is well-approximated by Euclidean distance. If two points are close enough that the manifold looks approximately flat between them, the shortest path along the manifold is nearly a straight line.
Local Flatness Assumption\n\nFormally, for a smooth manifold, the geodesic distance satisfies:\n\n$$d_G(\mathbf{x}, \mathbf{y}) = d_E(\mathbf{x}, \mathbf{y}) + O(\|\mathbf{x} - \mathbf{y}\|^3)$$\n\nas $\|\mathbf{x} - \mathbf{y}\| \to 0$. The error term scales with the cube of the distance, so for nearby points, the approximation is excellent.\n\nThis leads to Isomap's two-stage strategy:\n\n1. For nearby points: Approximate geodesic distance with Euclidean distance\n2. For distant points: Compute geodesic distance as the sum of local approximations along a shortest path\n\nThe second step uses the triangle inequality property of distances: the shortest path from $\mathbf{x}$ to $\mathbf{y}$ can be approximated by chaining together shortest paths through intermediate points.
Isomap converts the intractable problem of computing geodesics on an unknown manifold into a tractable graph problem: find shortest paths in a neighborhood graph where edge weights are local Euclidean distances. This brilliant reduction is why Isomap works—it approximates continuous geodesics with discrete shortest paths.
The theoretical foundation of Isomap rests on asymptotic guarantees about when the graph-based approximation converges to true geodesic distances. Understanding these guarantees helps us know when to trust Isomap's results.
Convergence Theorem (Informal)\n\nUnder suitable conditions on the manifold and sampling, as the number of sample points $N \to \infty$ and the neighborhood size decreases appropriately, Isomap's estimated geodesic distances converge to true geodesic distances:\n\n$$\hat{d}_G(\mathbf{x}_i, \mathbf{x}_j) \xrightarrow{N \to \infty} d_G(\mathbf{x}_i, \mathbf{x}_j)$$\n\nRequired Conditions:\n\n1. Manifold regularity: The manifold $\mathcal{M}$ should be a compact, smooth submanifold without boundary (or with geodesically convex boundary)\n\n2. Sufficient curvature bound: The curvature of $\mathcal{M}$ should be bounded, ensuring local Euclidean approximations are valid\n\n3. Uniform sampling: Points should be sampled uniformly (or nearly uniformly) from $\mathcal{M}$\n\n4. Connectivity: The neighborhood graph must be connected\n\n5. Appropriate neighborhood scaling: The neighborhood radius must decrease slowly enough to maintain connectivity but fast enough to exploit local flatness
| Paper | Condition | Convergence Rate | Key Insight |
|---|---|---|---|
| Tenenbaum et al., 2000 | Convexity, regularity | Asymptotic | Original Isomap paper; established basic consistency |
| Bernstein et al., 2000 | Convexity radius | $O(N^{-1/d})$ | First explicit convergence rate |
| Zha & Zhang, 2005 | Reach condition | $O(\epsilon^2)$ | Relates error to neighborhood size |
| Arias-Castro & Pelletier, 2019 | General manifolds | Optimal rates | Minimax-optimal for curve estimation |
A critical theoretical requirement is that geodesics between any two points lie entirely within the manifold. Manifolds with 'holes' or strong concavities can violate this, causing Isomap to find paths that cut across regions not in the manifold. This is the 'short-circuit' problem, which we'll discuss in detail when covering Isomap's limitations.
The Reach Condition\n\nA key concept in manifold learning theory is the reach of a manifold. Intuitively, the reach is the largest radius $r$ such that any point within distance $r$ of the manifold has a unique nearest point on the manifold. For embeddings:\n\n- A manifold with large reach is "well-separated" from itself—no two non-adjacent parts come too close\n- The reach bounds how tightly the manifold can curve\n- Isomap's accuracy depends on choosing neighborhoods smaller than the reach\n\n$$\epsilon < \frac{\text{reach}(\mathcal{M})}{2}$$\n\nensures local Euclidean distances are faithful approximations to geodesics.
To appreciate why geodesic distances are special for manifold learning, let's compare them to other distance metrics used in machine learning.
| Metric | Used By | Preserves | Limitation |
|---|---|---|---|
| Euclidean | PCA, MDS | Global linear structure | Fails on curves/nonlinearity |
| Geodesic | Isomap | Global manifold structure | Requires manifold assumptions |
| Local Euclidean | LLE, Laplacian Eigenmaps | Local neighborhoods | May distort global structure |
| Diffusion | Diffusion Maps | Density-aware connectivity | Sensitive to bandwidth choice |
| t-SNE/UMAP affinities | t-SNE, UMAP | Local-to-global balance | Stochastic, not true distance |
Isomap occupies a unique position: it's the only major manifold learning method that attempts to preserve a true metric (geodesic distances) globally. Methods like LLE and Laplacian Eigenmaps preserve only local structure, while t-SNE and UMAP use sophisticated non-metric affinities. Isomap's metric-preservation property makes it particularly suitable for tasks requiring geometric fidelity.
We've established the mathematical foundation for understanding geodesic distances and their central role in Isomap. Let's consolidate the key insights:
The next page covers Graph Construction—the critical step where we build the neighborhood graph that enables geodesic approximation. We'll examine k-NN vs. ε-ball neighborhoods, the tradeoffs in choosing connectivity, and how to handle disconnected graphs. This is where the theory meets implementation.