Loading learning content...
In the previous page, we established that geodesic distances—the shortest paths along a manifold—are the right metric for manifold learning. But we face a fundamental challenge: we don't have the manifold itself, only a finite sample of points from it.\n\nThe brilliant insight of Isomap is to approximate the continuous manifold with a discrete neighborhood graph. Each data point becomes a node, and edges connect nearby points. The geodesic distance between distant points is then approximated by the shortest path through this graph—a sum of local Euclidean distances.\n\nThis page covers the critical question: How do we decide which points are 'nearby'? The answer determines whether Isomap succeeds or fails, and getting it right requires understanding the tradeoffs between different neighborhood definitions.
By the end of this page, you will understand: (1) the two main approaches to defining neighborhoods (k-NN and ε-ball), (2) how to choose neighborhood size to balance local accuracy against global connectivity, (3) how to handle disconnected graphs, (4) computational considerations for large datasets, and (5) the theoretical requirements for faithful geodesic approximation.
The neighborhood graph $G = (V, E, W)$ is a weighted graph where:\n\n- Vertices $V$: Each data point $\mathbf{x}i$ becomes a vertex\n- Edges $E$: Connect "nearby" points (definition to be determined)\n- Weights $W$: Edge weights are Euclidean distances: $w{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|$\n\nThe key insight is that locally, on a smooth manifold, Euclidean distance approximates geodesic distance. So edges between nearby points have weights that are good approximations of the geodesic distance between those points.\n\nFor distant points not directly connected, we compute the geodesic approximation as the shortest path through the graph. This chains together local Euclidean approximations to estimate the global geodesic.
The Path Approximation\n\nFor two points $\mathbf{x}_i$ and $\mathbf{x}_j$ not directly connected by an edge, the geodesic distance is approximated by:\n\n$$\hat{d}G(\mathbf{x}i, \mathbf{x}j) = \min{\text{paths } P: i \to j} \sum{(u,v) \in P} w{uv}$$\n\nThis is precisely the shortest path problem in weighted graphs, solvable by algorithms like Dijkstra's or Floyd-Warshall. The quality of this approximation depends critically on how we define "nearby"—the neighborhood criterion.
The approximation works because smooth manifolds look 'locally flat.' If we choose neighborhoods small enough, local Euclidean distances are nearly equal to local geodesic distances. Chaining these local approximations via shortest paths reconstructs the global geodesic structure. The key is balancing neighborhood size: too small and the graph disconnects; too large and we 'short-circuit' across the manifold.
The first approach to defining neighborhoods is k-Nearest Neighbors (k-NN): connect each point to its $k$ closest points in Euclidean distance.\n\nDefinition\n\nFor each point $\mathbf{x}_i$, let $\mathcal{N}_k(i)$ be the set of indices of its $k$ nearest neighbors (excluding itself). The k-NN neighborhood graph has an edge between $i$ and $j$ if:\n\n$$j \in \mathcal{N}_k(i) \text{ or } i \in \mathcal{N}_k(j)$$\n\nNote that we typically symmetrize the neighborhood: if $i$ considers $j$ a neighbor, then there's an edge between them regardless of whether $j$ considers $i$ a neighbor. This ensures the graph is undirected.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
import numpy as npfrom scipy.spatial.distance import cdistfrom scipy.sparse import lil_matrixfrom sklearn.neighbors import NearestNeighbors def build_knn_graph(X, k, metric='euclidean'): """ Build a symmetrized k-NN neighborhood graph. Parameters: X: (N, D) array of data points k: Number of nearest neighbors metric: Distance metric (default: 'euclidean') Returns: adjacency: (N, N) sparse adjacency matrix with edge weights """ N = X.shape[0] # Find k nearest neighbors for each point nn = NearestNeighbors(n_neighbors=k+1, metric=metric) # +1 to exclude self nn.fit(X) distances, indices = nn.kneighbors(X) # Build sparse adjacency matrix adjacency = lil_matrix((N, N), dtype=np.float64) for i in range(N): for j_idx in range(1, k+1): # Skip index 0 (self) j = indices[i, j_idx] dist = distances[i, j_idx] # Symmetrize: add edge in both directions adjacency[i, j] = dist adjacency[j, i] = dist return adjacency.tocsr() def verify_connectivity(adjacency): """Check if the graph is connected using BFS.""" N = adjacency.shape[0] visited = np.zeros(N, dtype=bool) queue = [0] visited[0] = True while queue: current = queue.pop(0) neighbors = adjacency[current].nonzero()[1] for neighbor in neighbors: if not visited[neighbor]: visited[neighbor] = True queue.append(neighbor) n_components = 1 if visited.all() else count_components(adjacency) return visited.all(), n_components def count_components(adjacency): """Count connected components in the graph.""" N = adjacency.shape[0] visited = np.zeros(N, dtype=bool) components = 0 for start in range(N): if not visited[start]: components += 1 queue = [start] visited[start] = True while queue: current = queue.pop(0) neighbors = adjacency[current].nonzero()[1] for neighbor in neighbors: if not visited[neighbor]: visited[neighbor] = True queue.append(neighbor) return componentsThe second approach is ε-ball neighborhoods: connect all pairs of points within distance $\epsilon$.\n\nDefinition\n\nThe ε-ball neighborhood graph has an edge between $i$ and $j$ if:\n\n$$\|\mathbf{x}_i - \mathbf{x}j\| \leq \epsilon$$\n\nwith edge weight $w{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|$. This is inherently symmetric and creates a more uniform local structure, but the number of neighbors varies with local density.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
import numpy as npfrom scipy.spatial.distance import cdistfrom scipy.sparse import lil_matrix def build_epsilon_graph(X, epsilon, metric='euclidean'): """ Build an ε-ball neighborhood graph. Parameters: X: (N, D) array of data points epsilon: Neighborhood radius metric: Distance metric (default: 'euclidean') Returns: adjacency: (N, N) sparse adjacency matrix with edge weights """ N = X.shape[0] # Compute pairwise distances # For large N, use batch processing or approximate methods distances = cdist(X, X, metric=metric) # Build sparse adjacency matrix adjacency = lil_matrix((N, N), dtype=np.float64) for i in range(N): for j in range(i + 1, N): if distances[i, j] <= epsilon: adjacency[i, j] = distances[i, j] adjacency[j, i] = distances[i, j] return adjacency.tocsr() def estimate_epsilon_from_percentile(X, percentile=5, metric='euclidean'): """ Estimate a reasonable epsilon from the distance distribution. Uses the percentile of all pairwise distances as epsilon. This ensures a certain fraction of pairs are connected. Parameters: X: (N, D) array of data points percentile: Percentile of distance distribution metric: Distance metric Returns: epsilon: Estimated neighborhood radius """ # For large datasets, use sampling N = X.shape[0] if N > 5000: sample_idx = np.random.choice(N, 5000, replace=False) X_sample = X[sample_idx] else: X_sample = X distances = cdist(X_sample, X_sample, metric=metric) # Exclude diagonal (self-distances) off_diag = distances[np.triu_indices_from(distances, k=1)] return np.percentile(off_diag, percentile) def adaptive_epsilon_estimation(X, target_neighbors=10, metric='euclidean'): """ Estimate epsilon that yields approximately target_neighbors per point. Uses binary search to find epsilon achieving desired average degree. Parameters: X: (N, D) array of data points target_neighbors: Desired average number of neighbors metric: Distance metric Returns: epsilon: Estimated neighborhood radius """ N = X.shape[0] distances = cdist(X, X, metric=metric) # Binary search for epsilon eps_low = 0 eps_high = distances.max() for _ in range(50): # Binary search iterations eps_mid = (eps_low + eps_high) / 2 avg_neighbors = np.mean(np.sum(distances <= eps_mid, axis=1)) - 1 # Exclude self if avg_neighbors < target_neighbors: eps_low = eps_mid else: eps_high = eps_mid if abs(avg_neighbors - target_neighbors) < 0.5: break return eps_midBoth k-NN and ε-ball have their merits, and the choice depends on your data characteristics and computational constraints. Here's a decision framework:
| Scenario | Recommended | Reasoning |
|---|---|---|
| Uniform sampling density | Either | Both work well; ε-ball is slightly more theoretically grounded |
| Varying density | k-NN | Adapts to local density automatically |
| Tight theoretical guarantees needed | ε-ball | Convergence theorems typically use ε |
| Unknown data scale | k-NN | Scale-invariant parameter |
| Strong prior on manifold smoothness | ε-ball | Can set ε based on expected curvature |
| Risk of disconnected components | k-NN | Guarantees minimum degree |
| Computational efficiency paramount | k-NN | Sparse graphs with predictable edge count |
In practice, k-NN is more commonly used due to its robustness to density variations and its scale-invariant parameterization. Start with k ≈ 10-20 for manifolds of moderate intrinsic dimensionality, then adjust based on connectivity and embedding quality. The scikit-learn implementation of Isomap defaults to k-NN with k=5.
The Connectivity-Accuracy Tradeoff\n\nThe neighborhood parameter (k or ε) controls a fundamental tradeoff:\n\n- Too small (sparse graph):\n - Graph may disconnect into multiple components\n - Isomap cannot compute distances between components\n - Embedding loses global structure\n\n- Too large (dense graph):\n - Edges span large distances, violating local flatness assumption\n - "Short-circuit" edges connect across the manifold\n - Geodesic estimates become inaccurate\n - Computational cost increases\n\nThe sweet spot is the smallest neighborhood that keeps the graph connected while respecting manifold curvature restrictions.
A critical issue in Isomap is that the neighborhood graph must be connected. If the graph has multiple connected components, there's no path between points in different components, and geodesic distances cannot be computed.\n\nThis can happen when:\n\n1. Neighborhood parameter is too small: Not enough edges to bridge gaps\n2. Data has isolated clusters: True clusters with large gaps between them\n3. Outliers exist: Isolated points far from the main manifold\n4. Manifold has holes: Different regions of the manifold are genuinely disconnected\n\nSeveral strategies can address this:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
import numpy as npfrom scipy.sparse import csr_matrixfrom scipy.sparse.csgraph import connected_components def ensure_connected_graph(X, initial_k=5, max_k=50): """ Build k-NN graph with minimum k that ensures connectivity. Uses binary search to find smallest k yielding a connected graph. Parameters: X: (N, D) array of data points initial_k: Starting value of k max_k: Maximum k to try Returns: adjacency: Connected sparse adjacency matrix k_final: The k value used """ k_low, k_high = initial_k, max_k k_final = max_k best_adjacency = None while k_low <= k_high: k_mid = (k_low + k_high) // 2 adjacency = build_knn_graph(X, k_mid) n_components, labels = connected_components( csgraph=adjacency, directed=False, return_labels=True ) if n_components == 1: k_final = k_mid best_adjacency = adjacency k_high = k_mid - 1 else: k_low = k_mid + 1 if best_adjacency is None: # Even max_k doesn't connect—use largest component adjacency = build_knn_graph(X, max_k) n_components, labels = connected_components(adjacency, directed=False, return_labels=True) # Find largest component component_sizes = np.bincount(labels) largest_component = np.argmax(component_sizes) mask = labels == largest_component print(f"Warning: Graph disconnected with k={max_k}. " f"Using largest component ({component_sizes[largest_component]}/{X.shape[0]} points).") return adjacency, max_k, mask return best_adjacency, k_final, np.ones(X.shape[0], dtype=bool) def adaptive_connectivity_repair(adjacency, X, max_bridges=10): """ Repair disconnected graph by adding minimum bridging edges. Connects components by adding edges between their closest points. Parameters: adjacency: Sparse adjacency matrix (possibly disconnected) X: Original data points max_bridges: Maximum number of bridging edges to add Returns: adjacency: Repaired adjacency matrix """ from scipy.spatial.distance import cdist n_components, labels = connected_components(adjacency, directed=False, return_labels=True) if n_components == 1: return adjacency # Already connected adjacency = adjacency.tolil() # Convert to LIL for efficient modification bridges_added = 0 while n_components > 1 and bridges_added < max_bridges: # Find closest pair of points in different components min_dist = np.inf bridge = None for c1 in range(n_components): for c2 in range(c1 + 1, n_components): idx1 = np.where(labels == c1)[0] idx2 = np.where(labels == c2)[0] # Compute cross-component distances dists = cdist(X[idx1], X[idx2]) i_local, j_local = np.unravel_index(np.argmin(dists), dists.shape) if dists[i_local, j_local] < min_dist: min_dist = dists[i_local, j_local] bridge = (idx1[i_local], idx2[j_local]) # Add bridging edge i, j = bridge adjacency[i, j] = min_dist adjacency[j, i] = min_dist bridges_added += 1 # Recompute components n_components, labels = connected_components( adjacency.tocsr(), directed=False, return_labels=True ) return adjacency.tocsr()Adding bridging edges to connect components is a heuristic that modifies the estimated geodesic distances. If the components represent genuinely different manifold regions, the bridge creates a 'wormhole' that doesn't correspond to any path on the true manifold. Use bridging sparingly and only when you believe disconnection is due to sampling limitations, not true manifold structure.
Graph construction is often the first computational bottleneck in Isomap. Understanding the complexity helps in choosing efficient implementations.
| Operation | Naïve Complexity | Optimized Complexity | Method |
|---|---|---|---|
| Pairwise distances | $O(N^2 D)$ | $O(N^2 D)$ | Required for brute-force |
| k-NN search (brute force) | $O(N^2 D)$ | — | Compute all distances, sort |
| k-NN search (KD-tree) | $O(N^2 D)$ | $O(N \log N \cdot D)$ (low D) | Partition space recursively |
| k-NN search (Ball tree) | $O(N^2 D)$ | $O(N \log N \cdot D)$ | Metric tree; works in high D |
| k-NN search (ANN) | $O(N^2 D)$ | $O(N \log N)$ + error | LSH, HNSW, etc. |
| ε-ball search | $O(N^2 D)$ | $O(N \log N)$ (low D) | Range trees or spatial hashing |
| Graph storage | $O(N^2)$ dense | $O(Nk)$ sparse | Sparse matrix format |
Approximate Nearest Neighbors (ANN)\n\nFor large datasets (N > 10,000), exact k-NN computation becomes prohibitively expensive. Approximate nearest neighbor algorithms trade exactness for speed:\n\n- Locality-Sensitive Hashing (LSH): Hashes similar points to the same bucket with high probability. $O(N)$ query time on average, but can miss some neighbors.\n\n- Hierarchical Navigable Small World (HNSW): Builds a multi-layer graph where each layer is a sparse navigable graph. Extremely fast queries ($O(\log N)$) with high recall.\n\n- FAISS (Facebook AI Similarity Search): GPU-accelerated ANN with various index types. Can handle billions of vectors.\n\nFor Isomap, using ANN introduces some error in edge weights, but the impact is usually small if the approximation is good (e.g., HNSW with high ef parameter).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
import numpy as np def build_knn_graph_sklearn(X, k, algorithm='auto'): """ Build k-NN graph using scikit-learn's optimized implementation. Parameters: X: (N, D) array of data points k: Number of nearest neighbors algorithm: 'auto', 'ball_tree', 'kd_tree', or 'brute' Returns: adjacency: Sparse CSR adjacency matrix """ from sklearn.neighbors import kneighbors_graph # kneighbors_graph returns a sparse matrix # mode='distance' gives edge weights as distances adjacency = kneighbors_graph( X, n_neighbors=k, mode='distance', include_self=False, algorithm=algorithm ) # Symmetrize the graph adjacency = adjacency + adjacency.T # Remove duplicate edges (take min for edge weight) adjacency = adjacency.minimum(adjacency.T) return adjacency def build_knn_graph_annoy(X, k, n_trees=50): """ Build approximate k-NN graph using Annoy. Parameters: X: (N, D) array of data points k: Number of nearest neighbors n_trees: Number of trees in the forest (higher = more accurate) Returns: adjacency: Sparse CSR adjacency matrix """ from annoy import AnnoyIndex from scipy.sparse import lil_matrix N, D = X.shape # Build Annoy index index = AnnoyIndex(D, metric='euclidean') for i in range(N): index.add_item(i, X[i]) index.build(n_trees) # Query for neighbors adjacency = lil_matrix((N, N), dtype=np.float64) for i in range(N): neighbors, distances = index.get_nns_by_item(i, k+1, include_distances=True) for j, d in zip(neighbors[1:], distances[1:]): # Skip self adjacency[i, j] = d adjacency[j, i] = d return adjacency.tocsr() def build_knn_graph_pynndescent(X, k, n_neighbors=None): """ Build approximate k-NN graph using PyNNDescent (HNSW-like). This is one of the fastest approximate methods available. Parameters: X: (N, D) array of data points k: Number of nearest neighbors n_neighbors: Internal parameter (default: k) Returns: adjacency: Sparse CSR adjacency matrix """ from pynndescent import NNDescent from scipy.sparse import lil_matrix N = X.shape[0] # Build approximate k-NN index index = NNDescent(X, n_neighbors=k+1, metric='euclidean') indices, distances = index.query(X, k=k+1) # Build sparse adjacency adjacency = lil_matrix((N, N), dtype=np.float64) for i in range(N): for j_idx in range(1, k+1): # Skip self j = indices[i, j_idx] d = distances[i, j_idx] adjacency[i, j] = d adjacency[j, i] = d return adjacency.tocsr()For N < 10,000: Use scikit-learn's exact methods (ball_tree or kd_tree). For N < 100,000: Use PyNNDescent or Annoy for approximate k-NN. For N > 100,000: Consider Landmark Isomap (next page) or FAISS for GPU-accelerated search. The graph construction step should typically be much faster than the subsequent shortest-path computation.
For the graph-based geodesic approximation to converge to true geodesic distances, several theoretical conditions must hold. Understanding these helps diagnose when Isomap might fail.
Condition 1: Sufficient Sampling Density\n\nThe sampling density must be high enough that local neighborhoods faithfully represent the manifold. Formally, if $\rho$ is the minimum distance between any point on the manifold and its nearest sample point, we need:\n\n$$\rho \ll \text{reach}(\mathcal{M})$$\n\nwhere reach is the distance from the manifold to its "medial axis" (the set of points with multiple closest points on the manifold). This ensures samples capture the manifold's shape without aliasing.\n\nCondition 2: Neighborhood Size Bounds\n\nThe neighborhood parameter must satisfy:\n\n$$\rho \ll \epsilon \ll \text{reach}(\mathcal{M})$$\n\nThe lower bound ensures connectivity (neighborhoods are larger than sampling gaps). The upper bound ensures local flatness (neighborhoods don't span too much curvature).
Condition 3: Geodesic Convexity\n\nThe manifold should be geodesically convex: the geodesic between any two points should lie entirely within the manifold. Equivalently, there should be no "holes" or strong concavities.\n\nViolating this condition is the primary source of Isomap failures:\n\n- A manifold shaped like a "C" has geodesics that would need to leave the manifold\n- A Swiss Roll with a hole can have short-circuit paths through the hole\n- Multiple disjoint manifolds obviously violate convexity\n\nCondition 4: Bounded Curvature\n\nThe manifold's curvature should be bounded relative to the neighborhood size. If curvature is too high:\n\n- Local Euclidean approximations become inaccurate\n- Geodesics can be significantly longer than Euclidean distances even for nearby points\n- The shortest path in the graph may deviate significantly from the true geodesic
| Symptom | Likely Cause | Diagnostic | Remedy |
|---|---|---|---|
| Disconnected embedding | Insufficient k/ε | Check component count | Increase neighborhood size |
| Distorted embedding | Short-circuits | Check for cross-manifold edges | Decrease neighborhood size |
| Unstable to k changes | Borderline connectivity | Vary k and compare | Improve sampling or use robust method |
| Poor reconstruction | High curvature | Check residual variance | Consider local methods (LLE, t-SNE) |
| Collapsed dimensions | Intrinsic dim mismatch | Use residual variance plot | Adjust target dimension |
Isomap's theoretical guarantees assume an idealized setting that real data rarely satisfies perfectly. In practice, some distortion is inevitable. The question is whether the distortion is acceptable for your application. For exploratory visualization, moderate distortion is fine. For precise geometric reconstruction, you may need denser sampling or alternative methods.
We've covered the critical first step of Isomap: constructing the neighborhood graph that enables geodesic approximation. Let's consolidate the key insights:
The next page covers Landmark Isomap—a computational optimization that makes Isomap scalable to large datasets. By computing geodesic distances only from a subset of 'landmark' points, Landmark Isomap reduces the O(N³) complexity of classical Isomap to O(LN²) where L ≪ N is the number of landmarks.