Machine LearningIsomap

Isomap: Isometric Feature Mapping

LevelAdvanced

Duration90 mins

TopicIsomap

2 / 5

Graph Construction: Building the Neighborhood Network

From Points to Paths

In the previous page, we established that geodesic distances—the shortest paths along a manifold—are the right metric for manifold learning. But we face a fundamental challenge: we don't have the manifold itself, only a finite sample of points from it.\n\nThe brilliant insight of Isomap is to approximate the continuous manifold with a discrete neighborhood graph. Each data point becomes a node, and edges connect nearby points. The geodesic distance between distant points is then approximated by the shortest path through this graph—a sum of local Euclidean distances.\n\nThis page covers the critical question: How do we decide which points are 'nearby'? The answer determines whether Isomap succeeds or fails, and getting it right requires understanding the tradeoffs between different neighborhood definitions.

What You Will Learn

By the end of this page, you will understand: (1) the two main approaches to defining neighborhoods (k-NN and ε-ball), (2) how to choose neighborhood size to balance local accuracy against global connectivity, (3) how to handle disconnected graphs, (4) computational considerations for large datasets, and (5) the theoretical requirements for faithful geodesic approximation.

The Neighborhood Graph: Discretizing the Manifold

The neighborhood graph $G = (V, E, W)$ is a weighted graph where:\n\n- Vertices $V$: Each data point $\mathbf{x}i$ becomes a vertex\n- Edges $E$: Connect "nearby" points (definition to be determined)\n- Weights $W$: Edge weights are Euclidean distances: $w{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|$\n\nThe key insight is that locally, on a smooth manifold, Euclidean distance approximates geodesic distance. So edges between nearby points have weights that are good approximations of the geodesic distance between those points.\n\nFor distant points not directly connected, we compute the geodesic approximation as the shortest path through the graph. This chains together local Euclidean approximations to estimate the global geodesic.

Converting Mermaid diagram...

The Path Approximation\n\nFor two points $\mathbf{x}_i$ and $\mathbf{x}_j$ not directly connected by an edge, the geodesic distance is approximated by:\n\n$$\hat{d}G(\mathbf{x}i, \mathbf{x}j) = \min{\text{paths } P: i \to j} \sum{(u,v) \in P} w{uv}$$\n\nThis is precisely the shortest path problem in weighted graphs, solvable by algorithms like Dijkstra's or Floyd-Warshall. The quality of this approximation depends critically on how we define "nearby"—the neighborhood criterion.

Why This Approximation Works

The approximation works because smooth manifolds look 'locally flat.' If we choose neighborhoods small enough, local Euclidean distances are nearly equal to local geodesic distances. Chaining these local approximations via shortest paths reconstructs the global geodesic structure. The key is balancing neighborhood size: too small and the graph disconnects; too large and we 'short-circuit' across the manifold.

k-Nearest Neighbors (k-NN) Neighborhoods

The first approach to defining neighborhoods is k-Nearest Neighbors (k-NN): connect each point to its $k$ closest points in Euclidean distance.\n\nDefinition\n\nFor each point $\mathbf{x}_i$, let $\mathcal{N}_k(i)$ be the set of indices of its $k$ nearest neighbors (excluding itself). The k-NN neighborhood graph has an edge between $i$ and $j$ if:\n\n$$j \in \mathcal{N}_k(i) \text{ or } i \in \mathcal{N}_k(j)$$\n\nNote that we typically symmetrize the neighborhood: if $i$ considers $j$ a neighbor, then there's an edge between them regardless of whether $j$ considers $i$ a neighbor. This ensures the graph is undirected.

knn_graph_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from scipy.spatial.distance import cdist
from scipy.sparse import lil_matrix
from sklearn.neighbors import NearestNeighbors
 
def build_knn_graph(X, k, metric='euclidean'):
    """
    Build a symmetrized k-NN neighborhood graph.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        metric: Distance metric (default: 'euclidean')
    
    Returns:
        adjacency: (N, N) sparse adjacency matrix with edge weights
    """
    N = X.shape[0]
    
    # Find k nearest neighbors for each point
    nn = NearestNeighbors(n_neighbors=k+1, metric=metric)  # +1 to exclude self
    nn.fit(X)
    distances, indices = nn.kneighbors(X)
    
    # Build sparse adjacency matrix
    adjacency = lil_matrix((N, N), dtype=np.float64)
    
    for i in range(N):
        for j_idx in range(1, k+1):  # Skip index 0 (self)
            j = indices[i, j_idx]
            dist = distances[i, j_idx]
            
            # Symmetrize: add edge in both directions
            adjacency[i, j] = dist
            adjacency[j, i] = dist
    
    return adjacency.tocsr()
 
def verify_connectivity(adjacency):
    """Check if the graph is connected using BFS."""
    N = adjacency.shape[0]
    visited = np.zeros(N, dtype=bool)
    queue = [0]
    visited[0] = True
    
    while queue:
        current = queue.pop(0)
        neighbors = adjacency[current].nonzero()[1]
        for neighbor in neighbors:
            if not visited[neighbor]:
                visited[neighbor] = True
                queue.append(neighbor)
    
    n_components = 1 if visited.all() else count_components(adjacency)
    return visited.all(), n_components
 
def count_components(adjacency):
    """Count connected components in the graph."""
    N = adjacency.shape[0]
    visited = np.zeros(N, dtype=bool)
    components = 0
    
    for start in range(N):
        if not visited[start]:
            components += 1
            queue = [start]
            visited[start] = True
            while queue:
                current = queue.pop(0)
                neighbors = adjacency[current].nonzero()[1]
                for neighbor in neighbors:
                    if not visited[neighbor]:
                        visited[neighbor] = True
                        queue.append(neighbor)
    
    return components

Advantages of k-NN Neighborhoods

•Adapts to local density: In sparse regions, edges span larger distances; in dense regions, edges are shorter. This automatically adjusts to varying sampling density.
•Consistent degree: Each vertex has at least k outgoing edges (though incoming degree can vary after symmetrization). This provides some guarantee of connectivity.
•Scale-invariant parameter: The parameter k is a count, not a distance, making it easier to set without knowing the data's scale.
•Robust to outliers: A single outlier doesn't create spurious long-range edges (it just connects to its k closest points).

Disadvantages of k-NN Neighborhoods

•Can create short-circuits: In regions where different parts of the manifold come close (like layers of a Swiss Roll), k-NN may connect across the gap, violating the manifold structure.
•Asymmetry before symmetrization: The raw k-NN relation is not symmetric, requiring post-hoc symmetrization which can increase effective neighborhood size.
•Sensitive to k choice: Too small k leads to disconnected components; too large k creates short-circuits. The right k depends on intrinsic dimensionality and curvature.

ε-Ball (Radius) Neighborhoods

The second approach is ε-ball neighborhoods: connect all pairs of points within distance $\epsilon$.\n\nDefinition\n\nThe ε-ball neighborhood graph has an edge between $i$ and $j$ if:\n\n$$\|\mathbf{x}_i - \mathbf{x}j\| \leq \epsilon$$\n\nwith edge weight $w{ij} = \|\mathbf{x}_i - \mathbf{x}_j\|$. This is inherently symmetric and creates a more uniform local structure, but the number of neighbors varies with local density.

epsilon_ball_graph_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
from scipy.spatial.distance import cdist
from scipy.sparse import lil_matrix
 
def build_epsilon_graph(X, epsilon, metric='euclidean'):
    """
    Build an ε-ball neighborhood graph.
    
    Parameters:
        X: (N, D) array of data points
        epsilon: Neighborhood radius
        metric: Distance metric (default: 'euclidean')
    
    Returns:
        adjacency: (N, N) sparse adjacency matrix with edge weights
    """
    N = X.shape[0]
    
    # Compute pairwise distances
    # For large N, use batch processing or approximate methods
    distances = cdist(X, X, metric=metric)
    
    # Build sparse adjacency matrix
    adjacency = lil_matrix((N, N), dtype=np.float64)
    
    for i in range(N):
        for j in range(i + 1, N):
            if distances[i, j] <= epsilon:
                adjacency[i, j] = distances[i, j]
                adjacency[j, i] = distances[i, j]
    
    return adjacency.tocsr()
 
def estimate_epsilon_from_percentile(X, percentile=5, metric='euclidean'):
    """
    Estimate a reasonable epsilon from the distance distribution.
    
    Uses the percentile of all pairwise distances as epsilon.
    This ensures a certain fraction of pairs are connected.
    
    Parameters:
        X: (N, D) array of data points
        percentile: Percentile of distance distribution
        metric: Distance metric
    
    Returns:
        epsilon: Estimated neighborhood radius
    """
    # For large datasets, use sampling
    N = X.shape[0]
    if N > 5000:
        sample_idx = np.random.choice(N, 5000, replace=False)
        X_sample = X[sample_idx]
    else:
        X_sample = X
    
    distances = cdist(X_sample, X_sample, metric=metric)
    
    # Exclude diagonal (self-distances)
    off_diag = distances[np.triu_indices_from(distances, k=1)]
    
    return np.percentile(off_diag, percentile)
 
def adaptive_epsilon_estimation(X, target_neighbors=10, metric='euclidean'):
    """
    Estimate epsilon that yields approximately target_neighbors per point.
    
    Uses binary search to find epsilon achieving desired average degree.
    
    Parameters:
        X: (N, D) array of data points
        target_neighbors: Desired average number of neighbors
        metric: Distance metric
    
    Returns:
        epsilon: Estimated neighborhood radius
    """
    N = X.shape[0]
    distances = cdist(X, X, metric=metric)
    
    # Binary search for epsilon
    eps_low = 0
    eps_high = distances.max()
    
    for _ in range(50):  # Binary search iterations
        eps_mid = (eps_low + eps_high) / 2
        avg_neighbors = np.mean(np.sum(distances <= eps_mid, axis=1)) - 1  # Exclude self
        
        if avg_neighbors < target_neighbors:
            eps_low = eps_mid
        else:
            eps_high = eps_mid
        
        if abs(avg_neighbors - target_neighbors) < 0.5:
            break
    
    return eps_mid

Advantages of ε-Ball Neighborhoods

•Geometric interpretation: ε directly controls the scale at which local structure is preserved. All edges represent distances less than ε.
•Inherently symmetric: No need for post-hoc symmetrization; the neighborhood relation is naturally symmetric.
•Theoretical grounding: Convergence guarantees for Isomap are often stated in terms of ε (neighborhood radius) rather than k.
•Uniform edge weights: Bounded edge weights (all ≤ ε) can provide numerical stability in some applications.

Disadvantages of ε-Ball Neighborhoods

•Sensitive to density variations: In sparse regions, points may have no neighbors; in dense regions, points may have many. This can lead to disconnected graphs or dense subgraphs.
•Scale-dependent parameter: The right ε depends on the overall scale of the data, requiring normalization or careful tuning.
•Can disconnect the graph: Unlike k-NN, there's no guarantee of minimum degree—isolated points or components are possible.
•Quadratic in edge count: In dense regions with small ε, the number of edges can approach O(N²).

Choosing the Right Neighborhood Definition

Both k-NN and ε-ball have their merits, and the choice depends on your data characteristics and computational constraints. Here's a decision framework:

k-NN vs. ε-Ball: Decision Guide
Scenario	Recommended	Reasoning
Uniform sampling density	Either	Both work well; ε-ball is slightly more theoretically grounded
Varying density	k-NN	Adapts to local density automatically
Tight theoretical guarantees needed	ε-ball	Convergence theorems typically use ε
Unknown data scale	k-NN	Scale-invariant parameter
Strong prior on manifold smoothness	ε-ball	Can set ε based on expected curvature
Risk of disconnected components	k-NN	Guarantees minimum degree
Computational efficiency paramount	k-NN	Sparse graphs with predictable edge count

Practical Recommendation

In practice, k-NN is more commonly used due to its robustness to density variations and its scale-invariant parameterization. Start with k ≈ 10-20 for manifolds of moderate intrinsic dimensionality, then adjust based on connectivity and embedding quality. The scikit-learn implementation of Isomap defaults to k-NN with k=5.

The Connectivity-Accuracy Tradeoff\n\nThe neighborhood parameter (k or ε) controls a fundamental tradeoff:\n\n- Too small (sparse graph):\n - Graph may disconnect into multiple components\n - Isomap cannot compute distances between components\n - Embedding loses global structure\n\n- Too large (dense graph):\n - Edges span large distances, violating local flatness assumption\n - "Short-circuit" edges connect across the manifold\n - Geodesic estimates become inaccurate\n - Computational cost increases\n\nThe sweet spot is the smallest neighborhood that keeps the graph connected while respecting manifold curvature restrictions.

Handling Disconnected Graphs

A critical issue in Isomap is that the neighborhood graph must be connected. If the graph has multiple connected components, there's no path between points in different components, and geodesic distances cannot be computed.\n\nThis can happen when:\n\n1. Neighborhood parameter is too small: Not enough edges to bridge gaps\n2. Data has isolated clusters: True clusters with large gaps between them\n3. Outliers exist: Isolated points far from the main manifold\n4. Manifold has holes: Different regions of the manifold are genuinely disconnected\n\nSeveral strategies can address this:

Strategies for Disconnected Graphs

•Increase k or ε: The simplest solution—increase the neighborhood size until the graph connects. Use binary search to find the minimum value that achieves connectivity.
•Process largest component only: If the manifold genuinely has disconnected parts, embed only the largest connected component. Other components can be handled separately or flagged.
•Add bridging edges: Connect components with minimum-length bridging edges. This is heuristic but can work if disconnections are due to sampling gaps rather than true manifold structure.
•Use mutual k-NN + k-NN union: Start with mutual k-NN (both points consider each other neighbors), then add regular k-NN edges until connected. This balances strictness with connectivity.
•Density-aware connectivity: In sparse regions, use larger effective neighborhoods. Methods like HLLE (Hessian LLE) and LTSA use local PCA to estimate intrinsic dimensionality and adapt connectivity.

handle_disconnected_graph.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import connected_components
 
def ensure_connected_graph(X, initial_k=5, max_k=50):
    """
    Build k-NN graph with minimum k that ensures connectivity.
    
    Uses binary search to find smallest k yielding a connected graph.
    
    Parameters:
        X: (N, D) array of data points
        initial_k: Starting value of k
        max_k: Maximum k to try
    
    Returns:
        adjacency: Connected sparse adjacency matrix
        k_final: The k value used
    """
    k_low, k_high = initial_k, max_k
    k_final = max_k
    best_adjacency = None
    
    while k_low <= k_high:
        k_mid = (k_low + k_high) // 2
        adjacency = build_knn_graph(X, k_mid)
        
        n_components, labels = connected_components(
            csgraph=adjacency, 
            directed=False, 
            return_labels=True
        )
        
        if n_components == 1:
            k_final = k_mid
            best_adjacency = adjacency
            k_high = k_mid - 1
        else:
            k_low = k_mid + 1
    
    if best_adjacency is None:
        # Even max_k doesn't connect—use largest component
        adjacency = build_knn_graph(X, max_k)
        n_components, labels = connected_components(adjacency, directed=False, return_labels=True)
        
        # Find largest component
        component_sizes = np.bincount(labels)
        largest_component = np.argmax(component_sizes)
        mask = labels == largest_component
        
        print(f"Warning: Graph disconnected with k={max_k}. "
              f"Using largest component ({component_sizes[largest_component]}/{X.shape[0]} points).")
        
        return adjacency, max_k, mask
    
    return best_adjacency, k_final, np.ones(X.shape[0], dtype=bool)
 
def adaptive_connectivity_repair(adjacency, X, max_bridges=10):
    """
    Repair disconnected graph by adding minimum bridging edges.
    
    Connects components by adding edges between their closest points.
    
    Parameters:
        adjacency: Sparse adjacency matrix (possibly disconnected)
        X: Original data points
        max_bridges: Maximum number of bridging edges to add
    
    Returns:
        adjacency: Repaired adjacency matrix
    """
    from scipy.spatial.distance import cdist
    
    n_components, labels = connected_components(adjacency, directed=False, return_labels=True)
    
    if n_components == 1:
        return adjacency  # Already connected
    
    adjacency = adjacency.tolil()  # Convert to LIL for efficient modification
    
    bridges_added = 0
    while n_components > 1 and bridges_added < max_bridges:
        # Find closest pair of points in different components
        min_dist = np.inf
        bridge = None
        
        for c1 in range(n_components):
            for c2 in range(c1 + 1, n_components):
                idx1 = np.where(labels == c1)[0]
                idx2 = np.where(labels == c2)[0]
                
                # Compute cross-component distances
                dists = cdist(X[idx1], X[idx2])
                i_local, j_local = np.unravel_index(np.argmin(dists), dists.shape)
                
                if dists[i_local, j_local] < min_dist:
                    min_dist = dists[i_local, j_local]
                    bridge = (idx1[i_local], idx2[j_local])
        
        # Add bridging edge
        i, j = bridge
        adjacency[i, j] = min_dist
        adjacency[j, i] = min_dist
        bridges_added += 1
        
        # Recompute components
        n_components, labels = connected_components(
            adjacency.tocsr(), directed=False, return_labels=True
        )
    
    return adjacency.tocsr()

Bridging Can Introduce Distortion

Adding bridging edges to connect components is a heuristic that modifies the estimated geodesic distances. If the components represent genuinely different manifold regions, the bridge creates a 'wormhole' that doesn't correspond to any path on the true manifold. Use bridging sparingly and only when you believe disconnection is due to sampling limitations, not true manifold structure.

Computational Complexity and Scalability

Graph construction is often the first computational bottleneck in Isomap. Understanding the complexity helps in choosing efficient implementations.

Computational Complexity of Graph Construction
Operation	Naïve Complexity	Optimized Complexity	Method
Pairwise distances	$O(N^2 D)$	$O(N^2 D)$	Required for brute-force
k-NN search (brute force)	$O(N^2 D)$	—	Compute all distances, sort
k-NN search (KD-tree)	$O(N^2 D)$	$O(N \log N \cdot D)$ (low D)	Partition space recursively
k-NN search (Ball tree)	$O(N^2 D)$	$O(N \log N \cdot D)$	Metric tree; works in high D
k-NN search (ANN)	$O(N^2 D)$	$O(N \log N)$ + error	LSH, HNSW, etc.
ε-ball search	$O(N^2 D)$	$O(N \log N)$ (low D)	Range trees or spatial hashing
Graph storage	$O(N^2)$ dense	$O(Nk)$ sparse	Sparse matrix format

Approximate Nearest Neighbors (ANN)\n\nFor large datasets (N > 10,000), exact k-NN computation becomes prohibitively expensive. Approximate nearest neighbor algorithms trade exactness for speed:\n\n- Locality-Sensitive Hashing (LSH): Hashes similar points to the same bucket with high probability. $O(N)$ query time on average, but can miss some neighbors.\n\n- Hierarchical Navigable Small World (HNSW): Builds a multi-layer graph where each layer is a sparse navigable graph. Extremely fast queries ($O(\log N)$) with high recall.\n\n- FAISS (Facebook AI Similarity Search): GPU-accelerated ANN with various index types. Can handle billions of vectors.\n\nFor Isomap, using ANN introduces some error in edge weights, but the impact is usually small if the approximation is good (e.g., HNSW with high ef parameter).

efficient_graph_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import numpy as np
 
def build_knn_graph_sklearn(X, k, algorithm='auto'):
    """
    Build k-NN graph using scikit-learn's optimized implementation.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        algorithm: 'auto', 'ball_tree', 'kd_tree', or 'brute'
    
    Returns:
        adjacency: Sparse CSR adjacency matrix
    """
    from sklearn.neighbors import kneighbors_graph
    
    # kneighbors_graph returns a sparse matrix
    # mode='distance' gives edge weights as distances
    adjacency = kneighbors_graph(
        X, 
        n_neighbors=k, 
        mode='distance',
        include_self=False,
        algorithm=algorithm
    )
    
    # Symmetrize the graph
    adjacency = adjacency + adjacency.T
    
    # Remove duplicate edges (take min for edge weight)
    adjacency = adjacency.minimum(adjacency.T)
    
    return adjacency
 
def build_knn_graph_annoy(X, k, n_trees=50):
    """
    Build approximate k-NN graph using Annoy.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        n_trees: Number of trees in the forest (higher = more accurate)
    
    Returns:
        adjacency: Sparse CSR adjacency matrix
    """
    from annoy import AnnoyIndex
    from scipy.sparse import lil_matrix
    
    N, D = X.shape
    
    # Build Annoy index
    index = AnnoyIndex(D, metric='euclidean')
    for i in range(N):
        index.add_item(i, X[i])
    index.build(n_trees)
    
    # Query for neighbors
    adjacency = lil_matrix((N, N), dtype=np.float64)
    for i in range(N):
        neighbors, distances = index.get_nns_by_item(i, k+1, include_distances=True)
        for j, d in zip(neighbors[1:], distances[1:]):  # Skip self
            adjacency[i, j] = d
            adjacency[j, i] = d
    
    return adjacency.tocsr()
 
def build_knn_graph_pynndescent(X, k, n_neighbors=None):
    """
    Build approximate k-NN graph using PyNNDescent (HNSW-like).
    
    This is one of the fastest approximate methods available.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        n_neighbors: Internal parameter (default: k)
    
    Returns:
        adjacency: Sparse CSR adjacency matrix
    """
    from pynndescent import NNDescent
    from scipy.sparse import lil_matrix
    
    N = X.shape[0]
    
    # Build approximate k-NN index
    index = NNDescent(X, n_neighbors=k+1, metric='euclidean')
    indices, distances = index.query(X, k=k+1)
    
    # Build sparse adjacency
    adjacency = lil_matrix((N, N), dtype=np.float64)
    for i in range(N):
        for j_idx in range(1, k+1):  # Skip self
            j = indices[i, j_idx]
            d = distances[i, j_idx]
            adjacency[i, j] = d
            adjacency[j, i] = d
    
    return adjacency.tocsr()

Practical Scaling Advice

For N < 10,000: Use scikit-learn's exact methods (ball_tree or kd_tree). For N < 100,000: Use PyNNDescent or Annoy for approximate k-NN. For N > 100,000: Consider Landmark Isomap (next page) or FAISS for GPU-accelerated search. The graph construction step should typically be much faster than the subsequent shortest-path computation.

Theoretical Requirements for Faithful Approximation

For the graph-based geodesic approximation to converge to true geodesic distances, several theoretical conditions must hold. Understanding these helps diagnose when Isomap might fail.

Condition 1: Sufficient Sampling Density\n\nThe sampling density must be high enough that local neighborhoods faithfully represent the manifold. Formally, if $\rho$ is the minimum distance between any point on the manifold and its nearest sample point, we need:\n\n$$\rho \ll \text{reach}(\mathcal{M})$$\n\nwhere reach is the distance from the manifold to its "medial axis" (the set of points with multiple closest points on the manifold). This ensures samples capture the manifold's shape without aliasing.\n\nCondition 2: Neighborhood Size Bounds\n\nThe neighborhood parameter must satisfy:\n\n$$\rho \ll \epsilon \ll \text{reach}(\mathcal{M})$$\n\nThe lower bound ensures connectivity (neighborhoods are larger than sampling gaps). The upper bound ensures local flatness (neighborhoods don't span too much curvature).

Condition 3: Geodesic Convexity\n\nThe manifold should be geodesically convex: the geodesic between any two points should lie entirely within the manifold. Equivalently, there should be no "holes" or strong concavities.\n\nViolating this condition is the primary source of Isomap failures:\n\n- A manifold shaped like a "C" has geodesics that would need to leave the manifold\n- A Swiss Roll with a hole can have short-circuit paths through the hole\n- Multiple disjoint manifolds obviously violate convexity\n\nCondition 4: Bounded Curvature\n\nThe manifold's curvature should be bounded relative to the neighborhood size. If curvature is too high:\n\n- Local Euclidean approximations become inaccurate\n- Geodesics can be significantly longer than Euclidean distances even for nearby points\n- The shortest path in the graph may deviate significantly from the true geodesic

Diagnosing Isomap Failures
Symptom	Likely Cause	Diagnostic	Remedy
Disconnected embedding	Insufficient k/ε	Check component count	Increase neighborhood size
Distorted embedding	Short-circuits	Check for cross-manifold edges	Decrease neighborhood size
Unstable to k changes	Borderline connectivity	Vary k and compare	Improve sampling or use robust method
Poor reconstruction	High curvature	Check residual variance	Consider local methods (LLE, t-SNE)
Collapsed dimensions	Intrinsic dim mismatch	Use residual variance plot	Adjust target dimension

When Isomap Theory Breaks Down

Isomap's theoretical guarantees assume an idealized setting that real data rarely satisfies perfectly. In practice, some distortion is inevitable. The question is whether the distortion is acceptable for your application. For exploratory visualization, moderate distortion is fine. For precise geometric reconstruction, you may need denser sampling or alternative methods.

Summary: Building the Geodesic Approximation

We've covered the critical first step of Isomap: constructing the neighborhood graph that enables geodesic approximation. Let's consolidate the key insights:

Key Takeaways

•The neighborhood graph discretizes the manifold — vertices are data points, edges connect nearby points with weights equal to Euclidean distances.
•Two main approaches: k-NN and ε-ball — k-NN adapts to local density and is more robust; ε-ball has cleaner theoretical properties.
•Connectivity-accuracy tradeoff — small neighborhoods preserve locality but may disconnect; large neighborhoods connect but may short-circuit.
•The graph must be connected — use adaptive k, process largest component, or add bridging edges if disconnected.
•Computational efficiency matters — use spatial data structures (ball trees) for moderate N, approximate methods (HNSW) for large N.
•Theoretical guarantees require conditions — sufficient sampling, bounded curvature, geodesic convexity, and appropriate neighborhood size.
•Practical heuristic: start with k=5-20 — increase if disconnected, decrease if short-circuits are suspected.

What's Next

The next page covers Landmark Isomap—a computational optimization that makes Isomap scalable to large datasets. By computing geodesic distances only from a subset of 'landmark' points, Landmark Isomap reduces the O(N³) complexity of classical Isomap to O(LN²) where L ≪ N is the number of landmarks.

2 / 5

Loading learning content...

Machine LearningIsomap

Isomap: Isometric Feature Mapping

LevelAdvanced

Duration90 mins

TopicIsomap

2 / 5

Graph Construction: Building the Neighborhood Network

From Points to Paths

What You Will Learn

The Neighborhood Graph: Discretizing the Manifold

Converting Mermaid diagram...

Why This Approximation Works

k-Nearest Neighbors (k-NN) Neighborhoods

knn_graph_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import numpy as np
from scipy.spatial.distance import cdist
from scipy.sparse import lil_matrix
from sklearn.neighbors import NearestNeighbors
 
def build_knn_graph(X, k, metric='euclidean'):
    """
    Build a symmetrized k-NN neighborhood graph.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        metric: Distance metric (default: 'euclidean')
    
    Returns:
        adjacency: (N, N) sparse adjacency matrix with edge weights
    """
    N = X.shape[0]
    
    # Find k nearest neighbors for each point
    nn = NearestNeighbors(n_neighbors=k+1, metric=metric)  # +1 to exclude self
    nn.fit(X)
    distances, indices = nn.kneighbors(X)
    
    # Build sparse adjacency matrix
    adjacency = lil_matrix((N, N), dtype=np.float64)
    
    for i in range(N):
        for j_idx in range(1, k+1):  # Skip index 0 (self)
            j = indices[i, j_idx]
            dist = distances[i, j_idx]
            
            # Symmetrize: add edge in both directions
            adjacency[i, j] = dist
            adjacency[j, i] = dist
    
    return adjacency.tocsr()
 
def verify_connectivity(adjacency):
    """Check if the graph is connected using BFS."""
    N = adjacency.shape[0]
    visited = np.zeros(N, dtype=bool)
    queue = [0]
    visited[0] = True
    
    while queue:
        current = queue.pop(0)
        neighbors = adjacency[current].nonzero()[1]
        for neighbor in neighbors:
            if not visited[neighbor]:
                visited[neighbor] = True
                queue.append(neighbor)
    
    n_components = 1 if visited.all() else count_components(adjacency)
    return visited.all(), n_components
 
def count_components(adjacency):
    """Count connected components in the graph."""
    N = adjacency.shape[0]
    visited = np.zeros(N, dtype=bool)
    components = 0
    
    for start in range(N):
        if not visited[start]:
            components += 1
            queue = [start]
            visited[start] = True
            while queue:
                current = queue.pop(0)
                neighbors = adjacency[current].nonzero()[1]
                for neighbor in neighbors:
                    if not visited[neighbor]:
                        visited[neighbor] = True
                        queue.append(neighbor)
    
    return components

Advantages of k-NN Neighborhoods

•Adapts to local density: In sparse regions, edges span larger distances; in dense regions, edges are shorter. This automatically adjusts to varying sampling density.
•Consistent degree: Each vertex has at least k outgoing edges (though incoming degree can vary after symmetrization). This provides some guarantee of connectivity.
•Scale-invariant parameter: The parameter k is a count, not a distance, making it easier to set without knowing the data's scale.
•Robust to outliers: A single outlier doesn't create spurious long-range edges (it just connects to its k closest points).

Disadvantages of k-NN Neighborhoods

•Can create short-circuits: In regions where different parts of the manifold come close (like layers of a Swiss Roll), k-NN may connect across the gap, violating the manifold structure.
•Asymmetry before symmetrization: The raw k-NN relation is not symmetric, requiring post-hoc symmetrization which can increase effective neighborhood size.
•Sensitive to k choice: Too small k leads to disconnected components; too large k creates short-circuits. The right k depends on intrinsic dimensionality and curvature.

ε-Ball (Radius) Neighborhoods

epsilon_ball_graph_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import numpy as np
from scipy.spatial.distance import cdist
from scipy.sparse import lil_matrix
 
def build_epsilon_graph(X, epsilon, metric='euclidean'):
    """
    Build an ε-ball neighborhood graph.
    
    Parameters:
        X: (N, D) array of data points
        epsilon: Neighborhood radius
        metric: Distance metric (default: 'euclidean')
    
    Returns:
        adjacency: (N, N) sparse adjacency matrix with edge weights
    """
    N = X.shape[0]
    
    # Compute pairwise distances
    # For large N, use batch processing or approximate methods
    distances = cdist(X, X, metric=metric)
    
    # Build sparse adjacency matrix
    adjacency = lil_matrix((N, N), dtype=np.float64)
    
    for i in range(N):
        for j in range(i + 1, N):
            if distances[i, j] <= epsilon:
                adjacency[i, j] = distances[i, j]
                adjacency[j, i] = distances[i, j]
    
    return adjacency.tocsr()
 
def estimate_epsilon_from_percentile(X, percentile=5, metric='euclidean'):
    """
    Estimate a reasonable epsilon from the distance distribution.
    
    Uses the percentile of all pairwise distances as epsilon.
    This ensures a certain fraction of pairs are connected.
    
    Parameters:
        X: (N, D) array of data points
        percentile: Percentile of distance distribution
        metric: Distance metric
    
    Returns:
        epsilon: Estimated neighborhood radius
    """
    # For large datasets, use sampling
    N = X.shape[0]
    if N > 5000:
        sample_idx = np.random.choice(N, 5000, replace=False)
        X_sample = X[sample_idx]
    else:
        X_sample = X
    
    distances = cdist(X_sample, X_sample, metric=metric)
    
    # Exclude diagonal (self-distances)
    off_diag = distances[np.triu_indices_from(distances, k=1)]
    
    return np.percentile(off_diag, percentile)
 
def adaptive_epsilon_estimation(X, target_neighbors=10, metric='euclidean'):
    """
    Estimate epsilon that yields approximately target_neighbors per point.
    
    Uses binary search to find epsilon achieving desired average degree.
    
    Parameters:
        X: (N, D) array of data points
        target_neighbors: Desired average number of neighbors
        metric: Distance metric
    
    Returns:
        epsilon: Estimated neighborhood radius
    """
    N = X.shape[0]
    distances = cdist(X, X, metric=metric)
    
    # Binary search for epsilon
    eps_low = 0
    eps_high = distances.max()
    
    for _ in range(50):  # Binary search iterations
        eps_mid = (eps_low + eps_high) / 2
        avg_neighbors = np.mean(np.sum(distances <= eps_mid, axis=1)) - 1  # Exclude self
        
        if avg_neighbors < target_neighbors:
            eps_low = eps_mid
        else:
            eps_high = eps_mid
        
        if abs(avg_neighbors - target_neighbors) < 0.5:
            break
    
    return eps_mid

Advantages of ε-Ball Neighborhoods

•Geometric interpretation: ε directly controls the scale at which local structure is preserved. All edges represent distances less than ε.
•Inherently symmetric: No need for post-hoc symmetrization; the neighborhood relation is naturally symmetric.
•Theoretical grounding: Convergence guarantees for Isomap are often stated in terms of ε (neighborhood radius) rather than k.
•Uniform edge weights: Bounded edge weights (all ≤ ε) can provide numerical stability in some applications.

Disadvantages of ε-Ball Neighborhoods

•Sensitive to density variations: In sparse regions, points may have no neighbors; in dense regions, points may have many. This can lead to disconnected graphs or dense subgraphs.
•Scale-dependent parameter: The right ε depends on the overall scale of the data, requiring normalization or careful tuning.
•Can disconnect the graph: Unlike k-NN, there's no guarantee of minimum degree—isolated points or components are possible.
•Quadratic in edge count: In dense regions with small ε, the number of edges can approach O(N²).

Choosing the Right Neighborhood Definition

Both k-NN and ε-ball have their merits, and the choice depends on your data characteristics and computational constraints. Here's a decision framework:

k-NN vs. ε-Ball: Decision Guide
Scenario	Recommended	Reasoning
Uniform sampling density	Either	Both work well; ε-ball is slightly more theoretically grounded
Varying density	k-NN	Adapts to local density automatically
Tight theoretical guarantees needed	ε-ball	Convergence theorems typically use ε
Unknown data scale	k-NN	Scale-invariant parameter
Strong prior on manifold smoothness	ε-ball	Can set ε based on expected curvature
Risk of disconnected components	k-NN	Guarantees minimum degree
Computational efficiency paramount	k-NN	Sparse graphs with predictable edge count

Practical Recommendation

Handling Disconnected Graphs

Strategies for Disconnected Graphs

•Increase k or ε: The simplest solution—increase the neighborhood size until the graph connects. Use binary search to find the minimum value that achieves connectivity.
•Process largest component only: If the manifold genuinely has disconnected parts, embed only the largest connected component. Other components can be handled separately or flagged.
•Add bridging edges: Connect components with minimum-length bridging edges. This is heuristic but can work if disconnections are due to sampling gaps rather than true manifold structure.
•Use mutual k-NN + k-NN union: Start with mutual k-NN (both points consider each other neighbors), then add regular k-NN edges until connected. This balances strictness with connectivity.
•Density-aware connectivity: In sparse regions, use larger effective neighborhoods. Methods like HLLE (Hessian LLE) and LTSA use local PCA to estimate intrinsic dimensionality and adapt connectivity.

handle_disconnected_graph.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import connected_components
 
def ensure_connected_graph(X, initial_k=5, max_k=50):
    """
    Build k-NN graph with minimum k that ensures connectivity.
    
    Uses binary search to find smallest k yielding a connected graph.
    
    Parameters:
        X: (N, D) array of data points
        initial_k: Starting value of k
        max_k: Maximum k to try
    
    Returns:
        adjacency: Connected sparse adjacency matrix
        k_final: The k value used
    """
    k_low, k_high = initial_k, max_k
    k_final = max_k
    best_adjacency = None
    
    while k_low <= k_high:
        k_mid = (k_low + k_high) // 2
        adjacency = build_knn_graph(X, k_mid)
        
        n_components, labels = connected_components(
            csgraph=adjacency, 
            directed=False, 
            return_labels=True
        )
        
        if n_components == 1:
            k_final = k_mid
            best_adjacency = adjacency
            k_high = k_mid - 1
        else:
            k_low = k_mid + 1
    
    if best_adjacency is None:
        # Even max_k doesn't connect—use largest component
        adjacency = build_knn_graph(X, max_k)
        n_components, labels = connected_components(adjacency, directed=False, return_labels=True)
        
        # Find largest component
        component_sizes = np.bincount(labels)
        largest_component = np.argmax(component_sizes)
        mask = labels == largest_component
        
        print(f"Warning: Graph disconnected with k={max_k}. "
              f"Using largest component ({component_sizes[largest_component]}/{X.shape[0]} points).")
        
        return adjacency, max_k, mask
    
    return best_adjacency, k_final, np.ones(X.shape[0], dtype=bool)
 
def adaptive_connectivity_repair(adjacency, X, max_bridges=10):
    """
    Repair disconnected graph by adding minimum bridging edges.
    
    Connects components by adding edges between their closest points.
    
    Parameters:
        adjacency: Sparse adjacency matrix (possibly disconnected)
        X: Original data points
        max_bridges: Maximum number of bridging edges to add
    
    Returns:
        adjacency: Repaired adjacency matrix
    """
    from scipy.spatial.distance import cdist
    
    n_components, labels = connected_components(adjacency, directed=False, return_labels=True)
    
    if n_components == 1:
        return adjacency  # Already connected
    
    adjacency = adjacency.tolil()  # Convert to LIL for efficient modification
    
    bridges_added = 0
    while n_components > 1 and bridges_added < max_bridges:
        # Find closest pair of points in different components
        min_dist = np.inf
        bridge = None
        
        for c1 in range(n_components):
            for c2 in range(c1 + 1, n_components):
                idx1 = np.where(labels == c1)[0]
                idx2 = np.where(labels == c2)[0]
                
                # Compute cross-component distances
                dists = cdist(X[idx1], X[idx2])
                i_local, j_local = np.unravel_index(np.argmin(dists), dists.shape)
                
                if dists[i_local, j_local] < min_dist:
                    min_dist = dists[i_local, j_local]
                    bridge = (idx1[i_local], idx2[j_local])
        
        # Add bridging edge
        i, j = bridge
        adjacency[i, j] = min_dist
        adjacency[j, i] = min_dist
        bridges_added += 1
        
        # Recompute components
        n_components, labels = connected_components(
            adjacency.tocsr(), directed=False, return_labels=True
        )
    
    return adjacency.tocsr()

Bridging Can Introduce Distortion

Computational Complexity and Scalability

Graph construction is often the first computational bottleneck in Isomap. Understanding the complexity helps in choosing efficient implementations.

Computational Complexity of Graph Construction
Operation	Naïve Complexity	Optimized Complexity	Method
Pairwise distances	$O(N^2 D)$	$O(N^2 D)$	Required for brute-force
k-NN search (brute force)	$O(N^2 D)$	—	Compute all distances, sort
k-NN search (KD-tree)	$O(N^2 D)$	$O(N \log N \cdot D)$ (low D)	Partition space recursively
k-NN search (Ball tree)	$O(N^2 D)$	$O(N \log N \cdot D)$	Metric tree; works in high D
k-NN search (ANN)	$O(N^2 D)$	$O(N \log N)$ + error	LSH, HNSW, etc.
ε-ball search	$O(N^2 D)$	$O(N \log N)$ (low D)	Range trees or spatial hashing
Graph storage	$O(N^2)$ dense	$O(Nk)$ sparse	Sparse matrix format

efficient_graph_construction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import numpy as np
 
def build_knn_graph_sklearn(X, k, algorithm='auto'):
    """
    Build k-NN graph using scikit-learn's optimized implementation.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        algorithm: 'auto', 'ball_tree', 'kd_tree', or 'brute'
    
    Returns:
        adjacency: Sparse CSR adjacency matrix
    """
    from sklearn.neighbors import kneighbors_graph
    
    # kneighbors_graph returns a sparse matrix
    # mode='distance' gives edge weights as distances
    adjacency = kneighbors_graph(
        X, 
        n_neighbors=k, 
        mode='distance',
        include_self=False,
        algorithm=algorithm
    )
    
    # Symmetrize the graph
    adjacency = adjacency + adjacency.T
    
    # Remove duplicate edges (take min for edge weight)
    adjacency = adjacency.minimum(adjacency.T)
    
    return adjacency
 
def build_knn_graph_annoy(X, k, n_trees=50):
    """
    Build approximate k-NN graph using Annoy.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        n_trees: Number of trees in the forest (higher = more accurate)
    
    Returns:
        adjacency: Sparse CSR adjacency matrix
    """
    from annoy import AnnoyIndex
    from scipy.sparse import lil_matrix
    
    N, D = X.shape
    
    # Build Annoy index
    index = AnnoyIndex(D, metric='euclidean')
    for i in range(N):
        index.add_item(i, X[i])
    index.build(n_trees)
    
    # Query for neighbors
    adjacency = lil_matrix((N, N), dtype=np.float64)
    for i in range(N):
        neighbors, distances = index.get_nns_by_item(i, k+1, include_distances=True)
        for j, d in zip(neighbors[1:], distances[1:]):  # Skip self
            adjacency[i, j] = d
            adjacency[j, i] = d
    
    return adjacency.tocsr()
 
def build_knn_graph_pynndescent(X, k, n_neighbors=None):
    """
    Build approximate k-NN graph using PyNNDescent (HNSW-like).
    
    This is one of the fastest approximate methods available.
    
    Parameters:
        X: (N, D) array of data points
        k: Number of nearest neighbors
        n_neighbors: Internal parameter (default: k)
    
    Returns:
        adjacency: Sparse CSR adjacency matrix
    """
    from pynndescent import NNDescent
    from scipy.sparse import lil_matrix
    
    N = X.shape[0]
    
    # Build approximate k-NN index
    index = NNDescent(X, n_neighbors=k+1, metric='euclidean')
    indices, distances = index.query(X, k=k+1)
    
    # Build sparse adjacency
    adjacency = lil_matrix((N, N), dtype=np.float64)
    for i in range(N):
        for j_idx in range(1, k+1):  # Skip self
            j = indices[i, j_idx]
            d = distances[i, j_idx]
            adjacency[i, j] = d
            adjacency[j, i] = d
    
    return adjacency.tocsr()

Practical Scaling Advice

Theoretical Requirements for Faithful Approximation

For the graph-based geodesic approximation to converge to true geodesic distances, several theoretical conditions must hold. Understanding these helps diagnose when Isomap might fail.

Diagnosing Isomap Failures
Symptom	Likely Cause	Diagnostic	Remedy
Disconnected embedding	Insufficient k/ε	Check component count	Increase neighborhood size
Distorted embedding	Short-circuits	Check for cross-manifold edges	Decrease neighborhood size
Unstable to k changes	Borderline connectivity	Vary k and compare	Improve sampling or use robust method
Poor reconstruction	High curvature	Check residual variance	Consider local methods (LLE, t-SNE)
Collapsed dimensions	Intrinsic dim mismatch	Use residual variance plot	Adjust target dimension

When Isomap Theory Breaks Down

Summary: Building the Geodesic Approximation

We've covered the critical first step of Isomap: constructing the neighborhood graph that enables geodesic approximation. Let's consolidate the key insights:

Key Takeaways

•The neighborhood graph discretizes the manifold — vertices are data points, edges connect nearby points with weights equal to Euclidean distances.
•Two main approaches: k-NN and ε-ball — k-NN adapts to local density and is more robust; ε-ball has cleaner theoretical properties.
•Connectivity-accuracy tradeoff — small neighborhoods preserve locality but may disconnect; large neighborhoods connect but may short-circuit.
•The graph must be connected — use adaptive k, process largest component, or add bridging edges if disconnected.
•Computational efficiency matters — use spatial data structures (ball trees) for moderate N, approximate methods (HNSW) for large N.
•Theoretical guarantees require conditions — sufficient sampling, bounded curvature, geodesic convexity, and appropriate neighborhood size.
•Practical heuristic: start with k=5-20 — increase if disconnected, decrease if short-circuits are suspected.

What's Next

2 / 5