Loading learning content...
We now possess the reconstruction weights that encode local linear geometry—but these weights live in the original high-dimensional space. The critical question remains: How do we find low-dimensional coordinates that respect these local relationships?
This is the second phase of LLE: embedding optimization. Given weights $\mathbf{W}$ computed in high-dimensional space, we seek low-dimensional embedding coordinates ${\mathbf{y}_i} \in \mathbb{R}^d$ such that each point is approximately reconstructed by its neighbors using the same weights.
The mathematical structure of this problem is elegant: it reduces to an eigenvalue problem on a specific matrix constructed from the weights. This connects LLE to spectral methods and provides efficient, stable numerical solutions.
This page develops the embedding optimization in full detail, from the objective function through the eigenvalue formulation to practical algorithms.
By the end of this page, you will: • Formulate the embedding optimization problem mathematically • Derive the eigenvalue problem that provides optimal coordinates • Understand the spectral properties of the cost matrix • Implement efficient algorithms for computing the embedding • Recognize the significance of the smallest non-zero eigenvalues
With weights $\mathbf{W}$ fixed from the first phase, we seek embedding coordinates ${\mathbf{y}i}{i=1}^N$ in $\mathbb{R}^d$ that minimize the embedding cost:
$$\Phi(\mathbf{Y}) = \sum_{i=1}^N \left| \mathbf{y}i - \sum{j=1}^N W_{ij} \mathbf{y}_j \right|^2$$
The Key Insight:
The weights were chosen to minimize reconstruction error in the high-dimensional space. By finding low-dimensional coordinates $\mathbf{y}_i$ that also have small reconstruction error with the same weights, we preserve the local linear structure.
Matrix Formulation:
Let $\mathbf{Y} \in \mathbb{R}^{N \times d}$ be the matrix of embedding coordinates, with $\mathbf{y}_i^T$ as the $i$-th row. The embedding cost becomes:
$$\Phi(\mathbf{Y}) = |(\mathbf{I} - \mathbf{W})\mathbf{Y}|_F^2 = \text{tr}\left( \mathbf{Y}^T (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W}) \mathbf{Y} \right)$$
Defining the cost matrix: $$\mathbf{M} = (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W})$$
The objective simplifies to: $$\Phi(\mathbf{Y}) = \text{tr}(\mathbf{Y}^T \mathbf{M} \mathbf{Y})$$
The cost matrix M has crucial properties: • Symmetric: M = Mᵀ (by construction) • Positive Semi-Definite: All eigenvalues λ ≥ 0 • Sparse: Inherits sparsity from W • Graph Laplacian-like: M1 = 0 (the all-ones vector is in the null space)
These properties connect LLE to spectral graph theory and ensure well-posed optimization.
Minimizing $\text{tr}(\mathbf{Y}^T \mathbf{M} \mathbf{Y})$ without constraints yields a trivial solution: $\mathbf{Y} = \mathbf{0}$ (all points at the origin). To obtain meaningful embeddings, we impose constraints.
Constraint 1: Centering
The embedding should be centered at the origin: $$\sum_{i=1}^N \mathbf{y}_i = \mathbf{0} \quad \Leftrightarrow \quad \mathbf{Y}^T \mathbf{1} = \mathbf{0}$$
This removes the trivial translational degree of freedom.
Constraint 2: Unit Covariance
To prevent the embedding from collapsing to a point, we require: $$\frac{1}{N} \mathbf{Y}^T \mathbf{Y} = \mathbf{I}_d$$
or equivalently: $\mathbf{Y}^T \mathbf{Y} = N \mathbf{I}_d$.
This ensures the embedding has unit variance along each coordinate and the coordinates are uncorrelated.
The Constrained Problem:
$$\min_{\mathbf{Y}} \text{tr}(\mathbf{Y}^T \mathbf{M} \mathbf{Y})$$ subject to: $$\mathbf{Y}^T \mathbf{1} = \mathbf{0} \quad \text{(centering)}$$ $$\mathbf{Y}^T \mathbf{Y} = N \mathbf{I}_d \quad \text{(unit covariance)}$$
Why These Constraints Work:
The centering constraint eliminates the zero eigenvalue corresponding to uniform translation. The covariance constraint forces the solution to span a non-trivial subspace.
Consider what happens without the covariance constraint:
With the covariance constraint:
Connection to Eigenvalue Problems:
This constrained optimization is equivalent to an eigenvalue problem. We seek the $d$ directions of minimum "cost" after excluding the trivial zero-eigenvalue direction.
Some implementations use the constraint Yᵀ Y = I (orthonormal columns) rather than Yᵀ Y = N·I. This is equivalent up to scaling: multiply the resulting Y by √N. The eigenvalue problem is identical; only the final scaling differs.
The constrained optimization can be solved using Lagrange multipliers, leading to an eigenvalue problem.
Step 1: Write the Lagrangian
For each embedding coordinate $\mathbf{y}^{(j)} \in \mathbb{R}^N$ (the $j$-th column of $\mathbf{Y}$):
$$\mathcal{L}(\mathbf{y}^{(j)}, \lambda_j) = (\mathbf{y}^{(j)})^T \mathbf{M} \mathbf{y}^{(j)} - \lambda_j \left( (\mathbf{y}^{(j)})^T \mathbf{y}^{(j)} - N \right)$$
Step 2: First-Order Condition
$$\frac{\partial \mathcal{L}}{\partial \mathbf{y}^{(j)}} = 2\mathbf{M} \mathbf{y}^{(j)} - 2\lambda_j \mathbf{y}^{(j)} = \mathbf{0}$$
This gives the eigenvalue equation: $$\mathbf{M} \mathbf{y}^{(j)} = \lambda_j \mathbf{y}^{(j)}$$
Step 3: Selecting Eigenvectors
The objective at the solution is: $$(\mathbf{y}^{(j)})^T \mathbf{M} \mathbf{y}^{(j)} = \lambda_j (\mathbf{y}^{(j)})^T \mathbf{y}^{(j)} = N \lambda_j$$
To minimize the total cost, we select eigenvectors corresponding to the smallest eigenvalues.
Step 4: The Null Space Issue
Since $\mathbf{M}\mathbf{1} = \mathbf{0}$, the constant vector $\mathbf{1}$ is an eigenvector with eigenvalue 0. This corresponds to the trivial solution of placing all points at the same location.
The Solution:
The optimal $d$-dimensional embedding uses the eigenvectors corresponding to eigenvalues $\lambda_2 \leq \lambda_3 \leq \ldots \leq \lambda_{d+1}$, where we skip $\lambda_1 = 0$ (the constant eigenvector).
| Eigenvalue | Eigenvector | Meaning | Use |
|---|---|---|---|
| λ₁ = 0 | v₁ = 1 (constant) | Trivial translation mode | Skip (centering constraint) |
| λ₂ ≤ λ₃ ≤ ... ≤ λ_{d+1} | v₂, v₃, ..., v_{d+1} | Minimum cost embedding directions | Use as embedding coordinates |
| λ₂ = 0 (with λ₁) | Multiple zero eigenvalues | Disconnected components or singularity | Diagnostic warning |
| λ_{d+2}, ..., λ_N | Remaining eigenvectors | Higher cost directions, noise | Discard |
The Final Algorithm:
The embedding coordinates are: $$\mathbf{Y} = [\mathbf{v}_2, \mathbf{v}3, \ldots, \mathbf{v}{d+1}]$$
where $\mathbf{v}_i$ are the eigenvectors normalized to have $|\mathbf{v}_i|^2 = N$.
If λ₂ is very small (near zero), it may indicate: • Disconnected neighborhoods (the graph is not connected) • Near-degenerate structure in the data • Numerical issues in weight computation
Always check the eigenvalue spectrum before using the embedding.
Understanding the spectral properties of $\mathbf{M}$ provides insight into LLE's behavior and helps diagnose potential issues.
Theorem: The Constant Null Space
The all-ones vector $\mathbf{1}$ is always in the null space of $\mathbf{M}$:
Proof: Since $\sum_j W_{ij} = 1$ for all $i$, we have $\mathbf{W}\mathbf{1} = \mathbf{1}$. Therefore: $$(\mathbf{I} - \mathbf{W})\mathbf{1} = \mathbf{1} - \mathbf{W}\mathbf{1} = \mathbf{1} - \mathbf{1} = \mathbf{0}$$ Thus $\mathbf{M}\mathbf{1} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})\mathbf{1} = \mathbf{0}$.
Positive Semi-Definiteness:
$\mathbf{M} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})$ is always positive semi-definite: $$\mathbf{v}^T \mathbf{M} \mathbf{v} = |(\mathbf{I} - \mathbf{W})\mathbf{v}|^2 \geq 0$$
Rank and Null Space:
The dimension of the null space of $\mathbf{M}$ equals the dimension of the null space of $(\mathbf{I} - \mathbf{W})$. For well-constructed weights with a connected neighborhood graph, the null space is one-dimensional (spanned by $\mathbf{1}$).
If there are $c$ connected components in the neighborhood graph, the null space has dimension $c$ (one constant vector per component).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
import numpy as npfrom scipy.sparse.linalg import eigshfrom scipy.linalg import eighimport matplotlib.pyplot as plt def analyze_lle_spectrum(W, n_eigenvalues=20): """ Analyze the spectrum of the LLE cost matrix M = (I - W)^T (I - W). Parameters: ----------- W : ndarray of shape (N, N) LLE weight matrix (sparse or dense) n_eigenvalues : int Number of smallest eigenvalues to compute Returns: -------- eigenvalues : array Sorted eigenvalues eigenvectors : array Corresponding eigenvectors analysis : dict Diagnostic information """ N = W.shape[0] # Compute I - W if hasattr(W, 'toarray'): I_minus_W = np.eye(N) - W.toarray() else: I_minus_W = np.eye(N) - W # Cost matrix M = (I - W)^T @ (I - W) M = I_minus_W.T @ I_minus_W # Compute eigenvalues if N < 500: # Full eigendecomposition for small matrices eigenvalues, eigenvectors = eigh(M) else: # Sparse eigenvalue solver for larger matrices # Request smallest eigenvalues eigenvalues, eigenvectors = eigsh( M, k=min(n_eigenvalues, N-1), which='SM', # Smallest magnitude tol=1e-10 ) # Sort by eigenvalue idx = np.argsort(eigenvalues) eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] # Diagnostic analysis analysis = { 'smallest_eigenvalue': eigenvalues[0], 'second_smallest': eigenvalues[1] if len(eigenvalues) > 1 else None, 'eigenvalue_gap': eigenvalues[1] - eigenvalues[0] if len(eigenvalues) > 1 else None, 'num_near_zero': np.sum(eigenvalues < 1e-10), 'spectral_gap_ratio': eigenvalues[1] / eigenvalues[2] if len(eigenvalues) > 2 and eigenvalues[2] > 0 else None, } # Check null space dimension null_dim = np.sum(np.abs(eigenvalues) < 1e-10) analysis['null_space_dim'] = null_dim if null_dim > 1: analysis['warning'] = f"Null space dimension {null_dim} > 1: graph may be disconnected" return eigenvalues, eigenvectors, analysis def plot_eigenvalue_spectrum(eigenvalues, title="LLE Eigenvalue Spectrum"): """ Visualize the eigenvalue spectrum. """ fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Linear scale axes[0].plot(eigenvalues[:min(50, len(eigenvalues))], 'o-') axes[0].axhline(y=0, color='r', linestyle='--', alpha=0.5) axes[0].set_xlabel('Eigenvalue Index') axes[0].set_ylabel('Eigenvalue') axes[0].set_title(f'{title} (Linear Scale)') axes[0].grid(True, alpha=0.3) # Log scale (for positive eigenvalues) positive_eigs = eigenvalues[eigenvalues > 1e-15] axes[1].semilogy(positive_eigs[:min(50, len(positive_eigs))], 'o-') axes[1].set_xlabel('Eigenvalue Index (positive only)') axes[1].set_ylabel('Eigenvalue (log scale)') axes[1].set_title(f'{title} (Log Scale)') axes[1].grid(True, alpha=0.3) plt.tight_layout() return fig def check_embedding_quality(Y, W): """ Check how well the embedding preserves local structure. Returns the embedding cost and per-point errors. """ N = Y.shape[0] # Reconstruction in embedded space Y_reconstructed = W @ Y # Per-point reconstruction error errors = np.linalg.norm(Y - Y_reconstructed, axis=1) ** 2 # Total cost total_cost = np.sum(errors) return { 'total_cost': total_cost, 'mean_error': np.mean(errors), 'max_error': np.max(errors), 'errors': errors, }The gap between λ₁ = 0 and λ₂ (the "spectral gap") indicates how well the embedding separates trivial from meaningful structure. A large spectral gap suggests a stable, well-defined embedding. A very small gap may indicate near-degeneracy or disconnected components.
Computing the full eigendecomposition of an N×N matrix costs O(N³)—prohibitive for large datasets. Fortunately, we only need the d+1 smallest eigenvalues, enabling efficient iterative methods.
The ARPACK Approach:
The ARPACK library (used by scipy.sparse.linalg.eigsh) uses the Implicitly Restarted Lanczos Method:
Challenge: Smallest vs. Largest Eigenvalues
Iterative eigensolvers are most efficient for finding largest eigenvalues. Finding smallest eigenvalues requires:
Shift-Invert Mode: Solve $(\mathbf{M} - \sigma\mathbf{I})^{-1}$ for eigenvalues near $\sigma = 0$
Working with M Directly: The 'SM' (smallest magnitude) mode
Practical Recommendation:
For most applications (N < 10,000), shift-invert with σ = 0 works well. For very large N, consider approximate methods like randomized eigendecomposition.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
import numpy as npfrom scipy.sparse import csr_matrix, eye as sparse_eyefrom scipy.sparse.linalg import eigsh, LinearOperatorfrom scipy.linalg import eigh def compute_lle_embedding(W, n_components=2, method='arpack'): """ Compute the LLE embedding from the weight matrix. Parameters: ----------- W : sparse matrix or ndarray of shape (N, N) LLE weight matrix n_components : int Target dimensionality method : str 'arpack' for sparse solver, 'dense' for full decomposition Returns: -------- Y : ndarray of shape (N, n_components) Embedding coordinates eigenvalues : ndarray Eigenvalues (for diagnostics) """ N = W.shape[0] # Convert to sparse if needed if not hasattr(W, 'toarray'): W = csr_matrix(W) # Construct I - W I = sparse_eye(N, format='csr') I_minus_W = I - W # Method 1: Dense (for small N) if method == 'dense' or N < 500: M = (I_minus_W.T @ I_minus_W).toarray() eigenvalues, eigenvectors = eigh(M) # Method 2: Sparse ARPACK with shift-invert elif method == 'arpack': # Define M as a linear operator for memory efficiency def matvec(v): temp = I_minus_W @ v return I_minus_W.T @ temp M_op = LinearOperator((N, N), matvec=matvec, dtype=float) # For smallest eigenvalues, we use sigma=0.0 (shift-invert) # But eigsh with 'SM' can work for moderately sized problems M_dense = (I_minus_W.T @ I_minus_W).toarray() # For shift-invert try: # Shift-invert mode finds eigenvalues near sigma eigenvalues, eigenvectors = eigsh( M_dense, k=n_components + 1, sigma=0.0, # Shift near 0 which='LM', # Largest magnitude after shift tol=1e-10, ) except Exception as e: # Fallback to SM mode eigenvalues, eigenvectors = eigsh( M_dense, k=n_components + 1, which='SM', tol=1e-10, ) # Sort by eigenvalue idx = np.argsort(eigenvalues) eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] # Skip the first eigenvector (constant, λ ≈ 0) # Use eigenvectors 1 through n_components as embedding Y = eigenvectors[:, 1:n_components+1] # Rescale to have unit variance per dimension Y = Y * np.sqrt(N) return Y, eigenvalues[:n_components+2] def compute_lle_embedding_iterative(W, n_components=2, tol=1e-8, max_iter=1000): """ Compute LLE embedding using power iteration with deflation. This is useful when scipy.sparse is unavailable or for educational purposes to understand the algorithm. """ N = W.shape[0] # Convert to dense for simplicity if hasattr(W, 'toarray'): W = W.toarray() # Compute M = (I - W)^T (I - W) I_minus_W = np.eye(N) - W M = I_minus_W.T @ I_minus_W eigenvectors = [] eigenvalues = [] # Find n_components + 1 smallest eigenvalues using inverse iteration # (We work with M^{-1} conceptually, but use linear solves) for k in range(n_components + 1): # Initialize random vector v = np.random.randn(N) v = v / np.linalg.norm(v) # Deflate: orthogonalize against previously found eigenvectors for ev in eigenvectors: v = v - np.dot(v, ev) * ev v = v / np.linalg.norm(v) # Power iteration on M (for smallest eigenvalues, this is slow) # Instead, we solve the eigenvalue problem directly # This is just for illustration; use scipy.linalg.eigh in practice for _ in range(max_iter): v_new = M @ v # Deflate for ev in eigenvectors: v_new = v_new - np.dot(v_new, ev) * ev # Normalize norm = np.linalg.norm(v_new) if norm < 1e-15: break v_new = v_new / norm # Check convergence if np.linalg.norm(v_new - v) < tol: break v = v_new eigenvalue = v.T @ M @ v eigenvectors.append(v) eigenvalues.append(eigenvalue) # Stack and sort eigenvectors = np.column_stack(eigenvectors) eigenvalues = np.array(eigenvalues) idx = np.argsort(eigenvalues) eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] # Skip first eigenvector Y = eigenvectors[:, 1:n_components+1] * np.sqrt(N) return Y, eigenvaluesWe now synthesize the weight computation and embedding optimization into the complete LLE algorithm.
Algorithm: Locally Linear Embedding
Input: Data matrix $\mathbf{X} \in \mathbb{R}^{N \times D}$, number of neighbors $k$, target dimension $d$, regularization $\alpha$
Output: Embedding $\mathbf{Y} \in \mathbb{R}^{N \times d}$
Phase 1: Compute Reconstruction Weights
Phase 2: Compute Embedding Coordinates
Form cost matrix: $\mathbf{M} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})$
Compute $d+1$ smallest eigenvalues/eigenvectors of $\mathbf{M}$
Discard eigenvector with eigenvalue $\lambda_1 \approx 0$
Set embedding: $\mathbf{Y} = [\mathbf{v}2, \ldots, \mathbf{v}{d+1}] \cdot \sqrt{N}$
Return: $\mathbf{Y}$
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170
import numpy as npfrom sklearn.neighbors import NearestNeighborsfrom scipy.sparse import lil_matrix, csr_matrix, eye as sparse_eyefrom scipy.linalg import eigh class LocallyLinearEmbedding: """ Complete implementation of Locally Linear Embedding. This is an educational implementation that closely follows the algorithm as described. For production use, consider sklearn.manifold.LocallyLinearEmbedding. """ def __init__(self, n_neighbors=10, n_components=2, reg=1e-3): """ Parameters: ----------- n_neighbors : int Number of neighbors for each point n_components : int Target dimensionality reg : float Regularization strength """ self.n_neighbors = n_neighbors self.n_components = n_components self.reg = reg def fit_transform(self, X): """ Compute the LLE embedding. Parameters: ----------- X : ndarray of shape (N, D) Input data Returns: -------- Y : ndarray of shape (N, n_components) Embedding """ N, D = X.shape k = self.n_neighbors # ========================================== # PHASE 1: Compute Reconstruction Weights # ========================================== # Step 1.1: Find k nearest neighbors nn = NearestNeighbors(n_neighbors=k+1, algorithm='auto') nn.fit(X) _, neighbor_indices = nn.kneighbors(X) neighbor_indices = neighbor_indices[:, 1:] # Remove self # Step 1.2: Compute weights for each point W = lil_matrix((N, N)) for i in range(N): neighbors = neighbor_indices[i] x_i = X[i] X_neighbors = X[neighbors] # Difference vectors eta = x_i - X_neighbors # Shape: (k, D) # Local Gram matrix G = eta @ eta.T # Shape: (k, k) # Regularization trace = np.trace(G) if trace > 0: G += self.reg * trace / k * np.eye(k) else: G += self.reg * np.eye(k) # Solve for weights ones = np.ones(k) try: w = np.linalg.solve(G, ones) w = w / np.sum(w) except np.linalg.LinAlgError: w = np.ones(k) / k W[i, neighbors] = w W = csr_matrix(W) self.W_ = W # ========================================== # PHASE 2: Compute Embedding Coordinates # ========================================== # Step 2.1: Form cost matrix M = (I - W)^T (I - W) I = sparse_eye(N, format='csr') I_minus_W = I - W M = (I_minus_W.T @ I_minus_W).toarray() # Enforce exact symmetry M = (M + M.T) / 2 # Step 2.2: Eigendecomposition eigenvalues, eigenvectors = eigh(M) # Step 2.3: Select embedding coordinates # Skip first eigenvector (λ ≈ 0, corresponds to constant) idx = np.argsort(eigenvalues) eigenvalues = eigenvalues[idx] eigenvectors = eigenvectors[:, idx] self.eigenvalues_ = eigenvalues # Use eigenvectors 1 through n_components Y = eigenvectors[:, 1:self.n_components+1] # Scale to unit variance Y = Y * np.sqrt(N) self.embedding_ = Y return Y def reconstruction_error(self, X): """Compute reconstruction error in original space.""" return np.sum((X - self.W_ @ X) ** 2) def embedding_cost(self): """Compute embedding cost Tr(Y^T M Y).""" if not hasattr(self, 'embedding_'): raise ValueError("Must call fit_transform first") I_minus_W = sparse_eye(self.W_.shape[0]) - self.W_ residual = I_minus_W @ self.embedding_ return np.sum(residual ** 2) # Example usageif __name__ == "__main__": from sklearn.datasets import make_swiss_roll import matplotlib.pyplot as plt # Generate Swiss Roll X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42) # Apply LLE lle = LocallyLinearEmbedding(n_neighbors=12, n_components=2) Y = lle.fit_transform(X) # Visualize fig, axes = plt.subplots(1, 2, figsize=(12, 5)) # Original 3D ax1 = fig.add_subplot(121, projection='3d') ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap='viridis', s=10) ax1.set_title('Original Swiss Roll (3D)') # LLE embedding axes[1].scatter(Y[:, 0], Y[:, 1], c=color, cmap='viridis', s=10) axes[1].set_title('LLE Embedding (2D)') axes[1].set_xlabel('LLE 1') axes[1].set_ylabel('LLE 2') plt.tight_layout() plt.show() # Diagnostics print(f"Reconstruction error: {lle.reconstruction_error(X):.4f}") print(f"Embedding cost: {lle.embedding_cost():.4f}") print(f"First 5 eigenvalues: {lle.eigenvalues_[:5]}")What guarantees does LLE provide? Under what conditions does it recover the true manifold structure?
Asymptotic Consistency:
Under certain regularity conditions, LLE is asymptotically consistent: as the number of samples $N \to \infty$ with appropriate scaling of the neighborhood size, the embedding converges to the true low-dimensional parameterization (up to affine transformation).
Key Assumptions for Consistency:
The Isometry Property:
LLE approximately preserves local isometry: if two points are close on the manifold, their embedded positions are close. However, LLE does not preserve global distances—points far apart on the manifold may be close in the embedding if they are Euclidean neighbors.
Comparison with Isomap:
| Property | LLE | Isomap |
|---|---|---|
| Preserves | Local linear structure | Geodesic distances |
| Global geometry | Not explicitly preserved | Approximately preserved |
| Distortion | Local neighborhoods preserved | Isometric to flat space |
| Assumption | Local linearity | Manifold is isometric to ℝᵈ |
The Embedding Quality:
Define the embedding cost: $$\Phi(\mathbf{Y}) = \sum_{i=1}^N \left| \mathbf{y}i - \sum_j W{ij} \mathbf{y}_j \right|^2$$
For the optimal embedding using eigenvectors of $\mathbf{M}$: $$\Phi^* = \sum_{j=2}^{d+1} \lambda_j$$
The embedding cost equals the sum of the $d$ smallest non-zero eigenvalues. This provides a measure of how well the embedding preserves local structure.
Interpreting Eigenvalues:
Selecting Dimensionality:
The eigenvalue spectrum can guide dimensionality selection. Look for a "gap" or "elbow" in the eigenvalue plot—the intrinsic dimensionality is suggested by where eigenvalues transition from small to large.
One approach to selecting d: plot the cumulative sum of eigenvalues ∑λⱼ for j = 2, ..., d+1 as a function of d. When this "residual variance" levels off, additional dimensions add little new structure. This is analogous to the scree plot in PCA.
We have developed the complete theory and practice of LLE embedding optimization.
What's Next:
In the next page, we explore LLE variants—modifications and extensions that address limitations of the basic algorithm. These include Hessian LLE (which provides better theoretical guarantees), Modified LLE (for handling non-convex neighborhoods), and others that expand the applicability and robustness of the LLE family.
You now understand how LLE transforms reconstruction weights into low-dimensional coordinates through eigenvalue optimization. This spectral approach connects LLE to the broader family of spectral methods in machine learning.