Loading learning content...
The first phase of Locally Linear Embedding transforms intuition into computation: we must find reconstruction weights that quantify how each data point relates to its neighbors. These weights are not arbitrary numerical values—they encode the precise geometric relationships in the local tangent space.
Computing optimal weights is a constrained optimization problem with elegant mathematical structure. The solution involves linear algebra concepts—Gram matrices, constrained least squares, and regularization—that reveal deep connections between geometry and algebra. Done correctly, the weights capture all information needed to preserve local structure in the low-dimensional embedding.
This page provides a comprehensive treatment of weight computation, from the optimization formulation to practical numerical implementation.
By the end of this page, you will: • Derive the closed-form solution for optimal reconstruction weights • Understand the role of the local Gram matrix in weight computation • Master regularization techniques for ill-conditioned problems • Implement efficient weight computation algorithms • Recognize failure modes and diagnostic techniques
For each point $\mathbf{x}i$ in our dataset, we seek weights ${w{ij}}_{j \in \mathcal{N}_i}$ that minimize the reconstruction error subject to constraints.
The Formal Problem:
$$\min_{\mathbf{w}_i} \left| \mathbf{x}i - \sum{j \in \mathcal{N}i} w{ij} \mathbf{x}_j \right|^2$$
subject to: $$\sum_{j \in \mathcal{N}i} w{ij} = 1$$
where $\mathcal{N}_i$ denotes the set of $k$ nearest neighbors of point $i$.
Expanding the Objective:
Let us denote the $k$ neighbors of $\mathbf{x}i$ by ${\mathbf{x}{j_1}, \ldots, \mathbf{x}_{j_k}}$. Collecting the weights into a vector $\mathbf{w}_i \in \mathbb{R}^k$, the objective becomes:
$$\varepsilon_i = \left| \mathbf{x}i - \sum{l=1}^k w_{il} \mathbf{x}_{j_l} \right|^2$$
Introducing Difference Vectors:
Define $\boldsymbol{\eta}{l} = \mathbf{x}i - \mathbf{x}{j_l}$ for $l = 1, \ldots, k$. Using the constraint $\sum_l w{il} = 1$:
$$\mathbf{x}i - \sum_l w{il} \mathbf{x}{j_l} = \sum_l w{il} \mathbf{x}i - \sum_l w{il} \mathbf{x}{j_l} = \sum_l w{il} (\mathbf{x}i - \mathbf{x}{j_l}) = \sum_l w_{il} \boldsymbol{\eta}_l$$
Therefore: $$\varepsilon_i = \left| \sum_l w_{il} \boldsymbol{\eta}_l \right|^2 = \mathbf{w}_i^T \mathbf{G}^{(i)} \mathbf{w}_i$$
where $\mathbf{G}^{(i)}_{lm} = \boldsymbol{\eta}_l^T \boldsymbol{\eta}_m$ is the local Gram matrix.
The Gram matrix G⁽ⁱ⁾ ∈ ℝᵏˣᵏ captures all pairwise inner products between the difference vectors (x_i - x_j). Its (l,m) entry is:
G_{lm}⁽ⁱ⁾ = (x_i - x_{j_l})ᵀ(x_i - x_{j_m})
This matrix is symmetric and positive semi-definite. Its rank equals the dimension of the affine hull spanned by x_i and its neighbors.
We now seek to minimize the quadratic objective $\mathbf{w}_i^T \mathbf{G}^{(i)} \mathbf{w}_i$ subject to the linear constraint $\mathbf{1}^T \mathbf{w}_i = 1$.
Method: Lagrange Multipliers
The Lagrangian is: $$\mathcal{L}(\mathbf{w}_i, \lambda) = \mathbf{w}_i^T \mathbf{G}^{(i)} \mathbf{w}_i - \lambda(\mathbf{1}^T \mathbf{w}_i - 1)$$
First-Order Conditions:
$$\frac{\partial \mathcal{L}}{\partial \mathbf{w}_i} = 2\mathbf{G}^{(i)} \mathbf{w}_i - \lambda \mathbf{1} = \mathbf{0}$$
$$\frac{\partial \mathcal{L}}{\partial \lambda} = \mathbf{1}^T \mathbf{w}_i - 1 = 0$$
Solving for Weights:
From the first condition (assuming $\mathbf{G}^{(i)}$ is invertible): $$\mathbf{w}_i = \frac{\lambda}{2} (\mathbf{G}^{(i)})^{-1} \mathbf{1}$$
Substituting into the constraint: $$\mathbf{1}^T \mathbf{w}_i = \frac{\lambda}{2} \mathbf{1}^T (\mathbf{G}^{(i)})^{-1} \mathbf{1} = 1$$
Therefore: $$\lambda = \frac{2}{\mathbf{1}^T (\mathbf{G}^{(i)})^{-1} \mathbf{1}}$$
The Final Solution:
$$\mathbf{w}_i = \frac{(\mathbf{G}^{(i)})^{-1} \mathbf{1}}{\mathbf{1}^T (\mathbf{G}^{(i)})^{-1} \mathbf{1}}$$
This elegant formula shows that optimal weights are proportional to $\mathbf{G}^{-1}\mathbf{1}$, normalized to sum to one.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
import numpy as npfrom scipy.spatial import distance_matrix def compute_lle_weights(X, k=10, reg=1e-3): """ Compute LLE reconstruction weights for all points. Parameters: ----------- X : ndarray of shape (N, D) Data matrix with N points in D dimensions k : int Number of nearest neighbors reg : float Regularization parameter for ill-conditioned Gram matrices Returns: -------- W : ndarray of shape (N, N) Sparse weight matrix where W[i,j] = w_ij """ N, D = X.shape # Compute pairwise distances dist = distance_matrix(X, X) # Initialize sparse weight matrix W = np.zeros((N, N)) for i in range(N): # Find k nearest neighbors (excluding self) neighbor_indices = np.argsort(dist[i])[1:k+1] # Compute local Gram matrix # eta_j = x_i - x_j for each neighbor j x_i = X[i] X_neighbors = X[neighbor_indices] # Difference vectors: each row is (x_i - x_j) eta = x_i - X_neighbors # Shape: (k, D) # Local Gram matrix: G[l,m] = eta_l^T @ eta_m G = eta @ eta.T # Shape: (k, k) # Regularization for numerical stability # Add small value to diagonal if needed trace_G = np.trace(G) if trace_G > 0: G += reg * trace_G * np.eye(k) else: G += reg * np.eye(k) # Solve for weights: w = G^{-1} @ 1 / (1^T @ G^{-1} @ 1) ones = np.ones(k) try: # Solve G @ w_unnorm = ones w_unnorm = np.linalg.solve(G, ones) # Normalize to sum to 1 w = w_unnorm / np.sum(w_unnorm) except np.linalg.LinAlgError: # Fallback: uniform weights if Gram matrix is singular w = np.ones(k) / k print(f"Warning: Singular Gram matrix at point {i}") # Store weights in sparse matrix W[i, neighbor_indices] = w return W def compute_reconstruction_error(X, W): """ Compute the total reconstruction error for given weights. Error = sum_i ||x_i - sum_j w_ij * x_j||^2 """ N = X.shape[0] error = 0.0 for i in range(N): reconstruction = W[i] @ X # Weighted sum of all points residual = X[i] - reconstruction error += np.dot(residual, residual) return error # Example usageif __name__ == "__main__": # Generate Swiss Roll data np.random.seed(42) n_samples = 500 t = 1.5 * np.pi * (1 + 2 * np.random.rand(n_samples)) x = t * np.cos(t) y = 10 * np.random.rand(n_samples) z = t * np.sin(t) X = np.column_stack([x, y, z]) # Compute weights with different k values for k in [5, 10, 15, 20]: W = compute_lle_weights(X, k=k) error = compute_reconstruction_error(X, W) print(f"k={k:2d}: Reconstruction error = {error:.4f}")Instead of explicitly computing G⁻¹, we can solve the linear system G @ w = 1 for w, then normalize. This is numerically more stable and is the preferred implementation approach. The np.linalg.solve function uses LU decomposition which is efficient and stable.
The closed-form solution $\mathbf{w}_i = \mathbf{G}^{-1}\mathbf{1} / (\mathbf{1}^T \mathbf{G}^{-1}\mathbf{1})$ has deep geometric meaning that illuminates how LLE captures local structure.
Interpretation 1: Barycentric Coordinates
When neighbors lie exactly in a $d$-dimensional affine subspace containing $\mathbf{x}_i$, the weights are the barycentric coordinates of $\mathbf{x}_i$ with respect to the neighbors. Barycentric coordinates satisfy:
This connects LLE weights to classical computational geometry.
Interpretation 2: Projection onto Tangent Space
The weights describe how $\mathbf{x}_i$ projects onto the affine hull of its neighbors. If the neighbors span the local tangent space, the weights encode the local coordinate representation of $\mathbf{x}_i$.
Interpretation 3: Local Linear Regression
Finding weights that minimize reconstruction error is equivalent to solving a local linear regression problem: $$\min_\mathbf{w} |\mathbf{X}_\text{neighbors}^T \mathbf{w} - \mathbf{x}_i|^2 \quad \text{s.t. } \mathbf{1}^T \mathbf{w} = 1$$
This is constrained least squares in the local neighborhood.
| Weight Pattern | Geometric Meaning | Implication |
|---|---|---|
| Weights near 1/k (uniform) | x_i is near the centroid of neighbors | Point is in a 'flat' region |
| One weight ≈ 1, others ≈ 0 | x_i is very close to one neighbor | Possibly redundant data point |
| Negative weights | x_i requires extrapolation | Point is on the 'edge' of the local cloud |
| Large magnitude weights | Ill-conditioned local geometry | May need regularization |
| Weights decay with distance | Closer neighbors contribute more | Expected for well-sampled manifolds |
Negative Weights:
Unlike in some methods (e.g., kernel smoothing), LLE weights can be negative. This occurs when $\mathbf{x}_i$ is not in the convex hull of its neighbors—requiring extrapolation rather than interpolation.
Consider a simple 1D example: if neighbors are at positions 1 and 3, and $\mathbf{x}_i$ is at position 5, we need: $$w_1 \cdot 1 + w_2 \cdot 3 = 5 \quad \text{with } w_1 + w_2 = 1$$ Solving gives $w_1 = -1, w_2 = 2$—negative weight for the farther neighbor.
Negative weights are not errors; they naturally arise when the point lies outside the convex hull of its neighbors. However, many negative weights may indicate poor neighborhood selection.
Many negative weights can signal:
Monitoring weight distributions helps diagnose LLE issues.
The local Gram matrix $\mathbf{G}^{(i)}$ can be ill-conditioned or singular, leading to numerical instability. This is especially common when:
The Regularization Approach:
Add a small value to the diagonal of the Gram matrix: $$\tilde{\mathbf{G}}^{(i)} = \mathbf{G}^{(i)} + \alpha \cdot \text{tr}(\mathbf{G}^{(i)}) \cdot \mathbf{I}$$
where $\alpha$ is a small regularization parameter (typically $10^{-3}$ to $10^{-5}$).
Why This Works:
Regularization is equivalent to adding a penalty term to the optimization: $$\min_\mathbf{w} \left( \mathbf{w}^T \mathbf{G} \mathbf{w} + \gamma |\mathbf{w}|^2 \right) \quad \text{s.t. } \mathbf{1}^T \mathbf{w} = 1$$
The penalty discourages weights with large magnitude, improving stability without significantly biasing the solution.
Scaling with Trace:
Multiplying the regularization by $\text{tr}(\mathbf{G})$ makes the regularization scale-invariant. This ensures consistent behavior regardless of the data's absolute scale.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
import numpy as np def analyze_gram_conditioning(X, neighbor_indices): """ Analyze the conditioning of local Gram matrices. Returns condition numbers and eigenvalue analysis. """ N = len(neighbor_indices) condition_numbers = [] min_eigenvalues = [] max_eigenvalues = [] for i, neighbors in enumerate(neighbor_indices): x_i = X[i] X_neighbors = X[neighbors] # Difference vectors eta = x_i - X_neighbors # Local Gram matrix G = eta @ eta.T # Eigenvalue analysis eigvals = np.linalg.eigvalsh(G) eigvals_sorted = np.sort(eigvals)[::-1] # Descending min_eigval = eigvals_sorted[-1] max_eigval = eigvals_sorted[0] # Condition number (ratio of largest to smallest eigenvalue) if min_eigval > 1e-15: cond = max_eigval / min_eigval else: cond = np.inf condition_numbers.append(cond) min_eigenvalues.append(min_eigval) max_eigenvalues.append(max_eigval) return { 'condition_numbers': np.array(condition_numbers), 'min_eigenvalues': np.array(min_eigenvalues), 'max_eigenvalues': np.array(max_eigenvalues), } def choose_regularization(G, method='trace'): """ Automatically choose regularization strength. Methods: - 'trace': reg = alpha * trace(G) / k - 'eigval': reg based on eigenvalue gap analysis - 'fixed': use fixed small value """ k = G.shape[0] if method == 'trace': # Standard LLE regularization alpha = 1e-3 trace_G = np.trace(G) if trace_G > 0: reg = alpha * trace_G / k else: reg = 1e-6 elif method == 'eigval': # Analyze eigenvalue spectrum eigvals = np.linalg.eigvalsh(G) eigvals_sorted = np.sort(eigvals)[::-1] # Target condition number of 1e6 target_cond = 1e6 min_eigval = max(eigvals_sorted[0] / target_cond, 1e-10) reg = max(0, min_eigval - eigvals_sorted[-1]) elif method == 'fixed': reg = 1e-6 else: raise ValueError(f"Unknown method: {method}") return reg def solve_weights_robust(G, reg=None): """ Solve for weights with robust handling of ill-conditioning. """ k = G.shape[0] ones = np.ones(k) if reg is None: reg = choose_regularization(G) G_reg = G + reg * np.eye(k) # Use Cholesky decomposition for symmetric positive definite try: L = np.linalg.cholesky(G_reg) # Solve L @ y = ones, then L.T @ w = y y = np.linalg.solve(L, ones) w_unnorm = np.linalg.solve(L.T, y) except np.linalg.LinAlgError: # Fall back to general solver w_unnorm = np.linalg.lstsq(G_reg, ones, rcond=None)[0] # Normalize w = w_unnorm / np.sum(w_unnorm) return wHaving computed weights for each point individually, we assemble them into a global weight matrix $\mathbf{W} \in \mathbb{R}^{N \times N}$.
Structure of W:
$$W_{ij} = \begin{cases} w_{ij} & \text{if } j \in \mathcal{N}_i \ 0 & \text{otherwise} \end{cases}$$
Properties of W:
The Reconstruction Cost in Matrix Form:
The total reconstruction error can be written: $$\mathcal{E} = |(\mathbf{I} - \mathbf{W})\mathbf{X}|_F^2 = \text{tr}\left( \mathbf{X}^T (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W}) \mathbf{X} \right)$$
Define $\mathbf{M} = (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W})$, which we'll see is crucial for the embedding phase.
The Matrix M:
$$\mathbf{M} = (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W})$$
This matrix has important properties:
The last property implies $\mathbf{1}$ is an eigenvector with eigenvalue 0. This is the same structure as graph Laplacians, connecting LLE to spectral graph theory.
Explicit Form of M:
$$M_{ij} = \delta_{ij} - W_{ij} - W_{ji} + \sum_l W_{li} W_{lj}$$
where $\delta_{ij}$ is the Kronecker delta.
We don't need to explicitly compute the dense matrix M. In the embedding phase, we only need to apply (I - W)ᵀ(I - W) to vectors, which can be done efficiently using the sparse structure of W:
Total cost: O(Nk) per matrix-vector product.
The quality of LLE depends critically on how we select neighbors. Poor neighborhood choice can violate the local linearity assumption.
Method 1: k-Nearest Neighbors (k-NN)
The most common approach: for each point $\mathbf{x}_i$, select the $k$ points closest in Euclidean distance.
Advantages:
Disadvantages:
Method 2: ε-Neighborhoods
Select all points within Euclidean distance $\epsilon$ of $\mathbf{x}_i$.
Advantages:
Disadvantages:
| Aspect | k-NN | ε-Neighborhood | Recommendation |
|---|---|---|---|
| Neighborhood size | Fixed at k | Variable | Start with k-NN |
| Parameter tuning | Discrete, intuitive | Continuous, scale-dependent | k is easier to tune |
| Sparse regions | May include distant points | May have too few neighbors | Increase k or ε |
| Dense regions | May miss nearby points | May have too many neighbors | k-NN more stable |
| Short-circuit risk | Lower for small k | Lower for small ε | Validate neighborhood graph |
Choosing k:
The neighborhood size $k$ should satisfy:
Practical Guidelines:
| Dataset Size | Typical k Range | Notes |
|---|---|---|
| N < 100 | 3-10 | Very sensitive to k |
| N ∈ [100, 1000] | 5-20 | Most common regime |
| N ∈ [1000, 10000] | 10-50 | Can use larger k |
| N > 10000 | 10-100 | Computational considerations |
Validation Strategies:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
import numpy as npfrom sklearn.neighbors import NearestNeighborsimport matplotlib.pyplot as plt def analyze_neighborhood_selection(X, k_values): """ Analyze reconstruction error and weight statistics across different neighborhood sizes. """ results = [] for k in k_values: # Find k nearest neighbors nn = NearestNeighbors(n_neighbors=k+1) nn.fit(X) distances, indices = nn.kneighbors(X) # Remove self-connections neighbor_indices = indices[:, 1:] neighbor_distances = distances[:, 1:] # Compute weights and statistics N = X.shape[0] reconstruction_errors = [] weight_stats = [] for i in range(N): neighbors = neighbor_indices[i] x_i = X[i] X_neighbors = X[neighbors] # Compute Gram matrix eta = x_i - X_neighbors G = eta @ eta.T # Regularize reg = 1e-3 * np.trace(G) / k G_reg = G + reg * np.eye(k) # Solve for weights ones = np.ones(k) w = np.linalg.solve(G_reg, ones) w = w / np.sum(w) # Reconstruction reconstruction = X_neighbors.T @ w error = np.linalg.norm(x_i - reconstruction) reconstruction_errors.append(error) weight_stats.append({ 'min': np.min(w), 'max': np.max(w), 'negative_count': np.sum(w < 0), 'magnitude': np.linalg.norm(w), }) results.append({ 'k': k, 'mean_error': np.mean(reconstruction_errors), 'std_error': np.std(reconstruction_errors), 'max_error': np.max(reconstruction_errors), 'mean_negative_weights': np.mean([s['negative_count'] for s in weight_stats]), 'mean_weight_magnitude': np.mean([s['magnitude'] for s in weight_stats]), }) return results def plot_k_analysis(results): """ Visualize how LLE behavior changes with k. """ k_values = [r['k'] for r in results] mean_errors = [r['mean_error'] for r in results] neg_weights = [r['mean_negative_weights'] for r in results] fig, axes = plt.subplots(1, 2, figsize=(12, 4)) axes[0].plot(k_values, mean_errors, 'o-') axes[0].set_xlabel('Number of Neighbors (k)') axes[0].set_ylabel('Mean Reconstruction Error') axes[0].set_title('Reconstruction Error vs. k') axes[0].grid(True, alpha=0.3) axes[1].plot(k_values, neg_weights, 's-', color='orange') axes[1].set_xlabel('Number of Neighbors (k)') axes[1].set_ylabel('Mean Number of Negative Weights') axes[1].set_title('Negative Weights vs. k') axes[1].grid(True, alpha=0.3) plt.tight_layout() return figThere is a deep connection between LLE's weight computation and Local PCA (computing PCA in each neighborhood). Understanding this connection provides additional insight into what the weights capture.
Local Covariance Matrix:
For point $i$ with neighbors ${\mathbf{x}{j_l}}$, define the local covariance: $$\boldsymbol{\Sigma}i = \frac{1}{k} \sum{l=1}^k (\mathbf{x}{j_l} - \bar{\mathbf{x}}i)(\mathbf{x}{j_l} - \bar{\mathbf{x}}_i)^T$$
where $\bar{\mathbf{x}}_i$ is the centroid of the neighborhood.
The Local Gram Matrix Revisited:
Recall the LLE Gram matrix: $$G^{(i)}_{lm} = (\mathbf{x}i - \mathbf{x}{j_l})^T (\mathbf{x}i - \mathbf{x}{j_m})$$
If we let $\boldsymbol{\eta}_l = \mathbf{x}i - \mathbf{x}{j_l}$, then: $$\mathbf{G}^{(i)} = \mathbf{H}^T \mathbf{H}$$
where $\mathbf{H}$ is the matrix with rows $\boldsymbol{\eta}_l^T$.
Eigenstructure:
The eigenvalues of $\mathbf{G}^{(i)}$ are the squared singular values of $\mathbf{H}$. These relate to the local covariance eigenvalues, revealing the local dimensionality structure.
Local PCA for Understanding Weights:
If we perform PCA on the local neighborhood:
The reconstruction weights implicitly use this structure: they place more emphasis on neighbors that lie along principal directions.
If the local Gram matrix has rank r < k, it means the neighbors lie in an r-dimensional affine subspace. This happens when: • k > D (more neighbors than input dimensions) • Neighbors are collinear or coplanar • The local manifold has dimension less than k
This is why regularization is essential: it ensures the optimization has a unique solution.
LTSA: A Closely Related Method
Local Tangent Space Alignment (LTSA) makes the connection to local PCA explicit:
LTSA and LLE have similar asymptotic behavior but differ in how they encode local structure:
When the intrinsic dimension is known and the manifold is well-sampled, LTSA can be more accurate. When intrinsic dimension is uncertain, LLE's implicit approach is more robust.
Implementing LLE efficiently requires attention to computational complexity and memory usage.
Computational Complexity:
| Step | Naive | Optimized |
|---|---|---|
| Neighbor search | O(N²D) | O(N log N · D) with KD-tree |
| Gram matrix computation | O(Nk²D) | O(Nk²D) — already optimal |
| Solving for weights | O(Nk³) | O(Nk³) — each solve is O(k³) |
| Total weight computation | O(N²D + Nk³) | O(N log N · D + Nk³) |
For large N with moderate k, the bottleneck is neighbor finding, which can be accelerated with spatial data structures.
Memory Considerations:
Parallelization:
Weight computation is embarrassingly parallel:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155
import numpy as npfrom scipy.sparse import lil_matrix, csr_matrixfrom sklearn.neighbors import NearestNeighborsfrom concurrent.futures import ProcessPoolExecutorimport warnings class EfficientLLEWeights: """ Efficient, production-ready LLE weight computation. Features: - KD-tree for O(N log N) neighbor search - Sparse weight matrix storage - Parallel processing support - Robust numerical handling """ def __init__(self, n_neighbors=10, reg=1e-3, n_jobs=-1): """ Parameters: ----------- n_neighbors : int Number of neighbors for each point reg : float Regularization strength n_jobs : int Number of parallel jobs (-1 for all cores) """ self.n_neighbors = n_neighbors self.reg = reg self.n_jobs = n_jobs def fit(self, X): """ Compute LLE weights for dataset X. Parameters: ----------- X : ndarray of shape (N, D) Input data Returns: -------- self : fitted object with weight matrix W_ """ N, D = X.shape k = self.n_neighbors # Step 1: Find k nearest neighbors using KD-tree nn = NearestNeighbors( n_neighbors=k+1, # +1 because self is included algorithm='auto', # Chooses KD-tree or ball tree n_jobs=self.n_jobs ) nn.fit(X) _, indices = nn.kneighbors(X) # Remove self-connections self.neighbor_indices_ = indices[:, 1:] # Shape: (N, k) # Step 2: Compute weights (sparse storage) W = lil_matrix((N, N)) for i in range(N): neighbors = self.neighbor_indices_[i] w = self._compute_weights_single(X, i, neighbors) W[i, neighbors] = w # Convert to CSR for efficient arithmetic self.W_ = csr_matrix(W) return self def _compute_weights_single(self, X, i, neighbors): """ Compute weights for a single point. """ k = len(neighbors) x_i = X[i] X_neighbors = X[neighbors] # Difference vectors eta = x_i - X_neighbors # Shape: (k, D) # Local Gram matrix G = eta @ eta.T # Shape: (k, k) # Regularization trace_G = np.trace(G) if trace_G > 0: reg_value = self.reg * trace_G / k else: reg_value = self.reg G += reg_value * np.eye(k) # Solve for weights ones = np.ones(k) try: w = np.linalg.solve(G, ones) except np.linalg.LinAlgError: warnings.warn(f"Singular Gram matrix at point {i}") w = np.ones(k) # Normalize w = w / np.sum(w) return w def get_cost_matrix(self): """ Return the cost matrix M = (I - W)^T (I - W). This is computed implicitly without forming I - W explicitly. """ if not hasattr(self, 'W_'): raise ValueError("Must call fit() first") # I - W I_minus_W = csr_matrix(np.eye(self.W_.shape[0])) - self.W_ # M = (I - W)^T (I - W) M = I_minus_W.T @ I_minus_W return M def reconstruction_error(self, X): """ Compute total reconstruction error ||X - WX||_F^2. """ if not hasattr(self, 'W_'): raise ValueError("Must call fit() first") reconstructed = self.W_ @ X residual = X - reconstructed error = np.sum(residual ** 2) return error # Demonstrationif __name__ == "__main__": from sklearn.datasets import make_swiss_roll # Generate data X, t = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42) # Compute weights lle_weights = EfficientLLEWeights(n_neighbors=12, reg=1e-3) lle_weights.fit(X) # Analyze print(f"Weight matrix shape: {lle_weights.W_.shape}") print(f"Non-zero entries: {lle_weights.W_.nnz}") print(f"Sparsity: {1 - lle_weights.W_.nnz / np.prod(lle_weights.W_.shape):.4f}") print(f"Reconstruction error: {lle_weights.reconstruction_error(X):.4f}")We have developed a complete understanding of how LLE computes reconstruction weights that encode local linear geometry.
What's Next:
With weights computed, we move to the second phase of LLE: embedding optimization. We'll see how the weights guide discovery of low-dimensional coordinates that preserve local geometry. The optimization involves solving an eigenvalue problem on the cost matrix M = (I - W)ᵀ(I - W), connecting LLE to spectral methods and providing the final embedding.
You now have a comprehensive understanding of LLE weight computation—from the optimization formulation through closed-form solution, regularization, and efficient implementation. These weights are the bridge carrying local geometric information from high-dimensional to low-dimensional space.