Machine LearningManifold Learning

Locally Linear Embedding

LevelAdvanced

Duration90 mins

TopicManifold Learning

3 / 5

Embedding Optimization

From Weights to Coordinates

We now possess the reconstruction weights that encode local linear geometry—but these weights live in the original high-dimensional space. The critical question remains: How do we find low-dimensional coordinates that respect these local relationships?

This is the second phase of LLE: embedding optimization. Given weights $\mathbf{W}$ computed in high-dimensional space, we seek low-dimensional embedding coordinates ${\mathbf{y}_i} \in \mathbb{R}^d$ such that each point is approximately reconstructed by its neighbors using the same weights.

The mathematical structure of this problem is elegant: it reduces to an eigenvalue problem on a specific matrix constructed from the weights. This connects LLE to spectral methods and provides efficient, stable numerical solutions.

This page develops the embedding optimization in full detail, from the objective function through the eigenvalue formulation to practical algorithms.

Learning Objectives

By the end of this page, you will: • Formulate the embedding optimization problem mathematically • Derive the eigenvalue problem that provides optimal coordinates • Understand the spectral properties of the cost matrix • Implement efficient algorithms for computing the embedding • Recognize the significance of the smallest non-zero eigenvalues

The Embedding Objective Function

With weights $\mathbf{W}$ fixed from the first phase, we seek embedding coordinates ${\mathbf{y}i}{i=1}^N$ in $\mathbb{R}^d$ that minimize the embedding cost:

$$\Phi(\mathbf{Y}) = \sum_{i=1}^N \left| \mathbf{y}i - \sum{j=1}^N W_{ij} \mathbf{y}_j \right|^2$$

The Key Insight:

The weights were chosen to minimize reconstruction error in the high-dimensional space. By finding low-dimensional coordinates $\mathbf{y}_i$ that also have small reconstruction error with the same weights, we preserve the local linear structure.

Matrix Formulation:

Let $\mathbf{Y} \in \mathbb{R}^{N \times d}$ be the matrix of embedding coordinates, with $\mathbf{y}_i^T$ as the $i$-th row. The embedding cost becomes:

$$\Phi(\mathbf{Y}) = |(\mathbf{I} - \mathbf{W})\mathbf{Y}|_F^2 = \text{tr}\left( \mathbf{Y}^T (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W}) \mathbf{Y} \right)$$

Defining the cost matrix: $$\mathbf{M} = (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W})$$

The objective simplifies to: $$\Phi(\mathbf{Y}) = \text{tr}(\mathbf{Y}^T \mathbf{M} \mathbf{Y})$$

Properties of the Cost Matrix M

The cost matrix M has crucial properties: • Symmetric: M = Mᵀ (by construction) • Positive Semi-Definite: All eigenvalues λ ≥ 0 • Sparse: Inherits sparsity from W • Graph Laplacian-like: M1 = 0 (the all-ones vector is in the null space)

These properties connect LLE to spectral graph theory and ensure well-posed optimization.

Constraints on the Embedding

Minimizing $\text{tr}(\mathbf{Y}^T \mathbf{M} \mathbf{Y})$ without constraints yields a trivial solution: $\mathbf{Y} = \mathbf{0}$ (all points at the origin). To obtain meaningful embeddings, we impose constraints.

Constraint 1: Centering

The embedding should be centered at the origin: $$\sum_{i=1}^N \mathbf{y}_i = \mathbf{0} \quad \Leftrightarrow \quad \mathbf{Y}^T \mathbf{1} = \mathbf{0}$$

This removes the trivial translational degree of freedom.

Constraint 2: Unit Covariance

To prevent the embedding from collapsing to a point, we require: $$\frac{1}{N} \mathbf{Y}^T \mathbf{Y} = \mathbf{I}_d$$

or equivalently: $\mathbf{Y}^T \mathbf{Y} = N \mathbf{I}_d$.

This ensures the embedding has unit variance along each coordinate and the coordinates are uncorrelated.

The Constrained Problem:

$$\min_{\mathbf{Y}} \text{tr}(\mathbf{Y}^T \mathbf{M} \mathbf{Y})$$ subject to: $$\mathbf{Y}^T \mathbf{1} = \mathbf{0} \quad \text{(centering)}$$ $$\mathbf{Y}^T \mathbf{Y} = N \mathbf{I}_d \quad \text{(unit covariance)}$$

Why These Constraints Work:

The centering constraint eliminates the zero eigenvalue corresponding to uniform translation. The covariance constraint forces the solution to span a non-trivial subspace.

Consider what happens without the covariance constraint:

The trivial solution $\mathbf{Y} = \mathbf{0}$ achieves $\Phi = 0$
Any constant vector $\mathbf{y}i = \mathbf{c}$ also achieves $\Phi = 0$ (since $\sum_j W{ij} = 1$)

With the covariance constraint:

Points must spread out in $d$ dimensions
We seek the "least costly" way to achieve this spread

Connection to Eigenvalue Problems:

This constrained optimization is equivalent to an eigenvalue problem. We seek the $d$ directions of minimum "cost" after excluding the trivial zero-eigenvalue direction.

Alternative Constraint Formulation

Some implementations use the constraint Yᵀ Y = I (orthonormal columns) rather than Yᵀ Y = N·I. This is equivalent up to scaling: multiply the resulting Y by √N. The eigenvalue problem is identical; only the final scaling differs.

Derivation of the Eigenvalue Problem

The constrained optimization can be solved using Lagrange multipliers, leading to an eigenvalue problem.

Step 1: Write the Lagrangian

For each embedding coordinate $\mathbf{y}^{(j)} \in \mathbb{R}^N$ (the $j$-th column of $\mathbf{Y}$):

$$\mathcal{L}(\mathbf{y}^{(j)}, \lambda_j) = (\mathbf{y}^{(j)})^T \mathbf{M} \mathbf{y}^{(j)} - \lambda_j \left( (\mathbf{y}^{(j)})^T \mathbf{y}^{(j)} - N \right)$$

Step 2: First-Order Condition

$$\frac{\partial \mathcal{L}}{\partial \mathbf{y}^{(j)}} = 2\mathbf{M} \mathbf{y}^{(j)} - 2\lambda_j \mathbf{y}^{(j)} = \mathbf{0}$$

This gives the eigenvalue equation: $$\mathbf{M} \mathbf{y}^{(j)} = \lambda_j \mathbf{y}^{(j)}$$

Step 3: Selecting Eigenvectors

The objective at the solution is: $$(\mathbf{y}^{(j)})^T \mathbf{M} \mathbf{y}^{(j)} = \lambda_j (\mathbf{y}^{(j)})^T \mathbf{y}^{(j)} = N \lambda_j$$

To minimize the total cost, we select eigenvectors corresponding to the smallest eigenvalues.

Step 4: The Null Space Issue

Since $\mathbf{M}\mathbf{1} = \mathbf{0}$, the constant vector $\mathbf{1}$ is an eigenvector with eigenvalue 0. This corresponds to the trivial solution of placing all points at the same location.

The Solution:

The optimal $d$-dimensional embedding uses the eigenvectors corresponding to eigenvalues $\lambda_2 \leq \lambda_3 \leq \ldots \leq \lambda_{d+1}$, where we skip $\lambda_1 = 0$ (the constant eigenvector).

Eigenvalue Interpretation
Eigenvalue	Eigenvector	Meaning	Use
λ₁ = 0	v₁ = 1 (constant)	Trivial translation mode	Skip (centering constraint)
λ₂ ≤ λ₃ ≤ ... ≤ λ_{d+1}	v₂, v₃, ..., v_{d+1}	Minimum cost embedding directions	Use as embedding coordinates
λ₂ = 0 (with λ₁)	Multiple zero eigenvalues	Disconnected components or singularity	Diagnostic warning
λ_{d+2}, ..., λ_N	Remaining eigenvectors	Higher cost directions, noise	Discard

The Final Algorithm:

Compute the cost matrix $\mathbf{M} = (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W})$
Find the $(d+1)$ smallest eigenvalues and eigenvectors of $\mathbf{M}$
Discard the smallest eigenvalue (λ₁ = 0 with eigenvector 1)
Use eigenvectors 2 through (d+1) as the d-dimensional embedding

The embedding coordinates are: $$\mathbf{Y} = [\mathbf{v}_2, \mathbf{v}3, \ldots, \mathbf{v}{d+1}]$$

where $\mathbf{v}_i$ are the eigenvectors normalized to have $|\mathbf{v}_i|^2 = N$.

Handling Near-Zero Eigenvalues

If λ₂ is very small (near zero), it may indicate: • Disconnected neighborhoods (the graph is not connected) • Near-degenerate structure in the data • Numerical issues in weight computation

Always check the eigenvalue spectrum before using the embedding.

Spectral Properties of the Cost Matrix

Understanding the spectral properties of $\mathbf{M}$ provides insight into LLE's behavior and helps diagnose potential issues.

Theorem: The Constant Null Space

The all-ones vector $\mathbf{1}$ is always in the null space of $\mathbf{M}$:

Proof: Since $\sum_j W_{ij} = 1$ for all $i$, we have $\mathbf{W}\mathbf{1} = \mathbf{1}$. Therefore: $$(\mathbf{I} - \mathbf{W})\mathbf{1} = \mathbf{1} - \mathbf{W}\mathbf{1} = \mathbf{1} - \mathbf{1} = \mathbf{0}$$ Thus $\mathbf{M}\mathbf{1} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})\mathbf{1} = \mathbf{0}$.

Positive Semi-Definiteness:

$\mathbf{M} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})$ is always positive semi-definite: $$\mathbf{v}^T \mathbf{M} \mathbf{v} = |(\mathbf{I} - \mathbf{W})\mathbf{v}|^2 \geq 0$$

Rank and Null Space:

The dimension of the null space of $\mathbf{M}$ equals the dimension of the null space of $(\mathbf{I} - \mathbf{W})$. For well-constructed weights with a connected neighborhood graph, the null space is one-dimensional (spanned by $\mathbf{1}$).

If there are $c$ connected components in the neighborhood graph, the null space has dimension $c$ (one constant vector per component).

spectral_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
import numpy as np
from scipy.sparse.linalg import eigsh
from scipy.linalg import eigh
import matplotlib.pyplot as plt
 
def analyze_lle_spectrum(W, n_eigenvalues=20):
    """
    Analyze the spectrum of the LLE cost matrix M = (I - W)^T (I - W).
    
    Parameters:
    -----------
    W : ndarray of shape (N, N)
        LLE weight matrix (sparse or dense)
    n_eigenvalues : int
        Number of smallest eigenvalues to compute
        
    Returns:
    --------
    eigenvalues : array
        Sorted eigenvalues
    eigenvectors : array
        Corresponding eigenvectors
    analysis : dict
        Diagnostic information
    """
    N = W.shape[0]
    
    # Compute I - W
    if hasattr(W, 'toarray'):
        I_minus_W = np.eye(N) - W.toarray()
    else:
        I_minus_W = np.eye(N) - W
    
    # Cost matrix M = (I - W)^T @ (I - W)
    M = I_minus_W.T @ I_minus_W
    
    # Compute eigenvalues
    if N < 500:
        # Full eigendecomposition for small matrices
        eigenvalues, eigenvectors = eigh(M)
    else:
        # Sparse eigenvalue solver for larger matrices
        # Request smallest eigenvalues
        eigenvalues, eigenvectors = eigsh(
            M, k=min(n_eigenvalues, N-1), 
            which='SM',  # Smallest magnitude
            tol=1e-10
        )
        # Sort by eigenvalue
        idx = np.argsort(eigenvalues)
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
    
    # Diagnostic analysis
    analysis = {
        'smallest_eigenvalue': eigenvalues[0],
        'second_smallest': eigenvalues[1] if len(eigenvalues) > 1 else None,
        'eigenvalue_gap': eigenvalues[1] - eigenvalues[0] if len(eigenvalues) > 1 else None,
        'num_near_zero': np.sum(eigenvalues < 1e-10),
        'spectral_gap_ratio': eigenvalues[1] / eigenvalues[2] if len(eigenvalues) > 2 and eigenvalues[2] > 0 else None,
    }
    
    # Check null space dimension
    null_dim = np.sum(np.abs(eigenvalues) < 1e-10)
    analysis['null_space_dim'] = null_dim
    
    if null_dim > 1:
        analysis['warning'] = f"Null space dimension {null_dim} > 1: graph may be disconnected"
    
    return eigenvalues, eigenvectors, analysis
 
 
def plot_eigenvalue_spectrum(eigenvalues, title="LLE Eigenvalue Spectrum"):
    """
    Visualize the eigenvalue spectrum.
    """
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # Linear scale
    axes[0].plot(eigenvalues[:min(50, len(eigenvalues))], 'o-')
    axes[0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
    axes[0].set_xlabel('Eigenvalue Index')
    axes[0].set_ylabel('Eigenvalue')
    axes[0].set_title(f'{title} (Linear Scale)')
    axes[0].grid(True, alpha=0.3)
    
    # Log scale (for positive eigenvalues)
    positive_eigs = eigenvalues[eigenvalues > 1e-15]
    axes[1].semilogy(positive_eigs[:min(50, len(positive_eigs))], 'o-')
    axes[1].set_xlabel('Eigenvalue Index (positive only)')
    axes[1].set_ylabel('Eigenvalue (log scale)')
    axes[1].set_title(f'{title} (Log Scale)')
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    return fig
 
 
def check_embedding_quality(Y, W):
    """
    Check how well the embedding preserves local structure.
    
    Returns the embedding cost and per-point errors.
    """
    N = Y.shape[0]
    
    # Reconstruction in embedded space
    Y_reconstructed = W @ Y
    
    # Per-point reconstruction error
    errors = np.linalg.norm(Y - Y_reconstructed, axis=1) ** 2
    
    # Total cost
    total_cost = np.sum(errors)
    
    return {
        'total_cost': total_cost,
        'mean_error': np.mean(errors),
        'max_error': np.max(errors),
        'errors': errors,
    }

The Spectral Gap

The gap between λ₁ = 0 and λ₂ (the "spectral gap") indicates how well the embedding separates trivial from meaningful structure. A large spectral gap suggests a stable, well-defined embedding. A very small gap may indicate near-degeneracy or disconnected components.

Efficient Eigenvalue Computation

Computing the full eigendecomposition of an N×N matrix costs O(N³)—prohibitive for large datasets. Fortunately, we only need the d+1 smallest eigenvalues, enabling efficient iterative methods.

The ARPACK Approach:

The ARPACK library (used by scipy.sparse.linalg.eigsh) uses the Implicitly Restarted Lanczos Method:

Cost: O(N × k × nnz(M)) for k eigenvalues, where nnz is number of non-zeros
For sparse M with O(Nk) non-zeros: O(N × d × Nk) = O(N²dk)
Much faster than O(N³) when N is large

Challenge: Smallest vs. Largest Eigenvalues

Iterative eigensolvers are most efficient for finding largest eigenvalues. Finding smallest eigenvalues requires:

Shift-Invert Mode: Solve $(\mathbf{M} - \sigma\mathbf{I})^{-1}$ for eigenvalues near $\sigma = 0$
- Converts smallest eigenvalues of M to largest of $(\mathbf{M} - \sigma\mathbf{I})^{-1}$
- Requires solving linear systems with (M - σI)
Working with M Directly: The 'SM' (smallest magnitude) mode
- May be slower and less numerically stable
- Simpler to implement

Practical Recommendation:

For most applications (N < 10,000), shift-invert with σ = 0 works well. For very large N, consider approximate methods like randomized eigendecomposition.

efficient_embedding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
import numpy as np
from scipy.sparse import csr_matrix, eye as sparse_eye
from scipy.sparse.linalg import eigsh, LinearOperator
from scipy.linalg import eigh
 
def compute_lle_embedding(W, n_components=2, method='arpack'):
    """
    Compute the LLE embedding from the weight matrix.
    
    Parameters:
    -----------
    W : sparse matrix or ndarray of shape (N, N)
        LLE weight matrix
    n_components : int
        Target dimensionality
    method : str
        'arpack' for sparse solver, 'dense' for full decomposition
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    eigenvalues : ndarray
        Eigenvalues (for diagnostics)
    """
    N = W.shape[0]
    
    # Convert to sparse if needed
    if not hasattr(W, 'toarray'):
        W = csr_matrix(W)
    
    # Construct I - W
    I = sparse_eye(N, format='csr')
    I_minus_W = I - W
    
    # Method 1: Dense (for small N)
    if method == 'dense' or N < 500:
        M = (I_minus_W.T @ I_minus_W).toarray()
        eigenvalues, eigenvectors = eigh(M)
        
    # Method 2: Sparse ARPACK with shift-invert
    elif method == 'arpack':
        # Define M as a linear operator for memory efficiency
        def matvec(v):
            temp = I_minus_W @ v
            return I_minus_W.T @ temp
        
        M_op = LinearOperator((N, N), matvec=matvec, dtype=float)
        
        # For smallest eigenvalues, we use sigma=0.0 (shift-invert)
        # But eigsh with 'SM' can work for moderately sized problems
        M_dense = (I_minus_W.T @ I_minus_W).toarray()  # For shift-invert
        
        try:
            # Shift-invert mode finds eigenvalues near sigma
            eigenvalues, eigenvectors = eigsh(
                M_dense,
                k=n_components + 1,
                sigma=0.0,  # Shift near 0
                which='LM',  # Largest magnitude after shift
                tol=1e-10,
            )
        except Exception as e:
            # Fallback to SM mode
            eigenvalues, eigenvectors = eigsh(
                M_dense,
                k=n_components + 1,
                which='SM',
                tol=1e-10,
            )
    
    # Sort by eigenvalue
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip the first eigenvector (constant, λ ≈ 0)
    # Use eigenvectors 1 through n_components as embedding
    Y = eigenvectors[:, 1:n_components+1]
    
    # Rescale to have unit variance per dimension
    Y = Y * np.sqrt(N)
    
    return Y, eigenvalues[:n_components+2]
 
 
def compute_lle_embedding_iterative(W, n_components=2, tol=1e-8, max_iter=1000):
    """
    Compute LLE embedding using power iteration with deflation.
    
    This is useful when scipy.sparse is unavailable or for
    educational purposes to understand the algorithm.
    """
    N = W.shape[0]
    
    # Convert to dense for simplicity
    if hasattr(W, 'toarray'):
        W = W.toarray()
    
    # Compute M = (I - W)^T (I - W)
    I_minus_W = np.eye(N) - W
    M = I_minus_W.T @ I_minus_W
    
    eigenvectors = []
    eigenvalues = []
    
    # Find n_components + 1 smallest eigenvalues using inverse iteration
    # (We work with M^{-1} conceptually, but use linear solves)
    
    for k in range(n_components + 1):
        # Initialize random vector
        v = np.random.randn(N)
        v = v / np.linalg.norm(v)
        
        # Deflate: orthogonalize against previously found eigenvectors
        for ev in eigenvectors:
            v = v - np.dot(v, ev) * ev
        v = v / np.linalg.norm(v)
        
        # Power iteration on M (for smallest eigenvalues, this is slow)
        # Instead, we solve the eigenvalue problem directly
        # This is just for illustration; use scipy.linalg.eigh in practice
        
        for _ in range(max_iter):
            v_new = M @ v
            
            # Deflate
            for ev in eigenvectors:
                v_new = v_new - np.dot(v_new, ev) * ev
            
            # Normalize
            norm = np.linalg.norm(v_new)
            if norm < 1e-15:
                break
            v_new = v_new / norm
            
            # Check convergence
            if np.linalg.norm(v_new - v) < tol:
                break
            v = v_new
        
        eigenvalue = v.T @ M @ v
        eigenvectors.append(v)
        eigenvalues.append(eigenvalue)
    
    # Stack and sort
    eigenvectors = np.column_stack(eigenvectors)
    eigenvalues = np.array(eigenvalues)
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip first eigenvector
    Y = eigenvectors[:, 1:n_components+1] * np.sqrt(N)
    
    return Y, eigenvalues

The Complete LLE Algorithm

We now synthesize the weight computation and embedding optimization into the complete LLE algorithm.

Algorithm: Locally Linear Embedding

Input: Data matrix $\mathbf{X} \in \mathbb{R}^{N \times D}$, number of neighbors $k$, target dimension $d$, regularization $\alpha$

Output: Embedding $\mathbf{Y} \in \mathbb{R}^{N \times d}$

Phase 1: Compute Reconstruction Weights

For each point $i = 1, \ldots, N$:
- Find $k$ nearest neighbors $\mathcal{N}_i$
- Compute local Gram matrix: $G_{jl} = (\mathbf{x}_i - \mathbf{x}_j)^T(\mathbf{x}_i - \mathbf{x}_l)$ for $j, l \in \mathcal{N}_i$
- Regularize: $\mathbf{G} \leftarrow \mathbf{G} + \alpha \cdot \text{tr}(\mathbf{G}) \cdot \mathbf{I}$
- Solve for weights: $\mathbf{w}_i = \mathbf{G}^{-1}\mathbf{1} / (\mathbf{1}^T\mathbf{G}^{-1}\mathbf{1})$
- Store in weight matrix: $W_{ij} = w_{ij}$ for $j \in \mathcal{N}_i$

Phase 2: Compute Embedding Coordinates

Form cost matrix: $\mathbf{M} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})$
Compute $d+1$ smallest eigenvalues/eigenvectors of $\mathbf{M}$
Discard eigenvector with eigenvalue $\lambda_1 \approx 0$
Set embedding: $\mathbf{Y} = [\mathbf{v}2, \ldots, \mathbf{v}{d+1}] \cdot \sqrt{N}$

Return: $\mathbf{Y}$

complete_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import lil_matrix, csr_matrix, eye as sparse_eye
from scipy.linalg import eigh
 
class LocallyLinearEmbedding:
    """
    Complete implementation of Locally Linear Embedding.
    
    This is an educational implementation that closely follows
    the algorithm as described. For production use, consider
    sklearn.manifold.LocallyLinearEmbedding.
    """
    
    def __init__(self, n_neighbors=10, n_components=2, reg=1e-3):
        """
        Parameters:
        -----------
        n_neighbors : int
            Number of neighbors for each point
        n_components : int
            Target dimensionality
        reg : float
            Regularization strength
        """
        self.n_neighbors = n_neighbors
        self.n_components = n_components
        self.reg = reg
        
    def fit_transform(self, X):
        """
        Compute the LLE embedding.
        
        Parameters:
        -----------
        X : ndarray of shape (N, D)
            Input data
            
        Returns:
        --------
        Y : ndarray of shape (N, n_components)
            Embedding
        """
        N, D = X.shape
        k = self.n_neighbors
        
        # ==========================================
        # PHASE 1: Compute Reconstruction Weights
        # ==========================================
        
        # Step 1.1: Find k nearest neighbors
        nn = NearestNeighbors(n_neighbors=k+1, algorithm='auto')
        nn.fit(X)
        _, neighbor_indices = nn.kneighbors(X)
        neighbor_indices = neighbor_indices[:, 1:]  # Remove self
        
        # Step 1.2: Compute weights for each point
        W = lil_matrix((N, N))
        
        for i in range(N):
            neighbors = neighbor_indices[i]
            x_i = X[i]
            X_neighbors = X[neighbors]
            
            # Difference vectors
            eta = x_i - X_neighbors  # Shape: (k, D)
            
            # Local Gram matrix
            G = eta @ eta.T  # Shape: (k, k)
            
            # Regularization
            trace = np.trace(G)
            if trace > 0:
                G += self.reg * trace / k * np.eye(k)
            else:
                G += self.reg * np.eye(k)
            
            # Solve for weights
            ones = np.ones(k)
            try:
                w = np.linalg.solve(G, ones)
                w = w / np.sum(w)
            except np.linalg.LinAlgError:
                w = np.ones(k) / k
            
            W[i, neighbors] = w
        
        W = csr_matrix(W)
        self.W_ = W
        
        # ==========================================
        # PHASE 2: Compute Embedding Coordinates
        # ==========================================
        
        # Step 2.1: Form cost matrix M = (I - W)^T (I - W)
        I = sparse_eye(N, format='csr')
        I_minus_W = I - W
        M = (I_minus_W.T @ I_minus_W).toarray()
        
        # Enforce exact symmetry
        M = (M + M.T) / 2
        
        # Step 2.2: Eigendecomposition
        eigenvalues, eigenvectors = eigh(M)
        
        # Step 2.3: Select embedding coordinates
        # Skip first eigenvector (λ ≈ 0, corresponds to constant)
        idx = np.argsort(eigenvalues)
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
        
        self.eigenvalues_ = eigenvalues
        
        # Use eigenvectors 1 through n_components
        Y = eigenvectors[:, 1:self.n_components+1]
        
        # Scale to unit variance
        Y = Y * np.sqrt(N)
        
        self.embedding_ = Y
        
        return Y
    
    def reconstruction_error(self, X):
        """Compute reconstruction error in original space."""
        return np.sum((X - self.W_ @ X) ** 2)
    
    def embedding_cost(self):
        """Compute embedding cost Tr(Y^T M Y)."""
        if not hasattr(self, 'embedding_'):
            raise ValueError("Must call fit_transform first")
        
        I_minus_W = sparse_eye(self.W_.shape[0]) - self.W_
        residual = I_minus_W @ self.embedding_
        return np.sum(residual ** 2)
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    import matplotlib.pyplot as plt
    
    # Generate Swiss Roll
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Apply LLE
    lle = LocallyLinearEmbedding(n_neighbors=12, n_components=2)
    Y = lle.fit_transform(X)
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Original 3D
    ax1 = fig.add_subplot(121, projection='3d')
    ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap='viridis', s=10)
    ax1.set_title('Original Swiss Roll (3D)')
    
    # LLE embedding
    axes[1].scatter(Y[:, 0], Y[:, 1], c=color, cmap='viridis', s=10)
    axes[1].set_title('LLE Embedding (2D)')
    axes[1].set_xlabel('LLE 1')
    axes[1].set_ylabel('LLE 2')
    
    plt.tight_layout()
    plt.show()
    
    # Diagnostics
    print(f"Reconstruction error: {lle.reconstruction_error(X):.4f}")
    print(f"Embedding cost: {lle.embedding_cost():.4f}")
    print(f"First 5 eigenvalues: {lle.eigenvalues_[:5]}")

Theoretical Analysis and Guarantees

What guarantees does LLE provide? Under what conditions does it recover the true manifold structure?

Asymptotic Consistency:

Under certain regularity conditions, LLE is asymptotically consistent: as the number of samples $N \to \infty$ with appropriate scaling of the neighborhood size, the embedding converges to the true low-dimensional parameterization (up to affine transformation).

Key Assumptions for Consistency:

The manifold is smooth (at least $C^2$)
Sampling is uniform on the manifold
Neighborhood size scales as $k \sim N^\beta$ for appropriate $\beta$
The manifold has no self-intersections

The Isometry Property:

LLE approximately preserves local isometry: if two points are close on the manifold, their embedded positions are close. However, LLE does not preserve global distances—points far apart on the manifold may be close in the embedding if they are Euclidean neighbors.

Comparison with Isomap:

Property	LLE	Isomap
Preserves	Local linear structure	Geodesic distances
Global geometry	Not explicitly preserved	Approximately preserved
Distortion	Local neighborhoods preserved	Isometric to flat space
Assumption	Local linearity	Manifold is isometric to ℝᵈ

The Embedding Quality:

Define the embedding cost: $$\Phi(\mathbf{Y}) = \sum_{i=1}^N \left| \mathbf{y}i - \sum_j W{ij} \mathbf{y}_j \right|^2$$

For the optimal embedding using eigenvectors of $\mathbf{M}$: $$\Phi^* = \sum_{j=2}^{d+1} \lambda_j$$

The embedding cost equals the sum of the $d$ smallest non-zero eigenvalues. This provides a measure of how well the embedding preserves local structure.

Interpreting Eigenvalues:

$\lambda_2, \ldots, \lambda_{d+1}$ small: Good embedding; local structure well-preserved
$\lambda_2$ very small (but > 0): Strong manifold structure clearly captured
$\lambda_{d+1} \ll \lambda_{d+2}$: Good choice of $d$; clear spectral gap
$\lambda_{d+2} \approx \lambda_{d+1}$: Intrinsic dimensionality may be higher than $d$

Selecting Dimensionality:

The eigenvalue spectrum can guide dimensionality selection. Look for a "gap" or "elbow" in the eigenvalue plot—the intrinsic dimensionality is suggested by where eigenvalues transition from small to large.

Residual Variance Method

One approach to selecting d: plot the cumulative sum of eigenvalues ∑λⱼ for j = 2, ..., d+1 as a function of d. When this "residual variance" levels off, additional dimensions add little new structure. This is analogous to the scree plot in PCA.

Summary and What's Next

We have developed the complete theory and practice of LLE embedding optimization.

Key Takeaways

•Embedding Objective: Minimize Φ(Y) = ||Y - WY||² subject to centering and unit covariance
•Cost Matrix: M = (I - W)ᵀ(I - W) is symmetric positive semi-definite with 1 in null space
•Eigenvalue Solution: Optimal embedding uses d smallest non-zero eigenvectors of M
•Spectral Properties: The null space dimension indicates graph connectivity; spectral gap indicates embedding quality
•Efficient Computation: Sparse eigensolvers with shift-invert mode are efficient for large N
•Embedding Cost: Equals sum of eigenvalues λ₂ + ... + λ_{d+1}

What's Next:

In the next page, we explore LLE variants—modifications and extensions that address limitations of the basic algorithm. These include Hessian LLE (which provides better theoretical guarantees), Modified LLE (for handling non-convex neighborhoods), and others that expand the applicability and robustness of the LLE family.

Page Complete

You now understand how LLE transforms reconstruction weights into low-dimensional coordinates through eigenvalue optimization. This spectral approach connects LLE to the broader family of spectral methods in machine learning.

3 / 5

Loading learning content...

Machine LearningManifold Learning

Locally Linear Embedding

LevelAdvanced

Duration90 mins

TopicManifold Learning

3 / 5

Embedding Optimization

From Weights to Coordinates

This page develops the embedding optimization in full detail, from the objective function through the eigenvalue formulation to practical algorithms.

Learning Objectives

The Embedding Objective Function

With weights $\mathbf{W}$ fixed from the first phase, we seek embedding coordinates ${\mathbf{y}i}{i=1}^N$ in $\mathbb{R}^d$ that minimize the embedding cost:

$$\Phi(\mathbf{Y}) = \sum_{i=1}^N \left| \mathbf{y}i - \sum{j=1}^N W_{ij} \mathbf{y}_j \right|^2$$

The Key Insight:

Matrix Formulation:

Let $\mathbf{Y} \in \mathbb{R}^{N \times d}$ be the matrix of embedding coordinates, with $\mathbf{y}_i^T$ as the $i$-th row. The embedding cost becomes:

$$\Phi(\mathbf{Y}) = |(\mathbf{I} - \mathbf{W})\mathbf{Y}|_F^2 = \text{tr}\left( \mathbf{Y}^T (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W}) \mathbf{Y} \right)$$

Defining the cost matrix: $$\mathbf{M} = (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W})$$

The objective simplifies to: $$\Phi(\mathbf{Y}) = \text{tr}(\mathbf{Y}^T \mathbf{M} \mathbf{Y})$$

Properties of the Cost Matrix M

These properties connect LLE to spectral graph theory and ensure well-posed optimization.

Constraints on the Embedding

Constraint 1: Centering

The embedding should be centered at the origin: $$\sum_{i=1}^N \mathbf{y}_i = \mathbf{0} \quad \Leftrightarrow \quad \mathbf{Y}^T \mathbf{1} = \mathbf{0}$$

This removes the trivial translational degree of freedom.

Constraint 2: Unit Covariance

To prevent the embedding from collapsing to a point, we require: $$\frac{1}{N} \mathbf{Y}^T \mathbf{Y} = \mathbf{I}_d$$

or equivalently: $\mathbf{Y}^T \mathbf{Y} = N \mathbf{I}_d$.

This ensures the embedding has unit variance along each coordinate and the coordinates are uncorrelated.

The Constrained Problem:

Why These Constraints Work:

The centering constraint eliminates the zero eigenvalue corresponding to uniform translation. The covariance constraint forces the solution to span a non-trivial subspace.

Consider what happens without the covariance constraint:

The trivial solution $\mathbf{Y} = \mathbf{0}$ achieves $\Phi = 0$
Any constant vector $\mathbf{y}i = \mathbf{c}$ also achieves $\Phi = 0$ (since $\sum_j W{ij} = 1$)

With the covariance constraint:

Points must spread out in $d$ dimensions
We seek the "least costly" way to achieve this spread

Connection to Eigenvalue Problems:

This constrained optimization is equivalent to an eigenvalue problem. We seek the $d$ directions of minimum "cost" after excluding the trivial zero-eigenvalue direction.

Alternative Constraint Formulation

Derivation of the Eigenvalue Problem

The constrained optimization can be solved using Lagrange multipliers, leading to an eigenvalue problem.

Step 1: Write the Lagrangian

For each embedding coordinate $\mathbf{y}^{(j)} \in \mathbb{R}^N$ (the $j$-th column of $\mathbf{Y}$):

$$\mathcal{L}(\mathbf{y}^{(j)}, \lambda_j) = (\mathbf{y}^{(j)})^T \mathbf{M} \mathbf{y}^{(j)} - \lambda_j \left( (\mathbf{y}^{(j)})^T \mathbf{y}^{(j)} - N \right)$$

Step 2: First-Order Condition

$$\frac{\partial \mathcal{L}}{\partial \mathbf{y}^{(j)}} = 2\mathbf{M} \mathbf{y}^{(j)} - 2\lambda_j \mathbf{y}^{(j)} = \mathbf{0}$$

This gives the eigenvalue equation: $$\mathbf{M} \mathbf{y}^{(j)} = \lambda_j \mathbf{y}^{(j)}$$

Step 3: Selecting Eigenvectors

The objective at the solution is: $$(\mathbf{y}^{(j)})^T \mathbf{M} \mathbf{y}^{(j)} = \lambda_j (\mathbf{y}^{(j)})^T \mathbf{y}^{(j)} = N \lambda_j$$

To minimize the total cost, we select eigenvectors corresponding to the smallest eigenvalues.

Step 4: The Null Space Issue

Since $\mathbf{M}\mathbf{1} = \mathbf{0}$, the constant vector $\mathbf{1}$ is an eigenvector with eigenvalue 0. This corresponds to the trivial solution of placing all points at the same location.

The Solution:

Eigenvalue Interpretation
Eigenvalue	Eigenvector	Meaning	Use
λ₁ = 0	v₁ = 1 (constant)	Trivial translation mode	Skip (centering constraint)
λ₂ ≤ λ₃ ≤ ... ≤ λ_{d+1}	v₂, v₃, ..., v_{d+1}	Minimum cost embedding directions	Use as embedding coordinates
λ₂ = 0 (with λ₁)	Multiple zero eigenvalues	Disconnected components or singularity	Diagnostic warning
λ_{d+2}, ..., λ_N	Remaining eigenvectors	Higher cost directions, noise	Discard

The Final Algorithm:

Compute the cost matrix $\mathbf{M} = (\mathbf{I} - \mathbf{W})^T (\mathbf{I} - \mathbf{W})$
Find the $(d+1)$ smallest eigenvalues and eigenvectors of $\mathbf{M}$
Discard the smallest eigenvalue (λ₁ = 0 with eigenvector 1)
Use eigenvectors 2 through (d+1) as the d-dimensional embedding

The embedding coordinates are: $$\mathbf{Y} = [\mathbf{v}_2, \mathbf{v}3, \ldots, \mathbf{v}{d+1}]$$

where $\mathbf{v}_i$ are the eigenvectors normalized to have $|\mathbf{v}_i|^2 = N$.

Handling Near-Zero Eigenvalues

If λ₂ is very small (near zero), it may indicate: • Disconnected neighborhoods (the graph is not connected) • Near-degenerate structure in the data • Numerical issues in weight computation

Always check the eigenvalue spectrum before using the embedding.

Spectral Properties of the Cost Matrix

Understanding the spectral properties of $\mathbf{M}$ provides insight into LLE's behavior and helps diagnose potential issues.

Theorem: The Constant Null Space

The all-ones vector $\mathbf{1}$ is always in the null space of $\mathbf{M}$:

Positive Semi-Definiteness:

$\mathbf{M} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})$ is always positive semi-definite: $$\mathbf{v}^T \mathbf{M} \mathbf{v} = |(\mathbf{I} - \mathbf{W})\mathbf{v}|^2 \geq 0$$

Rank and Null Space:

If there are $c$ connected components in the neighborhood graph, the null space has dimension $c$ (one constant vector per component).

spectral_analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
import numpy as np
from scipy.sparse.linalg import eigsh
from scipy.linalg import eigh
import matplotlib.pyplot as plt
 
def analyze_lle_spectrum(W, n_eigenvalues=20):
    """
    Analyze the spectrum of the LLE cost matrix M = (I - W)^T (I - W).
    
    Parameters:
    -----------
    W : ndarray of shape (N, N)
        LLE weight matrix (sparse or dense)
    n_eigenvalues : int
        Number of smallest eigenvalues to compute
        
    Returns:
    --------
    eigenvalues : array
        Sorted eigenvalues
    eigenvectors : array
        Corresponding eigenvectors
    analysis : dict
        Diagnostic information
    """
    N = W.shape[0]
    
    # Compute I - W
    if hasattr(W, 'toarray'):
        I_minus_W = np.eye(N) - W.toarray()
    else:
        I_minus_W = np.eye(N) - W
    
    # Cost matrix M = (I - W)^T @ (I - W)
    M = I_minus_W.T @ I_minus_W
    
    # Compute eigenvalues
    if N < 500:
        # Full eigendecomposition for small matrices
        eigenvalues, eigenvectors = eigh(M)
    else:
        # Sparse eigenvalue solver for larger matrices
        # Request smallest eigenvalues
        eigenvalues, eigenvectors = eigsh(
            M, k=min(n_eigenvalues, N-1), 
            which='SM',  # Smallest magnitude
            tol=1e-10
        )
        # Sort by eigenvalue
        idx = np.argsort(eigenvalues)
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
    
    # Diagnostic analysis
    analysis = {
        'smallest_eigenvalue': eigenvalues[0],
        'second_smallest': eigenvalues[1] if len(eigenvalues) > 1 else None,
        'eigenvalue_gap': eigenvalues[1] - eigenvalues[0] if len(eigenvalues) > 1 else None,
        'num_near_zero': np.sum(eigenvalues < 1e-10),
        'spectral_gap_ratio': eigenvalues[1] / eigenvalues[2] if len(eigenvalues) > 2 and eigenvalues[2] > 0 else None,
    }
    
    # Check null space dimension
    null_dim = np.sum(np.abs(eigenvalues) < 1e-10)
    analysis['null_space_dim'] = null_dim
    
    if null_dim > 1:
        analysis['warning'] = f"Null space dimension {null_dim} > 1: graph may be disconnected"
    
    return eigenvalues, eigenvectors, analysis
 
 
def plot_eigenvalue_spectrum(eigenvalues, title="LLE Eigenvalue Spectrum"):
    """
    Visualize the eigenvalue spectrum.
    """
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # Linear scale
    axes[0].plot(eigenvalues[:min(50, len(eigenvalues))], 'o-')
    axes[0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
    axes[0].set_xlabel('Eigenvalue Index')
    axes[0].set_ylabel('Eigenvalue')
    axes[0].set_title(f'{title} (Linear Scale)')
    axes[0].grid(True, alpha=0.3)
    
    # Log scale (for positive eigenvalues)
    positive_eigs = eigenvalues[eigenvalues > 1e-15]
    axes[1].semilogy(positive_eigs[:min(50, len(positive_eigs))], 'o-')
    axes[1].set_xlabel('Eigenvalue Index (positive only)')
    axes[1].set_ylabel('Eigenvalue (log scale)')
    axes[1].set_title(f'{title} (Log Scale)')
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    return fig
 
 
def check_embedding_quality(Y, W):
    """
    Check how well the embedding preserves local structure.
    
    Returns the embedding cost and per-point errors.
    """
    N = Y.shape[0]
    
    # Reconstruction in embedded space
    Y_reconstructed = W @ Y
    
    # Per-point reconstruction error
    errors = np.linalg.norm(Y - Y_reconstructed, axis=1) ** 2
    
    # Total cost
    total_cost = np.sum(errors)
    
    return {
        'total_cost': total_cost,
        'mean_error': np.mean(errors),
        'max_error': np.max(errors),
        'errors': errors,
    }

The Spectral Gap

Efficient Eigenvalue Computation

The ARPACK Approach:

The ARPACK library (used by scipy.sparse.linalg.eigsh) uses the Implicitly Restarted Lanczos Method:

Cost: O(N × k × nnz(M)) for k eigenvalues, where nnz is number of non-zeros
For sparse M with O(Nk) non-zeros: O(N × d × Nk) = O(N²dk)
Much faster than O(N³) when N is large

Challenge: Smallest vs. Largest Eigenvalues

Iterative eigensolvers are most efficient for finding largest eigenvalues. Finding smallest eigenvalues requires:

Shift-Invert Mode: Solve $(\mathbf{M} - \sigma\mathbf{I})^{-1}$ for eigenvalues near $\sigma = 0$
- Converts smallest eigenvalues of M to largest of $(\mathbf{M} - \sigma\mathbf{I})^{-1}$
- Requires solving linear systems with (M - σI)
Working with M Directly: The 'SM' (smallest magnitude) mode
- May be slower and less numerically stable
- Simpler to implement

Practical Recommendation:

For most applications (N < 10,000), shift-invert with σ = 0 works well. For very large N, consider approximate methods like randomized eigendecomposition.

efficient_embedding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
import numpy as np
from scipy.sparse import csr_matrix, eye as sparse_eye
from scipy.sparse.linalg import eigsh, LinearOperator
from scipy.linalg import eigh
 
def compute_lle_embedding(W, n_components=2, method='arpack'):
    """
    Compute the LLE embedding from the weight matrix.
    
    Parameters:
    -----------
    W : sparse matrix or ndarray of shape (N, N)
        LLE weight matrix
    n_components : int
        Target dimensionality
    method : str
        'arpack' for sparse solver, 'dense' for full decomposition
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    eigenvalues : ndarray
        Eigenvalues (for diagnostics)
    """
    N = W.shape[0]
    
    # Convert to sparse if needed
    if not hasattr(W, 'toarray'):
        W = csr_matrix(W)
    
    # Construct I - W
    I = sparse_eye(N, format='csr')
    I_minus_W = I - W
    
    # Method 1: Dense (for small N)
    if method == 'dense' or N < 500:
        M = (I_minus_W.T @ I_minus_W).toarray()
        eigenvalues, eigenvectors = eigh(M)
        
    # Method 2: Sparse ARPACK with shift-invert
    elif method == 'arpack':
        # Define M as a linear operator for memory efficiency
        def matvec(v):
            temp = I_minus_W @ v
            return I_minus_W.T @ temp
        
        M_op = LinearOperator((N, N), matvec=matvec, dtype=float)
        
        # For smallest eigenvalues, we use sigma=0.0 (shift-invert)
        # But eigsh with 'SM' can work for moderately sized problems
        M_dense = (I_minus_W.T @ I_minus_W).toarray()  # For shift-invert
        
        try:
            # Shift-invert mode finds eigenvalues near sigma
            eigenvalues, eigenvectors = eigsh(
                M_dense,
                k=n_components + 1,
                sigma=0.0,  # Shift near 0
                which='LM',  # Largest magnitude after shift
                tol=1e-10,
            )
        except Exception as e:
            # Fallback to SM mode
            eigenvalues, eigenvectors = eigsh(
                M_dense,
                k=n_components + 1,
                which='SM',
                tol=1e-10,
            )
    
    # Sort by eigenvalue
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip the first eigenvector (constant, λ ≈ 0)
    # Use eigenvectors 1 through n_components as embedding
    Y = eigenvectors[:, 1:n_components+1]
    
    # Rescale to have unit variance per dimension
    Y = Y * np.sqrt(N)
    
    return Y, eigenvalues[:n_components+2]
 
 
def compute_lle_embedding_iterative(W, n_components=2, tol=1e-8, max_iter=1000):
    """
    Compute LLE embedding using power iteration with deflation.
    
    This is useful when scipy.sparse is unavailable or for
    educational purposes to understand the algorithm.
    """
    N = W.shape[0]
    
    # Convert to dense for simplicity
    if hasattr(W, 'toarray'):
        W = W.toarray()
    
    # Compute M = (I - W)^T (I - W)
    I_minus_W = np.eye(N) - W
    M = I_minus_W.T @ I_minus_W
    
    eigenvectors = []
    eigenvalues = []
    
    # Find n_components + 1 smallest eigenvalues using inverse iteration
    # (We work with M^{-1} conceptually, but use linear solves)
    
    for k in range(n_components + 1):
        # Initialize random vector
        v = np.random.randn(N)
        v = v / np.linalg.norm(v)
        
        # Deflate: orthogonalize against previously found eigenvectors
        for ev in eigenvectors:
            v = v - np.dot(v, ev) * ev
        v = v / np.linalg.norm(v)
        
        # Power iteration on M (for smallest eigenvalues, this is slow)
        # Instead, we solve the eigenvalue problem directly
        # This is just for illustration; use scipy.linalg.eigh in practice
        
        for _ in range(max_iter):
            v_new = M @ v
            
            # Deflate
            for ev in eigenvectors:
                v_new = v_new - np.dot(v_new, ev) * ev
            
            # Normalize
            norm = np.linalg.norm(v_new)
            if norm < 1e-15:
                break
            v_new = v_new / norm
            
            # Check convergence
            if np.linalg.norm(v_new - v) < tol:
                break
            v = v_new
        
        eigenvalue = v.T @ M @ v
        eigenvectors.append(v)
        eigenvalues.append(eigenvalue)
    
    # Stack and sort
    eigenvectors = np.column_stack(eigenvectors)
    eigenvalues = np.array(eigenvalues)
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip first eigenvector
    Y = eigenvectors[:, 1:n_components+1] * np.sqrt(N)
    
    return Y, eigenvalues

The Complete LLE Algorithm

We now synthesize the weight computation and embedding optimization into the complete LLE algorithm.

Algorithm: Locally Linear Embedding

Input: Data matrix $\mathbf{X} \in \mathbb{R}^{N \times D}$, number of neighbors $k$, target dimension $d$, regularization $\alpha$

Output: Embedding $\mathbf{Y} \in \mathbb{R}^{N \times d}$

Phase 1: Compute Reconstruction Weights

For each point $i = 1, \ldots, N$:
- Find $k$ nearest neighbors $\mathcal{N}_i$
- Compute local Gram matrix: $G_{jl} = (\mathbf{x}_i - \mathbf{x}_j)^T(\mathbf{x}_i - \mathbf{x}_l)$ for $j, l \in \mathcal{N}_i$
- Regularize: $\mathbf{G} \leftarrow \mathbf{G} + \alpha \cdot \text{tr}(\mathbf{G}) \cdot \mathbf{I}$
- Solve for weights: $\mathbf{w}_i = \mathbf{G}^{-1}\mathbf{1} / (\mathbf{1}^T\mathbf{G}^{-1}\mathbf{1})$
- Store in weight matrix: $W_{ij} = w_{ij}$ for $j \in \mathcal{N}_i$

Phase 2: Compute Embedding Coordinates

Form cost matrix: $\mathbf{M} = (\mathbf{I} - \mathbf{W})^T(\mathbf{I} - \mathbf{W})$
Compute $d+1$ smallest eigenvalues/eigenvectors of $\mathbf{M}$
Discard eigenvector with eigenvalue $\lambda_1 \approx 0$
Set embedding: $\mathbf{Y} = [\mathbf{v}2, \ldots, \mathbf{v}{d+1}] \cdot \sqrt{N}$

Return: $\mathbf{Y}$

complete_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.sparse import lil_matrix, csr_matrix, eye as sparse_eye
from scipy.linalg import eigh
 
class LocallyLinearEmbedding:
    """
    Complete implementation of Locally Linear Embedding.
    
    This is an educational implementation that closely follows
    the algorithm as described. For production use, consider
    sklearn.manifold.LocallyLinearEmbedding.
    """
    
    def __init__(self, n_neighbors=10, n_components=2, reg=1e-3):
        """
        Parameters:
        -----------
        n_neighbors : int
            Number of neighbors for each point
        n_components : int
            Target dimensionality
        reg : float
            Regularization strength
        """
        self.n_neighbors = n_neighbors
        self.n_components = n_components
        self.reg = reg
        
    def fit_transform(self, X):
        """
        Compute the LLE embedding.
        
        Parameters:
        -----------
        X : ndarray of shape (N, D)
            Input data
            
        Returns:
        --------
        Y : ndarray of shape (N, n_components)
            Embedding
        """
        N, D = X.shape
        k = self.n_neighbors
        
        # ==========================================
        # PHASE 1: Compute Reconstruction Weights
        # ==========================================
        
        # Step 1.1: Find k nearest neighbors
        nn = NearestNeighbors(n_neighbors=k+1, algorithm='auto')
        nn.fit(X)
        _, neighbor_indices = nn.kneighbors(X)
        neighbor_indices = neighbor_indices[:, 1:]  # Remove self
        
        # Step 1.2: Compute weights for each point
        W = lil_matrix((N, N))
        
        for i in range(N):
            neighbors = neighbor_indices[i]
            x_i = X[i]
            X_neighbors = X[neighbors]
            
            # Difference vectors
            eta = x_i - X_neighbors  # Shape: (k, D)
            
            # Local Gram matrix
            G = eta @ eta.T  # Shape: (k, k)
            
            # Regularization
            trace = np.trace(G)
            if trace > 0:
                G += self.reg * trace / k * np.eye(k)
            else:
                G += self.reg * np.eye(k)
            
            # Solve for weights
            ones = np.ones(k)
            try:
                w = np.linalg.solve(G, ones)
                w = w / np.sum(w)
            except np.linalg.LinAlgError:
                w = np.ones(k) / k
            
            W[i, neighbors] = w
        
        W = csr_matrix(W)
        self.W_ = W
        
        # ==========================================
        # PHASE 2: Compute Embedding Coordinates
        # ==========================================
        
        # Step 2.1: Form cost matrix M = (I - W)^T (I - W)
        I = sparse_eye(N, format='csr')
        I_minus_W = I - W
        M = (I_minus_W.T @ I_minus_W).toarray()
        
        # Enforce exact symmetry
        M = (M + M.T) / 2
        
        # Step 2.2: Eigendecomposition
        eigenvalues, eigenvectors = eigh(M)
        
        # Step 2.3: Select embedding coordinates
        # Skip first eigenvector (λ ≈ 0, corresponds to constant)
        idx = np.argsort(eigenvalues)
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]
        
        self.eigenvalues_ = eigenvalues
        
        # Use eigenvectors 1 through n_components
        Y = eigenvectors[:, 1:self.n_components+1]
        
        # Scale to unit variance
        Y = Y * np.sqrt(N)
        
        self.embedding_ = Y
        
        return Y
    
    def reconstruction_error(self, X):
        """Compute reconstruction error in original space."""
        return np.sum((X - self.W_ @ X) ** 2)
    
    def embedding_cost(self):
        """Compute embedding cost Tr(Y^T M Y)."""
        if not hasattr(self, 'embedding_'):
            raise ValueError("Must call fit_transform first")
        
        I_minus_W = sparse_eye(self.W_.shape[0]) - self.W_
        residual = I_minus_W @ self.embedding_
        return np.sum(residual ** 2)
 
 
# Example usage
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    import matplotlib.pyplot as plt
    
    # Generate Swiss Roll
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Apply LLE
    lle = LocallyLinearEmbedding(n_neighbors=12, n_components=2)
    Y = lle.fit_transform(X)
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Original 3D
    ax1 = fig.add_subplot(121, projection='3d')
    ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap='viridis', s=10)
    ax1.set_title('Original Swiss Roll (3D)')
    
    # LLE embedding
    axes[1].scatter(Y[:, 0], Y[:, 1], c=color, cmap='viridis', s=10)
    axes[1].set_title('LLE Embedding (2D)')
    axes[1].set_xlabel('LLE 1')
    axes[1].set_ylabel('LLE 2')
    
    plt.tight_layout()
    plt.show()
    
    # Diagnostics
    print(f"Reconstruction error: {lle.reconstruction_error(X):.4f}")
    print(f"Embedding cost: {lle.embedding_cost():.4f}")
    print(f"First 5 eigenvalues: {lle.eigenvalues_[:5]}")

Theoretical Analysis and Guarantees

What guarantees does LLE provide? Under what conditions does it recover the true manifold structure?

Asymptotic Consistency:

Key Assumptions for Consistency:

The manifold is smooth (at least $C^2$)
Sampling is uniform on the manifold
Neighborhood size scales as $k \sim N^\beta$ for appropriate $\beta$
The manifold has no self-intersections

The Isometry Property:

Comparison with Isomap:

Property	LLE	Isomap
Preserves	Local linear structure	Geodesic distances
Global geometry	Not explicitly preserved	Approximately preserved
Distortion	Local neighborhoods preserved	Isometric to flat space
Assumption	Local linearity	Manifold is isometric to ℝᵈ

The Embedding Quality:

Define the embedding cost: $$\Phi(\mathbf{Y}) = \sum_{i=1}^N \left| \mathbf{y}i - \sum_j W{ij} \mathbf{y}_j \right|^2$$

For the optimal embedding using eigenvectors of $\mathbf{M}$: $$\Phi^* = \sum_{j=2}^{d+1} \lambda_j$$

The embedding cost equals the sum of the $d$ smallest non-zero eigenvalues. This provides a measure of how well the embedding preserves local structure.

Interpreting Eigenvalues:

$\lambda_2, \ldots, \lambda_{d+1}$ small: Good embedding; local structure well-preserved
$\lambda_2$ very small (but > 0): Strong manifold structure clearly captured
$\lambda_{d+1} \ll \lambda_{d+2}$: Good choice of $d$; clear spectral gap
$\lambda_{d+2} \approx \lambda_{d+1}$: Intrinsic dimensionality may be higher than $d$

Selecting Dimensionality:

Residual Variance Method

Summary and What's Next

We have developed the complete theory and practice of LLE embedding optimization.

Key Takeaways

•Embedding Objective: Minimize Φ(Y) = ||Y - WY||² subject to centering and unit covariance
•Cost Matrix: M = (I - W)ᵀ(I - W) is symmetric positive semi-definite with 1 in null space
•Eigenvalue Solution: Optimal embedding uses d smallest non-zero eigenvectors of M
•Spectral Properties: The null space dimension indicates graph connectivity; spectral gap indicates embedding quality
•Efficient Computation: Sparse eigensolvers with shift-invert mode are efficient for large N
•Embedding Cost: Equals sum of eigenvalues λ₂ + ... + λ_{d+1}

What's Next:

Page Complete

3 / 5