Machine LearningManifold Learning

Locally Linear Embedding

LevelAdvanced

Duration90 mins

TopicManifold Learning

4 / 5

LLE Variants

Beyond Standard LLE

Standard LLE provides an elegant approach to nonlinear dimensionality reduction, but it has limitations. The algorithm can produce distorted embeddings in certain geometric configurations, struggles with noisy data, and lacks theoretical guarantees on the quality of the recovered parameterization.

Over the years, researchers have developed numerous LLE variants that address these shortcomings while preserving the core philosophy of local linear reconstruction. These variants offer improved theoretical properties, better robustness, and specialized capabilities for different data characteristics.

This page surveys the most important LLE variants: Hessian LLE (HLLE), Modified LLE (MLLE), Local Tangent Space Alignment (LTSA), and other extensions. Understanding these variants allows you to select the most appropriate algorithm for your specific manifold learning task.

Learning Objectives

By the end of this page, you will: • Understand the limitations of standard LLE that motivated variants • Master Hessian LLE and its improved theoretical guarantees • Apply Modified LLE for multiple weight solutions • Use LTSA for explicit tangent space alignment • Select the appropriate variant for different scenarios

Limitations of Standard LLE

Before exploring variants, we must understand what problems they solve. Standard LLE has several important limitations:

Limitation 1: Ill-Posed Reconstruction (When k > D)

When the number of neighbors $k$ exceeds the input dimension $D$, the local Gram matrix is rank-deficient, requiring regularization. This regularization introduces bias and can distort the embedding.

Limitation 2: Sensitivity to Neighborhood Selection

LLE is sensitive to how neighbors are chosen. If a neighborhood spans high curvature or includes points from different manifold "branches," the linear approximation fails.

Limitation 3: Non-Convex Neighborhoods

When a point lies on the boundary of a folded manifold, its Euclidean neighbors may not form a convex set in manifold coordinates. LLE's barycentric reconstruction assumes (implicitly) that the point can be represented as a convex combination.

Limitation 4: Lack of Isometry Guarantees

Standard LLE preserves local reconstruction weights but does not explicitly preserve geometric quantities like distances or angles. The embedding may distort these properties.

Limitation 5: Mapping to d ≠ Intrinsic Dimension

If we embed into dimension $d$ different from the true intrinsic dimension, LLE can produce collapsed or split embeddings.

Standard LLE Failure Modes
Failure Mode	Symptom	Cause	Variant Solution
Collapsed embedding	All points map near same location	d > intrinsic dim, or disconnected graph	Check eigenvalue spectrum
Distorted unfolding	Swiss roll 'twists' in embedding	k too large, curvature issues	HLLE, smaller k
Spurious connections	Distant manifold points merge	Short-circuits in neighborhood graph	Smaller k, better sampling
Boundary effects	Edge points distorted	Non-convex local geometry	MLLE handles multiple solutions
Unstable results	Different runs give different embeddings	Numerical sensitivity, regularization	LTSA more stable

Recognizing When Standard LLE Fails

Signs that standard LLE is failing: • Very small second eigenvalue (near zero) • Many negative weights • Embedding looks "crumpled" or distorted • Results change dramatically with small parameter changes • Reconstruction error is high relative to data variance

Hessian LLE (HLLE)

Hessian Locally Linear Embedding (HLLE) was introduced by Donoho and Grimes (2003) to address a fundamental limitation of standard LLE: the lack of guaranteed isometry.

The Key Insight:

While standard LLE preserves local reconstruction weights, it doesn't explicitly preserve the manifold's intrinsic geometry. HLLE instead minimizes a functional based on the Hessian of the mapping from manifold to embedding—ensuring that the mapping locally preserves the manifold's Riemannian structure.

Mathematical Foundation:

For a smooth embedding $f: \mathcal{M} \to \mathbb{R}^d$ of a manifold $\mathcal{M}$ into $\mathbb{R}^d$, the Hessian measures how the embedding curves:

$$H_f(\mathbf{x}) = \nabla^2 f(\mathbf{x})$$

A "flat" embedding (isometry to Euclidean space) has zero Hessian everywhere. HLLE seeks an embedding that minimizes the integrated squared Hessian.

The HLLE Functional:

$$\mathcal{E}{HLLE} = \int\mathcal{M} |H_f|_F^2 , d\mu$$

In practice, this is estimated from the data using local PCA to estimate the tangent space at each point, then constructing discrete Hessian estimators.

Algorithm Outline:

Compute neighborhoods: Find $k$ nearest neighbors for each point
Estimate tangent spaces: Perform local PCA in each neighborhood to find the $d$-dimensional tangent basis
Construct local Hessian estimators: Build matrices $\mathbf{H}_i$ that estimate the Hessian at each point using Taylor expansion in tangent coordinates
Form global quadratic form: Assemble local estimators into a global matrix $\mathbf{M}_{HLLE}$
Eigen-decomposition: Find the $d$ smallest non-trivial eigenvectors of $\mathbf{M}_{HLLE}$

Local Hessian Construction:

For point $i$ with tangent basis $\mathbf{T}_i \in \mathbb{R}^{D \times d}$, project neighbors onto tangent coordinates: $$\mathbf{t}_j = \mathbf{T}_i^T (\mathbf{x}_j - \mathbf{x}_i)$$

Construct the local design matrix including constant, linear, and quadratic terms: $$\mathbf{Z}_i = [\mathbf{1}, \mathbf{t}, \text{vec}(\mathbf{t}\mathbf{t}^T)]$$

The null space of $\mathbf{Z}_i$ (after QR decomposition) provides the Hessian estimator $\mathbf{H}_i$.

hessian_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.linalg import eigh, svd
 
def hessian_lle(X, n_neighbors=10, n_components=2):
    """
    Hessian Locally Linear Embedding (HLLE).
    
    Parameters:
    -----------
    X : ndarray of shape (N, D)
        Input data
    n_neighbors : int
        Number of neighbors (must be > d + d*(d+1)/2)
    n_components : int
        Target dimensionality (d)
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    """
    N, D = X.shape
    d = n_components
    k = n_neighbors
    
    # Minimum neighbors for HLLE
    dp = d + 1 + d * (d + 1) // 2  # d + 1 + d(d+1)/2
    if k < dp:
        raise ValueError(f"n_neighbors must be >= {dp} for d={d} components")
    
    # Find neighbors
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]  # Remove self
    
    # Build the cost matrix
    M = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        
        # Center the neighborhood
        X_nbrs = X[neighbor_idx] - X[i]
        
        # SVD to get tangent space
        U, s, Vt = svd(X_nbrs, full_matrices=False)
        
        # Use top d right singular vectors as tangent basis
        tangent = Vt[:d].T  # Shape: (k, d)
        
        # Project onto tangent coordinates
        nbrs_tangent = X_nbrs @ tangent  # Shape: (k, d)
        
        # Build design matrix for Hessian
        # Include: constant (1), linear terms (t), quadratic terms (t_i * t_j)
        
        # Constant term
        ones = np.ones((k, 1))
        
        # Linear terms
        linear = nbrs_tangent  # Shape: (k, d)
        
        # Quadratic terms: t_1^2, t_1*t_2, ..., t_d^2
        # Number of unique quadratic terms: d*(d+1)/2
        n_quad = d * (d + 1) // 2
        quadratic = np.zeros((k, n_quad))
        idx = 0
        for p in range(d):
            for q in range(p, d):
                quadratic[:, idx] = nbrs_tangent[:, p] * nbrs_tangent[:, q]
                idx += 1
        
        # Full design matrix
        Z = np.hstack([ones, linear, quadratic])  # Shape: (k, 1+d+n_quad)
        
        # QR decomposition to find null space
        Q, R = np.linalg.qr(Z, mode='complete')
        
        # Null space of Z^T is last columns of Q
        # These are the Hessian estimators
        n_null = k - (1 + d + n_quad)
        if n_null <= 0:
            # Not enough neighbors for null space
            continue
        
        W = Q[:, -n_null:]  # Null space basis
        
        # The null space should have dimension d*(d+1)/2 for a d-dim manifold
        # Use it to build the local quadratic term
        
        # Local contribution to M
        H = W @ W.T  # Local Hessian estimator
        
        # Add to global matrix
        ix = np.ix_(neighbor_idx, neighbor_idx)
        M[ix] += H
    
    # Symmetrize
    M = (M + M.T) / 2
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(M)
    
    # Sort by eigenvalue
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip the d+1 smallest eigenvalues (null space)
    # For HLLE, the null space has dimension d+1 (translations + constants)
    Y = eigenvectors[:, (d+1):(2*d+1)] * np.sqrt(N)
    
    return Y
 
 
# Comparison demonstration
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    from sklearn.manifold import LocallyLinearEmbedding
    import matplotlib.pyplot as plt
    
    # Generate data
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Standard LLE
    lle = LocallyLinearEmbedding(n_neighbors=12, n_components=2, method='standard')
    Y_lle = lle.fit_transform(X)
    
    # HLLE from sklearn
    hlle = LocallyLinearEmbedding(n_neighbors=12, n_components=2, method='hessian')
    Y_hlle = hlle.fit_transform(X)
    
    # Plot comparison
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    axes[0].scatter(Y_lle[:, 0], Y_lle[:, 1], c=color, cmap='viridis', s=10)
    axes[0].set_title('Standard LLE')
    
    axes[1].scatter(Y_hlle[:, 0], Y_hlle[:, 1], c=color, cmap='viridis', s=10)
    axes[1].set_title('Hessian LLE')
    
    plt.tight_layout()
    plt.show()

When to Use HLLE

HLLE is preferred when: • Theoretical guarantees on isometry are needed • The manifold is known to be Riemannian • Standard LLE produces distorted unfoldings • You need better preservation of local geometry (angles, areas)

However, HLLE requires more neighbors (k ≥ 1 + d + d(d+1)/2) and is more computationally expensive.

Modified LLE (MLLE)

Modified Locally Linear Embedding (MLLE) addresses the case when the reconstruction weights are not unique. This happens when $k > d$ (more neighbors than intrinsic dimension)—the weight solution becomes underdetermined.

The Problem with Non-Unique Weights:

In standard LLE with regularization, we pick one particular solution from the space of valid weights. But different choices lead to different embeddings. MLLE explicitly uses the entire space of valid weight vectors to construct a more robust embedding.

Key Insight:

When the local Gram matrix is rank-deficient, there's a subspace of weight vectors that achieve zero reconstruction error. MLLE constructs an embedding that respects all these valid weight vectors, not just a regularized single solution.

Mathematical Formulation:

For point $i$ with neighbors, let $\mathbf{S}_i$ denote the subspace of weight vectors that perfectly reconstruct $\mathbf{x}_i$. MLLE constructs local matrices $\mathbf{M}_i$ that encode this entire subspace:

$$\mathbf{M}_i = \mathbf{I} - \mathbf{V}_i \mathbf{V}_i^T$$

where $\mathbf{V}_i$ is the orthonormal basis for the right singular vectors of the local reconstruction problem with non-zero singular values.

The global cost matrix is: $$\mathbf{M}_{MLLE} = \sum_i \mathbf{M}_i^{\text{(expanded)}}$$

where "expanded" means extending the local matrix to the full N×N dimensions.

modified_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.linalg import eigh, svd
 
def modified_lle(X, n_neighbors=10, n_components=2, reg=1e-3):
    """
    Modified Locally Linear Embedding (MLLE).
    
    Handles the case when weights are not unique by considering
    the full space of valid weight vectors.
    
    Parameters:
    -----------
    X : ndarray of shape (N, D)
        Input data
    n_neighbors : int
        Number of neighbors
    n_components : int
        Target dimensionality
    reg : float
        Regularization for numerical stability
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    """
    N, D = X.shape
    d = n_components
    k = n_neighbors
    
    # Find neighbors
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]  # Remove self
    
    # Build cost matrix
    M = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        
        # Center at x_i
        X_nbrs = X[neighbor_idx] - X[i]
        
        # SVD of neighborhood
        U, s, Vt = svd(X_nbrs, full_matrices=True)
        
        # Determine effective rank (number of significant singular values)
        # This tells us the intrinsic dimension of the local neighborhood
        tol = max(X_nbrs.shape) * np.finfo(float).eps * s[0]
        rank = np.sum(s > tol)
        
        # If rank < k, we have a subspace of valid weights
        if rank < k:
            # The null space of X_nbrs^T corresponds to weight directions
            # that achieve zero reconstruction error
            
            # V contains right singular vectors
            # Last (k - rank) columns span the null space
            V_null = Vt.T[:, rank:]  # Shape: (k, k-rank)
            
            # Include the valid weight subspace
            # MLLE uses all directions, not just one regularized solution
            
            # Compute the standard weights as well
            # (from the non-null component)
            if rank > 0:
                # Pseudo-inverse solution
                V_valid = Vt.T[:, :rank]  # Shape: (k, rank)
                
                # Gram matrix approach (similar to standard LLE)
                eta = X_nbrs.T  # Shape: (D, k)
                G = X_nbrs @ X_nbrs.T  # Shape: (k, k)
                G += reg * np.trace(G) / k * np.eye(k)
                
                ones = np.ones(k)
                w = np.linalg.solve(G, ones)
                w = w / np.sum(w)
                
                # Extend w to include null space effects
                # For MLLE, we use projection onto the principal subspace
            else:
                # All weights are valid
                w = np.ones(k) / k
            
            # Local cost matrix based on valid subspace
            # M_i = I - V_valid @ V_valid^T (projects onto null space)
            V = Vt.T[:, :max(rank, d)]  # Use at least d components
            H = np.eye(k) - V @ V.T
            
        else:
            # Full rank: unique weight solution
            G = X_nbrs @ X_nbrs.T
            G += reg * np.trace(G) / k * np.eye(k)
            ones = np.ones(k)
            w = np.linalg.solve(G, ones)
            w = w / np.sum(w)
            
            # Local cost matrix from weights
            w = w.reshape(-1, 1)
            H = w @ w.T
        
        # Add to global matrix
        ix = np.ix_(neighbor_idx, neighbor_idx)
        M[ix] += H
    
    # Also need diagonal contributions
    for i in range(N):
        M[i, i] += 1.0
        neighbor_idx = indices[i]
        # Subtract off-diagonal contributions properly
        # (This is simplified; full MLLE has more careful construction)
    
    # Symmetrize
    M = (M + M.T) / 2
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(M)
    
    # Skip smallest eigenvalue (null space)
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    Y = eigenvectors[:, 1:d+1] * np.sqrt(N)
    
    return Y
 
 
# For reliable results, use sklearn's implementation
from sklearn.manifold import LocallyLinearEmbedding
 
def sklearn_mlle(X, n_neighbors=10, n_components=2):
    """Use sklearn's Modified LLE implementation."""
    mlle = LocallyLinearEmbedding(
        n_neighbors=n_neighbors,
        n_components=n_components,
        method='modified'
    )
    return mlle.fit_transform(X)

When MLLE Helps

MLLE is particularly useful when: • k > intrinsic dimension (common in practice) • Results are sensitive to regularization parameter • You observe instability across different runs • The manifold has varying local dimensionality

MLLE provides more stable results by not arbitrarily choosing from multiple valid weight solutions.

Local Tangent Space Alignment (LTSA)

Local Tangent Space Alignment (LTSA) takes a more direct approach to manifold learning: explicitly compute local tangent spaces and then align them globally.

Philosophy:

While LLE implicitly encodes tangent space information through reconstruction weights, LTSA explicitly:

Estimates the tangent space at each point via local PCA
Projects neighbors onto the local tangent space
Aligns all local tangent coordinates into a global embedding

The Algorithm:

Step 1: Local Tangent Space Estimation

For each point $i$, perform PCA on the k-nearest neighbors to find the local tangent space $T_i$:

Center neighbors: $\bar{\mathbf{X}}_i = \mathbf{X}_i - \text{mean}$
SVD: $\bar{\mathbf{X}}_i = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T$
Tangent basis: $\mathbf{T}_i = $ first $d$ columns of $\mathbf{U}$

Step 2: Local Coordinates

Project neighbors onto the tangent space: $$\boldsymbol{\theta}_j^{(i)} = \mathbf{T}_i^T (\mathbf{x}_j - \bar{\mathbf{x}}_i)$$

Step 3: Alignment

Construct a matrix $\mathbf{W}_i$ that maps local tangent coordinates to global embedding coordinates. Minimize: $$\sum_i |\mathbf{Y}_i - \mathbf{W}_i \boldsymbol{\Theta}_i|_F^2$$

subject to constraints preventing trivial solutions.

ltsa.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.linalg import eigh, svd
 
def ltsa(X, n_neighbors=10, n_components=2):
    """
    Local Tangent Space Alignment (LTSA).
    
    Explicitly estimates tangent spaces and aligns them globally.
    
    Parameters:
    -----------
    X : ndarray of shape (N, D)
        Input data
    n_neighbors : int
        Number of neighbors
    n_components : int
        Target dimensionality
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    """
    N, D = X.shape
    d = n_components
    k = n_neighbors
    
    # Find neighbors
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]  # Remove self
    
    # Initialize cost matrix
    B = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        
        # Include x_i in the neighborhood for LTSA
        full_idx = np.concatenate([[i], neighbor_idx])
        X_local = X[full_idx]
        
        # Center
        X_centered = X_local - np.mean(X_local, axis=0)
        
        # Local PCA to find tangent space
        U, s, Vt = svd(X_centered, full_matrices=False)
        
        # Local tangent coordinates (first d dimensions)
        theta = U[:, :d]  # Shape: (k+1, d)
        
        # Centering matrix for alignment
        H = np.eye(k + 1) - np.ones((k+1, k+1)) / (k + 1)
        
        # Projection matrix onto orthogonal complement of tangent space
        # This is what needs to be minimized in the global embedding
        G = H - theta @ theta.T @ H
        
        # The alignment cost for this neighborhood
        S = G.T @ G
        
        # Add to global matrix
        ix = np.ix_(full_idx, full_idx)
        B[ix] += S
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(B)
    
    # Sort
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip null space (smallest eigenvalues)
    # LTSA has (d+1) zero eigenvalues
    Y = eigenvectors[:, (d+1):(2*d+1)] * np.sqrt(N)
    
    return Y
 
 
def compare_lle_variants(X, n_neighbors=12, n_components=2):
    """
    Compare all LLE variants on the same dataset.
    """
    from sklearn.manifold import LocallyLinearEmbedding
    
    variants = {
        'standard': 'standard',
        'hessian': 'hessian', 
        'modified': 'modified',
        'ltsa': 'ltsa',
    }
    
    results = {}
    for name, method in variants.items():
        try:
            lle = LocallyLinearEmbedding(
                n_neighbors=n_neighbors,
                n_components=n_components,
                method=method
            )
            results[name] = lle.fit_transform(X)
        except Exception as e:
            print(f"{name} failed: {e}")
            results[name] = None
    
    return results
 
 
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    import matplotlib.pyplot as plt
    
    # Generate data
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Compare variants
    results = compare_lle_variants(X, n_neighbors=12, n_components=2)
    
    # Plot
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
    axes = axes.flatten()
    
    for ax, (name, Y) in zip(axes, results.items()):
        if Y is not None:
            ax.scatter(Y[:, 0], Y[:, 1], c=color, cmap='viridis', s=10)
        ax.set_title(f'{name.upper()} LLE')
        ax.set_xlabel('Component 1')
        ax.set_ylabel('Component 2')
    
    plt.tight_layout()
    plt.show()

Comparison of LLE Variants

Each LLE variant has its strengths and appropriate use cases. Here's a comprehensive comparison to guide selection:

LLE Variant Comparison
Variant	Key Feature	Minimum k	Complexity	Best For
Standard LLE	Reconstruction weights	k > d	O(DNk² + Nk³)	Simple manifolds, quick exploration
HLLE	Hessian-based isometry	k > 1+d+d(d+1)/2	O(DNk² + Nk³)	Theoretical guarantees, isometry
MLLE	Multiple weight solutions	k > d	O(DNk² + Nk³)	Ambiguous local geometry
LTSA	Explicit tangent alignment	k > d	O(DNk² + Nk³)	Clear tangent structure

Choose Standard LLE When:

•Starting exploration of a new dataset
•Computational efficiency is critical
•Manifold is well-sampled and simple
•Other methods are unavailable

Choose HLLE When:

•Geometric preservation is important
•Manifold is known to be Riemannian
•Standard LLE produces distortions
•You have enough neighbors

Practical Recommendation

For most applications:

Start with standard LLE to understand the data
If results look distorted, try LTSA (most robust)
If geometry preservation is critical, use HLLE
If results are unstable, use MLLE

In scikit-learn, all variants are available through the 'method' parameter of LocallyLinearEmbedding.

Robust LLE Extensions

Beyond the core variants, several extensions address robustness and special use cases:

1. Robust LLE (RLLE)

Standard LLE is sensitive to outliers—a single corrupted point can distort the entire embedding. Robust LLE addresses this through:

Iteratively reweighted least squares for weight computation
Huber or Tukey loss functions instead of squared error
Outlier detection and removal before embedding

2. Sparse LLE

When interpretability is important, Sparse LLE adds L1 regularization to the weight computation: $$\min_\mathbf{w} |\mathbf{x}i - \mathbf{X}\mathcal{N} \mathbf{w}|^2 + \lambda |\mathbf{w}|_1$$

subject to $\mathbf{1}^T \mathbf{w} = 1$

This produces sparse weights where each point depends on only a few key neighbors.

3. Semi-Supervised LLE

When some points have known coordinates (landmarks), Semi-Supervised LLE incorporates this information:

Fix coordinates of labeled points during optimization
Propagate structure from labeled to unlabeled points
Useful for out-of-sample extension

4. Incremental LLE

For streaming data or very large datasets, Incremental LLE:

Processes data in batches
Updates the embedding incrementally
Avoids recomputing everything for new points

robust_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import numpy as np
from sklearn.neighbors import NearestNeighbors
 
def robust_lle_weights(X, k=10, max_iter=10, huber_delta=1.0):
    """
    Compute LLE weights with Huber loss for robustness.
    
    Uses iteratively reweighted least squares (IRLS) to
    down-weight the influence of outliers.
    """
    N, D = X.shape
    
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]
    
    W = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        x_i = X[i]
        X_nbrs = X[neighbor_idx]
        
        # Initialize weights uniformly
        w = np.ones(k) / k
        
        for iteration in range(max_iter):
            # Compute residuals
            reconstruction = X_nbrs.T @ w
            residual = x_i - reconstruction
            r_norm = np.linalg.norm(residual)
            
            # Compute Huber weights
            if r_norm < huber_delta:
                # Quadratic region - unit weight
                rho = 1.0
            else:
                # Linear region - reduced weight  
                rho = huber_delta / r_norm
            
            # Weighted Gram matrix
            eta = x_i - X_nbrs
            
            # Robustness weights per neighbor
            neighbor_residuals = np.linalg.norm(eta, axis=1)
            neighbor_weights = np.where(
                neighbor_residuals < huber_delta,
                1.0,
                huber_delta / neighbor_residuals
            )
            
            # Weight the Gram matrix
            G = (eta * neighbor_weights[:, None]) @ eta.T
            
            # Regularize
            G += 1e-3 * np.trace(G) / k * np.eye(k)
            
            # Solve
            ones = np.ones(k)
            w_new = np.linalg.solve(G, ones)
            w_new = w_new / np.sum(w_new)
            
            # Check convergence
            if np.linalg.norm(w_new - w) < 1e-6:
                break
            w = w_new
        
        W[i, neighbor_idx] = w
    
    return W
 
 
def sparse_lle_weights(X, k=10, sparsity=0.5, max_iter=100):
    """
    Compute sparse LLE weights with L1 regularization.
    
    Uses proximal gradient descent to solve:
    min ||x - Xw||^2 + lambda * ||w||_1  s.t. sum(w) = 1
    """
    N, D = X.shape
    
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]
    
    W = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        x_i = X[i]
        X_nbrs = X[neighbor_idx]  # Shape: (k, D)
        
        # Gram matrix for gradient
        G = X_nbrs @ X_nbrs.T
        b = X_nbrs @ x_i
        
        # Initialize
        w = np.ones(k) / k
        step_size = 1.0 / (np.linalg.norm(G, 2) + 1e-6)
        lam = sparsity
        
        for _ in range(max_iter):
            # Gradient step
            grad = G @ w - b
            w_temp = w - step_size * grad
            
            # Soft thresholding (proximal operator for L1)
            w_temp = np.sign(w_temp) * np.maximum(np.abs(w_temp) - step_size * lam, 0)
            
            # Project onto sum-to-one constraint
            w_new = w_temp + (1 - np.sum(w_temp)) / k
            
            # Check convergence
            if np.linalg.norm(w_new - w) < 1e-6:
                break
            w = w_new
        
        W[i, neighbor_idx] = w
    
    return W

Out-of-Sample Extension

A significant limitation of LLE and its variants is that they are transductive: they provide embeddings only for the training points. Embedding new points requires re-running the entire algorithm.

Out-of-Sample Extension addresses this by learning a mapping from high-dimensional space to the embedding that can be applied to new points.

Method 1: Nearest Neighbor Interpolation

For a new point $\mathbf{x}_{new}$:

Find its $k$ nearest neighbors in the training set
Compute reconstruction weights as in LLE
Apply the same weights to the neighbors' embedding coordinates: $$\mathbf{y}{new} = \sum{j \in \mathcal{N}_{new}} w_j \mathbf{y}_j$$

Method 2: Kernel Extension

View LLE as implicitly defining a kernel, then use kernel regression: $$\mathbf{y}{new} = \sum_i \alpha_i K(\mathbf{x}{new}, \mathbf{x}_i)$$

where $K$ is derived from the LLE weight structure and $\alpha$ is fit on training data.

Method 3: Neural Network Mapping

Train a neural network to approximate the LLE mapping:

Fit LLE on training data to get $(\mathbf{X}, \mathbf{Y})$ pairs
Train a neural network $f_\theta: \mathbb{R}^D \to \mathbb{R}^d$ on these pairs
Apply $f_\theta$ to new points

Method 4: Parametric LLE

Learn a parametric mapping during the LLE optimization itself, constraining the embedding to be a function of the input: $$\mathbf{y}i = f\theta(\mathbf{x}_i)$$

Optimize over both the embedding and the parameters $\theta$.

out_of_sample.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.manifold import LocallyLinearEmbedding
 
class LLEWithExtension:
    """
    LLE with out-of-sample extension capability.
    """
    
    def __init__(self, n_neighbors=10, n_components=2, reg=1e-3):
        self.n_neighbors = n_neighbors
        self.n_components = n_components
        self.reg = reg
        
    def fit_transform(self, X):
        """Fit LLE and store training data for extension."""
        self.X_train_ = X.copy()
        
        # Fit standard LLE
        lle = LocallyLinearEmbedding(
            n_neighbors=self.n_neighbors,
            n_components=self.n_components,
            reg=self.reg
        )
        self.Y_train_ = lle.fit_transform(X)
        
        # Store neighbor finder
        self.nn_ = NearestNeighbors(n_neighbors=self.n_neighbors+1)
        self.nn_.fit(X)
        
        return self.Y_train_
    
    def transform(self, X_new):
        """
        Embed new points using nearest neighbor interpolation.
        """
        n_new = X_new.shape[0]
        Y_new = np.zeros((n_new, self.n_components))
        
        # Find neighbors in training set
        distances, indices = self.nn_.kneighbors(X_new)
        
        for i in range(n_new):
            # Get neighbors (may include self if X_new contains training points)
            neighbor_idx = indices[i, :self.n_neighbors]
            
            # Compute reconstruction weights
            x_i = X_new[i]
            X_nbrs = self.X_train_[neighbor_idx]
            
            # Gram matrix
            eta = x_i - X_nbrs
            G = eta @ eta.T
            G += self.reg * np.trace(G) / self.n_neighbors * np.eye(self.n_neighbors)
            
            # Solve for weights
            ones = np.ones(self.n_neighbors)
            w = np.linalg.solve(G, ones)
            w = w / np.sum(w)
            
            # Apply weights to embedded coordinates
            Y_nbrs = self.Y_train_[neighbor_idx]
            Y_new[i] = Y_nbrs.T @ w
        
        return Y_new
    
    def reconstruction_error(self, X_new, Y_new):
        """
        Compute reconstruction error for new points.
        """
        error = 0.0
        distances, indices = self.nn_.kneighbors(X_new)
        
        for i in range(len(X_new)):
            neighbor_idx = indices[i, :self.n_neighbors]
            x_i = X_new[i]
            X_nbrs = self.X_train_[neighbor_idx]
            
            # Compute weights
            eta = x_i - X_nbrs
            G = eta @ eta.T
            G += self.reg * np.trace(G) / self.n_neighbors * np.eye(self.n_neighbors)
            ones = np.ones(self.n_neighbors)
            w = np.linalg.solve(G, ones)
            w = w / np.sum(w)
            
            # Reconstruction in embedding space
            Y_nbrs = self.Y_train_[neighbor_idx]
            y_reconstructed = Y_nbrs.T @ w
            
            error += np.linalg.norm(Y_new[i] - y_reconstructed) ** 2
        
        return error
 
 
# Example
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    
    # Generate data
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Split into train and test
    X_train, X_test = X[:800], X[800:]
    color_train, color_test = color[:800], color[800:]
    
    # Fit LLE with extension
    lle = LLEWithExtension(n_neighbors=12, n_components=2)
    Y_train = lle.fit_transform(X_train)
    
    # Extend to test points
    Y_test = lle.transform(X_test)
    
    print(f"Training embedding shape: {Y_train.shape}")
    print(f"Test embedding shape: {Y_test.shape}")

Summary and What's Next

We have explored the rich landscape of LLE variants, each addressing specific limitations of the standard algorithm.

Key Takeaways

•HLLE: Provides isometry guarantees by minimizing Hessian functional; requires more neighbors
•MLLE: Handles non-unique weight solutions by using the full valid weight subspace
•LTSA: Explicitly computes and aligns local tangent spaces; often most stable
•Robust LLE: Uses Huber loss for outlier resistance
•Sparse LLE: Produces interpretable sparse weight matrices
•Out-of-Sample Extension: Enables embedding new points without recomputation

What's Next:

In the final page of this module, we examine the limitations of LLE in detail—understanding when the algorithm fails, why these failures occur, and how to diagnose and mitigate them. This knowledge is essential for applying LLE effectively in practice and knowing when alternative methods are more appropriate.

Page Complete

You now understand the major LLE variants and when to use each. This knowledge enables you to select the most appropriate algorithm for your specific manifold learning task and troubleshoot issues with standard LLE.

4 / 5

Loading learning content...

Machine LearningManifold Learning

Locally Linear Embedding

LevelAdvanced

Duration90 mins

TopicManifold Learning

4 / 5

LLE Variants

Beyond Standard LLE

Learning Objectives

Limitations of Standard LLE

Before exploring variants, we must understand what problems they solve. Standard LLE has several important limitations:

Limitation 1: Ill-Posed Reconstruction (When k > D)

When the number of neighbors $k$ exceeds the input dimension $D$, the local Gram matrix is rank-deficient, requiring regularization. This regularization introduces bias and can distort the embedding.

Limitation 2: Sensitivity to Neighborhood Selection

LLE is sensitive to how neighbors are chosen. If a neighborhood spans high curvature or includes points from different manifold "branches," the linear approximation fails.

Limitation 3: Non-Convex Neighborhoods

Limitation 4: Lack of Isometry Guarantees

Standard LLE preserves local reconstruction weights but does not explicitly preserve geometric quantities like distances or angles. The embedding may distort these properties.

Limitation 5: Mapping to d ≠ Intrinsic Dimension

If we embed into dimension $d$ different from the true intrinsic dimension, LLE can produce collapsed or split embeddings.

Standard LLE Failure Modes
Failure Mode	Symptom	Cause	Variant Solution
Collapsed embedding	All points map near same location	d > intrinsic dim, or disconnected graph	Check eigenvalue spectrum
Distorted unfolding	Swiss roll 'twists' in embedding	k too large, curvature issues	HLLE, smaller k
Spurious connections	Distant manifold points merge	Short-circuits in neighborhood graph	Smaller k, better sampling
Boundary effects	Edge points distorted	Non-convex local geometry	MLLE handles multiple solutions
Unstable results	Different runs give different embeddings	Numerical sensitivity, regularization	LTSA more stable

Recognizing When Standard LLE Fails

Hessian LLE (HLLE)

Hessian Locally Linear Embedding (HLLE) was introduced by Donoho and Grimes (2003) to address a fundamental limitation of standard LLE: the lack of guaranteed isometry.

The Key Insight:

Mathematical Foundation:

For a smooth embedding $f: \mathcal{M} \to \mathbb{R}^d$ of a manifold $\mathcal{M}$ into $\mathbb{R}^d$, the Hessian measures how the embedding curves:

$$H_f(\mathbf{x}) = \nabla^2 f(\mathbf{x})$$

A "flat" embedding (isometry to Euclidean space) has zero Hessian everywhere. HLLE seeks an embedding that minimizes the integrated squared Hessian.

The HLLE Functional:

$$\mathcal{E}{HLLE} = \int\mathcal{M} |H_f|_F^2 , d\mu$$

In practice, this is estimated from the data using local PCA to estimate the tangent space at each point, then constructing discrete Hessian estimators.

Algorithm Outline:

Compute neighborhoods: Find $k$ nearest neighbors for each point
Estimate tangent spaces: Perform local PCA in each neighborhood to find the $d$-dimensional tangent basis
Construct local Hessian estimators: Build matrices $\mathbf{H}_i$ that estimate the Hessian at each point using Taylor expansion in tangent coordinates
Form global quadratic form: Assemble local estimators into a global matrix $\mathbf{M}_{HLLE}$
Eigen-decomposition: Find the $d$ smallest non-trivial eigenvectors of $\mathbf{M}_{HLLE}$

Local Hessian Construction:

For point $i$ with tangent basis $\mathbf{T}_i \in \mathbb{R}^{D \times d}$, project neighbors onto tangent coordinates: $$\mathbf{t}_j = \mathbf{T}_i^T (\mathbf{x}_j - \mathbf{x}_i)$$

Construct the local design matrix including constant, linear, and quadratic terms: $$\mathbf{Z}_i = [\mathbf{1}, \mathbf{t}, \text{vec}(\mathbf{t}\mathbf{t}^T)]$$

The null space of $\mathbf{Z}_i$ (after QR decomposition) provides the Hessian estimator $\mathbf{H}_i$.

hessian_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.linalg import eigh, svd
 
def hessian_lle(X, n_neighbors=10, n_components=2):
    """
    Hessian Locally Linear Embedding (HLLE).
    
    Parameters:
    -----------
    X : ndarray of shape (N, D)
        Input data
    n_neighbors : int
        Number of neighbors (must be > d + d*(d+1)/2)
    n_components : int
        Target dimensionality (d)
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    """
    N, D = X.shape
    d = n_components
    k = n_neighbors
    
    # Minimum neighbors for HLLE
    dp = d + 1 + d * (d + 1) // 2  # d + 1 + d(d+1)/2
    if k < dp:
        raise ValueError(f"n_neighbors must be >= {dp} for d={d} components")
    
    # Find neighbors
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]  # Remove self
    
    # Build the cost matrix
    M = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        
        # Center the neighborhood
        X_nbrs = X[neighbor_idx] - X[i]
        
        # SVD to get tangent space
        U, s, Vt = svd(X_nbrs, full_matrices=False)
        
        # Use top d right singular vectors as tangent basis
        tangent = Vt[:d].T  # Shape: (k, d)
        
        # Project onto tangent coordinates
        nbrs_tangent = X_nbrs @ tangent  # Shape: (k, d)
        
        # Build design matrix for Hessian
        # Include: constant (1), linear terms (t), quadratic terms (t_i * t_j)
        
        # Constant term
        ones = np.ones((k, 1))
        
        # Linear terms
        linear = nbrs_tangent  # Shape: (k, d)
        
        # Quadratic terms: t_1^2, t_1*t_2, ..., t_d^2
        # Number of unique quadratic terms: d*(d+1)/2
        n_quad = d * (d + 1) // 2
        quadratic = np.zeros((k, n_quad))
        idx = 0
        for p in range(d):
            for q in range(p, d):
                quadratic[:, idx] = nbrs_tangent[:, p] * nbrs_tangent[:, q]
                idx += 1
        
        # Full design matrix
        Z = np.hstack([ones, linear, quadratic])  # Shape: (k, 1+d+n_quad)
        
        # QR decomposition to find null space
        Q, R = np.linalg.qr(Z, mode='complete')
        
        # Null space of Z^T is last columns of Q
        # These are the Hessian estimators
        n_null = k - (1 + d + n_quad)
        if n_null <= 0:
            # Not enough neighbors for null space
            continue
        
        W = Q[:, -n_null:]  # Null space basis
        
        # The null space should have dimension d*(d+1)/2 for a d-dim manifold
        # Use it to build the local quadratic term
        
        # Local contribution to M
        H = W @ W.T  # Local Hessian estimator
        
        # Add to global matrix
        ix = np.ix_(neighbor_idx, neighbor_idx)
        M[ix] += H
    
    # Symmetrize
    M = (M + M.T) / 2
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(M)
    
    # Sort by eigenvalue
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip the d+1 smallest eigenvalues (null space)
    # For HLLE, the null space has dimension d+1 (translations + constants)
    Y = eigenvectors[:, (d+1):(2*d+1)] * np.sqrt(N)
    
    return Y
 
 
# Comparison demonstration
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    from sklearn.manifold import LocallyLinearEmbedding
    import matplotlib.pyplot as plt
    
    # Generate data
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Standard LLE
    lle = LocallyLinearEmbedding(n_neighbors=12, n_components=2, method='standard')
    Y_lle = lle.fit_transform(X)
    
    # HLLE from sklearn
    hlle = LocallyLinearEmbedding(n_neighbors=12, n_components=2, method='hessian')
    Y_hlle = hlle.fit_transform(X)
    
    # Plot comparison
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    axes[0].scatter(Y_lle[:, 0], Y_lle[:, 1], c=color, cmap='viridis', s=10)
    axes[0].set_title('Standard LLE')
    
    axes[1].scatter(Y_hlle[:, 0], Y_hlle[:, 1], c=color, cmap='viridis', s=10)
    axes[1].set_title('Hessian LLE')
    
    plt.tight_layout()
    plt.show()

When to Use HLLE

However, HLLE requires more neighbors (k ≥ 1 + d + d(d+1)/2) and is more computationally expensive.

Modified LLE (MLLE)

The Problem with Non-Unique Weights:

Key Insight:

Mathematical Formulation:

$$\mathbf{M}_i = \mathbf{I} - \mathbf{V}_i \mathbf{V}_i^T$$

where $\mathbf{V}_i$ is the orthonormal basis for the right singular vectors of the local reconstruction problem with non-zero singular values.

The global cost matrix is: $$\mathbf{M}_{MLLE} = \sum_i \mathbf{M}_i^{\text{(expanded)}}$$

where "expanded" means extending the local matrix to the full N×N dimensions.

modified_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.linalg import eigh, svd
 
def modified_lle(X, n_neighbors=10, n_components=2, reg=1e-3):
    """
    Modified Locally Linear Embedding (MLLE).
    
    Handles the case when weights are not unique by considering
    the full space of valid weight vectors.
    
    Parameters:
    -----------
    X : ndarray of shape (N, D)
        Input data
    n_neighbors : int
        Number of neighbors
    n_components : int
        Target dimensionality
    reg : float
        Regularization for numerical stability
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    """
    N, D = X.shape
    d = n_components
    k = n_neighbors
    
    # Find neighbors
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]  # Remove self
    
    # Build cost matrix
    M = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        
        # Center at x_i
        X_nbrs = X[neighbor_idx] - X[i]
        
        # SVD of neighborhood
        U, s, Vt = svd(X_nbrs, full_matrices=True)
        
        # Determine effective rank (number of significant singular values)
        # This tells us the intrinsic dimension of the local neighborhood
        tol = max(X_nbrs.shape) * np.finfo(float).eps * s[0]
        rank = np.sum(s > tol)
        
        # If rank < k, we have a subspace of valid weights
        if rank < k:
            # The null space of X_nbrs^T corresponds to weight directions
            # that achieve zero reconstruction error
            
            # V contains right singular vectors
            # Last (k - rank) columns span the null space
            V_null = Vt.T[:, rank:]  # Shape: (k, k-rank)
            
            # Include the valid weight subspace
            # MLLE uses all directions, not just one regularized solution
            
            # Compute the standard weights as well
            # (from the non-null component)
            if rank > 0:
                # Pseudo-inverse solution
                V_valid = Vt.T[:, :rank]  # Shape: (k, rank)
                
                # Gram matrix approach (similar to standard LLE)
                eta = X_nbrs.T  # Shape: (D, k)
                G = X_nbrs @ X_nbrs.T  # Shape: (k, k)
                G += reg * np.trace(G) / k * np.eye(k)
                
                ones = np.ones(k)
                w = np.linalg.solve(G, ones)
                w = w / np.sum(w)
                
                # Extend w to include null space effects
                # For MLLE, we use projection onto the principal subspace
            else:
                # All weights are valid
                w = np.ones(k) / k
            
            # Local cost matrix based on valid subspace
            # M_i = I - V_valid @ V_valid^T (projects onto null space)
            V = Vt.T[:, :max(rank, d)]  # Use at least d components
            H = np.eye(k) - V @ V.T
            
        else:
            # Full rank: unique weight solution
            G = X_nbrs @ X_nbrs.T
            G += reg * np.trace(G) / k * np.eye(k)
            ones = np.ones(k)
            w = np.linalg.solve(G, ones)
            w = w / np.sum(w)
            
            # Local cost matrix from weights
            w = w.reshape(-1, 1)
            H = w @ w.T
        
        # Add to global matrix
        ix = np.ix_(neighbor_idx, neighbor_idx)
        M[ix] += H
    
    # Also need diagonal contributions
    for i in range(N):
        M[i, i] += 1.0
        neighbor_idx = indices[i]
        # Subtract off-diagonal contributions properly
        # (This is simplified; full MLLE has more careful construction)
    
    # Symmetrize
    M = (M + M.T) / 2
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(M)
    
    # Skip smallest eigenvalue (null space)
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    Y = eigenvectors[:, 1:d+1] * np.sqrt(N)
    
    return Y
 
 
# For reliable results, use sklearn's implementation
from sklearn.manifold import LocallyLinearEmbedding
 
def sklearn_mlle(X, n_neighbors=10, n_components=2):
    """Use sklearn's Modified LLE implementation."""
    mlle = LocallyLinearEmbedding(
        n_neighbors=n_neighbors,
        n_components=n_components,
        method='modified'
    )
    return mlle.fit_transform(X)

When MLLE Helps

MLLE provides more stable results by not arbitrarily choosing from multiple valid weight solutions.

Local Tangent Space Alignment (LTSA)

Local Tangent Space Alignment (LTSA) takes a more direct approach to manifold learning: explicitly compute local tangent spaces and then align them globally.

Philosophy:

While LLE implicitly encodes tangent space information through reconstruction weights, LTSA explicitly:

Estimates the tangent space at each point via local PCA
Projects neighbors onto the local tangent space
Aligns all local tangent coordinates into a global embedding

The Algorithm:

Step 1: Local Tangent Space Estimation

For each point $i$, perform PCA on the k-nearest neighbors to find the local tangent space $T_i$:

Center neighbors: $\bar{\mathbf{X}}_i = \mathbf{X}_i - \text{mean}$
SVD: $\bar{\mathbf{X}}_i = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T$
Tangent basis: $\mathbf{T}_i = $ first $d$ columns of $\mathbf{U}$

Step 2: Local Coordinates

Project neighbors onto the tangent space: $$\boldsymbol{\theta}_j^{(i)} = \mathbf{T}_i^T (\mathbf{x}_j - \bar{\mathbf{x}}_i)$$

Step 3: Alignment

Construct a matrix $\mathbf{W}_i$ that maps local tangent coordinates to global embedding coordinates. Minimize: $$\sum_i |\mathbf{Y}_i - \mathbf{W}_i \boldsymbol{\Theta}_i|_F^2$$

subject to constraints preventing trivial solutions.

ltsa.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
import numpy as np
from sklearn.neighbors import NearestNeighbors
from scipy.linalg import eigh, svd
 
def ltsa(X, n_neighbors=10, n_components=2):
    """
    Local Tangent Space Alignment (LTSA).
    
    Explicitly estimates tangent spaces and aligns them globally.
    
    Parameters:
    -----------
    X : ndarray of shape (N, D)
        Input data
    n_neighbors : int
        Number of neighbors
    n_components : int
        Target dimensionality
        
    Returns:
    --------
    Y : ndarray of shape (N, n_components)
        Embedding coordinates
    """
    N, D = X.shape
    d = n_components
    k = n_neighbors
    
    # Find neighbors
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]  # Remove self
    
    # Initialize cost matrix
    B = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        
        # Include x_i in the neighborhood for LTSA
        full_idx = np.concatenate([[i], neighbor_idx])
        X_local = X[full_idx]
        
        # Center
        X_centered = X_local - np.mean(X_local, axis=0)
        
        # Local PCA to find tangent space
        U, s, Vt = svd(X_centered, full_matrices=False)
        
        # Local tangent coordinates (first d dimensions)
        theta = U[:, :d]  # Shape: (k+1, d)
        
        # Centering matrix for alignment
        H = np.eye(k + 1) - np.ones((k+1, k+1)) / (k + 1)
        
        # Projection matrix onto orthogonal complement of tangent space
        # This is what needs to be minimized in the global embedding
        G = H - theta @ theta.T @ H
        
        # The alignment cost for this neighborhood
        S = G.T @ G
        
        # Add to global matrix
        ix = np.ix_(full_idx, full_idx)
        B[ix] += S
    
    # Eigendecomposition
    eigenvalues, eigenvectors = eigh(B)
    
    # Sort
    idx = np.argsort(eigenvalues)
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # Skip null space (smallest eigenvalues)
    # LTSA has (d+1) zero eigenvalues
    Y = eigenvectors[:, (d+1):(2*d+1)] * np.sqrt(N)
    
    return Y
 
 
def compare_lle_variants(X, n_neighbors=12, n_components=2):
    """
    Compare all LLE variants on the same dataset.
    """
    from sklearn.manifold import LocallyLinearEmbedding
    
    variants = {
        'standard': 'standard',
        'hessian': 'hessian', 
        'modified': 'modified',
        'ltsa': 'ltsa',
    }
    
    results = {}
    for name, method in variants.items():
        try:
            lle = LocallyLinearEmbedding(
                n_neighbors=n_neighbors,
                n_components=n_components,
                method=method
            )
            results[name] = lle.fit_transform(X)
        except Exception as e:
            print(f"{name} failed: {e}")
            results[name] = None
    
    return results
 
 
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    import matplotlib.pyplot as plt
    
    # Generate data
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Compare variants
    results = compare_lle_variants(X, n_neighbors=12, n_components=2)
    
    # Plot
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
    axes = axes.flatten()
    
    for ax, (name, Y) in zip(axes, results.items()):
        if Y is not None:
            ax.scatter(Y[:, 0], Y[:, 1], c=color, cmap='viridis', s=10)
        ax.set_title(f'{name.upper()} LLE')
        ax.set_xlabel('Component 1')
        ax.set_ylabel('Component 2')
    
    plt.tight_layout()
    plt.show()

Comparison of LLE Variants

Each LLE variant has its strengths and appropriate use cases. Here's a comprehensive comparison to guide selection:

LLE Variant Comparison
Variant	Key Feature	Minimum k	Complexity	Best For
Standard LLE	Reconstruction weights	k > d	O(DNk² + Nk³)	Simple manifolds, quick exploration
HLLE	Hessian-based isometry	k > 1+d+d(d+1)/2	O(DNk² + Nk³)	Theoretical guarantees, isometry
MLLE	Multiple weight solutions	k > d	O(DNk² + Nk³)	Ambiguous local geometry
LTSA	Explicit tangent alignment	k > d	O(DNk² + Nk³)	Clear tangent structure

Choose Standard LLE When:

•Starting exploration of a new dataset
•Computational efficiency is critical
•Manifold is well-sampled and simple
•Other methods are unavailable

Choose HLLE When:

•Geometric preservation is important
•Manifold is known to be Riemannian
•Standard LLE produces distortions
•You have enough neighbors

Practical Recommendation

For most applications:

Start with standard LLE to understand the data
If results look distorted, try LTSA (most robust)
If geometry preservation is critical, use HLLE
If results are unstable, use MLLE

In scikit-learn, all variants are available through the 'method' parameter of LocallyLinearEmbedding.

Robust LLE Extensions

Beyond the core variants, several extensions address robustness and special use cases:

1. Robust LLE (RLLE)

Standard LLE is sensitive to outliers—a single corrupted point can distort the entire embedding. Robust LLE addresses this through:

Iteratively reweighted least squares for weight computation
Huber or Tukey loss functions instead of squared error
Outlier detection and removal before embedding

2. Sparse LLE

When interpretability is important, Sparse LLE adds L1 regularization to the weight computation: $$\min_\mathbf{w} |\mathbf{x}i - \mathbf{X}\mathcal{N} \mathbf{w}|^2 + \lambda |\mathbf{w}|_1$$

subject to $\mathbf{1}^T \mathbf{w} = 1$

This produces sparse weights where each point depends on only a few key neighbors.

3. Semi-Supervised LLE

When some points have known coordinates (landmarks), Semi-Supervised LLE incorporates this information:

Fix coordinates of labeled points during optimization
Propagate structure from labeled to unlabeled points
Useful for out-of-sample extension

4. Incremental LLE

For streaming data or very large datasets, Incremental LLE:

Processes data in batches
Updates the embedding incrementally
Avoids recomputing everything for new points

robust_lle.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import numpy as np
from sklearn.neighbors import NearestNeighbors
 
def robust_lle_weights(X, k=10, max_iter=10, huber_delta=1.0):
    """
    Compute LLE weights with Huber loss for robustness.
    
    Uses iteratively reweighted least squares (IRLS) to
    down-weight the influence of outliers.
    """
    N, D = X.shape
    
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]
    
    W = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        x_i = X[i]
        X_nbrs = X[neighbor_idx]
        
        # Initialize weights uniformly
        w = np.ones(k) / k
        
        for iteration in range(max_iter):
            # Compute residuals
            reconstruction = X_nbrs.T @ w
            residual = x_i - reconstruction
            r_norm = np.linalg.norm(residual)
            
            # Compute Huber weights
            if r_norm < huber_delta:
                # Quadratic region - unit weight
                rho = 1.0
            else:
                # Linear region - reduced weight  
                rho = huber_delta / r_norm
            
            # Weighted Gram matrix
            eta = x_i - X_nbrs
            
            # Robustness weights per neighbor
            neighbor_residuals = np.linalg.norm(eta, axis=1)
            neighbor_weights = np.where(
                neighbor_residuals < huber_delta,
                1.0,
                huber_delta / neighbor_residuals
            )
            
            # Weight the Gram matrix
            G = (eta * neighbor_weights[:, None]) @ eta.T
            
            # Regularize
            G += 1e-3 * np.trace(G) / k * np.eye(k)
            
            # Solve
            ones = np.ones(k)
            w_new = np.linalg.solve(G, ones)
            w_new = w_new / np.sum(w_new)
            
            # Check convergence
            if np.linalg.norm(w_new - w) < 1e-6:
                break
            w = w_new
        
        W[i, neighbor_idx] = w
    
    return W
 
 
def sparse_lle_weights(X, k=10, sparsity=0.5, max_iter=100):
    """
    Compute sparse LLE weights with L1 regularization.
    
    Uses proximal gradient descent to solve:
    min ||x - Xw||^2 + lambda * ||w||_1  s.t. sum(w) = 1
    """
    N, D = X.shape
    
    nn = NearestNeighbors(n_neighbors=k+1)
    nn.fit(X)
    _, indices = nn.kneighbors(X)
    indices = indices[:, 1:]
    
    W = np.zeros((N, N))
    
    for i in range(N):
        neighbor_idx = indices[i]
        x_i = X[i]
        X_nbrs = X[neighbor_idx]  # Shape: (k, D)
        
        # Gram matrix for gradient
        G = X_nbrs @ X_nbrs.T
        b = X_nbrs @ x_i
        
        # Initialize
        w = np.ones(k) / k
        step_size = 1.0 / (np.linalg.norm(G, 2) + 1e-6)
        lam = sparsity
        
        for _ in range(max_iter):
            # Gradient step
            grad = G @ w - b
            w_temp = w - step_size * grad
            
            # Soft thresholding (proximal operator for L1)
            w_temp = np.sign(w_temp) * np.maximum(np.abs(w_temp) - step_size * lam, 0)
            
            # Project onto sum-to-one constraint
            w_new = w_temp + (1 - np.sum(w_temp)) / k
            
            # Check convergence
            if np.linalg.norm(w_new - w) < 1e-6:
                break
            w = w_new
        
        W[i, neighbor_idx] = w
    
    return W

Out-of-Sample Extension

Out-of-Sample Extension addresses this by learning a mapping from high-dimensional space to the embedding that can be applied to new points.

Method 1: Nearest Neighbor Interpolation

For a new point $\mathbf{x}_{new}$:

Find its $k$ nearest neighbors in the training set
Compute reconstruction weights as in LLE
Apply the same weights to the neighbors' embedding coordinates: $$\mathbf{y}{new} = \sum{j \in \mathcal{N}_{new}} w_j \mathbf{y}_j$$

Method 2: Kernel Extension

View LLE as implicitly defining a kernel, then use kernel regression: $$\mathbf{y}{new} = \sum_i \alpha_i K(\mathbf{x}{new}, \mathbf{x}_i)$$

where $K$ is derived from the LLE weight structure and $\alpha$ is fit on training data.

Method 3: Neural Network Mapping

Train a neural network to approximate the LLE mapping:

Fit LLE on training data to get $(\mathbf{X}, \mathbf{Y})$ pairs
Train a neural network $f_\theta: \mathbb{R}^D \to \mathbb{R}^d$ on these pairs
Apply $f_\theta$ to new points

Method 4: Parametric LLE

Learn a parametric mapping during the LLE optimization itself, constraining the embedding to be a function of the input: $$\mathbf{y}i = f\theta(\mathbf{x}_i)$$

Optimize over both the embedding and the parameters $\theta$.

out_of_sample.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.manifold import LocallyLinearEmbedding
 
class LLEWithExtension:
    """
    LLE with out-of-sample extension capability.
    """
    
    def __init__(self, n_neighbors=10, n_components=2, reg=1e-3):
        self.n_neighbors = n_neighbors
        self.n_components = n_components
        self.reg = reg
        
    def fit_transform(self, X):
        """Fit LLE and store training data for extension."""
        self.X_train_ = X.copy()
        
        # Fit standard LLE
        lle = LocallyLinearEmbedding(
            n_neighbors=self.n_neighbors,
            n_components=self.n_components,
            reg=self.reg
        )
        self.Y_train_ = lle.fit_transform(X)
        
        # Store neighbor finder
        self.nn_ = NearestNeighbors(n_neighbors=self.n_neighbors+1)
        self.nn_.fit(X)
        
        return self.Y_train_
    
    def transform(self, X_new):
        """
        Embed new points using nearest neighbor interpolation.
        """
        n_new = X_new.shape[0]
        Y_new = np.zeros((n_new, self.n_components))
        
        # Find neighbors in training set
        distances, indices = self.nn_.kneighbors(X_new)
        
        for i in range(n_new):
            # Get neighbors (may include self if X_new contains training points)
            neighbor_idx = indices[i, :self.n_neighbors]
            
            # Compute reconstruction weights
            x_i = X_new[i]
            X_nbrs = self.X_train_[neighbor_idx]
            
            # Gram matrix
            eta = x_i - X_nbrs
            G = eta @ eta.T
            G += self.reg * np.trace(G) / self.n_neighbors * np.eye(self.n_neighbors)
            
            # Solve for weights
            ones = np.ones(self.n_neighbors)
            w = np.linalg.solve(G, ones)
            w = w / np.sum(w)
            
            # Apply weights to embedded coordinates
            Y_nbrs = self.Y_train_[neighbor_idx]
            Y_new[i] = Y_nbrs.T @ w
        
        return Y_new
    
    def reconstruction_error(self, X_new, Y_new):
        """
        Compute reconstruction error for new points.
        """
        error = 0.0
        distances, indices = self.nn_.kneighbors(X_new)
        
        for i in range(len(X_new)):
            neighbor_idx = indices[i, :self.n_neighbors]
            x_i = X_new[i]
            X_nbrs = self.X_train_[neighbor_idx]
            
            # Compute weights
            eta = x_i - X_nbrs
            G = eta @ eta.T
            G += self.reg * np.trace(G) / self.n_neighbors * np.eye(self.n_neighbors)
            ones = np.ones(self.n_neighbors)
            w = np.linalg.solve(G, ones)
            w = w / np.sum(w)
            
            # Reconstruction in embedding space
            Y_nbrs = self.Y_train_[neighbor_idx]
            y_reconstructed = Y_nbrs.T @ w
            
            error += np.linalg.norm(Y_new[i] - y_reconstructed) ** 2
        
        return error
 
 
# Example
if __name__ == "__main__":
    from sklearn.datasets import make_swiss_roll
    
    # Generate data
    X, color = make_swiss_roll(n_samples=1000, noise=0.1, random_state=42)
    
    # Split into train and test
    X_train, X_test = X[:800], X[800:]
    color_train, color_test = color[:800], color[800:]
    
    # Fit LLE with extension
    lle = LLEWithExtension(n_neighbors=12, n_components=2)
    Y_train = lle.fit_transform(X_train)
    
    # Extend to test points
    Y_test = lle.transform(X_test)
    
    print(f"Training embedding shape: {Y_train.shape}")
    print(f"Test embedding shape: {Y_test.shape}")

Summary and What's Next

We have explored the rich landscape of LLE variants, each addressing specific limitations of the standard algorithm.

Key Takeaways

•HLLE: Provides isometry guarantees by minimizing Hessian functional; requires more neighbors
•MLLE: Handles non-unique weight solutions by using the full valid weight subspace
•LTSA: Explicitly computes and aligns local tangent spaces; often most stable
•Robust LLE: Uses Huber loss for outlier resistance
•Sparse LLE: Produces interpretable sparse weight matrices
•Out-of-Sample Extension: Enables embedding new points without recomputation

What's Next:

Page Complete

4 / 5