Machine LearningManifold Learning

Manifold Hypothesis

LevelAdvanced

Duration90 mins

TopicManifold Learning

3 / 5

Manifold Examples

A Gallery of Geometric Structures

Understanding manifold learning algorithms requires a library of example manifolds—some with known structure for validation, others representing real-world data. This page provides a comprehensive tour of manifolds that appear throughout machine learning research and practice.

Why Study Examples?

Manifold examples serve multiple purposes:

Algorithm benchmarking: Synthetic manifolds with known properties test whether algorithms recover true structure
Intuition building: Visualizable examples (2D manifolds in 3D space) develop geometric reasoning
Edge case discovery: Different topologies (holes, boundaries, high curvature) reveal algorithm limitations
Real-world grounding: Connecting abstract theory to actual data domains

We'll explore synthetic manifolds exhaustively, then connect to the manifold structures hidden in real datasets.

Learning Objectives

By completing this page, you will:

• Generate and visualize canonical synthetic manifolds (Swiss roll, S-curve, torus, sphere) • Understand the distinctive properties of each example (topology, curvature, boundaries) • Recognize which algorithm limitations each example reveals • Connect synthetic examples to real-world data manifolds • Generate high-dimensional manifold embeddings for testing

The Swiss Roll: The Canonical Test Case

The Swiss roll is perhaps the most famous manifold learning test case. It's a 2-dimensional surface embedded in 3D space by 'rolling up' a flat rectangle like a Swiss roll cake.

Mathematical Construction:

The Swiss roll is parameterized by two intrinsic coordinates (t, s):

$$x = t \cos(t)$$ $$y = s$$ $$z = t \sin(t)$$

where typically t ∈ [3π/2, 9π/2] (about 1.5 to 4.5 turns) and s ∈ [0, H] for some height H.

The parameter t controls position along the spiral, while s controls height. Points with similar (t, s) values are close along the manifold, even if far apart in Euclidean 3D space.

Key Properties:

Swiss Roll Characteristics

•Intrinsic dimension: 2 (parameterized by t and s)
•Ambient dimension: 3 (embedded in ℝ³)
•Topology: Simple (homeomorphic to a rectangle—no holes)
•Geodesic vs Euclidean: Points on different 'windings' may be close in ℝ³ but far along the manifold. This is the key test!
•Curvature: Varies with t (inner windings have higher curvature than outer)
•Boundaries: Has boundaries at the 'ends' of the roll and at top/bottom

swiss_roll.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_swiss_roll(n_samples=2000, noise=0.0, random_state=42):
    """
    Generate Swiss roll dataset.
    
    Parameters:
    -----------
    n_samples : int
        Number of points to generate
    noise : float
        Standard deviation of Gaussian noise added to the manifold
    random_state : int
        Random seed for reproducibility
    
    Returns:
    --------
    X : ndarray of shape (n_samples, 3)
        The 3D coordinates
    t : ndarray of shape (n_samples,)
        The 'unrolling' parameter (intrinsic coordinate along spiral)
    """
    np.random.seed(random_state)
    
    # Parameter t controls position along the spiral
    t = 1.5 * np.pi * (1 + 2 * np.random.rand(n_samples))
    
    # Height is uniform along the roll
    height = 21 * np.random.rand(n_samples)
    
    # Embedding into 3D
    x = t * np.cos(t)
    y = height
    z = t * np.sin(t)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, t
 
def visualize_swiss_roll(X, color, title='Swiss Roll'):
    """Create a 3D visualization of the Swiss roll."""
    fig = plt.figure(figsize=(12, 5))
    
    # 3D view
    ax1 = fig.add_subplot(121, projection='3d')
    scatter = ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, 
                          cmap='viridis', s=10, alpha=0.8)
    ax1.set_xlabel('X')
    ax1.set_ylabel('Y (height)')
    ax1.set_zlabel('Z')
    ax1.set_title(f'{title} - 3D View')
    plt.colorbar(scatter, ax=ax1, label='t (geodesic position)')
    
    # Top-down view showing the spiral
    ax2 = fig.add_subplot(122)
    scatter2 = ax2.scatter(X[:, 0], X[:, 2], c=color, 
                           cmap='viridis', s=10, alpha=0.8)
    ax2.set_xlabel('X')
    ax2.set_ylabel('Z')
    ax2.set_title(f'{title} - Top View (X-Z plane)')
    ax2.set_aspect('equal')
    plt.colorbar(scatter2, ax=ax2, label='t')
    
    plt.tight_layout()
    plt.show()
 
def demonstrate_geodesic_vs_euclidean(X, t):
    """Show how Euclidean distance differs from geodesic distance."""
    # Find two points close in Euclidean distance but far in geodesic
    from scipy.spatial.distance import cdist
    
    # Choose a point near the beginning of the roll
    idx1 = np.argmin(np.abs(t - 2*np.pi))  # t ≈ 2π
    
    # Find nearby point in Euclidean space
    distances = cdist(X[idx1:idx1+1], X)[0]
    distances[idx1] = np.inf  # Exclude self
    
    # Get closest Euclidean neighbor
    idx2_euclidean = np.argmin(distances)
    
    # The geodesic distance is proportional to |t1 - t2|
    geodesic_dist = np.abs(t[idx1] - t[idx2_euclidean])
    euclidean_dist = distances[idx2_euclidean]
    
    print(f"Point 1: t = {t[idx1]:.2f}, coords = {X[idx1]}")
    print(f"Point 2: t = {t[idx2_euclidean]:.2f}, coords = {X[idx2_euclidean]}")
    print(f"Euclidean distance: {euclidean_dist:.3f}")
    print(f"Geodesic t-difference: {geodesic_dist:.3f}")
    print(f"
→ Points may be close in 3D but FAR along the manifold!")
 
# Generate and visualize
X, t = generate_swiss_roll(n_samples=2000, noise=0.1)
visualize_swiss_roll(X, t)
demonstrate_geodesic_vs_euclidean(X, t)

Why Swiss Roll Is the Gold Standard Test

A good manifold learning algorithm must 'unroll' the Swiss roll, recovering the 2D rectangle parameterized by (t, s). Algorithms that only use Euclidean distances will fail—they'll connect points on different windings that are close in 3D but far on the manifold. This is the litmus test for manifold-aware methods.

The S-Curve: A Gentler Introduction

The S-curve (or S-shaped surface) is a simpler alternative to the Swiss roll. It curves but doesn't self-intersect or wind multiple times.

Mathematical Construction:

The S-curve is generated as:

$$x = \sin(t)$$ $$y = s$$ $$z = t$$

where t ∈ [-π, π] traces the S-shape and s ∈ [0, H] is the height.

Key Properties:

S-Curve Characteristics

•Intrinsic dimension: 2
•Ambient dimension: 3
•Topology: Simple (homeomorphic to a rectangle)
•Geodesic vs Euclidean: Less dramatic difference than Swiss roll, but still nonlinear
•Curvature: Moderate, relatively uniform
•Use case: Good for initial testing; success here doesn't guarantee success on harder manifolds

s_curve.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_s_curve(n_samples=2000, noise=0.0, random_state=42):
    """
    Generate S-curve dataset.
    
    Returns:
    --------
    X : ndarray of shape (n_samples, 3)
        The 3D coordinates
    t : ndarray of shape (n_samples,)
        The parameter along the S-curve
    """
    np.random.seed(random_state)
    
    # Parameter t controls position along the S
    t = 3 * np.pi * (np.random.rand(n_samples) - 0.5)
    
    # Height is uniform
    height = 2 * np.random.rand(n_samples)
    
    # Embedding into 3D
    x = np.sin(t)
    y = height
    z = np.sign(t) * (np.cos(t) - 1)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, t
 
# Generate and visualize
X_scurve, t_scurve = generate_s_curve(n_samples=1500, noise=0.05)
 
fig = plt.figure(figsize=(10, 4))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X_scurve[:, 0], X_scurve[:, 1], X_scurve[:, 2], 
           c=t_scurve, cmap='coolwarm', s=10, alpha=0.8)
ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
ax.set_title('S-Curve: 2D manifold in 3D')
plt.tight_layout()
plt.show()

When to Use S-Curve:

The S-curve is useful for:

Quick sanity checks of manifold learning implementations
Verifying that algorithms handle mild nonlinearity
Educational demonstrations (easier to visualize than Swiss roll)

However, passing the S-curve test isn't sufficient. Algorithms should also be tested on Swiss roll, torus, and other challenging cases.

Spheres: Closed Manifolds with Constant Curvature

Spheres represent closed manifolds (compact, without boundary) with constant positive curvature. They arise naturally in data normalization and directional statistics.

The n-Sphere Sⁿ:

The n-dimensional sphere Sⁿ is the set of points in ℝⁿ⁺¹ at unit distance from the origin:

$$S^n = {x \in \mathbb{R}^{n+1} : |x|_2 = 1}$$

Important cases:

S¹: Circle (1D manifold in ℝ²)
S²: Ordinary sphere (2D manifold in ℝ³)
Sⁿ: Hypersphere (n-dimensional manifold in ℝⁿ⁺¹)

Key Properties of Spheres:

Sphere Characteristics

•Closed (no boundary): Every point has a full neighborhood; no 'edges'
•Constant positive curvature: Gaussian curvature = 1/r² everywhere for radius r
•Geodesics are great circles: Shortest paths on S² are arcs of circles passing through the center
•Finite diameter: Maximum geodesic distance is π·r (antipodal points)
•Non-Euclidean behavior: Triangle angle sum > 180° on positively curved surfaces

Sampling Uniformly on Spheres:

Uniform sampling on Sⁿ is trickier than it appears. The naive approach of normalizing uniform random vectors works correctly:

Sample z ∈ ℝⁿ⁺¹ with each zᵢ ~ N(0, 1)
Normalize: x = z / ‖z‖

This works because the multivariate Gaussian is spherically symmetric.

sphere_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_sphere(n_samples=1000, dim=3, noise=0.0):
    """
    Generate points uniformly on (dim-1)-dimensional sphere in R^dim.
    
    Parameters:
    -----------
    n_samples : int
        Number of points
    dim : int
        Ambient dimension (sphere is (dim-1)-dimensional)
    noise : float
        Standard deviation of radial noise
    
    Returns:
    --------
    X : ndarray of shape (n_samples, dim)
        Points on sphere (plus noise)
    """
    # Sample from multivariate normal
    X = np.random.randn(n_samples, dim)
    
    # Normalize to unit sphere
    norms = np.linalg.norm(X, axis=1, keepdims=True)
    X = X / norms
    
    if noise > 0:
        # Add radial noise (points move in/out from sphere)
        radial_noise = 1 + noise * np.random.randn(n_samples, 1)
        X = X * radial_noise
    
    return X
 
def visualize_geodesics_on_sphere():
    """
    Visualize how geodesics (great circles) work on S².
    """
    fig = plt.figure(figsize=(12, 5))
    
    # Wireframe sphere
    u = np.linspace(0, 2 * np.pi, 50)
    v = np.linspace(0, np.pi, 25)
    x = np.outer(np.cos(u), np.sin(v))
    y = np.outer(np.sin(u), np.sin(v))
    z = np.outer(np.ones(50), np.cos(v))
    
    ax = fig.add_subplot(121, projection='3d')
    ax.plot_wireframe(x, y, z, alpha=0.2, color='gray')
    
    # Draw a great circle (geodesic) - equator
    theta = np.linspace(0, 2*np.pi, 100)
    ax.plot(np.cos(theta), np.sin(theta), np.zeros_like(theta), 
            'r-', linewidth=2, label='Geodesic (great circle)')
    
    # Mark two antipodal points
    ax.scatter([1, -1], [0, 0], [0, 0], c='blue', s=100, 
               label='Antipodal (max geodesic distance)')
    
    ax.set_title('S² with Geodesic (Great Circle)')
    ax.legend()
    
    # Geodesic distance demonstration
    ax2 = fig.add_subplot(122)
    angles = np.linspace(0, np.pi, 100)
    geodesic_dist = angles  # Geodesic distance = angle in radians
    euclidean_dist = 2 * np.sin(angles / 2)  # Chord length
    
    ax2.plot(np.degrees(angles), geodesic_dist, 'b-', 
             linewidth=2, label='Geodesic distance')
    ax2.plot(np.degrees(angles), euclidean_dist, 'r--', 
             linewidth=2, label='Euclidean (chord) distance')
    ax2.set_xlabel('Angle between points (degrees)')
    ax2.set_ylabel('Distance (r=1 sphere)')
    ax2.set_title('Geodesic vs Euclidean Distance on Sphere')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()
 
# Generate sphere samples
X_sphere = generate_sphere(n_samples=1000, dim=3, noise=0.02)
 
# Visualize
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X_sphere[:, 0], X_sphere[:, 1], X_sphere[:, 2], 
           c=X_sphere[:, 2], cmap='coolwarm', s=10, alpha=0.8)
ax.set_title('S²: Points on 2-sphere in ℝ³')
plt.show()
 
visualize_geodesics_on_sphere()

Spheres in Machine Learning

Spheres appear frequently in ML:

• Normalized embeddings (e.g., word2vec, cosine similarity): Live on hyperspheres • Angular margin losses: Face recognition models project to spheres • Directional statistics: von Mises-Fisher distributions on spheres • Hyperbolic alternatives: Some data is better represented on negatively curved spaces (hyperbolic geometry) rather than positively curved spheres

The Torus: Nontrivial Topology

The torus (plural: tori) is a 2-dimensional manifold with nontrivial topology—it has a 'hole' that distinguishes it from a sphere or plane. The torus cannot be continuously deformed into a sphere; its topology is fundamentally different.

Mathematical Construction:

A torus is parameterized by two angles (θ, φ):

$$x = (R + r\cos\theta) \cos\phi$$ $$y = (R + r\cos\theta) \sin\phi$$ $$z = r\sin\theta$$

where:

R = major radius (center of torus to center of tube)
r = minor radius (radius of the tube)
θ ∈ [0, 2π) = angle around the tube ('poloidal')
φ ∈ [0, 2π) = angle around the torus ('toroidal')

Both angles are periodic: moving 2π radians in either direction returns to the same point.

Torus Characteristics

•Intrinsic dimension: 2
•Ambient dimension: 3 (for standard embedding)
•Topology: Genus 1 (one 'hole'); fundamental group ℤ × ℤ
•Cannot be 'unrolled': Unlike Swiss roll, no homeomorphism to flat rectangle
•Periodic coordinates: Both θ and φ are circular; manifold is compact without boundary
•Mixed curvature: Positive, zero, and negative curvature regions coexist

torus_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_torus(n_samples=2000, R=3, r=1, noise=0.0):
    """
    Generate points on a torus.
    
    Parameters:
    -----------
    n_samples : int
        Number of points
    R : float
        Major radius (torus center to tube center)
    r : float
        Minor radius (tube radius)
    noise : float
        Standard deviation of Gaussian noise
    
    Returns:
    --------
    X : ndarray of shape (n_samples, 3)
        3D coordinates
    angles : ndarray of shape (n_samples, 2)
        The (theta, phi) parameters
    """
    np.random.seed(42)
    
    theta = 2 * np.pi * np.random.rand(n_samples)
    phi = 2 * np.pi * np.random.rand(n_samples)
    
    x = (R + r * np.cos(theta)) * np.cos(phi)
    y = (R + r * np.cos(theta)) * np.sin(phi)
    z = r * np.sin(theta)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, np.column_stack([theta, phi])
 
def visualize_torus_topology():
    """
    Visualize the periodic nature of torus coordinates.
    """
    fig = plt.figure(figsize=(15, 5))
    
    # 3D view
    X, angles = generate_torus(n_samples=2000, noise=0.02)
    
    ax1 = fig.add_subplot(131, projection='3d')
    ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=angles[:, 0], 
                cmap='hsv', s=5, alpha=0.6)
    ax1.set_title('Torus colored by θ (around tube)')
    
    ax2 = fig.add_subplot(132, projection='3d')
    ax2.scatter(X[:, 0], X[:, 1], X[:, 2], c=angles[:, 1], 
                cmap='hsv', s=5, alpha=0.6)
    ax2.set_title('Torus colored by φ (around torus)')
    
    # Flat parameter space (reveals periodicity)
    ax3 = fig.add_subplot(133)
    ax3.scatter(angles[:, 0], angles[:, 1], c='blue', s=1, alpha=0.3)
    ax3.set_xlabel('θ (around tube)')
    ax3.set_ylabel('φ (around torus)')
    ax3.set_title('Parameter space (flat torus)
Edges should be "glued"')
    ax3.set_xlim(0, 2*np.pi)
    ax3.set_ylim(0, 2*np.pi)
    
    plt.tight_layout()
    plt.show()
 
def demonstrate_torus_topology():
    """
    Show why the torus cannot be unrolled like Swiss roll.
    """
    print("=== Why Torus is Special ===")
    print("
1. Two independent 'loops' cannot be contracted:")
    print("   - Going around φ (main circle) returns to start")
    print("   - Going around θ (tube) returns to start")
    print("   - These loops cannot shrink to a point without leaving the surface")
    print("
2. Fundamental group: π₁(T²) = ℤ × ℤ")
    print("   - Classifies loops up to continuous deformation")
    print("   - Means there are two 'types' of non-contractible loops")
    print("
3. Cannot embed flat in plane:")
    print("   - Unrolling to rectangle would require cutting")
    print("   - Rectangle with edges 'glued' = flat torus (different metric)")
    print("
4. Implication for ML:")
    print("   - Algorithms assuming simple topology may fail")
    print("   - t-SNE/UMAP can struggle with toroidal structure")
    print("   - Need topology-aware methods for faithful embedding")
 
visualize_torus_topology()
demonstrate_torus_topology()

Torus as Algorithm Stress Test

The torus tests whether an algorithm respects topology. Methods that try to 'unfold' the torus to a flat rectangle will necessarily fail—they'll either tear the manifold (creating discontinuities) or fold it (creating false proximities). The torus reveals algorithm limitations that simpler manifolds hide.

Additional Synthetic Manifolds

Beyond the standard test cases, several other synthetic manifolds serve specialized testing purposes.

The Severed Sphere (Hemisphere):

A hemisphere—half of S²—is a manifold with boundary. The boundary creates challenges:

Boundary points have half-neighborhoods
Local dimension estimation fails at edges
Unfolding to a disk distorts the boundary severely

Klein Bottle (Non-orientable):

A 2D manifold that cannot be embedded in ℝ³ without self-intersection. It's non-orientable—there's no consistent 'inside' vs 'outside.' While rare in ML, it tests topological sophistication.

Intersecting Manifolds:

Two manifolds that cross or touch create singularities—points with higher local dimension. Real data often has such structure where multiple data-generating processes coexist.

additional_manifolds.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_linked_rings(n_samples_per_ring=500, R=3, r=0.5):
    """
    Generate two interlocking rings (links).
    Tests handling of multiple connected components with entanglement.
    """
    # First ring: in XY plane centered at origin
    theta1 = 2 * np.pi * np.random.rand(n_samples_per_ring)
    x1 = R * np.cos(theta1) + r * np.random.randn(n_samples_per_ring) * 0.2
    y1 = R * np.sin(theta1) + r * np.random.randn(n_samples_per_ring) * 0.2
    z1 = r * np.random.randn(n_samples_per_ring) * 0.2
    ring1 = np.column_stack([x1, y1, z1])
    
    # Second ring: in XZ plane, offset so they link
    theta2 = 2 * np.pi * np.random.rand(n_samples_per_ring)
    x2 = R * np.cos(theta2) + R + r * np.random.randn(n_samples_per_ring) * 0.2
    y2 = r * np.random.randn(n_samples_per_ring) * 0.2
    z2 = R * np.sin(theta2) + r * np.random.randn(n_samples_per_ring) * 0.2
    ring2 = np.column_stack([x2, y2, z2])
    
    labels = np.concatenate([np.zeros(n_samples_per_ring), 
                              np.ones(n_samples_per_ring)])
    
    return np.vstack([ring1, ring2]), labels
 
def generate_trefoil_knot(n_samples=1000, noise=0.0):
    """
    Generate points on a trefoil knot.
    A 1D manifold (curve) with complex 3D embedding.
    """
    t = np.linspace(0, 2*np.pi, n_samples)
    
    x = np.sin(t) + 2 * np.sin(2*t)
    y = np.cos(t) - 2 * np.cos(2*t)
    z = -np.sin(3*t)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, t
 
def generate_two_crossing_planes(n_samples=2000, angle=np.pi/4):
    """
    Generate two planes that intersect along a line.
    Creates a singularity (higher local dimension at intersection).
    """
    n_half = n_samples // 2
    
    # Plane 1: XY plane
    x1 = 4 * np.random.rand(n_half) - 2
    y1 = 4 * np.random.rand(n_half) - 2
    z1 = np.zeros(n_half)
    plane1 = np.column_stack([x1, y1, z1])
    
    # Plane 2: rotated around X axis
    x2 = 4 * np.random.rand(n_half) - 2
    y2_original = 4 * np.random.rand(n_half) - 2
    z2_original = np.zeros(n_half)
    
    # Rotate
    y2 = y2_original * np.cos(angle) - z2_original * np.sin(angle)
    z2 = y2_original * np.sin(angle) + z2_original * np.cos(angle)
    plane2 = np.column_stack([x2, y2, z2])
    
    labels = np.concatenate([np.zeros(n_half), np.ones(n_half)])
    
    return np.vstack([plane1, plane2]), labels
 
# Visualize these manifolds
fig = plt.figure(figsize=(15, 5))
 
ax1 = fig.add_subplot(131, projection='3d')
rings, ring_labels = generate_linked_rings()
ax1.scatter(rings[:, 0], rings[:, 1], rings[:, 2], 
            c=ring_labels, cmap='bwr', s=5)
ax1.set_title('Interlocked Rings
(Multiple components)')
 
ax2 = fig.add_subplot(132, projection='3d')
trefoil, t_knot = generate_trefoil_knot()
ax2.scatter(trefoil[:, 0], trefoil[:, 1], trefoil[:, 2], 
            c=t_knot, cmap='viridis', s=10)
ax2.set_title('Trefoil Knot
(1D manifold, complex embedding)')
 
ax3 = fig.add_subplot(133, projection='3d')
planes, plane_labels = generate_two_crossing_planes()
ax3.scatter(planes[:, 0], planes[:, 1], planes[:, 2], 
            c=plane_labels, cmap='coolwarm', s=3, alpha=0.5)
ax3.set_title('Intersecting Planes
(Singularity along intersection)')
 
plt.tight_layout()
plt.show()

Synthetic Manifold Properties
Manifold	ID	Topology	Special Property	Tests
Swiss roll	2	Rectangle	Geodesic ≠ Euclidean	Unrolling ability
S-curve	2	Rectangle	Mild curvature	Basic nonlinearity
Sphere	n-1	Closed, no boundary	Constant curvature	Closed manifolds
Torus	2	Genus 1 (hole)	Cannot unfold	Topology preservation
Hemisphere	2	Disk (boundary)	Boundary present	Boundary handling
Linked rings	1 each	Two components	Entanglement	Multiple components

Real-World Data Manifolds

Synthetic manifolds are valuable for testing, but the ultimate goal is understanding real data. Here we explore manifold structure in actual datasets.

Face Images:

The space of face images exhibits clear manifold structure. Key variations:

Pose (3 degrees of freedom): rotation around yaw, pitch, roll
Lighting (2-4 DOF): direction and intensity of illumination
Expression (~5-10 DOF): muscle activations for different expressions
Identity (varies): different individuals form different submanifolds

The famous Frey faces dataset (1965 frames from video) has estimated intrinsic dimension ~3-5, representing pose variation. Full face datasets (CelebA, etc.) have higher ID but still << ambient dimension.

Handwritten Digit Images:

MNIST Manifold Properties

•10 disconnected components: One sub-manifold per digit class
•Intrinsic dimension per class: ~5-8 (stroke thickness, slant, size, style)
•Inter-class transitions: Some digits connect (1-7, 4-9) via smooth morphs
•Outliers: Badly written digits may lie off the main manifolds
•Total estimated ID: ~10-14 for the full dataset

Natural Language (Word Embeddings):

Word embeddings (Word2Vec, GloVe, BERT) map words to vectors where semantic relationships form approximate manifold structure:

Synonyms cluster close together
Analogies form near-parallelograms (king - man + woman ≈ queen)
Topics form clusters or sub-manifolds

The 'geometry' of language is more complex than simple manifolds—it's hierarchical, context-dependent, and has mixed discrete/continuous aspects. But manifold perspectives offer useful intuitions.

Molecular Data:

Molecular conformations (3D arrangements of atoms in a molecule) form low-dimensional manifolds in high-dimensional descriptor space:

Bond lengths and angles are constrained by physics
Torsional rotations around bonds are often the primary DOF
Energy landscapes define 'allowed' configurations

Drug discovery and protein folding heavily rely on manifold learning to explore conformational space.

Time Series and Dynamical Systems:

For systems governed by differential equations, attractors (stable long-term behaviors) are often low-dimensional manifolds embedded in phase space:

Limit cycles: 1D manifolds (periodic orbits)
Strange attractors: Fractional dimension (Lorenz attractor ≈ 2.06)
Takens embedding: Reconstructs dynamics from time-series observations

Identifying Manifold Structure in Your Data

To check if your data has manifold structure:

Estimate intrinsic dimension (MLE, Two-NN)—is it << ambient?
Check eigenvalue spectra in local neighborhoods—sharp drops suggest low local dimension
Apply manifold learning (Isomap, UMAP)—do visualizations reveal smooth structure?
Consider generative factors—what independent parameters could generate your data?

Not all data has manifold structure. Purely random or heavily categorical data may not benefit from manifold learning approaches.

Summary: A Manifold Bestiary

We've surveyed the landscape of manifold examples—synthetic test cases and real-world instances. This 'bestiary' equips you to:

Choose appropriate test cases for algorithm development and debugging
Understand what each example tests (topology, curvature, boundaries)
Recognize manifold structure in real data
Anticipate algorithm failure modes based on manifold properties

Key Takeaways

•Swiss roll is the canonical test — Algorithms must handle geodesic ≠ Euclidean to pass
•Spheres and tori test topology — Closed manifolds and non-trivial fundamental groups reveal limitations
•Boundaries and intersections cause problems — Local dimension estimates fail; algorithms may cut or fold
•Real data manifolds are complex — Multiple components, varying curvature, noise, approximate structure
•Intrinsic dimension << ambient dimension — The gap justifies manifold learning approaches

What's Next:

With a solid understanding of manifold examples, we turn to implications for machine learning in the next page. We'll see how the manifold hypothesis fundamentally reshapes our understanding of learning, generalization, and representation.

Geometric Intuition Developed

You now have a library of manifold examples to draw upon. When evaluating a manifold learning algorithm, think: 'How does this handle Swiss roll? Torus? Intersecting manifolds?' This geometric intuition guides algorithm selection and debugging throughout your ML practice.

3 / 5

Loading learning content...

Machine LearningManifold Learning

Manifold Hypothesis

LevelAdvanced

Duration90 mins

TopicManifold Learning

3 / 5

Manifold Examples

A Gallery of Geometric Structures

Why Study Examples?

Manifold examples serve multiple purposes:

Algorithm benchmarking: Synthetic manifolds with known properties test whether algorithms recover true structure
Intuition building: Visualizable examples (2D manifolds in 3D space) develop geometric reasoning
Edge case discovery: Different topologies (holes, boundaries, high curvature) reveal algorithm limitations
Real-world grounding: Connecting abstract theory to actual data domains

We'll explore synthetic manifolds exhaustively, then connect to the manifold structures hidden in real datasets.

Learning Objectives

By completing this page, you will:

The Swiss Roll: The Canonical Test Case

The Swiss roll is perhaps the most famous manifold learning test case. It's a 2-dimensional surface embedded in 3D space by 'rolling up' a flat rectangle like a Swiss roll cake.

Mathematical Construction:

The Swiss roll is parameterized by two intrinsic coordinates (t, s):

$$x = t \cos(t)$$ $$y = s$$ $$z = t \sin(t)$$

where typically t ∈ [3π/2, 9π/2] (about 1.5 to 4.5 turns) and s ∈ [0, H] for some height H.

The parameter t controls position along the spiral, while s controls height. Points with similar (t, s) values are close along the manifold, even if far apart in Euclidean 3D space.

Key Properties:

Swiss Roll Characteristics

•Intrinsic dimension: 2 (parameterized by t and s)
•Ambient dimension: 3 (embedded in ℝ³)
•Topology: Simple (homeomorphic to a rectangle—no holes)
•Geodesic vs Euclidean: Points on different 'windings' may be close in ℝ³ but far along the manifold. This is the key test!
•Curvature: Varies with t (inner windings have higher curvature than outer)
•Boundaries: Has boundaries at the 'ends' of the roll and at top/bottom

swiss_roll.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_swiss_roll(n_samples=2000, noise=0.0, random_state=42):
    """
    Generate Swiss roll dataset.
    
    Parameters:
    -----------
    n_samples : int
        Number of points to generate
    noise : float
        Standard deviation of Gaussian noise added to the manifold
    random_state : int
        Random seed for reproducibility
    
    Returns:
    --------
    X : ndarray of shape (n_samples, 3)
        The 3D coordinates
    t : ndarray of shape (n_samples,)
        The 'unrolling' parameter (intrinsic coordinate along spiral)
    """
    np.random.seed(random_state)
    
    # Parameter t controls position along the spiral
    t = 1.5 * np.pi * (1 + 2 * np.random.rand(n_samples))
    
    # Height is uniform along the roll
    height = 21 * np.random.rand(n_samples)
    
    # Embedding into 3D
    x = t * np.cos(t)
    y = height
    z = t * np.sin(t)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, t
 
def visualize_swiss_roll(X, color, title='Swiss Roll'):
    """Create a 3D visualization of the Swiss roll."""
    fig = plt.figure(figsize=(12, 5))
    
    # 3D view
    ax1 = fig.add_subplot(121, projection='3d')
    scatter = ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, 
                          cmap='viridis', s=10, alpha=0.8)
    ax1.set_xlabel('X')
    ax1.set_ylabel('Y (height)')
    ax1.set_zlabel('Z')
    ax1.set_title(f'{title} - 3D View')
    plt.colorbar(scatter, ax=ax1, label='t (geodesic position)')
    
    # Top-down view showing the spiral
    ax2 = fig.add_subplot(122)
    scatter2 = ax2.scatter(X[:, 0], X[:, 2], c=color, 
                           cmap='viridis', s=10, alpha=0.8)
    ax2.set_xlabel('X')
    ax2.set_ylabel('Z')
    ax2.set_title(f'{title} - Top View (X-Z plane)')
    ax2.set_aspect('equal')
    plt.colorbar(scatter2, ax=ax2, label='t')
    
    plt.tight_layout()
    plt.show()
 
def demonstrate_geodesic_vs_euclidean(X, t):
    """Show how Euclidean distance differs from geodesic distance."""
    # Find two points close in Euclidean distance but far in geodesic
    from scipy.spatial.distance import cdist
    
    # Choose a point near the beginning of the roll
    idx1 = np.argmin(np.abs(t - 2*np.pi))  # t ≈ 2π
    
    # Find nearby point in Euclidean space
    distances = cdist(X[idx1:idx1+1], X)[0]
    distances[idx1] = np.inf  # Exclude self
    
    # Get closest Euclidean neighbor
    idx2_euclidean = np.argmin(distances)
    
    # The geodesic distance is proportional to |t1 - t2|
    geodesic_dist = np.abs(t[idx1] - t[idx2_euclidean])
    euclidean_dist = distances[idx2_euclidean]
    
    print(f"Point 1: t = {t[idx1]:.2f}, coords = {X[idx1]}")
    print(f"Point 2: t = {t[idx2_euclidean]:.2f}, coords = {X[idx2_euclidean]}")
    print(f"Euclidean distance: {euclidean_dist:.3f}")
    print(f"Geodesic t-difference: {geodesic_dist:.3f}")
    print(f"
→ Points may be close in 3D but FAR along the manifold!")
 
# Generate and visualize
X, t = generate_swiss_roll(n_samples=2000, noise=0.1)
visualize_swiss_roll(X, t)
demonstrate_geodesic_vs_euclidean(X, t)

Why Swiss Roll Is the Gold Standard Test

The S-Curve: A Gentler Introduction

The S-curve (or S-shaped surface) is a simpler alternative to the Swiss roll. It curves but doesn't self-intersect or wind multiple times.

Mathematical Construction:

The S-curve is generated as:

$$x = \sin(t)$$ $$y = s$$ $$z = t$$

where t ∈ [-π, π] traces the S-shape and s ∈ [0, H] is the height.

Key Properties:

S-Curve Characteristics

•Intrinsic dimension: 2
•Ambient dimension: 3
•Topology: Simple (homeomorphic to a rectangle)
•Geodesic vs Euclidean: Less dramatic difference than Swiss roll, but still nonlinear
•Curvature: Moderate, relatively uniform
•Use case: Good for initial testing; success here doesn't guarantee success on harder manifolds

s_curve.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_s_curve(n_samples=2000, noise=0.0, random_state=42):
    """
    Generate S-curve dataset.
    
    Returns:
    --------
    X : ndarray of shape (n_samples, 3)
        The 3D coordinates
    t : ndarray of shape (n_samples,)
        The parameter along the S-curve
    """
    np.random.seed(random_state)
    
    # Parameter t controls position along the S
    t = 3 * np.pi * (np.random.rand(n_samples) - 0.5)
    
    # Height is uniform
    height = 2 * np.random.rand(n_samples)
    
    # Embedding into 3D
    x = np.sin(t)
    y = height
    z = np.sign(t) * (np.cos(t) - 1)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, t
 
# Generate and visualize
X_scurve, t_scurve = generate_s_curve(n_samples=1500, noise=0.05)
 
fig = plt.figure(figsize=(10, 4))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X_scurve[:, 0], X_scurve[:, 1], X_scurve[:, 2], 
           c=t_scurve, cmap='coolwarm', s=10, alpha=0.8)
ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')
ax.set_title('S-Curve: 2D manifold in 3D')
plt.tight_layout()
plt.show()

When to Use S-Curve:

The S-curve is useful for:

Quick sanity checks of manifold learning implementations
Verifying that algorithms handle mild nonlinearity
Educational demonstrations (easier to visualize than Swiss roll)

However, passing the S-curve test isn't sufficient. Algorithms should also be tested on Swiss roll, torus, and other challenging cases.

Spheres: Closed Manifolds with Constant Curvature

Spheres represent closed manifolds (compact, without boundary) with constant positive curvature. They arise naturally in data normalization and directional statistics.

The n-Sphere Sⁿ:

The n-dimensional sphere Sⁿ is the set of points in ℝⁿ⁺¹ at unit distance from the origin:

$$S^n = {x \in \mathbb{R}^{n+1} : |x|_2 = 1}$$

Important cases:

S¹: Circle (1D manifold in ℝ²)
S²: Ordinary sphere (2D manifold in ℝ³)
Sⁿ: Hypersphere (n-dimensional manifold in ℝⁿ⁺¹)

Key Properties of Spheres:

Sphere Characteristics

•Closed (no boundary): Every point has a full neighborhood; no 'edges'
•Constant positive curvature: Gaussian curvature = 1/r² everywhere for radius r
•Geodesics are great circles: Shortest paths on S² are arcs of circles passing through the center
•Finite diameter: Maximum geodesic distance is π·r (antipodal points)
•Non-Euclidean behavior: Triangle angle sum > 180° on positively curved surfaces

Sampling Uniformly on Spheres:

Uniform sampling on Sⁿ is trickier than it appears. The naive approach of normalizing uniform random vectors works correctly:

Sample z ∈ ℝⁿ⁺¹ with each zᵢ ~ N(0, 1)
Normalize: x = z / ‖z‖

This works because the multivariate Gaussian is spherically symmetric.

sphere_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_sphere(n_samples=1000, dim=3, noise=0.0):
    """
    Generate points uniformly on (dim-1)-dimensional sphere in R^dim.
    
    Parameters:
    -----------
    n_samples : int
        Number of points
    dim : int
        Ambient dimension (sphere is (dim-1)-dimensional)
    noise : float
        Standard deviation of radial noise
    
    Returns:
    --------
    X : ndarray of shape (n_samples, dim)
        Points on sphere (plus noise)
    """
    # Sample from multivariate normal
    X = np.random.randn(n_samples, dim)
    
    # Normalize to unit sphere
    norms = np.linalg.norm(X, axis=1, keepdims=True)
    X = X / norms
    
    if noise > 0:
        # Add radial noise (points move in/out from sphere)
        radial_noise = 1 + noise * np.random.randn(n_samples, 1)
        X = X * radial_noise
    
    return X
 
def visualize_geodesics_on_sphere():
    """
    Visualize how geodesics (great circles) work on S².
    """
    fig = plt.figure(figsize=(12, 5))
    
    # Wireframe sphere
    u = np.linspace(0, 2 * np.pi, 50)
    v = np.linspace(0, np.pi, 25)
    x = np.outer(np.cos(u), np.sin(v))
    y = np.outer(np.sin(u), np.sin(v))
    z = np.outer(np.ones(50), np.cos(v))
    
    ax = fig.add_subplot(121, projection='3d')
    ax.plot_wireframe(x, y, z, alpha=0.2, color='gray')
    
    # Draw a great circle (geodesic) - equator
    theta = np.linspace(0, 2*np.pi, 100)
    ax.plot(np.cos(theta), np.sin(theta), np.zeros_like(theta), 
            'r-', linewidth=2, label='Geodesic (great circle)')
    
    # Mark two antipodal points
    ax.scatter([1, -1], [0, 0], [0, 0], c='blue', s=100, 
               label='Antipodal (max geodesic distance)')
    
    ax.set_title('S² with Geodesic (Great Circle)')
    ax.legend()
    
    # Geodesic distance demonstration
    ax2 = fig.add_subplot(122)
    angles = np.linspace(0, np.pi, 100)
    geodesic_dist = angles  # Geodesic distance = angle in radians
    euclidean_dist = 2 * np.sin(angles / 2)  # Chord length
    
    ax2.plot(np.degrees(angles), geodesic_dist, 'b-', 
             linewidth=2, label='Geodesic distance')
    ax2.plot(np.degrees(angles), euclidean_dist, 'r--', 
             linewidth=2, label='Euclidean (chord) distance')
    ax2.set_xlabel('Angle between points (degrees)')
    ax2.set_ylabel('Distance (r=1 sphere)')
    ax2.set_title('Geodesic vs Euclidean Distance on Sphere')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()
 
# Generate sphere samples
X_sphere = generate_sphere(n_samples=1000, dim=3, noise=0.02)
 
# Visualize
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X_sphere[:, 0], X_sphere[:, 1], X_sphere[:, 2], 
           c=X_sphere[:, 2], cmap='coolwarm', s=10, alpha=0.8)
ax.set_title('S²: Points on 2-sphere in ℝ³')
plt.show()
 
visualize_geodesics_on_sphere()

Spheres in Machine Learning

Spheres appear frequently in ML:

The Torus: Nontrivial Topology

Mathematical Construction:

A torus is parameterized by two angles (θ, φ):

$$x = (R + r\cos\theta) \cos\phi$$ $$y = (R + r\cos\theta) \sin\phi$$ $$z = r\sin\theta$$

where:

R = major radius (center of torus to center of tube)
r = minor radius (radius of the tube)
θ ∈ [0, 2π) = angle around the tube ('poloidal')
φ ∈ [0, 2π) = angle around the torus ('toroidal')

Both angles are periodic: moving 2π radians in either direction returns to the same point.

Torus Characteristics

•Intrinsic dimension: 2
•Ambient dimension: 3 (for standard embedding)
•Topology: Genus 1 (one 'hole'); fundamental group ℤ × ℤ
•Cannot be 'unrolled': Unlike Swiss roll, no homeomorphism to flat rectangle
•Periodic coordinates: Both θ and φ are circular; manifold is compact without boundary
•Mixed curvature: Positive, zero, and negative curvature regions coexist

torus_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_torus(n_samples=2000, R=3, r=1, noise=0.0):
    """
    Generate points on a torus.
    
    Parameters:
    -----------
    n_samples : int
        Number of points
    R : float
        Major radius (torus center to tube center)
    r : float
        Minor radius (tube radius)
    noise : float
        Standard deviation of Gaussian noise
    
    Returns:
    --------
    X : ndarray of shape (n_samples, 3)
        3D coordinates
    angles : ndarray of shape (n_samples, 2)
        The (theta, phi) parameters
    """
    np.random.seed(42)
    
    theta = 2 * np.pi * np.random.rand(n_samples)
    phi = 2 * np.pi * np.random.rand(n_samples)
    
    x = (R + r * np.cos(theta)) * np.cos(phi)
    y = (R + r * np.cos(theta)) * np.sin(phi)
    z = r * np.sin(theta)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, np.column_stack([theta, phi])
 
def visualize_torus_topology():
    """
    Visualize the periodic nature of torus coordinates.
    """
    fig = plt.figure(figsize=(15, 5))
    
    # 3D view
    X, angles = generate_torus(n_samples=2000, noise=0.02)
    
    ax1 = fig.add_subplot(131, projection='3d')
    ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=angles[:, 0], 
                cmap='hsv', s=5, alpha=0.6)
    ax1.set_title('Torus colored by θ (around tube)')
    
    ax2 = fig.add_subplot(132, projection='3d')
    ax2.scatter(X[:, 0], X[:, 1], X[:, 2], c=angles[:, 1], 
                cmap='hsv', s=5, alpha=0.6)
    ax2.set_title('Torus colored by φ (around torus)')
    
    # Flat parameter space (reveals periodicity)
    ax3 = fig.add_subplot(133)
    ax3.scatter(angles[:, 0], angles[:, 1], c='blue', s=1, alpha=0.3)
    ax3.set_xlabel('θ (around tube)')
    ax3.set_ylabel('φ (around torus)')
    ax3.set_title('Parameter space (flat torus)
Edges should be "glued"')
    ax3.set_xlim(0, 2*np.pi)
    ax3.set_ylim(0, 2*np.pi)
    
    plt.tight_layout()
    plt.show()
 
def demonstrate_torus_topology():
    """
    Show why the torus cannot be unrolled like Swiss roll.
    """
    print("=== Why Torus is Special ===")
    print("
1. Two independent 'loops' cannot be contracted:")
    print("   - Going around φ (main circle) returns to start")
    print("   - Going around θ (tube) returns to start")
    print("   - These loops cannot shrink to a point without leaving the surface")
    print("
2. Fundamental group: π₁(T²) = ℤ × ℤ")
    print("   - Classifies loops up to continuous deformation")
    print("   - Means there are two 'types' of non-contractible loops")
    print("
3. Cannot embed flat in plane:")
    print("   - Unrolling to rectangle would require cutting")
    print("   - Rectangle with edges 'glued' = flat torus (different metric)")
    print("
4. Implication for ML:")
    print("   - Algorithms assuming simple topology may fail")
    print("   - t-SNE/UMAP can struggle with toroidal structure")
    print("   - Need topology-aware methods for faithful embedding")
 
visualize_torus_topology()
demonstrate_torus_topology()

Torus as Algorithm Stress Test

Additional Synthetic Manifolds

Beyond the standard test cases, several other synthetic manifolds serve specialized testing purposes.

The Severed Sphere (Hemisphere):

A hemisphere—half of S²—is a manifold with boundary. The boundary creates challenges:

Boundary points have half-neighborhoods
Local dimension estimation fails at edges
Unfolding to a disk distorts the boundary severely

Klein Bottle (Non-orientable):

A 2D manifold that cannot be embedded in ℝ³ without self-intersection. It's non-orientable—there's no consistent 'inside' vs 'outside.' While rare in ML, it tests topological sophistication.

Intersecting Manifolds:

Two manifolds that cross or touch create singularities—points with higher local dimension. Real data often has such structure where multiple data-generating processes coexist.

additional_manifolds.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
 
def generate_linked_rings(n_samples_per_ring=500, R=3, r=0.5):
    """
    Generate two interlocking rings (links).
    Tests handling of multiple connected components with entanglement.
    """
    # First ring: in XY plane centered at origin
    theta1 = 2 * np.pi * np.random.rand(n_samples_per_ring)
    x1 = R * np.cos(theta1) + r * np.random.randn(n_samples_per_ring) * 0.2
    y1 = R * np.sin(theta1) + r * np.random.randn(n_samples_per_ring) * 0.2
    z1 = r * np.random.randn(n_samples_per_ring) * 0.2
    ring1 = np.column_stack([x1, y1, z1])
    
    # Second ring: in XZ plane, offset so they link
    theta2 = 2 * np.pi * np.random.rand(n_samples_per_ring)
    x2 = R * np.cos(theta2) + R + r * np.random.randn(n_samples_per_ring) * 0.2
    y2 = r * np.random.randn(n_samples_per_ring) * 0.2
    z2 = R * np.sin(theta2) + r * np.random.randn(n_samples_per_ring) * 0.2
    ring2 = np.column_stack([x2, y2, z2])
    
    labels = np.concatenate([np.zeros(n_samples_per_ring), 
                              np.ones(n_samples_per_ring)])
    
    return np.vstack([ring1, ring2]), labels
 
def generate_trefoil_knot(n_samples=1000, noise=0.0):
    """
    Generate points on a trefoil knot.
    A 1D manifold (curve) with complex 3D embedding.
    """
    t = np.linspace(0, 2*np.pi, n_samples)
    
    x = np.sin(t) + 2 * np.sin(2*t)
    y = np.cos(t) - 2 * np.cos(2*t)
    z = -np.sin(3*t)
    
    X = np.column_stack([x, y, z])
    
    if noise > 0:
        X += noise * np.random.randn(n_samples, 3)
    
    return X, t
 
def generate_two_crossing_planes(n_samples=2000, angle=np.pi/4):
    """
    Generate two planes that intersect along a line.
    Creates a singularity (higher local dimension at intersection).
    """
    n_half = n_samples // 2
    
    # Plane 1: XY plane
    x1 = 4 * np.random.rand(n_half) - 2
    y1 = 4 * np.random.rand(n_half) - 2
    z1 = np.zeros(n_half)
    plane1 = np.column_stack([x1, y1, z1])
    
    # Plane 2: rotated around X axis
    x2 = 4 * np.random.rand(n_half) - 2
    y2_original = 4 * np.random.rand(n_half) - 2
    z2_original = np.zeros(n_half)
    
    # Rotate
    y2 = y2_original * np.cos(angle) - z2_original * np.sin(angle)
    z2 = y2_original * np.sin(angle) + z2_original * np.cos(angle)
    plane2 = np.column_stack([x2, y2, z2])
    
    labels = np.concatenate([np.zeros(n_half), np.ones(n_half)])
    
    return np.vstack([plane1, plane2]), labels
 
# Visualize these manifolds
fig = plt.figure(figsize=(15, 5))
 
ax1 = fig.add_subplot(131, projection='3d')
rings, ring_labels = generate_linked_rings()
ax1.scatter(rings[:, 0], rings[:, 1], rings[:, 2], 
            c=ring_labels, cmap='bwr', s=5)
ax1.set_title('Interlocked Rings
(Multiple components)')
 
ax2 = fig.add_subplot(132, projection='3d')
trefoil, t_knot = generate_trefoil_knot()
ax2.scatter(trefoil[:, 0], trefoil[:, 1], trefoil[:, 2], 
            c=t_knot, cmap='viridis', s=10)
ax2.set_title('Trefoil Knot
(1D manifold, complex embedding)')
 
ax3 = fig.add_subplot(133, projection='3d')
planes, plane_labels = generate_two_crossing_planes()
ax3.scatter(planes[:, 0], planes[:, 1], planes[:, 2], 
            c=plane_labels, cmap='coolwarm', s=3, alpha=0.5)
ax3.set_title('Intersecting Planes
(Singularity along intersection)')
 
plt.tight_layout()
plt.show()

Synthetic Manifold Properties
Manifold	ID	Topology	Special Property	Tests
Swiss roll	2	Rectangle	Geodesic ≠ Euclidean	Unrolling ability
S-curve	2	Rectangle	Mild curvature	Basic nonlinearity
Sphere	n-1	Closed, no boundary	Constant curvature	Closed manifolds
Torus	2	Genus 1 (hole)	Cannot unfold	Topology preservation
Hemisphere	2	Disk (boundary)	Boundary present	Boundary handling
Linked rings	1 each	Two components	Entanglement	Multiple components

Real-World Data Manifolds

Synthetic manifolds are valuable for testing, but the ultimate goal is understanding real data. Here we explore manifold structure in actual datasets.

Face Images:

The space of face images exhibits clear manifold structure. Key variations:

Pose (3 degrees of freedom): rotation around yaw, pitch, roll
Lighting (2-4 DOF): direction and intensity of illumination
Expression (~5-10 DOF): muscle activations for different expressions
Identity (varies): different individuals form different submanifolds

Handwritten Digit Images:

MNIST Manifold Properties

•10 disconnected components: One sub-manifold per digit class
•Intrinsic dimension per class: ~5-8 (stroke thickness, slant, size, style)
•Inter-class transitions: Some digits connect (1-7, 4-9) via smooth morphs
•Outliers: Badly written digits may lie off the main manifolds
•Total estimated ID: ~10-14 for the full dataset

Natural Language (Word Embeddings):

Word embeddings (Word2Vec, GloVe, BERT) map words to vectors where semantic relationships form approximate manifold structure:

Synonyms cluster close together
Analogies form near-parallelograms (king - man + woman ≈ queen)
Topics form clusters or sub-manifolds

The 'geometry' of language is more complex than simple manifolds—it's hierarchical, context-dependent, and has mixed discrete/continuous aspects. But manifold perspectives offer useful intuitions.

Molecular Data:

Molecular conformations (3D arrangements of atoms in a molecule) form low-dimensional manifolds in high-dimensional descriptor space:

Bond lengths and angles are constrained by physics
Torsional rotations around bonds are often the primary DOF
Energy landscapes define 'allowed' configurations

Drug discovery and protein folding heavily rely on manifold learning to explore conformational space.

Time Series and Dynamical Systems:

For systems governed by differential equations, attractors (stable long-term behaviors) are often low-dimensional manifolds embedded in phase space:

Limit cycles: 1D manifolds (periodic orbits)
Strange attractors: Fractional dimension (Lorenz attractor ≈ 2.06)
Takens embedding: Reconstructs dynamics from time-series observations

Identifying Manifold Structure in Your Data

To check if your data has manifold structure:

Estimate intrinsic dimension (MLE, Two-NN)—is it << ambient?
Check eigenvalue spectra in local neighborhoods—sharp drops suggest low local dimension
Apply manifold learning (Isomap, UMAP)—do visualizations reveal smooth structure?
Consider generative factors—what independent parameters could generate your data?

Not all data has manifold structure. Purely random or heavily categorical data may not benefit from manifold learning approaches.

Summary: A Manifold Bestiary

We've surveyed the landscape of manifold examples—synthetic test cases and real-world instances. This 'bestiary' equips you to:

Choose appropriate test cases for algorithm development and debugging
Understand what each example tests (topology, curvature, boundaries)
Recognize manifold structure in real data
Anticipate algorithm failure modes based on manifold properties

Key Takeaways

•Swiss roll is the canonical test — Algorithms must handle geodesic ≠ Euclidean to pass
•Spheres and tori test topology — Closed manifolds and non-trivial fundamental groups reveal limitations
•Boundaries and intersections cause problems — Local dimension estimates fail; algorithms may cut or fold
•Real data manifolds are complex — Multiple components, varying curvature, noise, approximate structure
•Intrinsic dimension << ambient dimension — The gap justifies manifold learning approaches

What's Next:

Geometric Intuition Developed

3 / 5