Loading learning content...
Understanding manifold learning algorithms requires a library of example manifolds—some with known structure for validation, others representing real-world data. This page provides a comprehensive tour of manifolds that appear throughout machine learning research and practice.
Why Study Examples?
Manifold examples serve multiple purposes:
We'll explore synthetic manifolds exhaustively, then connect to the manifold structures hidden in real datasets.
By completing this page, you will:
• Generate and visualize canonical synthetic manifolds (Swiss roll, S-curve, torus, sphere) • Understand the distinctive properties of each example (topology, curvature, boundaries) • Recognize which algorithm limitations each example reveals • Connect synthetic examples to real-world data manifolds • Generate high-dimensional manifold embeddings for testing
The Swiss roll is perhaps the most famous manifold learning test case. It's a 2-dimensional surface embedded in 3D space by 'rolling up' a flat rectangle like a Swiss roll cake.
Mathematical Construction:
The Swiss roll is parameterized by two intrinsic coordinates (t, s):
$$x = t \cos(t)$$ $$y = s$$ $$z = t \sin(t)$$
where typically t ∈ [3π/2, 9π/2] (about 1.5 to 4.5 turns) and s ∈ [0, H] for some height H.
The parameter t controls position along the spiral, while s controls height. Points with similar (t, s) values are close along the manifold, even if far apart in Euclidean 3D space.
Key Properties:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
import numpy as npimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D def generate_swiss_roll(n_samples=2000, noise=0.0, random_state=42): """ Generate Swiss roll dataset. Parameters: ----------- n_samples : int Number of points to generate noise : float Standard deviation of Gaussian noise added to the manifold random_state : int Random seed for reproducibility Returns: -------- X : ndarray of shape (n_samples, 3) The 3D coordinates t : ndarray of shape (n_samples,) The 'unrolling' parameter (intrinsic coordinate along spiral) """ np.random.seed(random_state) # Parameter t controls position along the spiral t = 1.5 * np.pi * (1 + 2 * np.random.rand(n_samples)) # Height is uniform along the roll height = 21 * np.random.rand(n_samples) # Embedding into 3D x = t * np.cos(t) y = height z = t * np.sin(t) X = np.column_stack([x, y, z]) if noise > 0: X += noise * np.random.randn(n_samples, 3) return X, t def visualize_swiss_roll(X, color, title='Swiss Roll'): """Create a 3D visualization of the Swiss roll.""" fig = plt.figure(figsize=(12, 5)) # 3D view ax1 = fig.add_subplot(121, projection='3d') scatter = ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap='viridis', s=10, alpha=0.8) ax1.set_xlabel('X') ax1.set_ylabel('Y (height)') ax1.set_zlabel('Z') ax1.set_title(f'{title} - 3D View') plt.colorbar(scatter, ax=ax1, label='t (geodesic position)') # Top-down view showing the spiral ax2 = fig.add_subplot(122) scatter2 = ax2.scatter(X[:, 0], X[:, 2], c=color, cmap='viridis', s=10, alpha=0.8) ax2.set_xlabel('X') ax2.set_ylabel('Z') ax2.set_title(f'{title} - Top View (X-Z plane)') ax2.set_aspect('equal') plt.colorbar(scatter2, ax=ax2, label='t') plt.tight_layout() plt.show() def demonstrate_geodesic_vs_euclidean(X, t): """Show how Euclidean distance differs from geodesic distance.""" # Find two points close in Euclidean distance but far in geodesic from scipy.spatial.distance import cdist # Choose a point near the beginning of the roll idx1 = np.argmin(np.abs(t - 2*np.pi)) # t ≈ 2π # Find nearby point in Euclidean space distances = cdist(X[idx1:idx1+1], X)[0] distances[idx1] = np.inf # Exclude self # Get closest Euclidean neighbor idx2_euclidean = np.argmin(distances) # The geodesic distance is proportional to |t1 - t2| geodesic_dist = np.abs(t[idx1] - t[idx2_euclidean]) euclidean_dist = distances[idx2_euclidean] print(f"Point 1: t = {t[idx1]:.2f}, coords = {X[idx1]}") print(f"Point 2: t = {t[idx2_euclidean]:.2f}, coords = {X[idx2_euclidean]}") print(f"Euclidean distance: {euclidean_dist:.3f}") print(f"Geodesic t-difference: {geodesic_dist:.3f}") print(f"→ Points may be close in 3D but FAR along the manifold!") # Generate and visualizeX, t = generate_swiss_roll(n_samples=2000, noise=0.1)visualize_swiss_roll(X, t)demonstrate_geodesic_vs_euclidean(X, t)A good manifold learning algorithm must 'unroll' the Swiss roll, recovering the 2D rectangle parameterized by (t, s). Algorithms that only use Euclidean distances will fail—they'll connect points on different windings that are close in 3D but far on the manifold. This is the litmus test for manifold-aware methods.
The S-curve (or S-shaped surface) is a simpler alternative to the Swiss roll. It curves but doesn't self-intersect or wind multiple times.
Mathematical Construction:
The S-curve is generated as:
$$x = \sin(t)$$ $$y = s$$ $$z = t$$
where t ∈ [-π, π] traces the S-shape and s ∈ [0, H] is the height.
Key Properties:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import numpy as npimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D def generate_s_curve(n_samples=2000, noise=0.0, random_state=42): """ Generate S-curve dataset. Returns: -------- X : ndarray of shape (n_samples, 3) The 3D coordinates t : ndarray of shape (n_samples,) The parameter along the S-curve """ np.random.seed(random_state) # Parameter t controls position along the S t = 3 * np.pi * (np.random.rand(n_samples) - 0.5) # Height is uniform height = 2 * np.random.rand(n_samples) # Embedding into 3D x = np.sin(t) y = height z = np.sign(t) * (np.cos(t) - 1) X = np.column_stack([x, y, z]) if noise > 0: X += noise * np.random.randn(n_samples, 3) return X, t # Generate and visualizeX_scurve, t_scurve = generate_s_curve(n_samples=1500, noise=0.05) fig = plt.figure(figsize=(10, 4))ax = fig.add_subplot(111, projection='3d')ax.scatter(X_scurve[:, 0], X_scurve[:, 1], X_scurve[:, 2], c=t_scurve, cmap='coolwarm', s=10, alpha=0.8)ax.set_xlabel('X'); ax.set_ylabel('Y'); ax.set_zlabel('Z')ax.set_title('S-Curve: 2D manifold in 3D')plt.tight_layout()plt.show()When to Use S-Curve:
The S-curve is useful for:
However, passing the S-curve test isn't sufficient. Algorithms should also be tested on Swiss roll, torus, and other challenging cases.
Spheres represent closed manifolds (compact, without boundary) with constant positive curvature. They arise naturally in data normalization and directional statistics.
The n-Sphere Sⁿ:
The n-dimensional sphere Sⁿ is the set of points in ℝⁿ⁺¹ at unit distance from the origin:
$$S^n = {x \in \mathbb{R}^{n+1} : |x|_2 = 1}$$
Important cases:
Key Properties of Spheres:
Sampling Uniformly on Spheres:
Uniform sampling on Sⁿ is trickier than it appears. The naive approach of normalizing uniform random vectors works correctly:
This works because the multivariate Gaussian is spherically symmetric.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
import numpy as npimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D def generate_sphere(n_samples=1000, dim=3, noise=0.0): """ Generate points uniformly on (dim-1)-dimensional sphere in R^dim. Parameters: ----------- n_samples : int Number of points dim : int Ambient dimension (sphere is (dim-1)-dimensional) noise : float Standard deviation of radial noise Returns: -------- X : ndarray of shape (n_samples, dim) Points on sphere (plus noise) """ # Sample from multivariate normal X = np.random.randn(n_samples, dim) # Normalize to unit sphere norms = np.linalg.norm(X, axis=1, keepdims=True) X = X / norms if noise > 0: # Add radial noise (points move in/out from sphere) radial_noise = 1 + noise * np.random.randn(n_samples, 1) X = X * radial_noise return X def visualize_geodesics_on_sphere(): """ Visualize how geodesics (great circles) work on S². """ fig = plt.figure(figsize=(12, 5)) # Wireframe sphere u = np.linspace(0, 2 * np.pi, 50) v = np.linspace(0, np.pi, 25) x = np.outer(np.cos(u), np.sin(v)) y = np.outer(np.sin(u), np.sin(v)) z = np.outer(np.ones(50), np.cos(v)) ax = fig.add_subplot(121, projection='3d') ax.plot_wireframe(x, y, z, alpha=0.2, color='gray') # Draw a great circle (geodesic) - equator theta = np.linspace(0, 2*np.pi, 100) ax.plot(np.cos(theta), np.sin(theta), np.zeros_like(theta), 'r-', linewidth=2, label='Geodesic (great circle)') # Mark two antipodal points ax.scatter([1, -1], [0, 0], [0, 0], c='blue', s=100, label='Antipodal (max geodesic distance)') ax.set_title('S² with Geodesic (Great Circle)') ax.legend() # Geodesic distance demonstration ax2 = fig.add_subplot(122) angles = np.linspace(0, np.pi, 100) geodesic_dist = angles # Geodesic distance = angle in radians euclidean_dist = 2 * np.sin(angles / 2) # Chord length ax2.plot(np.degrees(angles), geodesic_dist, 'b-', linewidth=2, label='Geodesic distance') ax2.plot(np.degrees(angles), euclidean_dist, 'r--', linewidth=2, label='Euclidean (chord) distance') ax2.set_xlabel('Angle between points (degrees)') ax2.set_ylabel('Distance (r=1 sphere)') ax2.set_title('Geodesic vs Euclidean Distance on Sphere') ax2.legend() ax2.grid(True) plt.tight_layout() plt.show() # Generate sphere samplesX_sphere = generate_sphere(n_samples=1000, dim=3, noise=0.02) # Visualizefig = plt.figure(figsize=(8, 6))ax = fig.add_subplot(111, projection='3d')ax.scatter(X_sphere[:, 0], X_sphere[:, 1], X_sphere[:, 2], c=X_sphere[:, 2], cmap='coolwarm', s=10, alpha=0.8)ax.set_title('S²: Points on 2-sphere in ℝ³')plt.show() visualize_geodesics_on_sphere()Spheres appear frequently in ML:
• Normalized embeddings (e.g., word2vec, cosine similarity): Live on hyperspheres • Angular margin losses: Face recognition models project to spheres • Directional statistics: von Mises-Fisher distributions on spheres • Hyperbolic alternatives: Some data is better represented on negatively curved spaces (hyperbolic geometry) rather than positively curved spheres
The torus (plural: tori) is a 2-dimensional manifold with nontrivial topology—it has a 'hole' that distinguishes it from a sphere or plane. The torus cannot be continuously deformed into a sphere; its topology is fundamentally different.
Mathematical Construction:
A torus is parameterized by two angles (θ, φ):
$$x = (R + r\cos\theta) \cos\phi$$ $$y = (R + r\cos\theta) \sin\phi$$ $$z = r\sin\theta$$
where:
Both angles are periodic: moving 2π radians in either direction returns to the same point.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
import numpy as npimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D def generate_torus(n_samples=2000, R=3, r=1, noise=0.0): """ Generate points on a torus. Parameters: ----------- n_samples : int Number of points R : float Major radius (torus center to tube center) r : float Minor radius (tube radius) noise : float Standard deviation of Gaussian noise Returns: -------- X : ndarray of shape (n_samples, 3) 3D coordinates angles : ndarray of shape (n_samples, 2) The (theta, phi) parameters """ np.random.seed(42) theta = 2 * np.pi * np.random.rand(n_samples) phi = 2 * np.pi * np.random.rand(n_samples) x = (R + r * np.cos(theta)) * np.cos(phi) y = (R + r * np.cos(theta)) * np.sin(phi) z = r * np.sin(theta) X = np.column_stack([x, y, z]) if noise > 0: X += noise * np.random.randn(n_samples, 3) return X, np.column_stack([theta, phi]) def visualize_torus_topology(): """ Visualize the periodic nature of torus coordinates. """ fig = plt.figure(figsize=(15, 5)) # 3D view X, angles = generate_torus(n_samples=2000, noise=0.02) ax1 = fig.add_subplot(131, projection='3d') ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=angles[:, 0], cmap='hsv', s=5, alpha=0.6) ax1.set_title('Torus colored by θ (around tube)') ax2 = fig.add_subplot(132, projection='3d') ax2.scatter(X[:, 0], X[:, 1], X[:, 2], c=angles[:, 1], cmap='hsv', s=5, alpha=0.6) ax2.set_title('Torus colored by φ (around torus)') # Flat parameter space (reveals periodicity) ax3 = fig.add_subplot(133) ax3.scatter(angles[:, 0], angles[:, 1], c='blue', s=1, alpha=0.3) ax3.set_xlabel('θ (around tube)') ax3.set_ylabel('φ (around torus)') ax3.set_title('Parameter space (flat torus)Edges should be "glued"') ax3.set_xlim(0, 2*np.pi) ax3.set_ylim(0, 2*np.pi) plt.tight_layout() plt.show() def demonstrate_torus_topology(): """ Show why the torus cannot be unrolled like Swiss roll. """ print("=== Why Torus is Special ===") print("1. Two independent 'loops' cannot be contracted:") print(" - Going around φ (main circle) returns to start") print(" - Going around θ (tube) returns to start") print(" - These loops cannot shrink to a point without leaving the surface") print("2. Fundamental group: π₁(T²) = ℤ × ℤ") print(" - Classifies loops up to continuous deformation") print(" - Means there are two 'types' of non-contractible loops") print("3. Cannot embed flat in plane:") print(" - Unrolling to rectangle would require cutting") print(" - Rectangle with edges 'glued' = flat torus (different metric)") print("4. Implication for ML:") print(" - Algorithms assuming simple topology may fail") print(" - t-SNE/UMAP can struggle with toroidal structure") print(" - Need topology-aware methods for faithful embedding") visualize_torus_topology()demonstrate_torus_topology()The torus tests whether an algorithm respects topology. Methods that try to 'unfold' the torus to a flat rectangle will necessarily fail—they'll either tear the manifold (creating discontinuities) or fold it (creating false proximities). The torus reveals algorithm limitations that simpler manifolds hide.
Beyond the standard test cases, several other synthetic manifolds serve specialized testing purposes.
The Severed Sphere (Hemisphere):
A hemisphere—half of S²—is a manifold with boundary. The boundary creates challenges:
Klein Bottle (Non-orientable):
A 2D manifold that cannot be embedded in ℝ³ without self-intersection. It's non-orientable—there's no consistent 'inside' vs 'outside.' While rare in ML, it tests topological sophistication.
Intersecting Manifolds:
Two manifolds that cross or touch create singularities—points with higher local dimension. Real data often has such structure where multiple data-generating processes coexist.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
import numpy as npimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D def generate_linked_rings(n_samples_per_ring=500, R=3, r=0.5): """ Generate two interlocking rings (links). Tests handling of multiple connected components with entanglement. """ # First ring: in XY plane centered at origin theta1 = 2 * np.pi * np.random.rand(n_samples_per_ring) x1 = R * np.cos(theta1) + r * np.random.randn(n_samples_per_ring) * 0.2 y1 = R * np.sin(theta1) + r * np.random.randn(n_samples_per_ring) * 0.2 z1 = r * np.random.randn(n_samples_per_ring) * 0.2 ring1 = np.column_stack([x1, y1, z1]) # Second ring: in XZ plane, offset so they link theta2 = 2 * np.pi * np.random.rand(n_samples_per_ring) x2 = R * np.cos(theta2) + R + r * np.random.randn(n_samples_per_ring) * 0.2 y2 = r * np.random.randn(n_samples_per_ring) * 0.2 z2 = R * np.sin(theta2) + r * np.random.randn(n_samples_per_ring) * 0.2 ring2 = np.column_stack([x2, y2, z2]) labels = np.concatenate([np.zeros(n_samples_per_ring), np.ones(n_samples_per_ring)]) return np.vstack([ring1, ring2]), labels def generate_trefoil_knot(n_samples=1000, noise=0.0): """ Generate points on a trefoil knot. A 1D manifold (curve) with complex 3D embedding. """ t = np.linspace(0, 2*np.pi, n_samples) x = np.sin(t) + 2 * np.sin(2*t) y = np.cos(t) - 2 * np.cos(2*t) z = -np.sin(3*t) X = np.column_stack([x, y, z]) if noise > 0: X += noise * np.random.randn(n_samples, 3) return X, t def generate_two_crossing_planes(n_samples=2000, angle=np.pi/4): """ Generate two planes that intersect along a line. Creates a singularity (higher local dimension at intersection). """ n_half = n_samples // 2 # Plane 1: XY plane x1 = 4 * np.random.rand(n_half) - 2 y1 = 4 * np.random.rand(n_half) - 2 z1 = np.zeros(n_half) plane1 = np.column_stack([x1, y1, z1]) # Plane 2: rotated around X axis x2 = 4 * np.random.rand(n_half) - 2 y2_original = 4 * np.random.rand(n_half) - 2 z2_original = np.zeros(n_half) # Rotate y2 = y2_original * np.cos(angle) - z2_original * np.sin(angle) z2 = y2_original * np.sin(angle) + z2_original * np.cos(angle) plane2 = np.column_stack([x2, y2, z2]) labels = np.concatenate([np.zeros(n_half), np.ones(n_half)]) return np.vstack([plane1, plane2]), labels # Visualize these manifoldsfig = plt.figure(figsize=(15, 5)) ax1 = fig.add_subplot(131, projection='3d')rings, ring_labels = generate_linked_rings()ax1.scatter(rings[:, 0], rings[:, 1], rings[:, 2], c=ring_labels, cmap='bwr', s=5)ax1.set_title('Interlocked Rings(Multiple components)') ax2 = fig.add_subplot(132, projection='3d')trefoil, t_knot = generate_trefoil_knot()ax2.scatter(trefoil[:, 0], trefoil[:, 1], trefoil[:, 2], c=t_knot, cmap='viridis', s=10)ax2.set_title('Trefoil Knot(1D manifold, complex embedding)') ax3 = fig.add_subplot(133, projection='3d')planes, plane_labels = generate_two_crossing_planes()ax3.scatter(planes[:, 0], planes[:, 1], planes[:, 2], c=plane_labels, cmap='coolwarm', s=3, alpha=0.5)ax3.set_title('Intersecting Planes(Singularity along intersection)') plt.tight_layout()plt.show()| Manifold | ID | Topology | Special Property | Tests |
|---|---|---|---|---|
| Swiss roll | 2 | Rectangle | Geodesic ≠ Euclidean | Unrolling ability |
| S-curve | 2 | Rectangle | Mild curvature | Basic nonlinearity |
| Sphere | n-1 | Closed, no boundary | Constant curvature | Closed manifolds |
| Torus | 2 | Genus 1 (hole) | Cannot unfold | Topology preservation |
| Hemisphere | 2 | Disk (boundary) | Boundary present | Boundary handling |
| Linked rings | 1 each | Two components | Entanglement | Multiple components |
Synthetic manifolds are valuable for testing, but the ultimate goal is understanding real data. Here we explore manifold structure in actual datasets.
Face Images:
The space of face images exhibits clear manifold structure. Key variations:
The famous Frey faces dataset (1965 frames from video) has estimated intrinsic dimension ~3-5, representing pose variation. Full face datasets (CelebA, etc.) have higher ID but still << ambient dimension.
Handwritten Digit Images:
Natural Language (Word Embeddings):
Word embeddings (Word2Vec, GloVe, BERT) map words to vectors where semantic relationships form approximate manifold structure:
The 'geometry' of language is more complex than simple manifolds—it's hierarchical, context-dependent, and has mixed discrete/continuous aspects. But manifold perspectives offer useful intuitions.
Molecular Data:
Molecular conformations (3D arrangements of atoms in a molecule) form low-dimensional manifolds in high-dimensional descriptor space:
Drug discovery and protein folding heavily rely on manifold learning to explore conformational space.
Time Series and Dynamical Systems:
For systems governed by differential equations, attractors (stable long-term behaviors) are often low-dimensional manifolds embedded in phase space:
To check if your data has manifold structure:
Not all data has manifold structure. Purely random or heavily categorical data may not benefit from manifold learning approaches.
We've surveyed the landscape of manifold examples—synthetic test cases and real-world instances. This 'bestiary' equips you to:
What's Next:
With a solid understanding of manifold examples, we turn to implications for machine learning in the next page. We'll see how the manifold hypothesis fundamentally reshapes our understanding of learning, generalization, and representation.
You now have a library of manifold examples to draw upon. When evaluating a manifold learning algorithm, think: 'How does this handle Swiss roll? Torus? Intersecting manifolds?' This geometric intuition guides algorithm selection and debugging throughout your ML practice.