Loading learning content...
If you were to learn only one matrix decomposition technique for machine learning, Singular Value Decomposition (SVD) would be the unambiguous choice. It is, without exaggeration, the most versatile and widely-used matrix factorization in applied mathematics and data science.
SVD reveals the fundamental structure of any matrix—not just square matrices with special properties, but any matrix of any shape. It decomposes a matrix into three simpler matrices that expose the underlying geometry: the directions of maximum variance, the magnitudes of stretching along those directions, and how inputs and outputs relate through the transformation.
From Netflix's recommendation engine that powered its famous million-dollar prize competition to Google's latent semantic indexing for search, from image compression algorithms to noise reduction in signal processing, SVD is the mathematical workhorse operating behind countless real-world systems.
By the end of this page, you will understand: (1) The formal definition and existence theorem of SVD, (2) The geometric interpretation of what SVD reveals about linear transformations, (3) How to compute SVD step-by-step, (4) The relationship between SVD and eigendecomposition, (5) Low-rank approximations and the Eckart-Young-Mirsky theorem, and (6) Why SVD is computationally preferred in many applications.
Let's begin with the precise mathematical statement. Given any matrix A ∈ ℝ^(m×n), there exist orthogonal matrices U ∈ ℝ^(m×m) and V ∈ ℝ^(n×n), and a diagonal matrix Σ ∈ ℝ^(m×n) such that:
$$A = U \Sigma V^T$$
This factorization is called the Singular Value Decomposition of A.
Let's unpack each component:
The matrix U (Left Singular Vectors):
The matrix Σ (Singular Values):
The matrix V (Right Singular Vectors):
A remarkable property of SVD is that it exists for EVERY matrix—rectangular or square, full-rank or rank-deficient, symmetric or asymmetric. This universality is what makes SVD so powerful. The eigendecomposition, by contrast, only exists for square matrices and may require complex numbers. SVD always works with real numbers for real matrices.
Understanding the dimensions:
For a matrix A of size m × n with rank r:
This leads to different forms of SVD:
Full SVD: U is m × m, Σ is m × n, V is n × n
Economy (Thin) SVD: U is m × r, Σ is r × r, V is n × r
The economy SVD is more storage-efficient when r << min(m, n) and is what most software packages return by default.
1234567891011121314151617181920212223242526272829303132333435363738
import numpy as npfrom scipy.linalg import svd # Create a sample matrixA = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) print("Original Matrix A:")print(A)print(f"Shape: {A.shape}") # (4, 3) - 4 rows, 3 columns # Compute Full SVDU_full, s_full, Vt_full = svd(A, full_matrices=True)print(f"\nFull SVD:")print(f"U shape: {U_full.shape}") # (4, 4)print(f"s shape: {s_full.shape}") # (3,) - just the diagonal valuesprint(f"Vt shape: {Vt_full.shape}") # (3, 3) # Compute Economy (Thin) SVDU_thin, s_thin, Vt_thin = svd(A, full_matrices=False)print(f"\nEconomy SVD:")print(f"U shape: {U_thin.shape}") # (4, 3)print(f"s shape: {s_thin.shape}") # (3,)print(f"Vt shape: {Vt_thin.shape}") # (3, 3) # Verify reconstruction: A = U @ Σ @ V^TSigma_full = np.zeros((4, 3))np.fill_diagonal(Sigma_full, s_full)A_reconstructed = U_full @ Sigma_full @ Vt_fullprint(f"\nReconstruction error: {np.linalg.norm(A - A_reconstructed):.2e}") # Singular values reveal the rankprint(f"\nSingular values: {s_full}")print(f"Matrix rank: {np.sum(s_full > 1e-10)}")The power of SVD lies not just in its existence but in what it tells us about the geometry of linear transformations. To understand this, let's think about what happens when we multiply a vector x by the matrix A.
The transformation A = UΣV^T as three steps:
When we compute Ax, the SVD decomposes this into three sequential operations:
Step 1: V^T rotates (and possibly reflects) the input space The matrix V^T rotates the input vector x from the standard basis to a new coordinate system defined by the right singular vectors. This aligns the vector with the 'principal directions' of the transformation.
Step 2: Σ scales along each axis The diagonal matrix Σ stretches or compresses along each axis by the corresponding singular value σᵢ. Directions with larger singular values get amplified more; directions with σᵢ = 0 are collapsed entirely (these correspond to the null space).
Step 3: U rotates into the output space Finally, U rotates the scaled result into the output coordinate system, aligning it with the left singular vectors.
The sphere-to-ellipse transformation:
Perhaps the most illuminating visualization: consider what happens when A acts on the unit sphere. If we take all vectors x with ||x|| = 1 (a sphere in n dimensions), and apply A to each, we get an ellipsoid in m dimensions.
In machine learning, data matrices represent collections of samples (rows) with features (columns). The SVD of a data matrix reveals: (1) The directions of maximum variance in the data, (2) How much variance lies along each direction, and (3) Which features contribute most to each variance direction. This is exactly what we need for dimensionality reduction, noise filtering, and understanding data structure.
Connecting singular vectors to fundamental subspaces:
SVD provides orthonormal bases for all four fundamental subspaces of a matrix A with rank r:
| Subspace | Basis from SVD | Dimension |
|---|---|---|
| Column space (Range) of A | First r columns of U | r |
| Left null space of A | Last m-r columns of U | m-r |
| Row space of A | First r columns of V | r |
| Null space of A | Last n-r columns of V | n-r |
This elegant connection shows why SVD is considered the 'ultimate' matrix decomposition—it reveals the complete structure of any linear transformation.
Interpretation of singular values:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
import numpy as npimport matplotlib.pyplot as pltfrom matplotlib.patches import Ellipse def visualize_svd_transformation(A, ax1, ax2): """Visualize how SVD transforms the unit circle to an ellipse.""" U, s, Vt = np.linalg.svd(A) # Points on unit circle theta = np.linspace(0, 2*np.pi, 100) circle = np.vstack([np.cos(theta), np.sin(theta)]) # Transform through A ellipse = A @ circle # Plot unit circle with right singular vectors ax1.plot(circle[0], circle[1], 'b-', label='Unit circle') for i in range(len(s)): ax1.arrow(0, 0, Vt[i, 0]*0.8, Vt[i, 1]*0.8, head_width=0.1, color=f'C{i+1}', label=f'v{i+1} (input direction)') ax1.set_xlim(-1.5, 1.5) ax1.set_ylim(-1.5, 1.5) ax1.set_aspect('equal') ax1.axhline(y=0, color='k', linewidth=0.5) ax1.axvline(x=0, color='k', linewidth=0.5) ax1.legend(fontsize=8) ax1.set_title('Input Space: Unit Circle + Right Singular Vectors') # Plot ellipse with left singular vectors scaled by singular values ax2.plot(ellipse[0], ellipse[1], 'r-', label='Transformed ellipse') for i in range(len(s)): ax2.arrow(0, 0, U[0, i]*s[i]*0.8, U[1, i]*s[i]*0.8, head_width=0.2, color=f'C{i+1}', label=f'σ{i+1}·u{i+1} (σ={s[i]:.2f})') max_val = max(s) * 1.5 ax2.set_xlim(-max_val, max_val) ax2.set_ylim(-max_val, max_val) ax2.set_aspect('equal') ax2.axhline(y=0, color='k', linewidth=0.5) ax2.axvline(x=0, color='k', linewidth=0.5) ax2.legend(fontsize=8) ax2.set_title('Output Space: Ellipse + Scaled Left Singular Vectors') # Example: A transformation that stretches and rotatesA = np.array([[3, 1], [1, 2]]) fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))visualize_svd_transformation(A, ax1, ax2)plt.suptitle('SVD Geometric Interpretation: Circle → Ellipse', fontsize=14)plt.tight_layout()plt.show() # Verify the geometryU, s, Vt = np.linalg.svd(A)print("Singular values (ellipse semi-axis lengths):", s)print("\nLeft singular vectors (ellipse axis directions):")print(U)print("\nRight singular vectors (input directions mapped to axes):")print(Vt.T)Understanding how SVD is computed illuminates its deep connection to eigenvalue problems. While production implementations use sophisticated numerical methods (like the Golub-Kahan algorithm), the conceptual derivation reveals essential insights.
Deriving SVD from eigenvalues:
The key insight is that if we have A = UΣV^T, then:
$$A^T A = V \Sigma^T U^T U \Sigma V^T = V \Sigma^T \Sigma V^T = V \Lambda_V V^T$$
$$A A^T = U \Sigma V^T V \Sigma^T U^T = U \Sigma \Sigma^T U^T = U \Lambda_U U^T$$
where Λ_V = Σ^TΣ and Λ_U = ΣΣ^T are diagonal matrices containing σ₁², σ₂², ...
This reveals that:
Since A^TA and AA^T are symmetric positive semi-definite, their eigenvalues are real and non-negative. This guarantees that singular values are always real and non-negative.
The computational algorithm (conceptual):
While the derivation above is mathematically correct, computing SVD by explicitly forming A^TA is numerically unstable—it squares the condition number and can lose precision. Production algorithms like the Golub-Kahan bidiagonalization avoid this by working directly with A. Always use library functions (numpy.linalg.svd, scipy.linalg.svd) rather than implementing via eigendecomposition.
Computational complexity:
For an m × n matrix (assume m ≥ n):
This makes SVD relatively expensive for large matrices, motivating the use of approximate and randomized algorithms for big data applications.
Uniqueness of SVD:
SVD is essentially unique with the following caveats:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
import numpy as np def compute_svd_via_eigendecomposition(A): """ Compute SVD using eigendecomposition of A^T A. WARNING: This is for educational purposes only! In practice, use numpy.linalg.svd which uses numerically stable algorithms. """ m, n = A.shape # Form A^T A (n x n symmetric positive semi-definite matrix) ATA = A.T @ A # Compute eigenvalues and eigenvectors of A^T A # eigenvalues are squared singular values eigenvalues, V = np.linalg.eigh(ATA) # eigh for symmetric matrices # Sort in descending order (eigh returns ascending) idx = np.argsort(eigenvalues)[::-1] eigenvalues = eigenvalues[idx] V = V[:, idx] # Singular values are square roots of eigenvalues # Clip to avoid numerical issues with tiny negative values singular_values = np.sqrt(np.maximum(eigenvalues, 0)) # Find rank (number of non-trivially zero singular values) tol = max(m, n) * np.finfo(float).eps * singular_values[0] rank = np.sum(singular_values > tol) # Compute U from the relationship: u_i = (1/σ_i) A v_i U = np.zeros((m, rank)) for i in range(rank): U[:, i] = A @ V[:, i] / singular_values[i] return U, singular_values[:rank], V[:, :rank].T # Test with a sample matrixA = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=float) # Our implementationU_ours, s_ours, Vt_ours = compute_svd_via_eigendecomposition(A) # NumPy's implementationU_np, s_np, Vt_np = np.linalg.svd(A, full_matrices=False) print("Our SVD (via eigendecomposition):")print(f"Singular values: {s_ours}")print(f"Rank: {len(s_ours)}") print("\nNumPy's SVD:")print(f"Singular values: {s_np}") print(f"\nSingular value match: {np.allclose(s_ours, s_np[:len(s_ours)])}") # Verify reconstructionS_diag = np.diag(s_ours)A_reconstructed = U_ours @ S_diag @ Vt_oursprint(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed):.2e}")One of the most important applications of SVD is finding the best low-rank approximation to a matrix. This is fundamental to dimensionality reduction, data compression, noise reduction, and matrix completion.
The problem statement:
Given a matrix A of rank r, find the rank-k matrix Aₖ (where k < r) that is closest to A. 'Closest' means minimizing either:
The Eckart-Young-Mirsky Theorem (1936):
The optimal rank-k approximation Aₖ is obtained by truncating the SVD:
$$A_k = \sum_{i=1}^{k} \sigma_i u_i v_i^T = U_k \Sigma_k V_k^T$$
where Uₖ contains the first k columns of U, Σₖ is the k × k diagonal matrix of the top k singular values, and Vₖ contains the first k columns of V.
The approximation errors are:
This is a profound result: among all rank-k matrices, the truncated SVD is provably optimal in both common matrix norms!
Think of each singular value as measuring the 'importance' of a direction. The first singular vector captures the most significant pattern; the second captures the most significant remaining pattern orthogonal to the first; and so on. By keeping only the top k directions, we retain the most important structure while discarding less significant variations (often noise).
Practical implications:
1. Data Compression: Instead of storing m × n numbers, store:
Total: k(m + n + 1) numbers. If k << min(m,n), this is massive compression.
2. Noise Reduction: If data = signal + noise, and the signal is effectively low-rank while noise is full-rank, truncating small singular values removes noise while preserving signal.
3. Choosing k:
Outer product form of SVD:
The SVD can be written as a sum of rank-1 matrices: $$A = \sum_{i=1}^{r} \sigma_i u_i v_i^T$$
Each term σᵢuᵢvᵢᵀ is a rank-1 'building block'. The truncated SVD simply takes the first k of these blocks.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
import numpy as npimport matplotlib.pyplot as plt def analyze_low_rank_approximation(A, show_plot=True): """ Analyze the low-rank approximation quality for matrix A. """ U, s, Vt = np.linalg.svd(A, full_matrices=False) rank = len(s) # Compute various approximation metrics total_variance = np.sum(s**2) cumulative_variance = np.cumsum(s**2) / total_variance errors_frobenius = [] errors_spectral = [] compression_ratios = [] m, n = A.shape original_size = m * n for k in range(1, rank + 1): # Reconstruction with rank k Ak = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :] # Frobenius error error_f = np.linalg.norm(A - Ak, 'fro') errors_frobenius.append(error_f) # Spectral error (equals σ_{k+1}, or 0 if k=rank) error_s = s[k] if k < rank else 0 errors_spectral.append(error_s) # Compression ratio compressed_size = k * (m + n + 1) compression_ratios.append(original_size / compressed_size) if show_plot: fig, axes = plt.subplots(2, 2, figsize=(12, 10)) # Singular value spectrum axes[0, 0].semilogy(range(1, rank+1), s, 'bo-') axes[0, 0].set_xlabel('Index i') axes[0, 0].set_ylabel('Singular value σᵢ (log scale)') axes[0, 0].set_title('Singular Value Spectrum') axes[0, 0].grid(True) # Cumulative explained variance axes[0, 1].plot(range(1, rank+1), cumulative_variance * 100, 'go-') axes[0, 1].axhline(y=95, color='r', linestyle='--', label='95% threshold') axes[0, 1].set_xlabel('Number of components k') axes[0, 1].set_ylabel('Cumulative explained variance (%)') axes[0, 1].set_title('Explained Variance Ratio') axes[0, 1].legend() axes[0, 1].grid(True) # Approximation error axes[1, 0].plot(range(1, rank+1), errors_frobenius, 'r-', label='Frobenius') axes[1, 0].plot(range(1, rank+1), errors_spectral, 'b--', label='Spectral') axes[1, 0].set_xlabel('Rank k') axes[1, 0].set_ylabel('Approximation error ||A - Aₖ||') axes[1, 0].set_title('Approximation Error vs Rank') axes[1, 0].legend() axes[1, 0].grid(True) # Compression ratio axes[1, 1].plot(range(1, rank+1), compression_ratios, 'm-o') axes[1, 1].set_xlabel('Rank k') axes[1, 1].set_ylabel('Compression ratio') axes[1, 1].set_title('Compression Ratio vs Rank') axes[1, 1].grid(True) plt.tight_layout() plt.show() return s, cumulative_variance, errors_frobenius # Example: Create a matrix with structure + noisenp.random.seed(42)true_rank = 5m, n = 100, 50 # Low-rank component (the "signal")U_true = np.random.randn(m, true_rank)V_true = np.random.randn(n, true_rank)signal = U_true @ V_true.T # Add Gaussian noisenoise_level = 0.5A = signal + noise_level * np.random.randn(m, n) print(f"Matrix A: {m} x {n}")print(f"True underlying rank: {true_rank}")print(f"Noise level: {noise_level}") s, var_explained, errors = analyze_low_rank_approximation(A) # Find k that explains 95% of variancek_95 = np.argmax(var_explained >= 0.95) + 1print(f"\nComponents needed for 95% variance: {k_95}")print(f"Top {k_95} singular values: {s[:k_95].round(2)}")SVD and eigendecomposition are related but distinct factorizations. Understanding when to use each is crucial for ML practitioners.
Eigendecomposition recap:
For a square matrix A ∈ ℝⁿˣⁿ, the eigendecomposition is: $$A = Q \Lambda Q^{-1}$$
where Λ is diagonal containing eigenvalues, and Q contains eigenvectors.
For symmetric matrices A = Aᵀ, this simplifies to: $$A = Q \Lambda Q^T$$
where Q is orthogonal (Q⁻¹ = Qᵀ).
Key differences:
| Property | SVD | Eigendecomposition |
|---|---|---|
| Matrix shape | Any m × n | Square n × n only |
| Existence | Always exists | May not exist (defective matrices) |
| Values | Always real, non-negative (σᵢ ≥ 0) | Can be complex, negative, or zero |
| Vectors | Two sets: U (left) and V (right) | One set: eigenvectors Q |
| Orthogonality | U, V always orthogonal | Q orthogonal only for symmetric A |
| Ordering convention | σ₁ ≥ σ₂ ≥ ... ≥ 0 | Often unordered |
| Computational cost | O(mn²) for m ≥ n | O(n³) |
| Numerical stability | Stable standard algorithms | Can be unstable for non-symmetric |
Relationship for symmetric matrices:
For a symmetric positive semi-definite matrix A:
For a symmetric matrix with negative eigenvalues:
When to use SVD:
When to use eigendecomposition:
Principal Component Analysis (PCA) can be computed two ways: (1) Eigendecomposition of the covariance matrix C = XᵀX/(n-1), or (2) SVD of the centered data matrix X directly. The SVD approach is preferred because it avoids explicitly forming XᵀX, which can lose numerical precision. The principal components are the right singular vectors of X, and the explained variances are σᵢ²/(n-1).
SVD is not just a theoretical tool—it's the computational backbone of many ML systems. Let's explore the major applications with practical details.
1. Principal Component Analysis (PCA):
PCA finds the orthogonal directions of maximum variance in data. Given centered data matrix X (n samples × d features), compute SVD: X = UΣVᵀ
2. Latent Semantic Analysis (LSA):
For text analysis, create a term-document matrix A where Aᵢⱼ = frequency of term i in document j. SVD reveals:
3. Recommendation Systems:
The Netflix Prize made matrix factorization famous. For user-item rating matrix R:
4. Image Compression:
Treat an image as a matrix (or separately for each color channel). Truncated SVD gives lossy compression: storing k(m+n+1) values instead of mn, with provably minimal reconstruction error.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
import numpy as npimport matplotlib.pyplot as pltfrom PIL import Imageimport requestsfrom io import BytesIO def compress_image_svd(image_array, k): """ Compress an image using rank-k SVD approximation. Works on grayscale; for color, apply to each channel. """ U, s, Vt = np.linalg.svd(image_array, full_matrices=False) # Truncate to rank k compressed = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :] # Clip values to valid range compressed = np.clip(compressed, 0, 255) return compressed.astype(np.uint8) def analyze_compression(image_array): """Analyze compression quality at different ranks.""" U, s, Vt = np.linalg.svd(image_array, full_matrices=False) m, n = image_array.shape original_size = m * n results = [] k_values = [1, 5, 10, 20, 50, 100, min(m, n)] for k in k_values: if k > len(s): continue compressed = compress_image_svd(image_array, k) # Compute metrics compressed_size = k * (m + n + 1) compression_ratio = original_size / compressed_size psnr = 10 * np.log10(255**2 / np.mean((image_array - compressed)**2)) explained_var = np.sum(s[:k]**2) / np.sum(s**2) * 100 results.append({ 'k': k, 'compression_ratio': compression_ratio, 'psnr': psnr, 'explained_variance': explained_var, 'image': compressed }) return results # Create a sample grayscale image (gradient + pattern)x = np.linspace(0, 255, 256)y = np.linspace(0, 255, 256)X, Y = np.meshgrid(x, y)image = (0.5 * X + 0.3 * Y + 50 * np.sin(X/20) * np.cos(Y/30)).astype(np.uint8) # Analyze compressionresults = analyze_compression(image) print("Image Compression Analysis")print("=" * 60)for r in results: print(f"Rank {r['k']:3d}: Compression {r['compression_ratio']:5.1f}x, " f"PSNR {r['psnr']:5.1f} dB, Explained var {r['explained_variance']:5.1f}%") # Visualizationfig, axes = plt.subplots(2, 4, figsize=(16, 8))axes = axes.flatten() axes[0].imshow(image, cmap='gray')axes[0].set_title('Original')axes[0].axis('off') for i, r in enumerate(results[:7]): axes[i+1].imshow(r['image'], cmap='gray') axes[i+1].set_title(f"k={r['k']} ({r['compression_ratio']:.1f}x)") axes[i+1].axis('off') plt.suptitle('SVD Image Compression at Different Ranks', fontsize=14)plt.tight_layout()plt.show()5. Pseudoinverse and Least Squares:
The Moore-Penrose pseudoinverse A⁺ is elegantly computed via SVD: $$A^+ = V \Sigma^+ U^T$$
where Σ⁺ has 1/σᵢ on the diagonal for non-zero singular values (and 0 elsewhere).
This solves least squares problems even when A is rank-deficient: $$x = A^+ b$$
6. Noise Reduction:
If measurements are corrupted by noise, and the true signal is approximately low-rank:
The small singular values typically correspond to noise, while large ones capture the signal.
7. Matrix Completion:
Given a matrix with missing entries (e.g., user-movie ratings), find a low-rank matrix that matches the observed entries. Nuclear norm minimization (sum of singular values) is a convex relaxation that often recovers the true matrix. SVD is central to algorithms like Soft-Impute.
Across all these applications, SVD provides the same superpower: decomposing complex, high-dimensional data into ordered components of decreasing importance. By focusing on the top components and discarding the rest, we achieve dimensionality reduction, noise removal, and computational efficiency—all with theoretical guarantees of optimality.
Computing full SVD on large matrices is expensive. Understanding computational trade-offs is essential for practical ML applications.
Complexity Analysis:
| Algorithm | Complexity | When to use |
|---|---|---|
| Full SVD | O(mn²) for m ≥ n | Small matrices, need all components |
| Truncated SVD | O(mnk) | Need only top k components |
| Randomized SVD | O(mn log k + (m+n)k²) | Large matrices, approximate top k |
| Incremental SVD | O(nk²) per row update | Streaming data |
Truncated SVD:
When you only need the top k singular values/vectors (typical in ML), use iterative methods that never compute the full decomposition:
Randomized SVD:
For very large matrices, randomized algorithms provide excellent approximations much faster:
This trades exact computation for probabilistic guarantees with much lower cost.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
import numpy as npimport timefrom scipy.sparse import random as sparse_randomfrom scipy.sparse.linalg import svdsfrom sklearn.decomposition import TruncatedSVDfrom sklearn.utils.extmath import randomized_svd def benchmark_svd_methods(m, n, k, sparsity=0.01): """ Compare different SVD methods for a sparse matrix. """ print(f"Matrix size: {m} x {n} (sparsity: {sparsity*100:.1f}%)") print(f"Target rank: {k}") print("=" * 60) # Create sparse matrix A_sparse = sparse_random(m, n, density=sparsity, format='csr') A_dense = A_sparse.toarray() results = {} # Method 1: Full SVD (only if matrix is small enough) if m * n < 1e6: start = time.time() U_full, s_full, Vt_full = np.linalg.svd(A_dense, full_matrices=False) time_full = time.time() - start results['Full SVD'] = { 'time': time_full, 'top_k_singular': s_full[:k] } print(f"Full SVD: {time_full:.3f}s") # Method 2: Sparse truncated SVD (scipy) start = time.time() U_sparse, s_sparse, Vt_sparse = svds(A_sparse, k=k, which='LM') time_sparse = time.time() - start # Note: svds returns in ascending order, need to reverse idx = np.argsort(s_sparse)[::-1] s_sparse = s_sparse[idx] results['Sparse SVDs'] = {'time': time_sparse, 'top_k_singular': s_sparse} print(f"Sparse SVDs: {time_sparse:.3f}s") # Method 3: TruncatedSVD (sklearn) - works on dense or sparse start = time.time() tsvd = TruncatedSVD(n_components=k, algorithm='arpack') tsvd.fit(A_sparse) time_tsvd = time.time() - start results['TruncatedSVD'] = { 'time': time_tsvd, 'top_k_singular': tsvd.singular_values_ } print(f"TruncatedSVD: {time_tsvd:.3f}s") # Method 4: Randomized SVD start = time.time() U_rand, s_rand, Vt_rand = randomized_svd( A_sparse, n_components=k, n_iter=5, random_state=42 ) time_rand = time.time() - start results['Randomized SVD'] = {'time': time_rand, 'top_k_singular': s_rand} print(f"Randomized SVD: {time_rand:.3f}s") # Compare singular values print("\nTop singular values comparison:") for method, data in results.items(): sv_str = ', '.join(f'{s:.2f}' for s in data['top_k_singular'][:5]) print(f" {method:15s}: {sv_str}, ...") return results # Run benchmarknp.random.seed(42)results = benchmark_svd_methods(m=1000, n=500, k=10, sparsity=0.05) print("\n" + "=" * 60)print("For even larger matrices, use out-of-core methods or distributed")print("computing frameworks like Spark MLlib or Dask.")For an m × n matrix, full SVD requires O(mn + m² + n²) memory to store A, U, and V. For large matrices (e.g., million-user recommendation systems), this is prohibitive. Truncated and streaming methods that only maintain the top-k factors are essential for production ML systems.
We've covered SVD in depth—from its formal definition through geometric interpretation, computation, and practical applications. Let's consolidate the key insights.
What's next:
Having mastered SVD, we'll explore Eigendecomposition in depth—understanding when and why to decompose square matrices by their eigenvalues and eigenvectors. While SVD is the general workhorse, eigendecomposition is essential for understanding matrix dynamics, spectral graph theory, and many probabilistic models.
You now have a deep understanding of Singular Value Decomposition—the most versatile matrix factorization in machine learning. From theoretical foundations to practical implementation, you're equipped to apply SVD across dimensionality reduction, recommendation systems, signal processing, and beyond.