Loading content...
Imagine you're analyzing customer behavior data across thousands of dimensions—purchase history, browsing patterns, demographic features, temporal signals. Buried within this high-dimensional chaos lies structure: principal directions along which the data varies most dramatically. These special directions are eigenvectors, and the scales at which they stretch or compress the data are eigenvalues.
The eigenvalue problem isn't merely an abstract mathematical curiosity—it's the lingua franca of dimensionality reduction, spectral analysis, and stability theory in machine learning. From Principal Component Analysis (PCA) that powers recommendation systems, to the PageRank algorithm that once defined web search, to spectral clustering that discovers communities in graphs—eigenanalysis permeates modern ML infrastructure.
By the end of this page, you will understand the precise mathematical definition of eigenvalues and eigenvectors, the fundamental equation Av = λv, the conditions under which eigenvectors exist, and why this seemingly simple equation unlocks profound insights about linear transformations. You'll develop the rigorous foundation needed for PCA, spectral methods, and matrix decompositions.
To understand eigenvalues and eigenvectors, we must first ask a deceptively simple question about linear transformations:
Given a matrix A, are there special vectors that, when transformed by A, simply get scaled—stretched or compressed—without changing direction?
Most vectors, when multiplied by a matrix A, rotate, shear, or transform in complex ways. Their output points in a completely different direction than the input. But certain special vectors maintain their direction—only their magnitude changes.
These special vectors are eigenvectors, and the scaling factors are their corresponding eigenvalues.
The prefix 'eigen' comes from German, meaning 'own' or 'characteristic.' Eigenvectors are the 'characteristic vectors' of a matrix—the directions that reveal the matrix's intrinsic behavior. They are the matrix's 'own' vectors, the directions it 'naturally prefers.'
Consider a 2×2 matrix representing a linear transformation in the plane. Most vectors fed into this transformation emerge pointing somewhere new. But eigenvectors are invariant in direction—they either stretch (eigenvalue > 1), compress (eigenvalue < 1), flip (eigenvalue < 0), or remain unchanged (eigenvalue = 1).
This property makes eigenvectors extraordinarily useful. They are the canonical axes along which a transformation acts most simply. Understanding these axes means understanding the transformation's essential character.
We now formalize the intuition. Given a square matrix A of size n×n, a non-zero vector v is an eigenvector of A if there exists a scalar λ (the eigenvalue) such that:
$$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$$
This is the eigenvalue equation—the defining relationship of eigenanalysis.
Let's unpack what this equation says:
We require v ≠ 0 (non-zero vector). Why? Because A·0 = λ·0 = 0 for any λ—the zero vector trivially satisfies the equation for every scalar, giving us no useful information. Eigenvectors must be non-zero to be meaningful.
Rearranging the equation:
To find eigenvalues and eigenvectors, we rearrange:
$$\mathbf{A}\mathbf{v} - \lambda\mathbf{v} = \mathbf{0}$$
$$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$
Here, I is the identity matrix of same size as A. This equation says that v lies in the null space (kernel) of the matrix (A - λI).
For a non-zero solution v to exist, the matrix (A - λI) must be singular (non-invertible). A matrix is singular if and only if its determinant is zero:
$$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$
This is the characteristic equation—a polynomial equation in λ whose roots are the eigenvalues.
123456789101112131415161718192021222324252627282930313233
import numpy as np # Define a 2x2 matrixA = np.array([ [4, 1], [2, 3]]) # Compute eigenvalues and eigenvectorseigenvalues, eigenvectors = np.linalg.eig(A) print("Matrix A:")print(A)print()print("Eigenvalues:", eigenvalues)print()print("Eigenvectors (as columns):")print(eigenvectors)print() # Verify the eigenvalue equation: Av = λvfor i in range(len(eigenvalues)): v = eigenvectors[:, i] # i-th eigenvector (column) λ = eigenvalues[i] # corresponding eigenvalue Av = A @ v # Apply transformation λv = λ * v # Scale eigenvector print(f"Eigenvalue λ_{i+1} = {λ:.4f}") print(f" Av = {Av}") print(f" λv = {λv}") print(f" Av ≈ λv? {np.allclose(Av, λv)}") print()With the core equation established, let's define the key terms precisely. These definitions will be referenced throughout your ML journey.
The relationship between algebraic and geometric multiplicity is crucial for diagonalization. Geometric multiplicity is always ≤ algebraic multiplicity. When they're equal for all eigenvalues, the matrix is diagonalizable. We'll explore this in depth in the diagonalization section.
Important Properties:
Eigenvalue scaling: If v is an eigenvector with eigenvalue λ, then cv (for any non-zero scalar c) is also an eigenvector with the same eigenvalue. We typically normalize eigenvectors to unit length.
Number of eigenvalues: An n×n matrix has exactly n eigenvalues (counting multiplicities), though some may be complex or repeated.
Real symmetric matrices: If A is real and symmetric (A = Aᵀ), all eigenvalues are real and eigenvectors corresponding to distinct eigenvalues are orthogonal. This is the Spectral Theorem—critically important for PCA.
To find eigenvalues, we solve det(A - λI) = 0. The expression det(A - λI) is a polynomial in λ called the characteristic polynomial. Let's see how this works concretely.
For a 2×2 matrix:
Given matrix A = [[a, b], [c, d]], the characteristic polynomial is:
$$\det\begin{pmatrix} a - \lambda & b \ c & d - \lambda \end{pmatrix} = (a-\lambda)(d-\lambda) - bc$$
$$= \lambda^2 - (a+d)\lambda + (ad - bc)$$
$$= \lambda^2 - \text{tr}(A)\lambda + \det(A)$$
where tr(A) = a + d is the trace (sum of diagonal elements).
This gives us an elegant relationship: the sum of eigenvalues equals the trace, and the product of eigenvalues equals the determinant.
| Property | Formula | Interpretation |
|---|---|---|
| Sum of eigenvalues | λ₁ + λ₂ + ... + λₙ = tr(A) | Eigenvalues sum to the trace |
| Product of eigenvalues | λ₁ · λ₂ · ... · λₙ = det(A) | Eigenvalues multiply to the determinant |
| Eigenvalues of Aᵏ | λ₁ᵏ, λ₂ᵏ, ..., λₙᵏ | Matrix powers raise eigenvalue powers |
| Eigenvalues of A⁻¹ | 1/λ₁, 1/λ₂, ..., 1/λₙ | Inverse reciprocates eigenvalues |
| Eigenvalues of A + cI | λ₁+c, λ₂+c, ..., λₙ+c | Shifting matrix shifts eigenvalues |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
import numpy as npfrom numpy.polynomial import polynomial as P def characteristic_polynomial_2x2(A): """ Compute characteristic polynomial coefficients for a 2x2 matrix. Returns coefficients for: λ² - tr(A)λ + det(A) = 0 """ trace = np.trace(A) det = np.linalg.det(A) # Coefficients: [constant, linear, quadratic] # λ² - trace*λ + det = 0 return np.array([det, -trace, 1]) def find_eigenvalues_2x2(A): """ Find eigenvalues by solving the characteristic polynomial. Uses quadratic formula: λ = (trace ± √(trace² - 4*det)) / 2 """ trace = np.trace(A) det = np.linalg.det(A) discriminant = trace**2 - 4*det if discriminant >= 0: λ1 = (trace + np.sqrt(discriminant)) / 2 λ2 = (trace - np.sqrt(discriminant)) / 2 return λ1, λ2, "Real eigenvalues" else: real_part = trace / 2 imag_part = np.sqrt(-discriminant) / 2 λ1 = complex(real_part, imag_part) λ2 = complex(real_part, -imag_part) return λ1, λ2, "Complex conjugate eigenvalues" # Example: Compute eigenvalues from characteristic polynomialA = np.array([ [4, 1], [2, 3]]) print("Matrix A:")print(A)print() # Method 1: Via characteristic polynomialλ1, λ2, msg = find_eigenvalues_2x2(A)print(f"From characteristic polynomial: {msg}")print(f" λ₁ = {λ1}")print(f" λ₂ = {λ2}")print() # Verify: sum should equal trace, product should equal determinantprint("Verification:")print(f" λ₁ + λ₂ = {λ1 + λ2} (trace = {np.trace(A)})")print(f" λ₁ × λ₂ = {λ1 * λ2} (det = {np.linalg.det(A)})")print() # Compare with NumPy's resulteigenvalues_np = np.linalg.eigvals(A)print(f"NumPy's eigenvalues: {eigenvalues_np}")For larger matrices:
The characteristic polynomial for an n×n matrix is degree n, so finding eigenvalues algebraically (solving the polynomial exactly) becomes intractable for n ≥ 5 (per Abel-Ruffini theorem—no general algebraic formula exists for degree 5+ polynomials).
In practice, we use numerical algorithms like:
Modern libraries (NumPy, LAPACK) implement these highly optimized algorithms. You rarely compute eigenvalues by hand—but understanding the theory is essential for interpreting results.
Once we have eigenvalues, finding the corresponding eigenvectors is a matter of solving a linear system. For each eigenvalue λ, we solve:
$$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$
This is a homogeneous system—we seek non-trivial solutions in the null space of (A - λI).
Step-by-step procedure:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import numpy as npfrom scipy.linalg import null_space def find_eigenvector(A, eigenvalue): """ Find eigenvector(s) for a given eigenvalue by computing the null space of (A - λI). """ n = A.shape[0] I = np.eye(n) # Form (A - λI) shifted = A - eigenvalue * I # Find null space ns = null_space(shifted) return ns # Example: 2x2 matrixA = np.array([ [4, 1], [2, 3]], dtype=float) print("Matrix A:")print(A)print() # Get eigenvalueseigenvalues = np.linalg.eigvals(A)print(f"Eigenvalues: {eigenvalues}")print() # Find eigenvectors manually for each eigenvaluefor i, λ in enumerate(eigenvalues): print(f"For λ = {λ:.4f}:") # Show (A - λI) shifted = A - λ * np.eye(2) print(f" A - λI =") print(f" {shifted}") # Find null space (eigenvector) v = find_eigenvector(A, λ) print(f" Eigenvector (normalized): {v.flatten()}") # Verify Av = λv Av = A @ v λv = λ * v print(f" Verification: Av = {Av.flatten()}, λv = {λv.flatten()}") print() # Compare with NumPy's eigenvectorsprint("NumPy's eigenvectors (as columns):")eigenvalues_np, eigenvectors_np = np.linalg.eig(A)print(eigenvectors_np)Eigenvectors are only unique up to scalar multiplication—if v is an eigenvector, so is 2v, -v, or 0.5v. By convention, we often normalize eigenvectors to have unit length (‖v‖ = 1). NumPy's np.linalg.eig returns normalized eigenvectors.
Geometric multiplicity and eigenspace dimension:
For each eigenvalue, the eigenspace might have dimension > 1 (geometric multiplicity > 1). This happens when:
When geometric multiplicity > 1, there are infinitely many eigenvector choices—any vector in the eigenspace works. We typically choose an orthonormal basis for the eigenspace.
Eigenvalues reveal deep truths about matrices. Understanding these properties is crucial for ML applications.
Complex eigenvalues:
Even for real-valued matrices, eigenvalues can be complex. This occurs when the characteristic polynomial has complex roots. For real matrices, complex eigenvalues always appear in conjugate pairs: if λ = a + bi is an eigenvalue, so is λ̄ = a - bi.
| Matrix Type | Eigenvalue Property | Eigenvector Property |
|---|---|---|
| Real symmetric (A = Aᵀ) | All eigenvalues are real | Eigenvectors are orthogonal |
| Positive definite | All eigenvalues are positive (λ > 0) | Defines an ellipsoid |
| Positive semi-definite | All eigenvalues are non-negative (λ ≥ 0) | Covariance matrices |
| Orthogonal (A⁻¹ = Aᵀ) | All eigenvalues have |λ| = 1 | Rotation/reflection |
| Skew-symmetric (A = -Aᵀ) | All eigenvalues are purely imaginary | Relates to rotations |
| Idempotent (A² = A) | Eigenvalues are 0 or 1 | Projection matrices |
| Nilpotent (Aᵏ = 0) | All eigenvalues are 0 | Shift operators |
For real symmetric matrices: (1) All eigenvalues are real, (2) Eigenvectors for distinct eigenvalues are orthogonal, (3) The matrix is always diagonalizable with orthogonal eigenvectors. This is why PCA works—covariance matrices are real and symmetric!
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
import numpy as np def analyze_matrix_eigenproperties(A, name="A"): """ Analyze a matrix's eigenvalue properties and verify relationships with trace, determinant, and matrix type. """ eigenvalues, eigenvectors = np.linalg.eig(A) print(f"{'='*50}") print(f"Analysis of {name}:") print(f"{'='*50}") print(f"Matrix:") print(A) print() # Eigenvalues print(f"Eigenvalues: {eigenvalues}") # Check if all real if np.allclose(eigenvalues.imag, 0): print(" → All eigenvalues are REAL") eigenvalues = eigenvalues.real else: print(" → Contains COMPLEX eigenvalues") # Trace and determinant relationships trace = np.trace(A) det = np.linalg.det(A) print(f"Trace verification:") print(f" tr(A) = {trace:.6f}") print(f" Σλ = {np.sum(eigenvalues):.6f}") print(f"Determinant verification:") print(f" det(A) = {det:.6f}") print(f" Πλ = {np.prod(eigenvalues):.6f}") # Check positive definiteness if np.allclose(eigenvalues.imag, 0): real_eigvals = eigenvalues.real if np.all(real_eigvals > 0): print(" → Matrix is POSITIVE DEFINITE") elif np.all(real_eigvals >= 0): print(" → Matrix is POSITIVE SEMI-DEFINITE") elif np.all(real_eigvals < 0): print(" → Matrix is NEGATIVE DEFINITE") # Check orthogonality of eigenvectors (for symmetric matrices) if np.allclose(A, A.T): print(" → Matrix is SYMMETRIC") # Check orthogonality V = eigenvectors orthogonality = V.T @ V print(f" Eigenvector orthogonality (Vᵀ V should be I):") print(f" {orthogonality}") if np.allclose(orthogonality, np.eye(len(eigenvalues))): print(" → Eigenvectors are ORTHONORMAL ✓") return eigenvalues, eigenvectors # Example 1: Symmetric positive definite matrixcov_matrix = np.array([ [4, 2, 0.5], [2, 3, 1], [0.5, 1, 2]])analyze_matrix_eigenproperties(cov_matrix, "Covariance Matrix (Symmetric PD)") # Example 2: Rotation matrix (orthogonal)theta = np.pi / 4 # 45 degreesrotation = np.array([ [np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])analyze_matrix_eigenproperties(rotation, "Rotation Matrix") # Example 3: Non-symmetric matrixA = np.array([ [1, 2], [0, 3]])analyze_matrix_eigenproperties(A, "Upper Triangular Matrix")A fundamental theorem governs the relationship between eigenvectors:
Theorem: Eigenvectors corresponding to distinct eigenvalues are linearly independent.
This result is profound. It means if a matrix has n distinct eigenvalues, it has n linearly independent eigenvectors—enough to form a basis for ℝⁿ. Such a matrix is diagonalizable.
Proof sketch:
Suppose v₁ and v₂ are eigenvectors with distinct eigenvalues λ₁ ≠ λ₂. If they were linearly dependent, we'd have v₂ = cv₁ for some scalar c. Then:
Av₂ = A(cv₁) = cAv₁ = cλ₁v₁ = λ₁v₂
But also Av₂ = λ₂v₂. So λ₁v₂ = λ₂v₂, which implies (λ₁ - λ₂)v₂ = 0.
Since λ₁ ≠ λ₂ and v₂ ≠ 0, we have a contradiction. Therefore v₁ and v₂ must be linearly independent.
In PCA, we seek the eigenvectors of the covariance matrix (symmetric, positive semi-definite). The spectral theorem guarantees these eigenvectors are orthogonal—they form a natural coordinate system for the data. We can project high-dimensional data onto these orthogonal directions with no redundancy.
When eigenvectors are NOT linearly independent:
Problems arise when eigenvalues are repeated. A repeated eigenvalue (algebraic multiplicity > 1) might not have enough independent eigenvectors (geometric multiplicity < algebraic multiplicity). Such matrices are called defective and are not diagonalizable.
Example of a defective matrix:
$$A = \begin{pmatrix} 1 & 1 \ 0 & 1 \end{pmatrix}$$
This has eigenvalue λ = 1 with algebraic multiplicity 2, but only one linearly independent eigenvector (geometric multiplicity 1). It cannot be diagonalized.
Fortunately, symmetric matrices are never defective—a key reason why covariance matrices (symmetric by construction) are well-behaved for PCA.
The eigenvalue problem isn't academic—it's the engine behind numerous ML algorithms. Here's a preview of key applications we'll explore in depth later in this module:
In most ML applications, you won't compute eigenvalues by hand. But understanding eigenanalysis lets you interpret results correctly: Which principal components capture signal vs. noise? Why does spectral clustering find certain groupings? What makes a system stable or unstable?
We've established the rigorous foundation of eigenvalues and eigenvectors. Let's consolidate the key concepts:
What's Next:
Now that we understand what eigenvalues and eigenvectors are, the next page explores what they mean geometrically. We'll visualize how eigenvectors define invariant directions, how eigenvalues indicate stretching/compression, and how to 'see' the eigenstructure of transformations. This geometric intuition is essential for developing practical insight into ML algorithms.
You now understand the formal definition of the eigenvalue problem—the equation Av = λv, the characteristic polynomial, and key properties. Next, we'll build geometric intuition by visualizing what eigenvalues and eigenvectors mean for linear transformations in 2D and 3D.