Eigenvalues And Eigenvectors - Learning Module

Loading content...

0/278

Eigenvalue Problem Definition

The Essence of Linear Transformations

Imagine you're analyzing customer behavior data across thousands of dimensions—purchase history, browsing patterns, demographic features, temporal signals. Buried within this high-dimensional chaos lies structure: principal directions along which the data varies most dramatically. These special directions are eigenvectors, and the scales at which they stretch or compress the data are eigenvalues.

The eigenvalue problem isn't merely an abstract mathematical curiosity—it's the lingua franca of dimensionality reduction, spectral analysis, and stability theory in machine learning. From Principal Component Analysis (PCA) that powers recommendation systems, to the PageRank algorithm that once defined web search, to spectral clustering that discovers communities in graphs—eigenanalysis permeates modern ML infrastructure.

What You Will Learn

By the end of this page, you will understand the precise mathematical definition of eigenvalues and eigenvectors, the fundamental equation Av = λv, the conditions under which eigenvectors exist, and why this seemingly simple equation unlocks profound insights about linear transformations. You'll develop the rigorous foundation needed for PCA, spectral methods, and matrix decompositions.

The Fundamental Question

To understand eigenvalues and eigenvectors, we must first ask a deceptively simple question about linear transformations:

Given a matrix A, are there special vectors that, when transformed by A, simply get scaled—stretched or compressed—without changing direction?

Most vectors, when multiplied by a matrix A, rotate, shear, or transform in complex ways. Their output points in a completely different direction than the input. But certain special vectors maintain their direction—only their magnitude changes.

These special vectors are eigenvectors, and the scaling factors are their corresponding eigenvalues.

The Etymology

The prefix 'eigen' comes from German, meaning 'own' or 'characteristic.' Eigenvectors are the 'characteristic vectors' of a matrix—the directions that reveal the matrix's intrinsic behavior. They are the matrix's 'own' vectors, the directions it 'naturally prefers.'

Consider a 2×2 matrix representing a linear transformation in the plane. Most vectors fed into this transformation emerge pointing somewhere new. But eigenvectors are invariant in direction—they either stretch (eigenvalue > 1), compress (eigenvalue < 1), flip (eigenvalue < 0), or remain unchanged (eigenvalue = 1).

This property makes eigenvectors extraordinarily useful. They are the canonical axes along which a transformation acts most simply. Understanding these axes means understanding the transformation's essential character.

The Eigenvalue Equation

We now formalize the intuition. Given a square matrix A of size n×n, a non-zero vector v is an eigenvector of A if there exists a scalar λ (the eigenvalue) such that:

$$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$$

This is the eigenvalue equation—the defining relationship of eigenanalysis.

Let's unpack what this equation says:

The left side (Av) is the result of applying the linear transformation A to vector v
The right side (λv) is simply v scaled by the scalar λ
The equation demands these be equal: transforming v by A produces the same result as just scaling v

Critical Constraint: v ≠ 0

We require v ≠ 0 (non-zero vector). Why? Because A·0 = λ·0 = 0 for any λ—the zero vector trivially satisfies the equation for every scalar, giving us no useful information. Eigenvectors must be non-zero to be meaningful.

Rearranging the equation:

To find eigenvalues and eigenvectors, we rearrange:

$$\mathbf{A}\mathbf{v} - \lambda\mathbf{v} = \mathbf{0}$$

$$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$

Here, I is the identity matrix of same size as A. This equation says that v lies in the null space (kernel) of the matrix (A - λI).

For a non-zero solution v to exist, the matrix (A - λI) must be singular (non-invertible). A matrix is singular if and only if its determinant is zero:

$$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$

This is the characteristic equation—a polynomial equation in λ whose roots are the eigenvalues.

eigenvalue_definition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
 
# Define a 2x2 matrix
A = np.array([
    [4, 1],
    [2, 3]
])
 
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
 
print("Matrix A:")
print(A)
print()
print("Eigenvalues:", eigenvalues)
print()
print("Eigenvectors (as columns):")
print(eigenvectors)
print()
 
# Verify the eigenvalue equation: Av = λv
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]  # i-th eigenvector (column)
    λ = eigenvalues[i]      # corresponding eigenvalue
    
    Av = A @ v              # Apply transformation
    λv = λ * v              # Scale eigenvector
    
    print(f"Eigenvalue λ_{i+1} = {λ:.4f}")
    print(f"  Av = {Av}")
    print(f"  λv = {λv}")
    print(f"  Av ≈ λv? {np.allclose(Av, λv)}")
    print()

Formal Definitions and Terminology

With the core equation established, let's define the key terms precisely. These definitions will be referenced throughout your ML journey.

Core Definitions

•Eigenvalue (λ) — A scalar λ such that there exists a non-zero vector v satisfying Av = λv. Eigenvalues can be real or complex numbers, even for real-valued matrices.
•Eigenvector (v) — A non-zero vector v that, when transformed by matrix A, is simply scaled by the corresponding eigenvalue λ. Direction is preserved (or reversed if λ < 0).
•Eigenspace — For a given eigenvalue λ, the eigenspace is the set of all vectors v (including the zero vector) satisfying (A - λI)v = 0. It forms a subspace of ℝⁿ.
•Characteristic Polynomial — The polynomial p(λ) = det(A - λI). Its roots are the eigenvalues of A. For an n×n matrix, this is a degree-n polynomial.
•Spectrum — The set of all eigenvalues of a matrix, often denoted σ(A). The spectrum characterizes the matrix's behavior in many important ways.
•Algebraic Multiplicity — The number of times an eigenvalue appears as a root of the characteristic polynomial.
•Geometric Multiplicity — The dimension of the eigenspace corresponding to an eigenvalue—the number of linearly independent eigenvectors for that eigenvalue.

Multiplicity Matters

The relationship between algebraic and geometric multiplicity is crucial for diagonalization. Geometric multiplicity is always ≤ algebraic multiplicity. When they're equal for all eigenvalues, the matrix is diagonalizable. We'll explore this in depth in the diagonalization section.

Important Properties:

Eigenvalue scaling: If v is an eigenvector with eigenvalue λ, then cv (for any non-zero scalar c) is also an eigenvector with the same eigenvalue. We typically normalize eigenvectors to unit length.
Number of eigenvalues: An n×n matrix has exactly n eigenvalues (counting multiplicities), though some may be complex or repeated.
Real symmetric matrices: If A is real and symmetric (A = Aᵀ), all eigenvalues are real and eigenvectors corresponding to distinct eigenvalues are orthogonal. This is the Spectral Theorem—critically important for PCA.

The Characteristic Polynomial

To find eigenvalues, we solve det(A - λI) = 0. The expression det(A - λI) is a polynomial in λ called the characteristic polynomial. Let's see how this works concretely.

For a 2×2 matrix:

Given matrix A = [[a, b], [c, d]], the characteristic polynomial is:

$$\det\begin{pmatrix} a - \lambda & b \ c & d - \lambda \end{pmatrix} = (a-\lambda)(d-\lambda) - bc$$

$$= \lambda^2 - (a+d)\lambda + (ad - bc)$$

$$= \lambda^2 - \text{tr}(A)\lambda + \det(A)$$

where tr(A) = a + d is the trace (sum of diagonal elements).

This gives us an elegant relationship: the sum of eigenvalues equals the trace, and the product of eigenvalues equals the determinant.

Eigenvalue-Matrix Relationships
Property	Formula	Interpretation
Sum of eigenvalues	λ₁ + λ₂ + ... + λₙ = tr(A)	Eigenvalues sum to the trace
Product of eigenvalues	λ₁ · λ₂ · ... · λₙ = det(A)	Eigenvalues multiply to the determinant
Eigenvalues of Aᵏ	λ₁ᵏ, λ₂ᵏ, ..., λₙᵏ	Matrix powers raise eigenvalue powers
Eigenvalues of A⁻¹	1/λ₁, 1/λ₂, ..., 1/λₙ	Inverse reciprocates eigenvalues
Eigenvalues of A + cI	λ₁+c, λ₂+c, ..., λₙ+c	Shifting matrix shifts eigenvalues

characteristic_polynomial.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import numpy as np
from numpy.polynomial import polynomial as P
 
def characteristic_polynomial_2x2(A):
    """
    Compute characteristic polynomial coefficients for a 2x2 matrix.
    Returns coefficients for: λ² - tr(A)λ + det(A) = 0
    """
    trace = np.trace(A)
    det = np.linalg.det(A)
    
    # Coefficients: [constant, linear, quadratic]
    # λ² - trace*λ + det = 0
    return np.array([det, -trace, 1])
 
def find_eigenvalues_2x2(A):
    """
    Find eigenvalues by solving the characteristic polynomial.
    Uses quadratic formula: λ = (trace ± √(trace² - 4*det)) / 2
    """
    trace = np.trace(A)
    det = np.linalg.det(A)
    
    discriminant = trace**2 - 4*det
    
    if discriminant >= 0:
        λ1 = (trace + np.sqrt(discriminant)) / 2
        λ2 = (trace - np.sqrt(discriminant)) / 2
        return λ1, λ2, "Real eigenvalues"
    else:
        real_part = trace / 2
        imag_part = np.sqrt(-discriminant) / 2
        λ1 = complex(real_part, imag_part)
        λ2 = complex(real_part, -imag_part)
        return λ1, λ2, "Complex conjugate eigenvalues"
 
# Example: Compute eigenvalues from characteristic polynomial
A = np.array([
    [4, 1],
    [2, 3]
])
 
print("Matrix A:")
print(A)
print()
 
# Method 1: Via characteristic polynomial
λ1, λ2, msg = find_eigenvalues_2x2(A)
print(f"From characteristic polynomial: {msg}")
print(f"  λ₁ = {λ1}")
print(f"  λ₂ = {λ2}")
print()
 
# Verify: sum should equal trace, product should equal determinant
print("Verification:")
print(f"  λ₁ + λ₂ = {λ1 + λ2} (trace = {np.trace(A)})")
print(f"  λ₁ × λ₂ = {λ1 * λ2} (det = {np.linalg.det(A)})")
print()
 
# Compare with NumPy's result
eigenvalues_np = np.linalg.eigvals(A)
print(f"NumPy's eigenvalues: {eigenvalues_np}")

For larger matrices:

The characteristic polynomial for an n×n matrix is degree n, so finding eigenvalues algebraically (solving the polynomial exactly) becomes intractable for n ≥ 5 (per Abel-Ruffini theorem—no general algebraic formula exists for degree 5+ polynomials).

In practice, we use numerical algorithms like:

Power iteration — finds the dominant eigenvalue
QR algorithm — industry-standard for all eigenvalues
Lanczos/Arnoldi methods — for sparse matrices
Divide-and-conquer — for symmetric matrices

Modern libraries (NumPy, LAPACK) implement these highly optimized algorithms. You rarely compute eigenvalues by hand—but understanding the theory is essential for interpreting results.

Finding Eigenvectors

Once we have eigenvalues, finding the corresponding eigenvectors is a matter of solving a linear system. For each eigenvalue λ, we solve:

$$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$

This is a homogeneous system—we seek non-trivial solutions in the null space of (A - λI).

Step-by-step procedure:

Compute (A - λI) by subtracting λ from each diagonal entry
Row-reduce (A - λI) to row echelon form
Solve for free variables to express the null space
Each free variable corresponds to a basis vector of the eigenspace

finding_eigenvectors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
from scipy.linalg import null_space
 
def find_eigenvector(A, eigenvalue):
    """
    Find eigenvector(s) for a given eigenvalue by computing
    the null space of (A - λI).
    """
    n = A.shape[0]
    I = np.eye(n)
    
    # Form (A - λI)
    shifted = A - eigenvalue * I
    
    # Find null space
    ns = null_space(shifted)
    
    return ns
 
# Example: 2x2 matrix
A = np.array([
    [4, 1],
    [2, 3]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Get eigenvalues
eigenvalues = np.linalg.eigvals(A)
print(f"Eigenvalues: {eigenvalues}")
print()
 
# Find eigenvectors manually for each eigenvalue
for i, λ in enumerate(eigenvalues):
    print(f"For λ = {λ:.4f}:")
    
    # Show (A - λI)
    shifted = A - λ * np.eye(2)
    print(f"  A - λI =")
    print(f"    {shifted}")
    
    # Find null space (eigenvector)
    v = find_eigenvector(A, λ)
    print(f"  Eigenvector (normalized): {v.flatten()}")
    
    # Verify Av = λv
    Av = A @ v
    λv = λ * v
    print(f"  Verification: Av = {Av.flatten()}, λv = {λv.flatten()}")
    print()
 
# Compare with NumPy's eigenvectors
print("NumPy's eigenvectors (as columns):")
eigenvalues_np, eigenvectors_np = np.linalg.eig(A)
print(eigenvectors_np)

Eigenvector Normalization

Eigenvectors are only unique up to scalar multiplication—if v is an eigenvector, so is 2v, -v, or 0.5v. By convention, we often normalize eigenvectors to have unit length (‖v‖ = 1). NumPy's np.linalg.eig returns normalized eigenvectors.

Geometric multiplicity and eigenspace dimension:

For each eigenvalue, the eigenspace might have dimension > 1 (geometric multiplicity > 1). This happens when:

An eigenvalue is repeated (algebraic multiplicity > 1)
The null space of (A - λI) has dimension > 1

When geometric multiplicity > 1, there are infinitely many eigenvector choices—any vector in the eigenspace works. We typically choose an orthonormal basis for the eigenspace.

Key Properties of Eigenvalues

Eigenvalues reveal deep truths about matrices. Understanding these properties is crucial for ML applications.

Complex eigenvalues:

Even for real-valued matrices, eigenvalues can be complex. This occurs when the characteristic polynomial has complex roots. For real matrices, complex eigenvalues always appear in conjugate pairs: if λ = a + bi is an eigenvalue, so is λ̄ = a - bi.

Eigenvalue Properties for Special Matrix Types
Matrix Type	Eigenvalue Property	Eigenvector Property
Real symmetric (A = Aᵀ)	All eigenvalues are real	Eigenvectors are orthogonal
Positive definite	All eigenvalues are positive (λ > 0)	Defines an ellipsoid
Positive semi-definite	All eigenvalues are non-negative (λ ≥ 0)	Covariance matrices
Orthogonal (A⁻¹ = Aᵀ)	All eigenvalues have \|λ\| = 1	Rotation/reflection
Skew-symmetric (A = -Aᵀ)	All eigenvalues are purely imaginary	Relates to rotations
Idempotent (A² = A)	Eigenvalues are 0 or 1	Projection matrices
Nilpotent (Aᵏ = 0)	All eigenvalues are 0	Shift operators

The Spectral Theorem: Foundation of PCA

For real symmetric matrices: (1) All eigenvalues are real, (2) Eigenvectors for distinct eigenvalues are orthogonal, (3) The matrix is always diagonalizable with orthogonal eigenvectors. This is why PCA works—covariance matrices are real and symmetric!

eigenvalue_properties.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import numpy as np
 
def analyze_matrix_eigenproperties(A, name="A"):
    """
    Analyze a matrix's eigenvalue properties and verify
    relationships with trace, determinant, and matrix type.
    """
    eigenvalues, eigenvectors = np.linalg.eig(A)
    
    print(f"
{'='*50}")
    print(f"Analysis of {name}:")
    print(f"{'='*50}")
    print(f"Matrix:")
    print(A)
    print()
    
    # Eigenvalues
    print(f"Eigenvalues: {eigenvalues}")
    
    # Check if all real
    if np.allclose(eigenvalues.imag, 0):
        print("  → All eigenvalues are REAL")
        eigenvalues = eigenvalues.real
    else:
        print("  → Contains COMPLEX eigenvalues")
    
    # Trace and determinant relationships
    trace = np.trace(A)
    det = np.linalg.det(A)
    
    print(f"
Trace verification:")
    print(f"  tr(A) = {trace:.6f}")
    print(f"  Σλ = {np.sum(eigenvalues):.6f}")
    
    print(f"
Determinant verification:")
    print(f"  det(A) = {det:.6f}")
    print(f"  Πλ = {np.prod(eigenvalues):.6f}")
    
    # Check positive definiteness
    if np.allclose(eigenvalues.imag, 0):
        real_eigvals = eigenvalues.real
        if np.all(real_eigvals > 0):
            print("
  → Matrix is POSITIVE DEFINITE")
        elif np.all(real_eigvals >= 0):
            print("
  → Matrix is POSITIVE SEMI-DEFINITE")
        elif np.all(real_eigvals < 0):
            print("
  → Matrix is NEGATIVE DEFINITE")
    
    # Check orthogonality of eigenvectors (for symmetric matrices)
    if np.allclose(A, A.T):
        print("
  → Matrix is SYMMETRIC")
        # Check orthogonality
        V = eigenvectors
        orthogonality = V.T @ V
        print(f"
  Eigenvector orthogonality (Vᵀ V should be I):")
        print(f"  {orthogonality}")
        if np.allclose(orthogonality, np.eye(len(eigenvalues))):
            print("  → Eigenvectors are ORTHONORMAL ✓")
    
    return eigenvalues, eigenvectors
 
# Example 1: Symmetric positive definite matrix
cov_matrix = np.array([
    [4, 2, 0.5],
    [2, 3, 1],
    [0.5, 1, 2]
])
analyze_matrix_eigenproperties(cov_matrix, "Covariance Matrix (Symmetric PD)")
 
# Example 2: Rotation matrix (orthogonal)
theta = np.pi / 4  # 45 degrees
rotation = np.array([
    [np.cos(theta), -np.sin(theta)],
    [np.sin(theta), np.cos(theta)]
])
analyze_matrix_eigenproperties(rotation, "Rotation Matrix")
 
# Example 3: Non-symmetric matrix
A = np.array([
    [1, 2],
    [0, 3]
])
analyze_matrix_eigenproperties(A, "Upper Triangular Matrix")

Linear Independence of Eigenvectors

A fundamental theorem governs the relationship between eigenvectors:

Theorem: Eigenvectors corresponding to distinct eigenvalues are linearly independent.

This result is profound. It means if a matrix has n distinct eigenvalues, it has n linearly independent eigenvectors—enough to form a basis for ℝⁿ. Such a matrix is diagonalizable.

Proof sketch:

Suppose v₁ and v₂ are eigenvectors with distinct eigenvalues λ₁ ≠ λ₂. If they were linearly dependent, we'd have v₂ = cv₁ for some scalar c. Then:

Av₂ = A(cv₁) = cAv₁ = cλ₁v₁ = λ₁v₂

But also Av₂ = λ₂v₂. So λ₁v₂ = λ₂v₂, which implies (λ₁ - λ₂)v₂ = 0.

Since λ₁ ≠ λ₂ and v₂ ≠ 0, we have a contradiction. Therefore v₁ and v₂ must be linearly independent.

Implications for Machine Learning

In PCA, we seek the eigenvectors of the covariance matrix (symmetric, positive semi-definite). The spectral theorem guarantees these eigenvectors are orthogonal—they form a natural coordinate system for the data. We can project high-dimensional data onto these orthogonal directions with no redundancy.

When eigenvectors are NOT linearly independent:

Problems arise when eigenvalues are repeated. A repeated eigenvalue (algebraic multiplicity > 1) might not have enough independent eigenvectors (geometric multiplicity < algebraic multiplicity). Such matrices are called defective and are not diagonalizable.

Example of a defective matrix:

$$A = \begin{pmatrix} 1 & 1 \ 0 & 1 \end{pmatrix}$$

This has eigenvalue λ = 1 with algebraic multiplicity 2, but only one linearly independent eigenvector (geometric multiplicity 1). It cannot be diagonalized.

Fortunately, symmetric matrices are never defective—a key reason why covariance matrices (symmetric by construction) are well-behaved for PCA.

Why Eigenvalues Matter for Machine Learning

The eigenvalue problem isn't academic—it's the engine behind numerous ML algorithms. Here's a preview of key applications we'll explore in depth later in this module:

Eigenanalysis in Machine Learning

•Principal Component Analysis (PCA) — Eigenvectors of the covariance matrix define principal directions of maximum variance. Eigenvalues quantify variance along each direction. Dimensionality reduction keeps top-k eigenvectors.
•Spectral Clustering — Eigenvectors of the graph Laplacian reveal cluster structure. The smallest eigenvalues correspond to smoothest functions over the graph—natural cluster indicators.
•PageRank Algorithm — The dominant eigenvector of the web's link matrix determines page importance. Eigenvalue λ = 1 corresponds to the stationary distribution.
•Linear Dynamical Systems — Eigenvalues determine stability. |λ| < 1 means stable (convergent); |λ| > 1 means unstable (divergent). Critical for analyzing RNN dynamics.
•Kernel PCA — Eigendecomposition of the kernel matrix enables nonlinear dimensionality reduction in feature space.
•Matrix Completion/SVD — SVD decomposes any matrix via eigenanalysis of AᵀA and AAᵀ. Powers collaborative filtering and recommendation systems.

The ML Practitioner's View

In most ML applications, you won't compute eigenvalues by hand. But understanding eigenanalysis lets you interpret results correctly: Which principal components capture signal vs. noise? Why does spectral clustering find certain groupings? What makes a system stable or unstable?

Summary: The Eigenvalue Problem

We've established the rigorous foundation of eigenvalues and eigenvectors. Let's consolidate the key concepts:

Key Takeaways

•The eigenvalue equation Av = λv defines special vectors (eigenvectors) that only scale under transformation A, with scaling factor λ (eigenvalue).
•Eigenvalues are roots of the characteristic polynomial det(A - λI) = 0. An n×n matrix has exactly n eigenvalues (counting multiplicities).
•Eigenvectors span the null space of (A - λI). They're only unique up to scalar multiplication; we typically normalize to unit length.
•Sum of eigenvalues equals trace; product equals determinant. These invariants connect eigenvalues to matrix properties.
•Symmetric matrices have real eigenvalues and orthogonal eigenvectors—the Spectral Theorem. This is why PCA works reliably.
•Eigenvectors for distinct eigenvalues are linearly independent. This guarantees diagonalizability when all eigenvalues differ.
•Eigenanalysis underpins PCA, spectral clustering, PageRank, and stability analysis—core ML algorithms.

What's Next:

Now that we understand what eigenvalues and eigenvectors are, the next page explores what they mean geometrically. We'll visualize how eigenvectors define invariant directions, how eigenvalues indicate stretching/compression, and how to 'see' the eigenstructure of transformations. This geometric intuition is essential for developing practical insight into ML algorithms.

Page Complete

You now understand the formal definition of the eigenvalue problem—the equation Av = λv, the characteristic polynomial, and key properties. Next, we'll build geometric intuition by visualizing what eigenvalues and eigenvectors mean for linear transformations in 2D and 3D.

Eigenvalue Problem Definition

The Essence of Linear Transformations

What You Will Learn

The Fundamental Question

To understand eigenvalues and eigenvectors, we must first ask a deceptively simple question about linear transformations:

Given a matrix A, are there special vectors that, when transformed by A, simply get scaled—stretched or compressed—without changing direction?

These special vectors are eigenvectors, and the scaling factors are their corresponding eigenvalues.

The Etymology

The Eigenvalue Equation

We now formalize the intuition. Given a square matrix A of size n×n, a non-zero vector v is an eigenvector of A if there exists a scalar λ (the eigenvalue) such that:

$$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$$

This is the eigenvalue equation—the defining relationship of eigenanalysis.

Let's unpack what this equation says:

The left side (Av) is the result of applying the linear transformation A to vector v
The right side (λv) is simply v scaled by the scalar λ
The equation demands these be equal: transforming v by A produces the same result as just scaling v

Critical Constraint: v ≠ 0

Rearranging the equation:

To find eigenvalues and eigenvectors, we rearrange:

$$\mathbf{A}\mathbf{v} - \lambda\mathbf{v} = \mathbf{0}$$

$$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$

Here, I is the identity matrix of same size as A. This equation says that v lies in the null space (kernel) of the matrix (A - λI).

For a non-zero solution v to exist, the matrix (A - λI) must be singular (non-invertible). A matrix is singular if and only if its determinant is zero:

$$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$

This is the characteristic equation—a polynomial equation in λ whose roots are the eigenvalues.

eigenvalue_definition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
 
# Define a 2x2 matrix
A = np.array([
    [4, 1],
    [2, 3]
])
 
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
 
print("Matrix A:")
print(A)
print()
print("Eigenvalues:", eigenvalues)
print()
print("Eigenvectors (as columns):")
print(eigenvectors)
print()
 
# Verify the eigenvalue equation: Av = λv
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]  # i-th eigenvector (column)
    λ = eigenvalues[i]      # corresponding eigenvalue
    
    Av = A @ v              # Apply transformation
    λv = λ * v              # Scale eigenvector
    
    print(f"Eigenvalue λ_{i+1} = {λ:.4f}")
    print(f"  Av = {Av}")
    print(f"  λv = {λv}")
    print(f"  Av ≈ λv? {np.allclose(Av, λv)}")
    print()

Formal Definitions and Terminology

With the core equation established, let's define the key terms precisely. These definitions will be referenced throughout your ML journey.

Core Definitions

•Eigenvalue (λ) — A scalar λ such that there exists a non-zero vector v satisfying Av = λv. Eigenvalues can be real or complex numbers, even for real-valued matrices.
•Eigenvector (v) — A non-zero vector v that, when transformed by matrix A, is simply scaled by the corresponding eigenvalue λ. Direction is preserved (or reversed if λ < 0).
•Eigenspace — For a given eigenvalue λ, the eigenspace is the set of all vectors v (including the zero vector) satisfying (A - λI)v = 0. It forms a subspace of ℝⁿ.
•Characteristic Polynomial — The polynomial p(λ) = det(A - λI). Its roots are the eigenvalues of A. For an n×n matrix, this is a degree-n polynomial.
•Spectrum — The set of all eigenvalues of a matrix, often denoted σ(A). The spectrum characterizes the matrix's behavior in many important ways.
•Algebraic Multiplicity — The number of times an eigenvalue appears as a root of the characteristic polynomial.
•Geometric Multiplicity — The dimension of the eigenspace corresponding to an eigenvalue—the number of linearly independent eigenvectors for that eigenvalue.

Multiplicity Matters

Important Properties:

Eigenvalue scaling: If v is an eigenvector with eigenvalue λ, then cv (for any non-zero scalar c) is also an eigenvector with the same eigenvalue. We typically normalize eigenvectors to unit length.
Number of eigenvalues: An n×n matrix has exactly n eigenvalues (counting multiplicities), though some may be complex or repeated.
Real symmetric matrices: If A is real and symmetric (A = Aᵀ), all eigenvalues are real and eigenvectors corresponding to distinct eigenvalues are orthogonal. This is the Spectral Theorem—critically important for PCA.

The Characteristic Polynomial

To find eigenvalues, we solve det(A - λI) = 0. The expression det(A - λI) is a polynomial in λ called the characteristic polynomial. Let's see how this works concretely.

For a 2×2 matrix:

Given matrix A = [[a, b], [c, d]], the characteristic polynomial is:

$$\det\begin{pmatrix} a - \lambda & b \ c & d - \lambda \end{pmatrix} = (a-\lambda)(d-\lambda) - bc$$

$$= \lambda^2 - (a+d)\lambda + (ad - bc)$$

$$= \lambda^2 - \text{tr}(A)\lambda + \det(A)$$

where tr(A) = a + d is the trace (sum of diagonal elements).

This gives us an elegant relationship: the sum of eigenvalues equals the trace, and the product of eigenvalues equals the determinant.

Eigenvalue-Matrix Relationships
Property	Formula	Interpretation
Sum of eigenvalues	λ₁ + λ₂ + ... + λₙ = tr(A)	Eigenvalues sum to the trace
Product of eigenvalues	λ₁ · λ₂ · ... · λₙ = det(A)	Eigenvalues multiply to the determinant
Eigenvalues of Aᵏ	λ₁ᵏ, λ₂ᵏ, ..., λₙᵏ	Matrix powers raise eigenvalue powers
Eigenvalues of A⁻¹	1/λ₁, 1/λ₂, ..., 1/λₙ	Inverse reciprocates eigenvalues
Eigenvalues of A + cI	λ₁+c, λ₂+c, ..., λₙ+c	Shifting matrix shifts eigenvalues

characteristic_polynomial.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import numpy as np
from numpy.polynomial import polynomial as P
 
def characteristic_polynomial_2x2(A):
    """
    Compute characteristic polynomial coefficients for a 2x2 matrix.
    Returns coefficients for: λ² - tr(A)λ + det(A) = 0
    """
    trace = np.trace(A)
    det = np.linalg.det(A)
    
    # Coefficients: [constant, linear, quadratic]
    # λ² - trace*λ + det = 0
    return np.array([det, -trace, 1])
 
def find_eigenvalues_2x2(A):
    """
    Find eigenvalues by solving the characteristic polynomial.
    Uses quadratic formula: λ = (trace ± √(trace² - 4*det)) / 2
    """
    trace = np.trace(A)
    det = np.linalg.det(A)
    
    discriminant = trace**2 - 4*det
    
    if discriminant >= 0:
        λ1 = (trace + np.sqrt(discriminant)) / 2
        λ2 = (trace - np.sqrt(discriminant)) / 2
        return λ1, λ2, "Real eigenvalues"
    else:
        real_part = trace / 2
        imag_part = np.sqrt(-discriminant) / 2
        λ1 = complex(real_part, imag_part)
        λ2 = complex(real_part, -imag_part)
        return λ1, λ2, "Complex conjugate eigenvalues"
 
# Example: Compute eigenvalues from characteristic polynomial
A = np.array([
    [4, 1],
    [2, 3]
])
 
print("Matrix A:")
print(A)
print()
 
# Method 1: Via characteristic polynomial
λ1, λ2, msg = find_eigenvalues_2x2(A)
print(f"From characteristic polynomial: {msg}")
print(f"  λ₁ = {λ1}")
print(f"  λ₂ = {λ2}")
print()
 
# Verify: sum should equal trace, product should equal determinant
print("Verification:")
print(f"  λ₁ + λ₂ = {λ1 + λ2} (trace = {np.trace(A)})")
print(f"  λ₁ × λ₂ = {λ1 * λ2} (det = {np.linalg.det(A)})")
print()
 
# Compare with NumPy's result
eigenvalues_np = np.linalg.eigvals(A)
print(f"NumPy's eigenvalues: {eigenvalues_np}")

For larger matrices:

In practice, we use numerical algorithms like:

Power iteration — finds the dominant eigenvalue
QR algorithm — industry-standard for all eigenvalues
Lanczos/Arnoldi methods — for sparse matrices
Divide-and-conquer — for symmetric matrices

Modern libraries (NumPy, LAPACK) implement these highly optimized algorithms. You rarely compute eigenvalues by hand—but understanding the theory is essential for interpreting results.

Finding Eigenvectors

Once we have eigenvalues, finding the corresponding eigenvectors is a matter of solving a linear system. For each eigenvalue λ, we solve:

$$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$

This is a homogeneous system—we seek non-trivial solutions in the null space of (A - λI).

Step-by-step procedure:

Compute (A - λI) by subtracting λ from each diagonal entry
Row-reduce (A - λI) to row echelon form
Solve for free variables to express the null space
Each free variable corresponds to a basis vector of the eigenspace

finding_eigenvectors.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
from scipy.linalg import null_space
 
def find_eigenvector(A, eigenvalue):
    """
    Find eigenvector(s) for a given eigenvalue by computing
    the null space of (A - λI).
    """
    n = A.shape[0]
    I = np.eye(n)
    
    # Form (A - λI)
    shifted = A - eigenvalue * I
    
    # Find null space
    ns = null_space(shifted)
    
    return ns
 
# Example: 2x2 matrix
A = np.array([
    [4, 1],
    [2, 3]
], dtype=float)
 
print("Matrix A:")
print(A)
print()
 
# Get eigenvalues
eigenvalues = np.linalg.eigvals(A)
print(f"Eigenvalues: {eigenvalues}")
print()
 
# Find eigenvectors manually for each eigenvalue
for i, λ in enumerate(eigenvalues):
    print(f"For λ = {λ:.4f}:")
    
    # Show (A - λI)
    shifted = A - λ * np.eye(2)
    print(f"  A - λI =")
    print(f"    {shifted}")
    
    # Find null space (eigenvector)
    v = find_eigenvector(A, λ)
    print(f"  Eigenvector (normalized): {v.flatten()}")
    
    # Verify Av = λv
    Av = A @ v
    λv = λ * v
    print(f"  Verification: Av = {Av.flatten()}, λv = {λv.flatten()}")
    print()
 
# Compare with NumPy's eigenvectors
print("NumPy's eigenvectors (as columns):")
eigenvalues_np, eigenvectors_np = np.linalg.eig(A)
print(eigenvectors_np)

Eigenvector Normalization

Geometric multiplicity and eigenspace dimension:

For each eigenvalue, the eigenspace might have dimension > 1 (geometric multiplicity > 1). This happens when:

An eigenvalue is repeated (algebraic multiplicity > 1)
The null space of (A - λI) has dimension > 1

When geometric multiplicity > 1, there are infinitely many eigenvector choices—any vector in the eigenspace works. We typically choose an orthonormal basis for the eigenspace.

Key Properties of Eigenvalues

Eigenvalues reveal deep truths about matrices. Understanding these properties is crucial for ML applications.

Complex eigenvalues:

Eigenvalue Properties for Special Matrix Types
Matrix Type	Eigenvalue Property	Eigenvector Property
Real symmetric (A = Aᵀ)	All eigenvalues are real	Eigenvectors are orthogonal
Positive definite	All eigenvalues are positive (λ > 0)	Defines an ellipsoid
Positive semi-definite	All eigenvalues are non-negative (λ ≥ 0)	Covariance matrices
Orthogonal (A⁻¹ = Aᵀ)	All eigenvalues have \|λ\| = 1	Rotation/reflection
Skew-symmetric (A = -Aᵀ)	All eigenvalues are purely imaginary	Relates to rotations
Idempotent (A² = A)	Eigenvalues are 0 or 1	Projection matrices
Nilpotent (Aᵏ = 0)	All eigenvalues are 0	Shift operators

The Spectral Theorem: Foundation of PCA

eigenvalue_properties.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import numpy as np
 
def analyze_matrix_eigenproperties(A, name="A"):
    """
    Analyze a matrix's eigenvalue properties and verify
    relationships with trace, determinant, and matrix type.
    """
    eigenvalues, eigenvectors = np.linalg.eig(A)
    
    print(f"
{'='*50}")
    print(f"Analysis of {name}:")
    print(f"{'='*50}")
    print(f"Matrix:")
    print(A)
    print()
    
    # Eigenvalues
    print(f"Eigenvalues: {eigenvalues}")
    
    # Check if all real
    if np.allclose(eigenvalues.imag, 0):
        print("  → All eigenvalues are REAL")
        eigenvalues = eigenvalues.real
    else:
        print("  → Contains COMPLEX eigenvalues")
    
    # Trace and determinant relationships
    trace = np.trace(A)
    det = np.linalg.det(A)
    
    print(f"
Trace verification:")
    print(f"  tr(A) = {trace:.6f}")
    print(f"  Σλ = {np.sum(eigenvalues):.6f}")
    
    print(f"
Determinant verification:")
    print(f"  det(A) = {det:.6f}")
    print(f"  Πλ = {np.prod(eigenvalues):.6f}")
    
    # Check positive definiteness
    if np.allclose(eigenvalues.imag, 0):
        real_eigvals = eigenvalues.real
        if np.all(real_eigvals > 0):
            print("
  → Matrix is POSITIVE DEFINITE")
        elif np.all(real_eigvals >= 0):
            print("
  → Matrix is POSITIVE SEMI-DEFINITE")
        elif np.all(real_eigvals < 0):
            print("
  → Matrix is NEGATIVE DEFINITE")
    
    # Check orthogonality of eigenvectors (for symmetric matrices)
    if np.allclose(A, A.T):
        print("
  → Matrix is SYMMETRIC")
        # Check orthogonality
        V = eigenvectors
        orthogonality = V.T @ V
        print(f"
  Eigenvector orthogonality (Vᵀ V should be I):")
        print(f"  {orthogonality}")
        if np.allclose(orthogonality, np.eye(len(eigenvalues))):
            print("  → Eigenvectors are ORTHONORMAL ✓")
    
    return eigenvalues, eigenvectors
 
# Example 1: Symmetric positive definite matrix
cov_matrix = np.array([
    [4, 2, 0.5],
    [2, 3, 1],
    [0.5, 1, 2]
])
analyze_matrix_eigenproperties(cov_matrix, "Covariance Matrix (Symmetric PD)")
 
# Example 2: Rotation matrix (orthogonal)
theta = np.pi / 4  # 45 degrees
rotation = np.array([
    [np.cos(theta), -np.sin(theta)],
    [np.sin(theta), np.cos(theta)]
])
analyze_matrix_eigenproperties(rotation, "Rotation Matrix")
 
# Example 3: Non-symmetric matrix
A = np.array([
    [1, 2],
    [0, 3]
])
analyze_matrix_eigenproperties(A, "Upper Triangular Matrix")

Linear Independence of Eigenvectors

A fundamental theorem governs the relationship between eigenvectors:

Theorem: Eigenvectors corresponding to distinct eigenvalues are linearly independent.

This result is profound. It means if a matrix has n distinct eigenvalues, it has n linearly independent eigenvectors—enough to form a basis for ℝⁿ. Such a matrix is diagonalizable.

Proof sketch:

Suppose v₁ and v₂ are eigenvectors with distinct eigenvalues λ₁ ≠ λ₂. If they were linearly dependent, we'd have v₂ = cv₁ for some scalar c. Then:

Av₂ = A(cv₁) = cAv₁ = cλ₁v₁ = λ₁v₂

But also Av₂ = λ₂v₂. So λ₁v₂ = λ₂v₂, which implies (λ₁ - λ₂)v₂ = 0.

Since λ₁ ≠ λ₂ and v₂ ≠ 0, we have a contradiction. Therefore v₁ and v₂ must be linearly independent.

Implications for Machine Learning

When eigenvectors are NOT linearly independent:

Example of a defective matrix:

$$A = \begin{pmatrix} 1 & 1 \ 0 & 1 \end{pmatrix}$$

This has eigenvalue λ = 1 with algebraic multiplicity 2, but only one linearly independent eigenvector (geometric multiplicity 1). It cannot be diagonalized.

Fortunately, symmetric matrices are never defective—a key reason why covariance matrices (symmetric by construction) are well-behaved for PCA.

Why Eigenvalues Matter for Machine Learning

The eigenvalue problem isn't academic—it's the engine behind numerous ML algorithms. Here's a preview of key applications we'll explore in depth later in this module:

Eigenanalysis in Machine Learning

•Principal Component Analysis (PCA) — Eigenvectors of the covariance matrix define principal directions of maximum variance. Eigenvalues quantify variance along each direction. Dimensionality reduction keeps top-k eigenvectors.
•Spectral Clustering — Eigenvectors of the graph Laplacian reveal cluster structure. The smallest eigenvalues correspond to smoothest functions over the graph—natural cluster indicators.
•PageRank Algorithm — The dominant eigenvector of the web's link matrix determines page importance. Eigenvalue λ = 1 corresponds to the stationary distribution.
•Linear Dynamical Systems — Eigenvalues determine stability. |λ| < 1 means stable (convergent); |λ| > 1 means unstable (divergent). Critical for analyzing RNN dynamics.
•Kernel PCA — Eigendecomposition of the kernel matrix enables nonlinear dimensionality reduction in feature space.
•Matrix Completion/SVD — SVD decomposes any matrix via eigenanalysis of AᵀA and AAᵀ. Powers collaborative filtering and recommendation systems.

The ML Practitioner's View

Summary: The Eigenvalue Problem

We've established the rigorous foundation of eigenvalues and eigenvectors. Let's consolidate the key concepts:

Key Takeaways

•The eigenvalue equation Av = λv defines special vectors (eigenvectors) that only scale under transformation A, with scaling factor λ (eigenvalue).
•Eigenvalues are roots of the characteristic polynomial det(A - λI) = 0. An n×n matrix has exactly n eigenvalues (counting multiplicities).
•Eigenvectors span the null space of (A - λI). They're only unique up to scalar multiplication; we typically normalize to unit length.
•Sum of eigenvalues equals trace; product equals determinant. These invariants connect eigenvalues to matrix properties.
•Symmetric matrices have real eigenvalues and orthogonal eigenvectors—the Spectral Theorem. This is why PCA works reliably.
•Eigenvectors for distinct eigenvalues are linearly independent. This guarantees diagonalizability when all eigenvalues differ.
•Eigenanalysis underpins PCA, spectral clustering, PageRank, and stability analysis—core ML algorithms.

What's Next:

Page Complete