Machine LearningMatrix Decompositions

Matrix Decompositions

LevelIntermediate

Duration120 mins

TopicMatrix Decompositions

3 / 5

QR Decomposition

The Computational Foundation of Numerical Linear Algebra

While SVD and eigendecomposition often receive more attention for their interpretability, QR decomposition is the unsung hero of computational linear algebra. It's the foundation upon which many critical algorithms are built—from solving least squares problems to computing eigenvalues to maintaining numerical stability in iterative methods.

QR decomposition factors any matrix into an orthogonal matrix Q and an upper triangular matrix R. This seemingly simple factorization has profound implications: it orthonormalizes vectors (Gram-Schmidt on steroids), provides numerically stable least squares solutions, enables efficient eigenvalue computation (QR algorithm), and serves as a building block for more complex decompositions.

If SVD is the Swiss Army knife of matrix factorizations, QR decomposition is the precision screwdriver—less versatile but absolutely essential for the tasks where it excels.

What You Will Learn

By the end of this page, you will understand: (1) The definition and existence of QR decomposition, (2) Three algorithms for computing QR: Gram-Schmidt, Householder reflections, and Givens rotations, (3) Why QR is preferred over normal equations for least squares, (4) The QR algorithm for eigenvalue computation, and (5) Practical applications in ML including orthogonalization and numerical stability.

QR Decomposition: Definition and Existence

The QR Decomposition:

Any matrix A ∈ ℝᵐˣⁿ (with m ≥ n) can be factored as:

$$A = QR$$

where:

Q ∈ ℝᵐˣᵐ is an orthogonal matrix (QᵀQ = QQᵀ = I)
R ∈ ℝᵐˣⁿ is upper triangular (all entries below diagonal are zero)

Forms of QR decomposition:

Full QR:

Q is m × m (full orthogonal matrix)
R is m × n (upper triangular with zeros below)
The last (m-n) rows of R are all zeros

Reduced (Thin) QR:

Q is m × n (only the 'useful' columns)
R is n × n (square upper triangular)
Q still has orthonormal columns: QᵀQ = Iₙ (but QQᵀ ≠ Iₘ)

The reduced form is more storage-efficient and sufficient for most applications.

Existence and uniqueness:

QR decomposition always exists for any m × n matrix with m ≥ n.

If A has full column rank (rank = n), then:

R has positive diagonal entries (with the convention sign choice)
The decomposition A = QR is unique with this diagonal sign convention

If A is rank-deficient, R will have zero diagonal entries where rank is lost.

Why 'QR'?

The names Q and R are conventional: Q comes from 'orthogonal' (German: orthogonal = QO, using Q), and R from 'right triangular' (upper triangular matrices are also called right triangular). Some sources use A = QU where U stands for upper triangular.

Geometric interpretation:

QR decomposition performs orthogonalization of the columns of A:

The columns of Q form an orthonormal basis for the column space of A
R encodes the coefficients: column j of A = linear combination of first j columns of Q
The upper triangular structure of R reflects the sequential nature of orthogonalization

Connection to Gram-Schmidt:

QR decomposition is essentially the Gram-Schmidt process organized in matrix form:

Gram-Schmidt takes vectors a₁, a₂, ..., aₙ and produces orthonormal q₁, q₂, ..., qₙ
The coefficients relating aⱼ to q₁, ..., qⱼ form column j of R

qr_decomposition_basic.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
 
# Create a sample matrix
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 10],
    [10, 11, 12]
], dtype=float)
 
print("Matrix A:")
print(A)
print(f"Shape: {A.shape}")
 
# Full QR decomposition
Q_full, R_full = np.linalg.qr(A, mode='complete')
print(f"
=== Full QR ===")
print(f"Q shape: {Q_full.shape}")  # (4, 4)
print(f"R shape: {R_full.shape}")  # (4, 3)
 
# Reduced (thin) QR decomposition
Q_reduced, R_reduced = np.linalg.qr(A, mode='reduced')
print(f"
=== Reduced QR ===")
print(f"Q shape: {Q_reduced.shape}")  # (4, 3)
print(f"R shape: {R_reduced.shape}")  # (3, 3)
 
# Verify Q is orthogonal/orthonormal
print(f"
Q^T Q (should be identity):")
print(np.round(Q_reduced.T @ Q_reduced, 10))
 
# Verify R is upper triangular
print(f"
R (upper triangular):")
print(np.round(R_reduced, 6))
 
# Verify reconstruction: A = QR
A_reconstructed = Q_reduced @ R_reduced
print(f"
Reconstruction error ||A - QR||: {np.linalg.norm(A - A_reconstructed):.2e}")
 
# Column space interpretation
print(f"
=== Column Space Analysis ===")
for j in range(A.shape[1]):
    # Column j of A = first (j+1) columns of Q times column j of R
    a_j = A[:, j]
    reconstruction = Q_reduced[:, :j+1] @ R_reduced[:j+1, j]
    print(f"Column {j+1}: ||a_j - Σ r_{{ij}}q_i||: {np.linalg.norm(a_j - reconstruction):.2e}")

The Gram-Schmidt Process

The conceptually simplest way to compute QR decomposition is through Gram-Schmidt orthogonalization—a process that takes a set of vectors and produces an orthonormal set spanning the same space.

Classical Gram-Schmidt (CGS):

Given vectors a₁, a₂, ..., aₙ (columns of A), produce orthonormal q₁, q₂, ..., qₙ:

For j = 1 to n:
    vⱼ = aⱼ
    For i = 1 to j-1:
        rᵢⱼ = qᵢᵀ aⱼ          # Projection coefficient
        vⱼ = vⱼ - rᵢⱼ qᵢ      # Subtract projection
    rⱼⱼ = ||vⱼ||
    qⱼ = vⱼ / rⱼⱼ             # Normalize

Interpretation:

For each new vector aⱼ, subtract its projections onto all previous qᵢ
This leaves the component orthogonal to span{q₁, ..., qⱼ₋₁}
Normalize to get unit vector qⱼ

The problem with classical Gram-Schmidt:

CGS is numerically unstable. Due to floating-point errors, the computed qⱼ may have small components in the direction of previous qᵢ vectors. These errors accumulate, and for ill-conditioned matrices, the output vectors can be far from orthogonal.

Numerical Instability

Classical Gram-Schmidt can produce severely non-orthogonal output for ill-conditioned matrices. For a matrix with condition number κ, the output Q may have orthogonality errors proportional to κ × machine_epsilon. For κ = 10¹⁰ and double precision (ε ≈ 10⁻¹⁶), this can completely destroy orthogonality!

Modified Gram-Schmidt (MGS):

A simple reordering dramatically improves stability:

For j = 1 to n:
    vⱼ = aⱼ
    For i = 1 to j-1:
        rᵢⱼ = qᵢᵀ vⱼ          # Note: vⱼ, not aⱼ
        vⱼ = vⱼ - rᵢⱼ qᵢ      # Update vⱼ immediately
    rⱼⱼ = ||vⱼ||
    qⱼ = vⱼ / rⱼⱼ

The key difference: In CGS, we compute all projections based on the original aⱼ. In MGS, after each subtraction, we recompute the projection against the updated vector.

This seemingly minor change makes MGS as stable as Householder QR for computing R, though Q may still have some orthogonality loss.

Complexity:

Time: O(2mn²) floating-point operations
Space: O(mn) for storing Q (can overwrite A)

gram_schmidt.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
 
def classical_gram_schmidt(A):
    """
    Classical Gram-Schmidt orthogonalization.
    Numerically unstable but conceptually simple.
    """
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))
    
    for j in range(n):
        v = A[:, j].copy()
        
        # Subtract projections onto all previous q vectors
        for i in range(j):
            R[i, j] = Q[:, i] @ A[:, j]  # Project onto original a_j
            v = v - R[i, j] * Q[:, i]
        
        R[j, j] = np.linalg.norm(v)
        if R[j, j] > 1e-10:
            Q[:, j] = v / R[j, j]
        else:
            Q[:, j] = 0  # Handle rank deficiency
    
    return Q, R
 
def modified_gram_schmidt(A):
    """
    Modified Gram-Schmidt orthogonalization.
    Much more numerically stable than classical version.
    """
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))
    V = A.copy().astype(float)
    
    for j in range(n):
        # Subtract projections, updating v_j after each
        for i in range(j):
            R[i, j] = Q[:, i] @ V[:, j]  # Project onto current v_j
            V[:, j] = V[:, j] - R[i, j] * Q[:, i]  # Update v_j immediately
        
        R[j, j] = np.linalg.norm(V[:, j])
        if R[j, j] > 1e-10:
            Q[:, j] = V[:, j] / R[j, j]
        else:
            Q[:, j] = 0
    
    return Q, R
 
def test_orthogonality(Q, name):
    """Measure departure from orthogonality."""
    QtQ = Q.T @ Q
    I = np.eye(Q.shape[1])
    error = np.linalg.norm(QtQ - I, 'fro')
    print(f"{name}: ||Q^T Q - I||_F = {error:.2e}")
    return error
 
# Create an ill-conditioned matrix (challenging case)
np.random.seed(42)
n = 20
# Construct matrix with specific condition number
U, _ = np.linalg.qr(np.random.randn(n, n))
V, _ = np.linalg.qr(np.random.randn(n, n))
singular_values = np.logspace(0, -10, n)  # Condition number = 10^10
A_illcond = U @ np.diag(singular_values) @ V.T
 
print(f"Matrix condition number: {np.linalg.cond(A_illcond):.2e}
")
 
# Compare methods
Q_cgs, R_cgs = classical_gram_schmidt(A_illcond)
Q_mgs, R_mgs = modified_gram_schmidt(A_illcond)
Q_np, R_np = np.linalg.qr(A_illcond)
 
print("Orthogonality comparison:")
test_orthogonality(Q_cgs, "Classical Gram-Schmidt")
test_orthogonality(Q_mgs, "Modified Gram-Schmidt")
test_orthogonality(Q_np, "NumPy QR (Householder)")
 
# Reconstruction accuracy
print("
Reconstruction accuracy ||A - QR||:")
print(f"Classical GS: {np.linalg.norm(A_illcond - Q_cgs @ R_cgs):.2e}")
print(f"Modified GS:  {np.linalg.norm(A_illcond - Q_mgs @ R_mgs):.2e}")
print(f"NumPy QR:     {np.linalg.norm(A_illcond - Q_np @ R_np):.2e}")

Householder Reflections: The Industry Standard

For production-quality numerical software, Householder reflections are the preferred method for QR decomposition. They provide excellent numerical stability and are the default in libraries like NumPy, LAPACK, and MATLAB.

The Householder transformation:

A Householder matrix (or Householder reflector) is an orthogonal matrix of the form: $$H = I - 2\frac{vv^T}{v^Tv}$$

where v ∈ ℝⁿ is a non-zero vector.

Properties:

H is symmetric: Hᵀ = H
H is orthogonal: HᵀH = HHᵀ = I
H is an involution: H² = I
H reflects vectors across the hyperplane orthogonal to v

The key insight:

Given any vector x, we can find v such that Hx = αe₁ (a vector pointing along the first axis): $$v = x - \alpha e_1 \quad \text{where} \quad \alpha = \pm|x|$$

The sign is chosen to avoid numerical cancellation (typically opposite to sign of x₁).

QR via Householder:

To compute QR, we zero out below-diagonal entries column by column:

Find H₁ to zero entries below diagonal in column 1
Find H₂ to zero entries below diagonal in column 2 (applied to submatrix)
Continue until R is upper triangular

The product Q = H₁H₂...Hₙ is orthogonal, and Hₙ...H₂H₁A = R.

Why Householder Wins

Householder reflections are 'backward stable': the computed QR satisfies (Q+ΔQ)(R+ΔR) = A+ΔA where ΔA is tiny (machine precision × ||A||). This is much stronger than just accurate output—it guarantees the computed factors are the exact QR of a matrix very close to A.

Efficient implementation:

Naively, applying an m×m Householder matrix costs O(m²). But H = I - 2vvᵀ/(vᵀv) is never formed explicitly!

Instead: Hx = x - (2vᵀx/(vᵀv))v — just O(m) operations.

Column-wise application:

For matrix X:  HX = X - v(2vᵀX/(vᵀv))

This is O(mn) for an m×n matrix—much better than O(m²n) for explicit multiplication.

Storing Q implicitly:

In practice, Q is often stored implicitly as the sequence of Householder vectors v₁, v₂, ..., vₙ. When needed, Q can be applied to a vector efficiently without ever forming the full m×m matrix.

Complexity:

Time: 2mn² - (2/3)n³ flops (about twice as expensive as LU)
Space: Can work in-place, storing Householder vectors in subdiagonal of A

householder_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
 
def householder_reflection(x):
    """
    Compute Householder vector v such that Hx = ||x||e_1.
    H = I - 2vv^T / (v^T v)
    
    Returns:
        v: Householder vector (normalized)
        tau: scalar 2/(v^T v) used in H = I - tau*v*v^T
    """
    n = len(x)
    sigma = np.sign(x[0]) if x[0] != 0 else 1
    norm_x = np.linalg.norm(x)
    
    if norm_x == 0:
        return np.zeros(n), 0
    
    # v = x + sign(x_1)||x||e_1
    v = x.copy()
    v[0] = x[0] + sigma * norm_x
    
    # tau = 2 / (v^T v) = 2 / ||v||^2
    tau = 2.0 / (v @ v)
    
    return v, tau
 
def householder_qr(A):
    """
    Compute QR decomposition using Householder reflections.
    
    Returns:
        Q: m x m orthogonal matrix (explicit form)
        R: m x n upper triangular matrix
    """
    m, n = A.shape
    R = A.copy().astype(float)
    Q = np.eye(m)
    
    for k in range(min(m-1, n)):
        # Extract subcolumn to reflect
        x = R[k:m, k]
        
        # Compute Householder reflection for this column
        v, tau = householder_reflection(x)
        
        if tau > 0:
            # Apply H to remaining columns of R: R[k:, k:] = H R[k:, k:]
            # H * X = X - tau * v * (v^T * X)
            R[k:m, k:n] -= tau * np.outer(v, v @ R[k:m, k:n])
            
            # Accumulate Q from the right: Q = Q * H
            Q[:, k:m] -= tau * np.outer(Q[:, k:m] @ v, v)
    
    return Q, R
 
def householder_qr_implicit(A):
    """
    Compute QR with implicit Q representation (more efficient).
    Returns Householder vectors for later Q application.
    """
    m, n = A.shape
    R = A.copy().astype(float)
    taus = []
    vs = []
    
    for k in range(min(m-1, n)):
        x = R[k:m, k]
        v, tau = householder_reflection(x)
        taus.append(tau)
        vs.append(v)
        
        if tau > 0:
            R[k:m, k:n] -= tau * np.outer(v, v @ R[k:m, k:n])
    
    return vs, taus, R
 
def apply_Q_implicit(vs, taus, x, transpose=False):
    """Apply Q (or Q^T) to vector x using implicit representation."""
    m = len(x)
    result = x.copy().astype(float)
    
    if transpose:  # Apply Q^T = H_n ... H_2 H_1
        for k, (v, tau) in enumerate(zip(vs, taus)):
            # H_k acts on indices k:m
            result[k:] -= tau * v * (v @ result[k:])
    else:  # Apply Q = H_1 H_2 ... H_n
        for k, (v, tau) in reversed(list(enumerate(zip(vs, taus)))):
            result[k:] -= tau * v * (v @ result[k:])
    
    return result
 
# Test implementation
np.random.seed(42)
A = np.random.randn(6, 4)
 
# Our Householder implementation
Q_ours, R_ours = householder_qr(A)
 
# NumPy implementation
Q_np, R_np = np.linalg.qr(A, mode='complete')
 
print("Householder QR Decomposition")
print("=" * 50)
print(f"Matrix A: {A.shape}")
print(f"
R (upper triangular):")
print(np.round(R_ours, 6))
print(f"
Orthogonality ||Q^T Q - I||: {np.linalg.norm(Q_ours.T @ Q_ours - np.eye(6)):.2e}")
print(f"Reconstruction ||A - QR||: {np.linalg.norm(A - Q_ours[:, :4] @ R_ours[:4, :]):.2e}")
 
# Compare with NumPy
print(f"
Difference from NumPy R: {np.linalg.norm(np.abs(R_ours[:4]) - np.abs(R_np)):.2e}")
print("(Sign differences are allowed due to reflection choice)")

Givens Rotations: Targeted Zero Introduction

While Householder reflections are efficient for dense matrices, Givens rotations offer advantages for sparse and structured matrices by zeroing one element at a time.

The Givens rotation:

A Givens rotation G(i, j, θ) is an n×n identity matrix with four modified entries:

$$G(i,j,\theta) = \begin{pmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0 \ \vdots & \ddots & \vdots & & \vdots & & \vdots \ 0 & \cdots & c & \cdots & s & \cdots & 0 \ \vdots & & \vdots & \ddots & \vdots & & \vdots \ 0 & \cdots & -s & \cdots & c & \cdots & 0 \ \vdots & & \vdots & & \vdots & \ddots & \vdots \ 0 & \cdots & 0 & \cdots & 0 & \cdots & 1 \end{pmatrix}$$

where c = cos(θ) and s = sin(θ) are placed at (i,i), (i,j), (j,i), (j,j).

Effect of Givens rotation:

Multiplying Gx rotates vector x in the (i,j) plane by angle θ. We can choose θ to zero out x_j: $$c = \frac{x_i}{\sqrt{x_i^2 + x_j^2}}, \quad s = \frac{x_j}{\sqrt{x_i^2 + x_j^2}}$$

QR via Givens rotations:

Zero below-diagonal elements one at a time, working from bottom to top within each column:

For each column k:
    For each row i from m down to k+1:
        Find G to zero A[i,k] using A[k,k] as pivot
        Apply G to rows k and i of A

When to Use Givens

Givens rotations excel when: (1) The matrix is sparse—Householder fills in zeros but Givens targets specific elements, (2) The matrix is nearly triangular—only a few rotations needed, (3) Parallel implementations—rotations on independent rows can be concurrent, (4) Updating QR—adding a row requires only a few Givens rotations vs full recomputation.

Complexity comparison:

Method	Dense m×n	Sparse	Best for
Householder	2mn² - 2n³/3	Fills in	Dense matrices
Givens	3mn² - n³	Preserves sparsity	Sparse, structured
Modified GS	2mn²	Moderate fill	When Q needed explicitly

Applications of Givens rotations:

Updating QR factorization: When a new row or column is added, QR can be updated in O(mn) rather than O(mn²) recomputation
Sparse least squares: For sparse A, Givens preserves sparsity in R
QR algorithm for eigenvalues: Givens rotations implement the 'shifts' in the implicit QR algorithm
Real-time control systems: Incremental updates as new measurements arrive

givens_rotations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
 
def givens_rotation(a, b):
    """
    Compute Givens rotation parameters (c, s) such that:
    [c  s] [a]   [r]
    [-s c] [b] = [0]
    
    Returns c = cos(θ), s = sin(θ), r = √(a² + b²)
    """
    if b == 0:
        c, s, r = 1, 0, a
    elif a == 0:
        c, s, r = 0, np.sign(b), abs(b)
    elif abs(a) > abs(b):
        t = b / a
        u = np.sign(a) * np.sqrt(1 + t**2)
        c = 1 / u
        s = t * c
        r = a * u
    else:
        t = a / b
        u = np.sign(b) * np.sqrt(1 + t**2)
        s = 1 / u
        c = t * s
        r = b * u
    
    return c, s, r
 
def apply_givens_left(A, i, k, c, s):
    """
    Apply Givens rotation to rows i and k of matrix A.
    Modifies A in place.
    """
    for j in range(A.shape[1]):
        temp = c * A[k, j] + s * A[i, j]
        A[i, j] = -s * A[k, j] + c * A[i, j]
        A[k, j] = temp
 
def givens_qr(A):
    """
    Compute QR decomposition using Givens rotations.
    
    Returns:
        Q: m x m orthogonal matrix
        R: m x n upper triangular matrix
    """
    m, n = A.shape
    R = A.copy().astype(float)
    Q = np.eye(m)
    
    for k in range(n):
        # Zero out entries below diagonal in column k
        for i in range(m-1, k, -1):  # From bottom to top
            if R[i, k] != 0:
                # Compute Givens rotation to zero R[i, k]
                c, s, r = givens_rotation(R[k, k], R[i, k])
                
                # Apply to R (rows k and i)
                apply_givens_left(R, i, k, c, s)
                
                # Accumulate in Q (columns k and i)
                # Q = Q @ G^T means operating on columns
                for row in range(m):
                    temp = c * Q[row, k] + s * Q[row, i]
                    Q[row, i] = -s * Q[row, k] + c * Q[row, i]
                    Q[row, k] = temp
    
    return Q, R
 
# Test Givens QR
np.random.seed(42)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 10],
    [11, 12, 13]
], dtype=float)
 
Q_givens, R_givens = givens_qr(A)
Q_np, R_np = np.linalg.qr(A, mode='complete')
 
print("Givens Rotation QR Decomposition")
print("=" * 50)
print("R (upper triangular):")
print(np.round(R_givens, 6))
print(f"
Orthogonality ||Q^T Q - I||: {np.linalg.norm(Q_givens.T @ Q_givens - np.eye(4)):.2e}")
print(f"Reconstruction ||A - Q[:,:3]R[:3,:]||: {np.linalg.norm(A - Q_givens[:, :3] @ R_givens[:3, :]):.2e}")
 
# Demonstrate sparse advantage
print("
=== Sparse Matrix Example ===")
# Nearly upper triangular (only a few entries to zero)
A_sparse = np.array([
    [3, 1, 4],
    [0, 2, 6],
    [0, 0.1, 3],  # Only this small entry needs zeroing
    [0, 0, 0.2]   # And this one
], dtype=float)
 
print("Nearly triangular matrix:")
print(A_sparse)
 
Q_sparse, R_sparse = givens_qr(A_sparse)
print("
R after Givens (minimal rotations needed):")
print(np.round(R_sparse, 6))

Solving Least Squares with QR Decomposition

One of the most important applications of QR decomposition is solving least squares problems—the foundation of linear regression and many optimization algorithms.

The least squares problem:

Given an overdetermined system Ax = b (m equations, n unknowns, m > n), find x that minimizes ||Ax - b||².

Two approaches:

1. Normal equations: $$A^T A x = A^T b$$ Solve by computing AᵀA, then using Cholesky or LU.

2. QR decomposition: Write A = QR (reduced form). Then: $$|Ax - b|^2 = |QRx - b|^2 = |Rx - Q^Tb|^2$$

(Since Q preserves norms: ||Qy||² = ||y||²)

Minimized by solving the triangular system: $$Rx = Q^T b$$

Why QR is better:

Normal Equations vs QR for Least Squares
Aspect	Normal Equations	QR Method
Condition number	κ(AᵀA) = κ(A)²	κ(R) = κ(A)
Precision loss	~2× digits lost	~1× digits lost
Rank-deficient A	AᵀA is singular	Detectable from R
Extra computation	Form AᵀA (expensive)	QR directly
Stability	Unstable for ill-conditioned A	Backward stable

Condition Number Squaring is Dangerous

For AᵀA, the condition number is squared: κ(AᵀA) = κ(A)². If κ(A) = 10⁸, then κ(AᵀA) = 10¹⁶, which exceeds double precision! All digits of accuracy are lost. QR avoids this by never forming AᵀA.

Step-by-step QR solution:

Compute QR: A = QR (reduced form: Q is m×n, R is n×n)
Transform RHS: c = Qᵀb (m×1 → n×1)
Back-substitution: Solve Rx = c (triangular system, O(n²))

Total cost: O(mn²) for QR + O(n²) for solve = O(mn²)

Compared to normal equations: O(mn²) for AᵀA + O(n³) for solve = O(mn² + n³)

For m >> n, both are dominated by O(mn²), but QR is more stable.

Residual computation:

The least squares residual r = b - Ax has norm: $$|r| = |b - Ax| = |(I - QQ^T)b|$$

This equals the norm of the 'leftover' part of b not in the column space of A (which equals span of Q).

least_squares_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
from scipy.linalg import solve_triangular
 
def least_squares_normal_equations(A, b):
    """Solve least squares via normal equations (unstable!)."""
    ATA = A.T @ A
    ATb = A.T @ b
    x = np.linalg.solve(ATA, ATb)
    return x
 
def least_squares_qr(A, b):
    """Solve least squares via QR decomposition (stable)."""
    Q, R = np.linalg.qr(A, mode='reduced')
    c = Q.T @ b
    x = solve_triangular(R, c)
    return x
 
def compare_methods(A, b, x_true):
    """Compare normal equations vs QR for least squares."""
    
    print(f"Condition number κ(A): {np.linalg.cond(A):.2e}")
    print(f"Condition number κ(A^T A): {np.linalg.cond(A.T @ A):.2e}
")
    
    # Method 1: Normal equations
    x_normal = least_squares_normal_equations(A, b)
    error_normal = np.linalg.norm(x_normal - x_true) / np.linalg.norm(x_true)
    
    # Method 2: QR
    x_qr = least_squares_qr(A, b)
    error_qr = np.linalg.norm(x_qr - x_true) / np.linalg.norm(x_true)
    
    # Method 3: NumPy lstsq (uses SVD)
    x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
    error_lstsq = np.linalg.norm(x_lstsq - x_true) / np.linalg.norm(x_true)
    
    print("Relative errors ||x_computed - x_true|| / ||x_true||:")
    print(f"  Normal equations: {error_normal:.2e}")
    print(f"  QR decomposition: {error_qr:.2e}")
    print(f"  NumPy lstsq (SVD): {error_lstsq:.2e}")
    
    return error_normal, error_qr, error_lstsq
 
# Test with well-conditioned matrix
np.random.seed(42)
m, n = 100, 10
A_good = np.random.randn(m, n)
x_true = np.random.randn(n)
b_good = A_good @ x_true + 0.01 * np.random.randn(m)  # Add noise
 
print("=== Well-conditioned matrix ===")
compare_methods(A_good, b_good, x_true)
 
# Test with ill-conditioned matrix
print("
=== Ill-conditioned matrix ===")
U, _ = np.linalg.qr(np.random.randn(m, n))
V, _ = np.linalg.qr(np.random.randn(n, n))
singular_values = np.logspace(0, -10, n)  # κ = 10^10
A_bad = U @ np.diag(singular_values) @ V.T
 
x_true = np.random.randn(n)
b_bad = A_bad @ x_true + 1e-12 * np.random.randn(m)  # Smaller noise
 
compare_methods(A_bad, b_bad, x_true)
 
print("
Note: Normal equations fail catastrophically for ill-conditioned A!")
print("QR maintains accuracy where normal equations lose all precision.")

The QR Algorithm for Computing Eigenvalues

One of the most elegant applications of QR decomposition is the QR algorithm for computing matrix eigenvalues—considered one of the top 10 algorithms of the 20th century.

Basic QR iteration:

A₀ = A
For k = 0, 1, 2, ...
    Qₖ Rₖ = Aₖ       (QR factorization)
    Aₖ₊₁ = Rₖ Qₖ     (Reverse multiply)

The magic: Under mild conditions, Aₖ converges to an upper triangular matrix (Schur form), with eigenvalues on the diagonal!

Why it works (intuition):

Aₖ₊₁ = RₖQₖ = Qₖᵀ(QₖRₖ)Qₖ = QₖᵀAₖQₖ
So Aₖ₊₁ is similar to Aₖ: same eigenvalues, different basis
The iteration 'pushes' eigenvector structure toward triangular form
For symmetric matrices, converges to diagonal (eigenvalue matrix)

Convergence rate:

The subdiagonal entries decay proportionally to |λᵢ₊₁/λᵢ|ᵏ. If eigenvalues are well-separated, convergence is fast. For clustered eigenvalues, convergence is slow.

Shifted QR (practical version):

To accelerate convergence, use shifts:

Qₖ Rₖ = Aₖ - μₖI     (Shifted QR)
Aₖ₊₁ = Rₖ Qₖ + μₖI   (Shift back)

The shift μₖ is chosen close to an eigenvalue (often using Wilkinson's shift). This dramatically speeds convergence—typically O(n) total iterations instead of O(n²).

Modern Implementation

Production eigenvalue codes first reduce A to Hessenberg form (tridiagonal for symmetric), then apply implicit shifted QR without forming Q and R explicitly. This reduces each iteration from O(n³) to O(n²), making the full algorithm O(n³) overall.

Connection to power iteration:

The QR algorithm can be viewed as simultaneous power iteration for all eigenvectors:

Q₁Q₂...Qₖ ≈ eigenvector matrix (converging to Q of Schur decomposition)
The iteration amplifies components along dominant eigenvectors
The 'reverse multiply' RQ trick maintains orthogonality

For symmetric matrices:

The QR algorithm simplifies considerably:

Input is real symmetric, output is diagonal (eigenvalue matrix)
All operations preserve symmetry
Tridiagonal reduction makes each QR step O(n)
Extremely efficient and stable

This is why symmetric eigenproblems (like PCA) are 'easy' computationally.

qr_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
import numpy as np
import matplotlib.pyplot as plt
 
def basic_qr_iteration(A, max_iters=100, tol=1e-10):
    """
    Basic QR algorithm for eigenvalue computation.
    
    Returns sequence of diagonal elements (converging to eigenvalues).
    """
    n = A.shape[0]
    Ak = A.copy().astype(float)
    history = []
    
    for k in range(max_iters):
        # Store current diagonal (approximate eigenvalues)
        history.append(np.diag(Ak).copy())
        
        # QR factorization
        Q, R = np.linalg.qr(Ak)
        
        # Reverse multiply
        Ak = R @ Q
        
        # Check convergence (sub-diagonal entries → 0)
        off_diag_norm = np.linalg.norm(np.tril(Ak, -1))
        if off_diag_norm < tol:
            print(f"Converged in {k+1} iterations")
            break
    else:
        print(f"Did not converge in {max_iters} iterations")
    
    return np.diag(Ak), np.array(history), Ak
 
def shifted_qr_iteration(A, max_iters=100, tol=1e-10):
    """
    Shifted QR algorithm with Rayleigh quotient shift.
    Much faster convergence than basic QR.
    """
    n = A.shape[0]
    Ak = A.copy().astype(float)
    history = []
    
    for k in range(max_iters):
        history.append(np.diag(Ak).copy())
        
        # Wilkinson shift: use last diagonal element as shift
        # (More sophisticated shifts exist but this works well)
        mu = Ak[-1, -1]
        
        # Shifted QR
        Q, R = np.linalg.qr(Ak - mu * np.eye(n))
        
        # Reverse multiply and shift back
        Ak = R @ Q + mu * np.eye(n)
        
        off_diag_norm = np.linalg.norm(np.tril(Ak, -1))
        if off_diag_norm < tol:
            print(f"Shifted QR converged in {k+1} iterations")
            break
    else:
        print(f"Shifted QR did not converge in {max_iters} iterations")
    
    return np.diag(Ak), np.array(history), Ak
 
# Test on a symmetric matrix
np.random.seed(42)
n = 5
A_sym = np.random.randn(n, n)
A_sym = (A_sym + A_sym.T) / 2  # Make symmetric
 
true_eigenvalues = np.sort(np.linalg.eigvalsh(A_sym))[::-1]
print(f"True eigenvalues: {true_eigenvalues.round(4)}")
 
print("
=== Basic QR Iteration ===")
eigs_basic, hist_basic, _ = basic_qr_iteration(A_sym)
eigs_basic_sorted = np.sort(eigs_basic)[::-1]
print(f"Computed eigenvalues: {eigs_basic_sorted.round(4)}")
 
print("
=== Shifted QR Iteration ===")
eigs_shifted, hist_shifted, _ = shifted_qr_iteration(A_sym)
eigs_shifted_sorted = np.sort(eigs_shifted)[::-1]
print(f"Computed eigenvalues: {eigs_shifted_sorted.round(4)}")
 
# Visualize convergence
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
for i in range(n):
    axes[0].plot(hist_basic[:, i], label=f'Diag[{i}]')
axes[0].axhline(y=true_eigenvalues[0], color='k', linestyle='--', alpha=0.3)
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('Diagonal values')
axes[0].set_title('Basic QR Convergence')
axes[0].legend()
 
for i in range(min(len(hist_shifted), n)):
    if i < hist_shifted.shape[1]:
        axes[1].plot(hist_shifted[:, i], label=f'Diag[{i}]')
axes[1].set_xlabel('Iteration')
axes[1].set_ylabel('Diagonal values')
axes[1].set_title('Shifted QR Convergence (Much Faster!)')
axes[1].legend()
 
plt.tight_layout()
plt.show()

Summary: Mastering QR Decomposition

We've explored QR decomposition in depth—from basic definitions through multiple algorithms to critical applications. Let's consolidate the key insights.

Key Takeaways

•QR exists for all matrices: A = QR with orthogonal Q and upper triangular R; unique for full-rank A with positive R diagonal.
•Three algorithms: Gram-Schmidt (intuitive but unstable), Householder (stable, preferred), Givens (sparse-friendly).
•Modified Gram-Schmidt dramatically improves stability over classical version with a simple code reordering.
•Householder reflections are backward stable and used in all production linear algebra libraries.
•QR beats normal equations for least squares: avoids squaring the condition number, preserving numerical precision.
•The QR algorithm computes eigenvalues through elegant matrix iteration—one of the top algorithms of the 20th century.
•Shifted QR accelerates eigenvalue convergence from O(n²) to O(n) iterations.

What's next:

We'll explore Cholesky decomposition—a specialized factorization for symmetric positive definite matrices that is twice as fast as LU and essential for covariance matrices, kernel methods, and Gaussian processes in machine learning.

Page Complete

You now understand QR decomposition comprehensively—its definition, three computational approaches, superiority for least squares, and role in eigenvalue computation. QR is the invisible workhorse behind much of numerical linear algebra.

3 / 5

Loading learning content...

Machine LearningMatrix Decompositions

Matrix Decompositions

LevelIntermediate

Duration120 mins

TopicMatrix Decompositions

3 / 5

QR Decomposition

The Computational Foundation of Numerical Linear Algebra

If SVD is the Swiss Army knife of matrix factorizations, QR decomposition is the precision screwdriver—less versatile but absolutely essential for the tasks where it excels.

What You Will Learn

QR Decomposition: Definition and Existence

The QR Decomposition:

Any matrix A ∈ ℝᵐˣⁿ (with m ≥ n) can be factored as:

$$A = QR$$

where:

Q ∈ ℝᵐˣᵐ is an orthogonal matrix (QᵀQ = QQᵀ = I)
R ∈ ℝᵐˣⁿ is upper triangular (all entries below diagonal are zero)

Forms of QR decomposition:

Full QR:

Q is m × m (full orthogonal matrix)
R is m × n (upper triangular with zeros below)
The last (m-n) rows of R are all zeros

Reduced (Thin) QR:

Q is m × n (only the 'useful' columns)
R is n × n (square upper triangular)
Q still has orthonormal columns: QᵀQ = Iₙ (but QQᵀ ≠ Iₘ)

The reduced form is more storage-efficient and sufficient for most applications.

Existence and uniqueness:

QR decomposition always exists for any m × n matrix with m ≥ n.

If A has full column rank (rank = n), then:

R has positive diagonal entries (with the convention sign choice)
The decomposition A = QR is unique with this diagonal sign convention

If A is rank-deficient, R will have zero diagonal entries where rank is lost.

Why 'QR'?

Geometric interpretation:

QR decomposition performs orthogonalization of the columns of A:

The columns of Q form an orthonormal basis for the column space of A
R encodes the coefficients: column j of A = linear combination of first j columns of Q
The upper triangular structure of R reflects the sequential nature of orthogonalization

Connection to Gram-Schmidt:

QR decomposition is essentially the Gram-Schmidt process organized in matrix form:

Gram-Schmidt takes vectors a₁, a₂, ..., aₙ and produces orthonormal q₁, q₂, ..., qₙ
The coefficients relating aⱼ to q₁, ..., qⱼ form column j of R

qr_decomposition_basic.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import numpy as np
 
# Create a sample matrix
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 10],
    [10, 11, 12]
], dtype=float)
 
print("Matrix A:")
print(A)
print(f"Shape: {A.shape}")
 
# Full QR decomposition
Q_full, R_full = np.linalg.qr(A, mode='complete')
print(f"
=== Full QR ===")
print(f"Q shape: {Q_full.shape}")  # (4, 4)
print(f"R shape: {R_full.shape}")  # (4, 3)
 
# Reduced (thin) QR decomposition
Q_reduced, R_reduced = np.linalg.qr(A, mode='reduced')
print(f"
=== Reduced QR ===")
print(f"Q shape: {Q_reduced.shape}")  # (4, 3)
print(f"R shape: {R_reduced.shape}")  # (3, 3)
 
# Verify Q is orthogonal/orthonormal
print(f"
Q^T Q (should be identity):")
print(np.round(Q_reduced.T @ Q_reduced, 10))
 
# Verify R is upper triangular
print(f"
R (upper triangular):")
print(np.round(R_reduced, 6))
 
# Verify reconstruction: A = QR
A_reconstructed = Q_reduced @ R_reduced
print(f"
Reconstruction error ||A - QR||: {np.linalg.norm(A - A_reconstructed):.2e}")
 
# Column space interpretation
print(f"
=== Column Space Analysis ===")
for j in range(A.shape[1]):
    # Column j of A = first (j+1) columns of Q times column j of R
    a_j = A[:, j]
    reconstruction = Q_reduced[:, :j+1] @ R_reduced[:j+1, j]
    print(f"Column {j+1}: ||a_j - Σ r_{{ij}}q_i||: {np.linalg.norm(a_j - reconstruction):.2e}")

The Gram-Schmidt Process

Classical Gram-Schmidt (CGS):

Given vectors a₁, a₂, ..., aₙ (columns of A), produce orthonormal q₁, q₂, ..., qₙ:

For j = 1 to n:
    vⱼ = aⱼ
    For i = 1 to j-1:
        rᵢⱼ = qᵢᵀ aⱼ          # Projection coefficient
        vⱼ = vⱼ - rᵢⱼ qᵢ      # Subtract projection
    rⱼⱼ = ||vⱼ||
    qⱼ = vⱼ / rⱼⱼ             # Normalize

Interpretation:

For each new vector aⱼ, subtract its projections onto all previous qᵢ
This leaves the component orthogonal to span{q₁, ..., qⱼ₋₁}
Normalize to get unit vector qⱼ

The problem with classical Gram-Schmidt:

Numerical Instability

Modified Gram-Schmidt (MGS):

A simple reordering dramatically improves stability:

For j = 1 to n:
    vⱼ = aⱼ
    For i = 1 to j-1:
        rᵢⱼ = qᵢᵀ vⱼ          # Note: vⱼ, not aⱼ
        vⱼ = vⱼ - rᵢⱼ qᵢ      # Update vⱼ immediately
    rⱼⱼ = ||vⱼ||
    qⱼ = vⱼ / rⱼⱼ

The key difference: In CGS, we compute all projections based on the original aⱼ. In MGS, after each subtraction, we recompute the projection against the updated vector.

This seemingly minor change makes MGS as stable as Householder QR for computing R, though Q may still have some orthogonality loss.

Complexity:

Time: O(2mn²) floating-point operations
Space: O(mn) for storing Q (can overwrite A)

gram_schmidt.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import numpy as np
 
def classical_gram_schmidt(A):
    """
    Classical Gram-Schmidt orthogonalization.
    Numerically unstable but conceptually simple.
    """
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))
    
    for j in range(n):
        v = A[:, j].copy()
        
        # Subtract projections onto all previous q vectors
        for i in range(j):
            R[i, j] = Q[:, i] @ A[:, j]  # Project onto original a_j
            v = v - R[i, j] * Q[:, i]
        
        R[j, j] = np.linalg.norm(v)
        if R[j, j] > 1e-10:
            Q[:, j] = v / R[j, j]
        else:
            Q[:, j] = 0  # Handle rank deficiency
    
    return Q, R
 
def modified_gram_schmidt(A):
    """
    Modified Gram-Schmidt orthogonalization.
    Much more numerically stable than classical version.
    """
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))
    V = A.copy().astype(float)
    
    for j in range(n):
        # Subtract projections, updating v_j after each
        for i in range(j):
            R[i, j] = Q[:, i] @ V[:, j]  # Project onto current v_j
            V[:, j] = V[:, j] - R[i, j] * Q[:, i]  # Update v_j immediately
        
        R[j, j] = np.linalg.norm(V[:, j])
        if R[j, j] > 1e-10:
            Q[:, j] = V[:, j] / R[j, j]
        else:
            Q[:, j] = 0
    
    return Q, R
 
def test_orthogonality(Q, name):
    """Measure departure from orthogonality."""
    QtQ = Q.T @ Q
    I = np.eye(Q.shape[1])
    error = np.linalg.norm(QtQ - I, 'fro')
    print(f"{name}: ||Q^T Q - I||_F = {error:.2e}")
    return error
 
# Create an ill-conditioned matrix (challenging case)
np.random.seed(42)
n = 20
# Construct matrix with specific condition number
U, _ = np.linalg.qr(np.random.randn(n, n))
V, _ = np.linalg.qr(np.random.randn(n, n))
singular_values = np.logspace(0, -10, n)  # Condition number = 10^10
A_illcond = U @ np.diag(singular_values) @ V.T
 
print(f"Matrix condition number: {np.linalg.cond(A_illcond):.2e}
")
 
# Compare methods
Q_cgs, R_cgs = classical_gram_schmidt(A_illcond)
Q_mgs, R_mgs = modified_gram_schmidt(A_illcond)
Q_np, R_np = np.linalg.qr(A_illcond)
 
print("Orthogonality comparison:")
test_orthogonality(Q_cgs, "Classical Gram-Schmidt")
test_orthogonality(Q_mgs, "Modified Gram-Schmidt")
test_orthogonality(Q_np, "NumPy QR (Householder)")
 
# Reconstruction accuracy
print("
Reconstruction accuracy ||A - QR||:")
print(f"Classical GS: {np.linalg.norm(A_illcond - Q_cgs @ R_cgs):.2e}")
print(f"Modified GS:  {np.linalg.norm(A_illcond - Q_mgs @ R_mgs):.2e}")
print(f"NumPy QR:     {np.linalg.norm(A_illcond - Q_np @ R_np):.2e}")

Householder Reflections: The Industry Standard

The Householder transformation:

A Householder matrix (or Householder reflector) is an orthogonal matrix of the form: $$H = I - 2\frac{vv^T}{v^Tv}$$

where v ∈ ℝⁿ is a non-zero vector.

Properties:

H is symmetric: Hᵀ = H
H is orthogonal: HᵀH = HHᵀ = I
H is an involution: H² = I
H reflects vectors across the hyperplane orthogonal to v

The key insight:

Given any vector x, we can find v such that Hx = αe₁ (a vector pointing along the first axis): $$v = x - \alpha e_1 \quad \text{where} \quad \alpha = \pm|x|$$

The sign is chosen to avoid numerical cancellation (typically opposite to sign of x₁).

QR via Householder:

To compute QR, we zero out below-diagonal entries column by column:

Find H₁ to zero entries below diagonal in column 1
Find H₂ to zero entries below diagonal in column 2 (applied to submatrix)
Continue until R is upper triangular

The product Q = H₁H₂...Hₙ is orthogonal, and Hₙ...H₂H₁A = R.

Why Householder Wins

Efficient implementation:

Naively, applying an m×m Householder matrix costs O(m²). But H = I - 2vvᵀ/(vᵀv) is never formed explicitly!

Instead: Hx = x - (2vᵀx/(vᵀv))v — just O(m) operations.

Column-wise application:

For matrix X:  HX = X - v(2vᵀX/(vᵀv))

This is O(mn) for an m×n matrix—much better than O(m²n) for explicit multiplication.

Storing Q implicitly:

Complexity:

Time: 2mn² - (2/3)n³ flops (about twice as expensive as LU)
Space: Can work in-place, storing Householder vectors in subdiagonal of A

householder_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import numpy as np
 
def householder_reflection(x):
    """
    Compute Householder vector v such that Hx = ||x||e_1.
    H = I - 2vv^T / (v^T v)
    
    Returns:
        v: Householder vector (normalized)
        tau: scalar 2/(v^T v) used in H = I - tau*v*v^T
    """
    n = len(x)
    sigma = np.sign(x[0]) if x[0] != 0 else 1
    norm_x = np.linalg.norm(x)
    
    if norm_x == 0:
        return np.zeros(n), 0
    
    # v = x + sign(x_1)||x||e_1
    v = x.copy()
    v[0] = x[0] + sigma * norm_x
    
    # tau = 2 / (v^T v) = 2 / ||v||^2
    tau = 2.0 / (v @ v)
    
    return v, tau
 
def householder_qr(A):
    """
    Compute QR decomposition using Householder reflections.
    
    Returns:
        Q: m x m orthogonal matrix (explicit form)
        R: m x n upper triangular matrix
    """
    m, n = A.shape
    R = A.copy().astype(float)
    Q = np.eye(m)
    
    for k in range(min(m-1, n)):
        # Extract subcolumn to reflect
        x = R[k:m, k]
        
        # Compute Householder reflection for this column
        v, tau = householder_reflection(x)
        
        if tau > 0:
            # Apply H to remaining columns of R: R[k:, k:] = H R[k:, k:]
            # H * X = X - tau * v * (v^T * X)
            R[k:m, k:n] -= tau * np.outer(v, v @ R[k:m, k:n])
            
            # Accumulate Q from the right: Q = Q * H
            Q[:, k:m] -= tau * np.outer(Q[:, k:m] @ v, v)
    
    return Q, R
 
def householder_qr_implicit(A):
    """
    Compute QR with implicit Q representation (more efficient).
    Returns Householder vectors for later Q application.
    """
    m, n = A.shape
    R = A.copy().astype(float)
    taus = []
    vs = []
    
    for k in range(min(m-1, n)):
        x = R[k:m, k]
        v, tau = householder_reflection(x)
        taus.append(tau)
        vs.append(v)
        
        if tau > 0:
            R[k:m, k:n] -= tau * np.outer(v, v @ R[k:m, k:n])
    
    return vs, taus, R
 
def apply_Q_implicit(vs, taus, x, transpose=False):
    """Apply Q (or Q^T) to vector x using implicit representation."""
    m = len(x)
    result = x.copy().astype(float)
    
    if transpose:  # Apply Q^T = H_n ... H_2 H_1
        for k, (v, tau) in enumerate(zip(vs, taus)):
            # H_k acts on indices k:m
            result[k:] -= tau * v * (v @ result[k:])
    else:  # Apply Q = H_1 H_2 ... H_n
        for k, (v, tau) in reversed(list(enumerate(zip(vs, taus)))):
            result[k:] -= tau * v * (v @ result[k:])
    
    return result
 
# Test implementation
np.random.seed(42)
A = np.random.randn(6, 4)
 
# Our Householder implementation
Q_ours, R_ours = householder_qr(A)
 
# NumPy implementation
Q_np, R_np = np.linalg.qr(A, mode='complete')
 
print("Householder QR Decomposition")
print("=" * 50)
print(f"Matrix A: {A.shape}")
print(f"
R (upper triangular):")
print(np.round(R_ours, 6))
print(f"
Orthogonality ||Q^T Q - I||: {np.linalg.norm(Q_ours.T @ Q_ours - np.eye(6)):.2e}")
print(f"Reconstruction ||A - QR||: {np.linalg.norm(A - Q_ours[:, :4] @ R_ours[:4, :]):.2e}")
 
# Compare with NumPy
print(f"
Difference from NumPy R: {np.linalg.norm(np.abs(R_ours[:4]) - np.abs(R_np)):.2e}")
print("(Sign differences are allowed due to reflection choice)")

Givens Rotations: Targeted Zero Introduction

While Householder reflections are efficient for dense matrices, Givens rotations offer advantages for sparse and structured matrices by zeroing one element at a time.

The Givens rotation:

A Givens rotation G(i, j, θ) is an n×n identity matrix with four modified entries:

where c = cos(θ) and s = sin(θ) are placed at (i,i), (i,j), (j,i), (j,j).

Effect of Givens rotation:

Multiplying Gx rotates vector x in the (i,j) plane by angle θ. We can choose θ to zero out x_j: $$c = \frac{x_i}{\sqrt{x_i^2 + x_j^2}}, \quad s = \frac{x_j}{\sqrt{x_i^2 + x_j^2}}$$

QR via Givens rotations:

Zero below-diagonal elements one at a time, working from bottom to top within each column:

For each column k:
    For each row i from m down to k+1:
        Find G to zero A[i,k] using A[k,k] as pivot
        Apply G to rows k and i of A

When to Use Givens

Complexity comparison:

Method	Dense m×n	Sparse	Best for
Householder	2mn² - 2n³/3	Fills in	Dense matrices
Givens	3mn² - n³	Preserves sparsity	Sparse, structured
Modified GS	2mn²	Moderate fill	When Q needed explicitly

Applications of Givens rotations:

Updating QR factorization: When a new row or column is added, QR can be updated in O(mn) rather than O(mn²) recomputation
Sparse least squares: For sparse A, Givens preserves sparsity in R
QR algorithm for eigenvalues: Givens rotations implement the 'shifts' in the implicit QR algorithm
Real-time control systems: Incremental updates as new measurements arrive

givens_rotations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
 
def givens_rotation(a, b):
    """
    Compute Givens rotation parameters (c, s) such that:
    [c  s] [a]   [r]
    [-s c] [b] = [0]
    
    Returns c = cos(θ), s = sin(θ), r = √(a² + b²)
    """
    if b == 0:
        c, s, r = 1, 0, a
    elif a == 0:
        c, s, r = 0, np.sign(b), abs(b)
    elif abs(a) > abs(b):
        t = b / a
        u = np.sign(a) * np.sqrt(1 + t**2)
        c = 1 / u
        s = t * c
        r = a * u
    else:
        t = a / b
        u = np.sign(b) * np.sqrt(1 + t**2)
        s = 1 / u
        c = t * s
        r = b * u
    
    return c, s, r
 
def apply_givens_left(A, i, k, c, s):
    """
    Apply Givens rotation to rows i and k of matrix A.
    Modifies A in place.
    """
    for j in range(A.shape[1]):
        temp = c * A[k, j] + s * A[i, j]
        A[i, j] = -s * A[k, j] + c * A[i, j]
        A[k, j] = temp
 
def givens_qr(A):
    """
    Compute QR decomposition using Givens rotations.
    
    Returns:
        Q: m x m orthogonal matrix
        R: m x n upper triangular matrix
    """
    m, n = A.shape
    R = A.copy().astype(float)
    Q = np.eye(m)
    
    for k in range(n):
        # Zero out entries below diagonal in column k
        for i in range(m-1, k, -1):  # From bottom to top
            if R[i, k] != 0:
                # Compute Givens rotation to zero R[i, k]
                c, s, r = givens_rotation(R[k, k], R[i, k])
                
                # Apply to R (rows k and i)
                apply_givens_left(R, i, k, c, s)
                
                # Accumulate in Q (columns k and i)
                # Q = Q @ G^T means operating on columns
                for row in range(m):
                    temp = c * Q[row, k] + s * Q[row, i]
                    Q[row, i] = -s * Q[row, k] + c * Q[row, i]
                    Q[row, k] = temp
    
    return Q, R
 
# Test Givens QR
np.random.seed(42)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 10],
    [11, 12, 13]
], dtype=float)
 
Q_givens, R_givens = givens_qr(A)
Q_np, R_np = np.linalg.qr(A, mode='complete')
 
print("Givens Rotation QR Decomposition")
print("=" * 50)
print("R (upper triangular):")
print(np.round(R_givens, 6))
print(f"
Orthogonality ||Q^T Q - I||: {np.linalg.norm(Q_givens.T @ Q_givens - np.eye(4)):.2e}")
print(f"Reconstruction ||A - Q[:,:3]R[:3,:]||: {np.linalg.norm(A - Q_givens[:, :3] @ R_givens[:3, :]):.2e}")
 
# Demonstrate sparse advantage
print("
=== Sparse Matrix Example ===")
# Nearly upper triangular (only a few entries to zero)
A_sparse = np.array([
    [3, 1, 4],
    [0, 2, 6],
    [0, 0.1, 3],  # Only this small entry needs zeroing
    [0, 0, 0.2]   # And this one
], dtype=float)
 
print("Nearly triangular matrix:")
print(A_sparse)
 
Q_sparse, R_sparse = givens_qr(A_sparse)
print("
R after Givens (minimal rotations needed):")
print(np.round(R_sparse, 6))

Solving Least Squares with QR Decomposition

One of the most important applications of QR decomposition is solving least squares problems—the foundation of linear regression and many optimization algorithms.

The least squares problem:

Given an overdetermined system Ax = b (m equations, n unknowns, m > n), find x that minimizes ||Ax - b||².

Two approaches:

1. Normal equations: $$A^T A x = A^T b$$ Solve by computing AᵀA, then using Cholesky or LU.

2. QR decomposition: Write A = QR (reduced form). Then: $$|Ax - b|^2 = |QRx - b|^2 = |Rx - Q^Tb|^2$$

(Since Q preserves norms: ||Qy||² = ||y||²)

Minimized by solving the triangular system: $$Rx = Q^T b$$

Why QR is better:

Normal Equations vs QR for Least Squares
Aspect	Normal Equations	QR Method
Condition number	κ(AᵀA) = κ(A)²	κ(R) = κ(A)
Precision loss	~2× digits lost	~1× digits lost
Rank-deficient A	AᵀA is singular	Detectable from R
Extra computation	Form AᵀA (expensive)	QR directly
Stability	Unstable for ill-conditioned A	Backward stable

Condition Number Squaring is Dangerous

Step-by-step QR solution:

Compute QR: A = QR (reduced form: Q is m×n, R is n×n)
Transform RHS: c = Qᵀb (m×1 → n×1)
Back-substitution: Solve Rx = c (triangular system, O(n²))

Total cost: O(mn²) for QR + O(n²) for solve = O(mn²)

Compared to normal equations: O(mn²) for AᵀA + O(n³) for solve = O(mn² + n³)

For m >> n, both are dominated by O(mn²), but QR is more stable.

Residual computation:

The least squares residual r = b - Ax has norm: $$|r| = |b - Ax| = |(I - QQ^T)b|$$

This equals the norm of the 'leftover' part of b not in the column space of A (which equals span of Q).

least_squares_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
from scipy.linalg import solve_triangular
 
def least_squares_normal_equations(A, b):
    """Solve least squares via normal equations (unstable!)."""
    ATA = A.T @ A
    ATb = A.T @ b
    x = np.linalg.solve(ATA, ATb)
    return x
 
def least_squares_qr(A, b):
    """Solve least squares via QR decomposition (stable)."""
    Q, R = np.linalg.qr(A, mode='reduced')
    c = Q.T @ b
    x = solve_triangular(R, c)
    return x
 
def compare_methods(A, b, x_true):
    """Compare normal equations vs QR for least squares."""
    
    print(f"Condition number κ(A): {np.linalg.cond(A):.2e}")
    print(f"Condition number κ(A^T A): {np.linalg.cond(A.T @ A):.2e}
")
    
    # Method 1: Normal equations
    x_normal = least_squares_normal_equations(A, b)
    error_normal = np.linalg.norm(x_normal - x_true) / np.linalg.norm(x_true)
    
    # Method 2: QR
    x_qr = least_squares_qr(A, b)
    error_qr = np.linalg.norm(x_qr - x_true) / np.linalg.norm(x_true)
    
    # Method 3: NumPy lstsq (uses SVD)
    x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
    error_lstsq = np.linalg.norm(x_lstsq - x_true) / np.linalg.norm(x_true)
    
    print("Relative errors ||x_computed - x_true|| / ||x_true||:")
    print(f"  Normal equations: {error_normal:.2e}")
    print(f"  QR decomposition: {error_qr:.2e}")
    print(f"  NumPy lstsq (SVD): {error_lstsq:.2e}")
    
    return error_normal, error_qr, error_lstsq
 
# Test with well-conditioned matrix
np.random.seed(42)
m, n = 100, 10
A_good = np.random.randn(m, n)
x_true = np.random.randn(n)
b_good = A_good @ x_true + 0.01 * np.random.randn(m)  # Add noise
 
print("=== Well-conditioned matrix ===")
compare_methods(A_good, b_good, x_true)
 
# Test with ill-conditioned matrix
print("
=== Ill-conditioned matrix ===")
U, _ = np.linalg.qr(np.random.randn(m, n))
V, _ = np.linalg.qr(np.random.randn(n, n))
singular_values = np.logspace(0, -10, n)  # κ = 10^10
A_bad = U @ np.diag(singular_values) @ V.T
 
x_true = np.random.randn(n)
b_bad = A_bad @ x_true + 1e-12 * np.random.randn(m)  # Smaller noise
 
compare_methods(A_bad, b_bad, x_true)
 
print("
Note: Normal equations fail catastrophically for ill-conditioned A!")
print("QR maintains accuracy where normal equations lose all precision.")

The QR Algorithm for Computing Eigenvalues

One of the most elegant applications of QR decomposition is the QR algorithm for computing matrix eigenvalues—considered one of the top 10 algorithms of the 20th century.

Basic QR iteration:

A₀ = A
For k = 0, 1, 2, ...
    Qₖ Rₖ = Aₖ       (QR factorization)
    Aₖ₊₁ = Rₖ Qₖ     (Reverse multiply)

The magic: Under mild conditions, Aₖ converges to an upper triangular matrix (Schur form), with eigenvalues on the diagonal!

Why it works (intuition):

Aₖ₊₁ = RₖQₖ = Qₖᵀ(QₖRₖ)Qₖ = QₖᵀAₖQₖ
So Aₖ₊₁ is similar to Aₖ: same eigenvalues, different basis
The iteration 'pushes' eigenvector structure toward triangular form
For symmetric matrices, converges to diagonal (eigenvalue matrix)

Convergence rate:

The subdiagonal entries decay proportionally to |λᵢ₊₁/λᵢ|ᵏ. If eigenvalues are well-separated, convergence is fast. For clustered eigenvalues, convergence is slow.

Shifted QR (practical version):

To accelerate convergence, use shifts:

Qₖ Rₖ = Aₖ - μₖI     (Shifted QR)
Aₖ₊₁ = Rₖ Qₖ + μₖI   (Shift back)

The shift μₖ is chosen close to an eigenvalue (often using Wilkinson's shift). This dramatically speeds convergence—typically O(n) total iterations instead of O(n²).

Modern Implementation

Connection to power iteration:

The QR algorithm can be viewed as simultaneous power iteration for all eigenvectors:

Q₁Q₂...Qₖ ≈ eigenvector matrix (converging to Q of Schur decomposition)
The iteration amplifies components along dominant eigenvectors
The 'reverse multiply' RQ trick maintains orthogonality

For symmetric matrices:

The QR algorithm simplifies considerably:

Input is real symmetric, output is diagonal (eigenvalue matrix)
All operations preserve symmetry
Tridiagonal reduction makes each QR step O(n)
Extremely efficient and stable

This is why symmetric eigenproblems (like PCA) are 'easy' computationally.

qr_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
import numpy as np
import matplotlib.pyplot as plt
 
def basic_qr_iteration(A, max_iters=100, tol=1e-10):
    """
    Basic QR algorithm for eigenvalue computation.
    
    Returns sequence of diagonal elements (converging to eigenvalues).
    """
    n = A.shape[0]
    Ak = A.copy().astype(float)
    history = []
    
    for k in range(max_iters):
        # Store current diagonal (approximate eigenvalues)
        history.append(np.diag(Ak).copy())
        
        # QR factorization
        Q, R = np.linalg.qr(Ak)
        
        # Reverse multiply
        Ak = R @ Q
        
        # Check convergence (sub-diagonal entries → 0)
        off_diag_norm = np.linalg.norm(np.tril(Ak, -1))
        if off_diag_norm < tol:
            print(f"Converged in {k+1} iterations")
            break
    else:
        print(f"Did not converge in {max_iters} iterations")
    
    return np.diag(Ak), np.array(history), Ak
 
def shifted_qr_iteration(A, max_iters=100, tol=1e-10):
    """
    Shifted QR algorithm with Rayleigh quotient shift.
    Much faster convergence than basic QR.
    """
    n = A.shape[0]
    Ak = A.copy().astype(float)
    history = []
    
    for k in range(max_iters):
        history.append(np.diag(Ak).copy())
        
        # Wilkinson shift: use last diagonal element as shift
        # (More sophisticated shifts exist but this works well)
        mu = Ak[-1, -1]
        
        # Shifted QR
        Q, R = np.linalg.qr(Ak - mu * np.eye(n))
        
        # Reverse multiply and shift back
        Ak = R @ Q + mu * np.eye(n)
        
        off_diag_norm = np.linalg.norm(np.tril(Ak, -1))
        if off_diag_norm < tol:
            print(f"Shifted QR converged in {k+1} iterations")
            break
    else:
        print(f"Shifted QR did not converge in {max_iters} iterations")
    
    return np.diag(Ak), np.array(history), Ak
 
# Test on a symmetric matrix
np.random.seed(42)
n = 5
A_sym = np.random.randn(n, n)
A_sym = (A_sym + A_sym.T) / 2  # Make symmetric
 
true_eigenvalues = np.sort(np.linalg.eigvalsh(A_sym))[::-1]
print(f"True eigenvalues: {true_eigenvalues.round(4)}")
 
print("
=== Basic QR Iteration ===")
eigs_basic, hist_basic, _ = basic_qr_iteration(A_sym)
eigs_basic_sorted = np.sort(eigs_basic)[::-1]
print(f"Computed eigenvalues: {eigs_basic_sorted.round(4)}")
 
print("
=== Shifted QR Iteration ===")
eigs_shifted, hist_shifted, _ = shifted_qr_iteration(A_sym)
eigs_shifted_sorted = np.sort(eigs_shifted)[::-1]
print(f"Computed eigenvalues: {eigs_shifted_sorted.round(4)}")
 
# Visualize convergence
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
 
for i in range(n):
    axes[0].plot(hist_basic[:, i], label=f'Diag[{i}]')
axes[0].axhline(y=true_eigenvalues[0], color='k', linestyle='--', alpha=0.3)
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('Diagonal values')
axes[0].set_title('Basic QR Convergence')
axes[0].legend()
 
for i in range(min(len(hist_shifted), n)):
    if i < hist_shifted.shape[1]:
        axes[1].plot(hist_shifted[:, i], label=f'Diag[{i}]')
axes[1].set_xlabel('Iteration')
axes[1].set_ylabel('Diagonal values')
axes[1].set_title('Shifted QR Convergence (Much Faster!)')
axes[1].legend()
 
plt.tight_layout()
plt.show()

Summary: Mastering QR Decomposition

We've explored QR decomposition in depth—from basic definitions through multiple algorithms to critical applications. Let's consolidate the key insights.

Key Takeaways

•QR exists for all matrices: A = QR with orthogonal Q and upper triangular R; unique for full-rank A with positive R diagonal.
•Three algorithms: Gram-Schmidt (intuitive but unstable), Householder (stable, preferred), Givens (sparse-friendly).
•Modified Gram-Schmidt dramatically improves stability over classical version with a simple code reordering.
•Householder reflections are backward stable and used in all production linear algebra libraries.
•QR beats normal equations for least squares: avoids squaring the condition number, preserving numerical precision.
•The QR algorithm computes eigenvalues through elegant matrix iteration—one of the top algorithms of the 20th century.
•Shifted QR accelerates eigenvalue convergence from O(n²) to O(n) iterations.

What's next:

Page Complete

3 / 5