Loading learning content...
While SVD and eigendecomposition often receive more attention for their interpretability, QR decomposition is the unsung hero of computational linear algebra. It's the foundation upon which many critical algorithms are built—from solving least squares problems to computing eigenvalues to maintaining numerical stability in iterative methods.
QR decomposition factors any matrix into an orthogonal matrix Q and an upper triangular matrix R. This seemingly simple factorization has profound implications: it orthonormalizes vectors (Gram-Schmidt on steroids), provides numerically stable least squares solutions, enables efficient eigenvalue computation (QR algorithm), and serves as a building block for more complex decompositions.
If SVD is the Swiss Army knife of matrix factorizations, QR decomposition is the precision screwdriver—less versatile but absolutely essential for the tasks where it excels.
By the end of this page, you will understand: (1) The definition and existence of QR decomposition, (2) Three algorithms for computing QR: Gram-Schmidt, Householder reflections, and Givens rotations, (3) Why QR is preferred over normal equations for least squares, (4) The QR algorithm for eigenvalue computation, and (5) Practical applications in ML including orthogonalization and numerical stability.
The QR Decomposition:
Any matrix A ∈ ℝᵐˣⁿ (with m ≥ n) can be factored as:
$$A = QR$$
where:
Forms of QR decomposition:
Full QR:
Reduced (Thin) QR:
The reduced form is more storage-efficient and sufficient for most applications.
Existence and uniqueness:
QR decomposition always exists for any m × n matrix with m ≥ n.
If A has full column rank (rank = n), then:
If A is rank-deficient, R will have zero diagonal entries where rank is lost.
The names Q and R are conventional: Q comes from 'orthogonal' (German: orthogonal = QO, using Q), and R from 'right triangular' (upper triangular matrices are also called right triangular). Some sources use A = QU where U stands for upper triangular.
Geometric interpretation:
QR decomposition performs orthogonalization of the columns of A:
Connection to Gram-Schmidt:
QR decomposition is essentially the Gram-Schmidt process organized in matrix form:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
import numpy as np # Create a sample matrixA = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 10], [10, 11, 12]], dtype=float) print("Matrix A:")print(A)print(f"Shape: {A.shape}") # Full QR decompositionQ_full, R_full = np.linalg.qr(A, mode='complete')print(f"=== Full QR ===")print(f"Q shape: {Q_full.shape}") # (4, 4)print(f"R shape: {R_full.shape}") # (4, 3) # Reduced (thin) QR decompositionQ_reduced, R_reduced = np.linalg.qr(A, mode='reduced')print(f"=== Reduced QR ===")print(f"Q shape: {Q_reduced.shape}") # (4, 3)print(f"R shape: {R_reduced.shape}") # (3, 3) # Verify Q is orthogonal/orthonormalprint(f"Q^T Q (should be identity):")print(np.round(Q_reduced.T @ Q_reduced, 10)) # Verify R is upper triangularprint(f"R (upper triangular):")print(np.round(R_reduced, 6)) # Verify reconstruction: A = QRA_reconstructed = Q_reduced @ R_reducedprint(f"Reconstruction error ||A - QR||: {np.linalg.norm(A - A_reconstructed):.2e}") # Column space interpretationprint(f"=== Column Space Analysis ===")for j in range(A.shape[1]): # Column j of A = first (j+1) columns of Q times column j of R a_j = A[:, j] reconstruction = Q_reduced[:, :j+1] @ R_reduced[:j+1, j] print(f"Column {j+1}: ||a_j - Σ r_{{ij}}q_i||: {np.linalg.norm(a_j - reconstruction):.2e}")The conceptually simplest way to compute QR decomposition is through Gram-Schmidt orthogonalization—a process that takes a set of vectors and produces an orthonormal set spanning the same space.
Classical Gram-Schmidt (CGS):
Given vectors a₁, a₂, ..., aₙ (columns of A), produce orthonormal q₁, q₂, ..., qₙ:
For j = 1 to n:
vⱼ = aⱼ
For i = 1 to j-1:
rᵢⱼ = qᵢᵀ aⱼ # Projection coefficient
vⱼ = vⱼ - rᵢⱼ qᵢ # Subtract projection
rⱼⱼ = ||vⱼ||
qⱼ = vⱼ / rⱼⱼ # Normalize
Interpretation:
The problem with classical Gram-Schmidt:
CGS is numerically unstable. Due to floating-point errors, the computed qⱼ may have small components in the direction of previous qᵢ vectors. These errors accumulate, and for ill-conditioned matrices, the output vectors can be far from orthogonal.
Classical Gram-Schmidt can produce severely non-orthogonal output for ill-conditioned matrices. For a matrix with condition number κ, the output Q may have orthogonality errors proportional to κ × machine_epsilon. For κ = 10¹⁰ and double precision (ε ≈ 10⁻¹⁶), this can completely destroy orthogonality!
Modified Gram-Schmidt (MGS):
A simple reordering dramatically improves stability:
For j = 1 to n:
vⱼ = aⱼ
For i = 1 to j-1:
rᵢⱼ = qᵢᵀ vⱼ # Note: vⱼ, not aⱼ
vⱼ = vⱼ - rᵢⱼ qᵢ # Update vⱼ immediately
rⱼⱼ = ||vⱼ||
qⱼ = vⱼ / rⱼⱼ
The key difference: In CGS, we compute all projections based on the original aⱼ. In MGS, after each subtraction, we recompute the projection against the updated vector.
This seemingly minor change makes MGS as stable as Householder QR for computing R, though Q may still have some orthogonality loss.
Complexity:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
import numpy as np def classical_gram_schmidt(A): """ Classical Gram-Schmidt orthogonalization. Numerically unstable but conceptually simple. """ m, n = A.shape Q = np.zeros((m, n)) R = np.zeros((n, n)) for j in range(n): v = A[:, j].copy() # Subtract projections onto all previous q vectors for i in range(j): R[i, j] = Q[:, i] @ A[:, j] # Project onto original a_j v = v - R[i, j] * Q[:, i] R[j, j] = np.linalg.norm(v) if R[j, j] > 1e-10: Q[:, j] = v / R[j, j] else: Q[:, j] = 0 # Handle rank deficiency return Q, R def modified_gram_schmidt(A): """ Modified Gram-Schmidt orthogonalization. Much more numerically stable than classical version. """ m, n = A.shape Q = np.zeros((m, n)) R = np.zeros((n, n)) V = A.copy().astype(float) for j in range(n): # Subtract projections, updating v_j after each for i in range(j): R[i, j] = Q[:, i] @ V[:, j] # Project onto current v_j V[:, j] = V[:, j] - R[i, j] * Q[:, i] # Update v_j immediately R[j, j] = np.linalg.norm(V[:, j]) if R[j, j] > 1e-10: Q[:, j] = V[:, j] / R[j, j] else: Q[:, j] = 0 return Q, R def test_orthogonality(Q, name): """Measure departure from orthogonality.""" QtQ = Q.T @ Q I = np.eye(Q.shape[1]) error = np.linalg.norm(QtQ - I, 'fro') print(f"{name}: ||Q^T Q - I||_F = {error:.2e}") return error # Create an ill-conditioned matrix (challenging case)np.random.seed(42)n = 20# Construct matrix with specific condition numberU, _ = np.linalg.qr(np.random.randn(n, n))V, _ = np.linalg.qr(np.random.randn(n, n))singular_values = np.logspace(0, -10, n) # Condition number = 10^10A_illcond = U @ np.diag(singular_values) @ V.T print(f"Matrix condition number: {np.linalg.cond(A_illcond):.2e}") # Compare methodsQ_cgs, R_cgs = classical_gram_schmidt(A_illcond)Q_mgs, R_mgs = modified_gram_schmidt(A_illcond)Q_np, R_np = np.linalg.qr(A_illcond) print("Orthogonality comparison:")test_orthogonality(Q_cgs, "Classical Gram-Schmidt")test_orthogonality(Q_mgs, "Modified Gram-Schmidt")test_orthogonality(Q_np, "NumPy QR (Householder)") # Reconstruction accuracyprint("Reconstruction accuracy ||A - QR||:")print(f"Classical GS: {np.linalg.norm(A_illcond - Q_cgs @ R_cgs):.2e}")print(f"Modified GS: {np.linalg.norm(A_illcond - Q_mgs @ R_mgs):.2e}")print(f"NumPy QR: {np.linalg.norm(A_illcond - Q_np @ R_np):.2e}")For production-quality numerical software, Householder reflections are the preferred method for QR decomposition. They provide excellent numerical stability and are the default in libraries like NumPy, LAPACK, and MATLAB.
The Householder transformation:
A Householder matrix (or Householder reflector) is an orthogonal matrix of the form: $$H = I - 2\frac{vv^T}{v^Tv}$$
where v ∈ ℝⁿ is a non-zero vector.
Properties:
The key insight:
Given any vector x, we can find v such that Hx = αe₁ (a vector pointing along the first axis): $$v = x - \alpha e_1 \quad \text{where} \quad \alpha = \pm|x|$$
The sign is chosen to avoid numerical cancellation (typically opposite to sign of x₁).
QR via Householder:
To compute QR, we zero out below-diagonal entries column by column:
The product Q = H₁H₂...Hₙ is orthogonal, and Hₙ...H₂H₁A = R.
Householder reflections are 'backward stable': the computed QR satisfies (Q+ΔQ)(R+ΔR) = A+ΔA where ΔA is tiny (machine precision × ||A||). This is much stronger than just accurate output—it guarantees the computed factors are the exact QR of a matrix very close to A.
Efficient implementation:
Naively, applying an m×m Householder matrix costs O(m²). But H = I - 2vvᵀ/(vᵀv) is never formed explicitly!
Instead: Hx = x - (2vᵀx/(vᵀv))v — just O(m) operations.
Column-wise application:
For matrix X: HX = X - v(2vᵀX/(vᵀv))
This is O(mn) for an m×n matrix—much better than O(m²n) for explicit multiplication.
Storing Q implicitly:
In practice, Q is often stored implicitly as the sequence of Householder vectors v₁, v₂, ..., vₙ. When needed, Q can be applied to a vector efficiently without ever forming the full m×m matrix.
Complexity:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
import numpy as np def householder_reflection(x): """ Compute Householder vector v such that Hx = ||x||e_1. H = I - 2vv^T / (v^T v) Returns: v: Householder vector (normalized) tau: scalar 2/(v^T v) used in H = I - tau*v*v^T """ n = len(x) sigma = np.sign(x[0]) if x[0] != 0 else 1 norm_x = np.linalg.norm(x) if norm_x == 0: return np.zeros(n), 0 # v = x + sign(x_1)||x||e_1 v = x.copy() v[0] = x[0] + sigma * norm_x # tau = 2 / (v^T v) = 2 / ||v||^2 tau = 2.0 / (v @ v) return v, tau def householder_qr(A): """ Compute QR decomposition using Householder reflections. Returns: Q: m x m orthogonal matrix (explicit form) R: m x n upper triangular matrix """ m, n = A.shape R = A.copy().astype(float) Q = np.eye(m) for k in range(min(m-1, n)): # Extract subcolumn to reflect x = R[k:m, k] # Compute Householder reflection for this column v, tau = householder_reflection(x) if tau > 0: # Apply H to remaining columns of R: R[k:, k:] = H R[k:, k:] # H * X = X - tau * v * (v^T * X) R[k:m, k:n] -= tau * np.outer(v, v @ R[k:m, k:n]) # Accumulate Q from the right: Q = Q * H Q[:, k:m] -= tau * np.outer(Q[:, k:m] @ v, v) return Q, R def householder_qr_implicit(A): """ Compute QR with implicit Q representation (more efficient). Returns Householder vectors for later Q application. """ m, n = A.shape R = A.copy().astype(float) taus = [] vs = [] for k in range(min(m-1, n)): x = R[k:m, k] v, tau = householder_reflection(x) taus.append(tau) vs.append(v) if tau > 0: R[k:m, k:n] -= tau * np.outer(v, v @ R[k:m, k:n]) return vs, taus, R def apply_Q_implicit(vs, taus, x, transpose=False): """Apply Q (or Q^T) to vector x using implicit representation.""" m = len(x) result = x.copy().astype(float) if transpose: # Apply Q^T = H_n ... H_2 H_1 for k, (v, tau) in enumerate(zip(vs, taus)): # H_k acts on indices k:m result[k:] -= tau * v * (v @ result[k:]) else: # Apply Q = H_1 H_2 ... H_n for k, (v, tau) in reversed(list(enumerate(zip(vs, taus)))): result[k:] -= tau * v * (v @ result[k:]) return result # Test implementationnp.random.seed(42)A = np.random.randn(6, 4) # Our Householder implementationQ_ours, R_ours = householder_qr(A) # NumPy implementationQ_np, R_np = np.linalg.qr(A, mode='complete') print("Householder QR Decomposition")print("=" * 50)print(f"Matrix A: {A.shape}")print(f"R (upper triangular):")print(np.round(R_ours, 6))print(f"Orthogonality ||Q^T Q - I||: {np.linalg.norm(Q_ours.T @ Q_ours - np.eye(6)):.2e}")print(f"Reconstruction ||A - QR||: {np.linalg.norm(A - Q_ours[:, :4] @ R_ours[:4, :]):.2e}") # Compare with NumPyprint(f"Difference from NumPy R: {np.linalg.norm(np.abs(R_ours[:4]) - np.abs(R_np)):.2e}")print("(Sign differences are allowed due to reflection choice)")While Householder reflections are efficient for dense matrices, Givens rotations offer advantages for sparse and structured matrices by zeroing one element at a time.
The Givens rotation:
A Givens rotation G(i, j, θ) is an n×n identity matrix with four modified entries:
$$G(i,j,\theta) = \begin{pmatrix} 1 & \cdots & 0 & \cdots & 0 & \cdots & 0 \ \vdots & \ddots & \vdots & & \vdots & & \vdots \ 0 & \cdots & c & \cdots & s & \cdots & 0 \ \vdots & & \vdots & \ddots & \vdots & & \vdots \ 0 & \cdots & -s & \cdots & c & \cdots & 0 \ \vdots & & \vdots & & \vdots & \ddots & \vdots \ 0 & \cdots & 0 & \cdots & 0 & \cdots & 1 \end{pmatrix}$$
where c = cos(θ) and s = sin(θ) are placed at (i,i), (i,j), (j,i), (j,j).
Effect of Givens rotation:
Multiplying Gx rotates vector x in the (i,j) plane by angle θ. We can choose θ to zero out x_j: $$c = \frac{x_i}{\sqrt{x_i^2 + x_j^2}}, \quad s = \frac{x_j}{\sqrt{x_i^2 + x_j^2}}$$
QR via Givens rotations:
Zero below-diagonal elements one at a time, working from bottom to top within each column:
For each column k:
For each row i from m down to k+1:
Find G to zero A[i,k] using A[k,k] as pivot
Apply G to rows k and i of A
Givens rotations excel when: (1) The matrix is sparse—Householder fills in zeros but Givens targets specific elements, (2) The matrix is nearly triangular—only a few rotations needed, (3) Parallel implementations—rotations on independent rows can be concurrent, (4) Updating QR—adding a row requires only a few Givens rotations vs full recomputation.
Complexity comparison:
| Method | Dense m×n | Sparse | Best for |
|---|---|---|---|
| Householder | 2mn² - 2n³/3 | Fills in | Dense matrices |
| Givens | 3mn² - n³ | Preserves sparsity | Sparse, structured |
| Modified GS | 2mn² | Moderate fill | When Q needed explicitly |
Applications of Givens rotations:
Updating QR factorization: When a new row or column is added, QR can be updated in O(mn) rather than O(mn²) recomputation
Sparse least squares: For sparse A, Givens preserves sparsity in R
QR algorithm for eigenvalues: Givens rotations implement the 'shifts' in the implicit QR algorithm
Real-time control systems: Incremental updates as new measurements arrive
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108
import numpy as np def givens_rotation(a, b): """ Compute Givens rotation parameters (c, s) such that: [c s] [a] [r] [-s c] [b] = [0] Returns c = cos(θ), s = sin(θ), r = √(a² + b²) """ if b == 0: c, s, r = 1, 0, a elif a == 0: c, s, r = 0, np.sign(b), abs(b) elif abs(a) > abs(b): t = b / a u = np.sign(a) * np.sqrt(1 + t**2) c = 1 / u s = t * c r = a * u else: t = a / b u = np.sign(b) * np.sqrt(1 + t**2) s = 1 / u c = t * s r = b * u return c, s, r def apply_givens_left(A, i, k, c, s): """ Apply Givens rotation to rows i and k of matrix A. Modifies A in place. """ for j in range(A.shape[1]): temp = c * A[k, j] + s * A[i, j] A[i, j] = -s * A[k, j] + c * A[i, j] A[k, j] = temp def givens_qr(A): """ Compute QR decomposition using Givens rotations. Returns: Q: m x m orthogonal matrix R: m x n upper triangular matrix """ m, n = A.shape R = A.copy().astype(float) Q = np.eye(m) for k in range(n): # Zero out entries below diagonal in column k for i in range(m-1, k, -1): # From bottom to top if R[i, k] != 0: # Compute Givens rotation to zero R[i, k] c, s, r = givens_rotation(R[k, k], R[i, k]) # Apply to R (rows k and i) apply_givens_left(R, i, k, c, s) # Accumulate in Q (columns k and i) # Q = Q @ G^T means operating on columns for row in range(m): temp = c * Q[row, k] + s * Q[row, i] Q[row, i] = -s * Q[row, k] + c * Q[row, i] Q[row, k] = temp return Q, R # Test Givens QRnp.random.seed(42)A = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 10], [11, 12, 13]], dtype=float) Q_givens, R_givens = givens_qr(A)Q_np, R_np = np.linalg.qr(A, mode='complete') print("Givens Rotation QR Decomposition")print("=" * 50)print("R (upper triangular):")print(np.round(R_givens, 6))print(f"Orthogonality ||Q^T Q - I||: {np.linalg.norm(Q_givens.T @ Q_givens - np.eye(4)):.2e}")print(f"Reconstruction ||A - Q[:,:3]R[:3,:]||: {np.linalg.norm(A - Q_givens[:, :3] @ R_givens[:3, :]):.2e}") # Demonstrate sparse advantageprint("=== Sparse Matrix Example ===")# Nearly upper triangular (only a few entries to zero)A_sparse = np.array([ [3, 1, 4], [0, 2, 6], [0, 0.1, 3], # Only this small entry needs zeroing [0, 0, 0.2] # And this one], dtype=float) print("Nearly triangular matrix:")print(A_sparse) Q_sparse, R_sparse = givens_qr(A_sparse)print("R after Givens (minimal rotations needed):")print(np.round(R_sparse, 6))One of the most important applications of QR decomposition is solving least squares problems—the foundation of linear regression and many optimization algorithms.
The least squares problem:
Given an overdetermined system Ax = b (m equations, n unknowns, m > n), find x that minimizes ||Ax - b||².
Two approaches:
1. Normal equations: $$A^T A x = A^T b$$ Solve by computing AᵀA, then using Cholesky or LU.
2. QR decomposition: Write A = QR (reduced form). Then: $$|Ax - b|^2 = |QRx - b|^2 = |Rx - Q^Tb|^2$$
(Since Q preserves norms: ||Qy||² = ||y||²)
Minimized by solving the triangular system: $$Rx = Q^T b$$
Why QR is better:
| Aspect | Normal Equations | QR Method |
|---|---|---|
| Condition number | κ(AᵀA) = κ(A)² | κ(R) = κ(A) |
| Precision loss | ~2× digits lost | ~1× digits lost |
| Rank-deficient A | AᵀA is singular | Detectable from R |
| Extra computation | Form AᵀA (expensive) | QR directly |
| Stability | Unstable for ill-conditioned A | Backward stable |
For AᵀA, the condition number is squared: κ(AᵀA) = κ(A)². If κ(A) = 10⁸, then κ(AᵀA) = 10¹⁶, which exceeds double precision! All digits of accuracy are lost. QR avoids this by never forming AᵀA.
Step-by-step QR solution:
Total cost: O(mn²) for QR + O(n²) for solve = O(mn²)
Compared to normal equations: O(mn²) for AᵀA + O(n³) for solve = O(mn² + n³)
For m >> n, both are dominated by O(mn²), but QR is more stable.
Residual computation:
The least squares residual r = b - Ax has norm: $$|r| = |b - Ax| = |(I - QQ^T)b|$$
This equals the norm of the 'leftover' part of b not in the column space of A (which equals span of Q).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
import numpy as npfrom scipy.linalg import solve_triangular def least_squares_normal_equations(A, b): """Solve least squares via normal equations (unstable!).""" ATA = A.T @ A ATb = A.T @ b x = np.linalg.solve(ATA, ATb) return x def least_squares_qr(A, b): """Solve least squares via QR decomposition (stable).""" Q, R = np.linalg.qr(A, mode='reduced') c = Q.T @ b x = solve_triangular(R, c) return x def compare_methods(A, b, x_true): """Compare normal equations vs QR for least squares.""" print(f"Condition number κ(A): {np.linalg.cond(A):.2e}") print(f"Condition number κ(A^T A): {np.linalg.cond(A.T @ A):.2e}") # Method 1: Normal equations x_normal = least_squares_normal_equations(A, b) error_normal = np.linalg.norm(x_normal - x_true) / np.linalg.norm(x_true) # Method 2: QR x_qr = least_squares_qr(A, b) error_qr = np.linalg.norm(x_qr - x_true) / np.linalg.norm(x_true) # Method 3: NumPy lstsq (uses SVD) x_lstsq, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None) error_lstsq = np.linalg.norm(x_lstsq - x_true) / np.linalg.norm(x_true) print("Relative errors ||x_computed - x_true|| / ||x_true||:") print(f" Normal equations: {error_normal:.2e}") print(f" QR decomposition: {error_qr:.2e}") print(f" NumPy lstsq (SVD): {error_lstsq:.2e}") return error_normal, error_qr, error_lstsq # Test with well-conditioned matrixnp.random.seed(42)m, n = 100, 10A_good = np.random.randn(m, n)x_true = np.random.randn(n)b_good = A_good @ x_true + 0.01 * np.random.randn(m) # Add noise print("=== Well-conditioned matrix ===")compare_methods(A_good, b_good, x_true) # Test with ill-conditioned matrixprint("=== Ill-conditioned matrix ===")U, _ = np.linalg.qr(np.random.randn(m, n))V, _ = np.linalg.qr(np.random.randn(n, n))singular_values = np.logspace(0, -10, n) # κ = 10^10A_bad = U @ np.diag(singular_values) @ V.T x_true = np.random.randn(n)b_bad = A_bad @ x_true + 1e-12 * np.random.randn(m) # Smaller noise compare_methods(A_bad, b_bad, x_true) print("Note: Normal equations fail catastrophically for ill-conditioned A!")print("QR maintains accuracy where normal equations lose all precision.")One of the most elegant applications of QR decomposition is the QR algorithm for computing matrix eigenvalues—considered one of the top 10 algorithms of the 20th century.
Basic QR iteration:
A₀ = A
For k = 0, 1, 2, ...
Qₖ Rₖ = Aₖ (QR factorization)
Aₖ₊₁ = Rₖ Qₖ (Reverse multiply)
The magic: Under mild conditions, Aₖ converges to an upper triangular matrix (Schur form), with eigenvalues on the diagonal!
Why it works (intuition):
Convergence rate:
The subdiagonal entries decay proportionally to |λᵢ₊₁/λᵢ|ᵏ. If eigenvalues are well-separated, convergence is fast. For clustered eigenvalues, convergence is slow.
Shifted QR (practical version):
To accelerate convergence, use shifts:
Qₖ Rₖ = Aₖ - μₖI (Shifted QR)
Aₖ₊₁ = Rₖ Qₖ + μₖI (Shift back)
The shift μₖ is chosen close to an eigenvalue (often using Wilkinson's shift). This dramatically speeds convergence—typically O(n) total iterations instead of O(n²).
Production eigenvalue codes first reduce A to Hessenberg form (tridiagonal for symmetric), then apply implicit shifted QR without forming Q and R explicitly. This reduces each iteration from O(n³) to O(n²), making the full algorithm O(n³) overall.
Connection to power iteration:
The QR algorithm can be viewed as simultaneous power iteration for all eigenvectors:
For symmetric matrices:
The QR algorithm simplifies considerably:
This is why symmetric eigenproblems (like PCA) are 'easy' computationally.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
import numpy as npimport matplotlib.pyplot as plt def basic_qr_iteration(A, max_iters=100, tol=1e-10): """ Basic QR algorithm for eigenvalue computation. Returns sequence of diagonal elements (converging to eigenvalues). """ n = A.shape[0] Ak = A.copy().astype(float) history = [] for k in range(max_iters): # Store current diagonal (approximate eigenvalues) history.append(np.diag(Ak).copy()) # QR factorization Q, R = np.linalg.qr(Ak) # Reverse multiply Ak = R @ Q # Check convergence (sub-diagonal entries → 0) off_diag_norm = np.linalg.norm(np.tril(Ak, -1)) if off_diag_norm < tol: print(f"Converged in {k+1} iterations") break else: print(f"Did not converge in {max_iters} iterations") return np.diag(Ak), np.array(history), Ak def shifted_qr_iteration(A, max_iters=100, tol=1e-10): """ Shifted QR algorithm with Rayleigh quotient shift. Much faster convergence than basic QR. """ n = A.shape[0] Ak = A.copy().astype(float) history = [] for k in range(max_iters): history.append(np.diag(Ak).copy()) # Wilkinson shift: use last diagonal element as shift # (More sophisticated shifts exist but this works well) mu = Ak[-1, -1] # Shifted QR Q, R = np.linalg.qr(Ak - mu * np.eye(n)) # Reverse multiply and shift back Ak = R @ Q + mu * np.eye(n) off_diag_norm = np.linalg.norm(np.tril(Ak, -1)) if off_diag_norm < tol: print(f"Shifted QR converged in {k+1} iterations") break else: print(f"Shifted QR did not converge in {max_iters} iterations") return np.diag(Ak), np.array(history), Ak # Test on a symmetric matrixnp.random.seed(42)n = 5A_sym = np.random.randn(n, n)A_sym = (A_sym + A_sym.T) / 2 # Make symmetric true_eigenvalues = np.sort(np.linalg.eigvalsh(A_sym))[::-1]print(f"True eigenvalues: {true_eigenvalues.round(4)}") print("=== Basic QR Iteration ===")eigs_basic, hist_basic, _ = basic_qr_iteration(A_sym)eigs_basic_sorted = np.sort(eigs_basic)[::-1]print(f"Computed eigenvalues: {eigs_basic_sorted.round(4)}") print("=== Shifted QR Iteration ===")eigs_shifted, hist_shifted, _ = shifted_qr_iteration(A_sym)eigs_shifted_sorted = np.sort(eigs_shifted)[::-1]print(f"Computed eigenvalues: {eigs_shifted_sorted.round(4)}") # Visualize convergencefig, axes = plt.subplots(1, 2, figsize=(14, 5)) for i in range(n): axes[0].plot(hist_basic[:, i], label=f'Diag[{i}]')axes[0].axhline(y=true_eigenvalues[0], color='k', linestyle='--', alpha=0.3)axes[0].set_xlabel('Iteration')axes[0].set_ylabel('Diagonal values')axes[0].set_title('Basic QR Convergence')axes[0].legend() for i in range(min(len(hist_shifted), n)): if i < hist_shifted.shape[1]: axes[1].plot(hist_shifted[:, i], label=f'Diag[{i}]')axes[1].set_xlabel('Iteration')axes[1].set_ylabel('Diagonal values')axes[1].set_title('Shifted QR Convergence (Much Faster!)')axes[1].legend() plt.tight_layout()plt.show()We've explored QR decomposition in depth—from basic definitions through multiple algorithms to critical applications. Let's consolidate the key insights.
What's next:
We'll explore Cholesky decomposition—a specialized factorization for symmetric positive definite matrices that is twice as fast as LU and essential for covariance matrices, kernel methods, and Gaussian processes in machine learning.
You now understand QR decomposition comprehensively—its definition, three computational approaches, superiority for least squares, and role in eigenvalue computation. QR is the invisible workhorse behind much of numerical linear algebra.