Machine LearningComputational Considerations

Computational Considerations in Linear Regression

LevelAdvanced

Duration90 mins

TopicComputational Considerations

2 / 5

QR Decomposition Solution

The Elegant Solution to Numerical Instability

In the previous page, we discovered a sobering truth: the seemingly simple normal equations β = (XᵀX)⁻¹Xᵀy can catastrophically fail due to condition number squaring. A moderately ill-conditioned problem (κ(X) = 10⁴) becomes severely unstable (κ(XᵀX) = 10⁸) when we form XᵀX.

The solution isn't to find a better way to invert XᵀX—it's to never form XᵀX in the first place.

This is where QR decomposition enters the picture. By factoring X into an orthogonal matrix Q and an upper-triangular matrix R, we transform least squares into a simple triangular system that preserves the original conditioning of X.

What You Will Learn

By the end of this page, you will: (1) understand why orthogonal matrices are numerically safe, (2) derive how QR decomposition solves least squares, (3) learn multiple algorithms for computing QR (Gram-Schmidt, Householder), and (4) implement a numerically stable least squares solver from scratch.

Orthogonal Matrices: The Numerically Safe Transformations

Before diving into QR decomposition, we must understand why orthogonal matrices are the cornerstone of numerical linear algebra. An orthogonal matrix Q satisfies:

$$Q^\top Q = QQ^\top = I \quad \Leftrightarrow \quad Q^{-1} = Q^\top$$

Geometric Interpretation:

Orthogonal matrices represent rotations and reflections—rigid transformations that preserve lengths and angles. When we multiply a vector by Q:

Lengths are preserved: ‖Qx‖ = ‖x‖ for all x
Angles are preserved: ⟨Qx, Qy⟩ = ⟨x, y⟩ for all x, y
The inverse is trivial to compute: Q⁻¹ = Qᵀ (just transpose, no numerical computation needed)

The Condition Number of Orthogonal Matrices

Since orthogonal matrices preserve lengths, all their singular values equal 1. Therefore κ(Q) = σ_max/σ_min = 1/1 = 1. Orthogonal matrices are perfectly conditioned! Multiplying by Q never amplifies errors.

Why This Matters for Stability:

Suppose we want to solve Ax = b but A is ill-conditioned. If we can factor A = QR where Q is orthogonal, then:

$$Ax = b \implies QRx = b \implies Rx = Q^\top b$$

The transformation from b to Qᵀb is perfectly conditioned (κ = 1). All the ill-conditioning is isolated in R, and since R is upper-triangular, solving Rx = c is straightforward via back-substitution.

Critically, the condition number we face is κ(R) = κ(A), not κ(AᵀA) = κ(A)². We've avoided the squaring!

Comparison: Normal Equations vs QR Decomposition
Property	Normal Equations	QR Decomposition
Solves	(XᵀX)β = Xᵀy	Rβ = Qᵀy
Condition experienced	κ(XᵀX) = κ(X)²	κ(R) = κ(X)
Matrix to invert	XᵀX (p × p)	None—back-substitution
Numerical stability	Unstable for ill-conditioned X	Backward stable
Extra computation	Form XᵀX	Compute Q, R
Recommended when	X is very well-conditioned	Always (default choice)

QR Decomposition: Definition and Least Squares Connection

The QR Factorization:

For any matrix X ∈ ℝⁿˣᵖ with n ≥ p (more rows than columns), we can compute:

$$X = QR$$

where:

Q ∈ ℝⁿˣⁿ is orthogonal (full QR) or Q ∈ ℝⁿˣᵖ with orthonormal columns (thin/reduced QR)
R ∈ ℝᵖˣᵖ is upper triangular (for thin QR) or R ∈ ℝⁿˣᵖ (for full QR, but bottom n-p rows are zero)

The reduced/thin QR is most useful for least squares:

$$X = Q_1 R_1 \quad \text{where } Q_1 \in \mathbb{R}^{n \times p}, R_1 \in \mathbb{R}^{p \times p}$$

Derivation: QR Solution to Least Squares

We want to minimize ‖Xβ - y‖². Substituting X = Q₁R₁:

$$|X\beta - y|^2 = |Q_1 R_1 \beta - y|^2$$

Now, for any orthogonal matrix Q:

$$|Qz| = |z| \quad \text{(orthogonal matrices preserve norms)}$$

Multiplying by Qᵀ (which is also orthogonal) preserves the norm:

$$|Q_1 R_1 \beta - y|^2 = |Q^\top(Q_1 R_1 \beta - y)|^2$$

where Q is the full orthogonal matrix. Partitioning Q = [Q₁ | Q₂]:

$$Q^\top y = \begin{bmatrix} Q_1^\top y \ Q_2^\top y \end{bmatrix} = \begin{bmatrix} c_1 \ c_2 \end{bmatrix}$$

The residual norm becomes:

$$|R_1\beta - c_1|^2 + |c_2|^2$$

The Key Insight

The term ‖c₂‖² = ‖Q₂ᵀy‖² is independent of β! It's the irreducible residual—the part of y orthogonal to the column space of X. We minimize by choosing β such that R₁β = c₁, making the first term zero.

The QR Least Squares Algorithm:

Compute the QR factorization: X = Q₁R₁ (thin QR)
Form c₁ = Q₁ᵀy
Solve the triangular system R₁β = c₁ by back-substitution

Why Back-Substitution is Stable:

Solving an upper-triangular system:

$$\begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1p} \ 0 & r_{22} & \cdots & r_{2p} \ \vdots & \ddots & \ddots & \vdots \ 0 & \cdots & 0 & r_{pp} \end{bmatrix} \begin{bmatrix} \beta_1 \ \beta_2 \ \vdots \ \beta_p \end{bmatrix} = \begin{bmatrix} c_1 \ c_2 \ \vdots \ c_p \end{bmatrix}$$

is trivial: start from the bottom row and work up:

$$\beta_p = c_p / r_{pp}, \quad \beta_{p-1} = (c_{p-1} - r_{p-1,p}\beta_p) / r_{p-1,p-1}, \quad \ldots$$

Each step involves only additions, multiplications, and a single division—no matrix inversion, no condition-number-squaring.

qr_least_squares.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from scipy.linalg import solve_triangular
 
def qr_least_squares(X, y):
    """
    Solve least squares using QR decomposition.
    
    This is the numerically stable approach:
    1. Compute thin QR: X = Q @ R
    2. Compute c = Q.T @ y
    3. Solve triangular system: R @ beta = c
    
    Parameters:
    -----------
    X : ndarray of shape (n, p)
        Design matrix
    y : ndarray of shape (n,)
        Target vector
        
    Returns:
    --------
    beta : ndarray of shape (p,)
        Least squares solution
    residual : float
        Squared residual norm ||X @ beta - y||^2
    """
    n, p = X.shape
    
    # Step 1: Compute thin QR factorization
    # mode='reduced' gives Q1 (n x p) and R (p x p)
    Q, R = np.linalg.qr(X, mode='reduced')
    
    # Step 2: Compute c = Q^T @ y
    c = Q.T @ y
    
    # Step 3: Solve R @ beta = c via back-substitution
    # solve_triangular is more efficient and numerically stable
    beta = solve_triangular(R, c)
    
    # Compute residual: ||y - X @ beta||^2
    # Equivalently: ||y||^2 - ||c||^2 (the part orthogonal to col(X))
    residual = np.linalg.norm(y)**2 - np.linalg.norm(c)**2
    
    return beta, residual
 
 
def compare_methods(X, y, beta_true):
    """
    Compare normal equations vs QR for the same problem.
    """
    # Method 1: Normal Equations (unstable)
    XtX = X.T @ X
    Xty = X.T @ y
    try:
        beta_normal = np.linalg.solve(XtX, Xty)
        error_normal = np.linalg.norm(beta_normal - beta_true)
    except np.linalg.LinAlgError:
        beta_normal = None
        error_normal = np.inf
    
    # Method 2: QR Decomposition (stable)
    beta_qr, residual = qr_least_squares(X, y)
    error_qr = np.linalg.norm(beta_qr - beta_true)
    
    print(f"Condition number κ(X):   {np.linalg.cond(X):.2e}")
    print(f"Condition number κ(XᵀX): {np.linalg.cond(XtX):.2e}")
    print(f"Normal Equations Error:  {error_normal:.2e}")
    print(f"QR Decomposition Error:  {error_qr:.2e}")
    print(f"QR advantage factor:     {error_normal / error_qr:.1f}x")
    
    return beta_normal, beta_qr
 
 
# Demonstration with ill-conditioned problem
np.random.seed(42)
n, p = 100, 10
 
# Create ill-conditioned X using specific singular values
U, _ = np.linalg.qr(np.random.randn(n, n))
V, _ = np.linalg.qr(np.random.randn(p, p))
singular_values = np.logspace(0, -8, p)  # 1, 1e-8 range → κ ≈ 10^8
X = U[:, :p] @ np.diag(singular_values) @ V
 
# True coefficients and generate data
beta_true = np.ones(p)
y = X @ beta_true + 1e-10 * np.random.randn(n)  # Tiny noise
 
print("=== Ill-Conditioned Least Squares Comparison ===")
compare_methods(X, y, beta_true)

Gram-Schmidt Orthogonalization

The simplest way to compute QR is the Gram-Schmidt process, an iterative algorithm that orthogonalizes columns one at a time.

The Classical Gram-Schmidt Algorithm:

Given matrix X with columns x₁, x₂, ..., xₚ, we construct orthonormal vectors q₁, q₂, ..., qₚ:

q₁ = x₁ / ‖x₁‖ (normalize first column)
For j = 2, 3, ..., p:
- Remove components in the direction of previous q's: $$\tilde{q}j = x_j - \sum{i=1}^{j-1} (q_i^\top x_j) q_i$$
- Normalize: q_j = q̃_j / ‖q̃_j‖

The R matrix is built as we go:

r_{ij} = q_i^ᵀ x_j for i < j (projections we subtract)
r_{jj} = ‖q̃_j‖ (the norm before normalizing)

Why Classical Gram-Schmidt is Numerically Problematic:

While mathematically elegant, CGS suffers from loss of orthogonality when columns are nearly linearly dependent. The problem is that we compute projections onto previously computed q vectors, which may have accumulated small errors. These errors compound, and after several iterations, the q vectors may no longer be orthogonal at all.

The Fix: Modified Gram-Schmidt (MGS)

Modified Gram-Schmidt makes a subtle but crucial change: instead of computing all projections at once using the original xⱼ, we update xⱼ incrementally after each projection:

q₁ = x₁ / ‖x₁‖
For j = 2, 3, ..., p:
- Start with x̃ⱼ = xⱼ
- For i = 1, 2, ..., j-1:
  - r_{ij} = q_i^ᵀ x̃_j
  - x̃_j ← x̃_j - r_{ij} q_i (update immediately!)
- r_{jj} = ‖x̃ⱼ‖, q_j = x̃_j / r_{jj}

Mathematically identical to CGS, but numerically much more stable.

gram_schmidt.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
import numpy as np
 
def classical_gram_schmidt(X):
    """
    Classical Gram-Schmidt orthogonalization.
    
    Mathematically correct but numerically unstable for 
    ill-conditioned matrices.
    """
    n, p = X.shape
    Q = np.zeros((n, p))
    R = np.zeros((p, p))
    
    for j in range(p):
        # Start with the j-th column
        v = X[:, j].copy()
        
        # Subtract projections onto all previous q vectors
        for i in range(j):
            R[i, j] = np.dot(Q[:, i], X[:, j])  # Using ORIGINAL X[:, j]
            v = v - R[i, j] * Q[:, i]
        
        # Normalize
        R[j, j] = np.linalg.norm(v)
        if R[j, j] > 1e-14:
            Q[:, j] = v / R[j, j]
        else:
            Q[:, j] = v  # Near-zero column (rank deficiency)
    
    return Q, R
 
 
def modified_gram_schmidt(X):
    """
    Modified Gram-Schmidt orthogonalization.
    
    Numerically more stable than classical GS because 
    projections are computed incrementally.
    """
    n, p = X.shape
    Q = np.zeros((n, p))
    R = np.zeros((p, p))
    V = X.copy().astype(float)  # Work with a copy
    
    for j in range(p):
        # First, orthogonalize column j against all previous columns
        for i in range(j):
            R[i, j] = np.dot(Q[:, i], V[:, j])  # Using CURRENT V[:, j]
            V[:, j] = V[:, j] - R[i, j] * Q[:, i]  # Immediate update!
        
        # Normalize
        R[j, j] = np.linalg.norm(V[:, j])
        if R[j, j] > 1e-14:
            Q[:, j] = V[:, j] / R[j, j]
        else:
            Q[:, j] = V[:, j]
    
    return Q, R
 
 
def measure_orthogonality(Q):
    """
    Measure how close Q^T Q is to identity.
    Returns ||Q^T Q - I||_F / p
    """
    p = Q.shape[1]
    return np.linalg.norm(Q.T @ Q - np.eye(p), 'fro') / p
 
 
def compare_gram_schmidt_stability():
    """
    Demonstrates the numerical advantage of Modified Gram-Schmidt.
    """
    np.random.seed(42)
    
    # Create increasingly ill-conditioned matrices
    n, p = 50, 20
    
    condition_numbers = [1e2, 1e4, 1e6, 1e8, 1e10]
    
    print("Gram-Schmidt Orthogonality Comparison")
    print("=" * 60)
    print(f"{'κ(X)':<12} {'CGS Orth. Error':<18} {'MGS Orth. Error':<18}")
    print("-" * 60)
    
    for kappa in condition_numbers:
        # Construct X with specified condition number
        U, _ = np.linalg.qr(np.random.randn(n, p))
        V, _ = np.linalg.qr(np.random.randn(p, p))
        s = np.logspace(0, -np.log10(kappa), p)
        X = U @ np.diag(s) @ V
        
        # Classical Gram-Schmidt
        Q_cgs, R_cgs = classical_gram_schmidt(X)
        orth_error_cgs = measure_orthogonality(Q_cgs)
        
        # Modified Gram-Schmidt
        Q_mgs, R_mgs = modified_gram_schmidt(X)
        orth_error_mgs = measure_orthogonality(Q_mgs)
        
        print(f"{kappa:<12.0e} {orth_error_cgs:<18.2e} {orth_error_mgs:<18.2e}")
    
    print()
    print("Observation: MGS maintains better orthogonality as κ(X) grows.")
    print("However, even MGS degrades—Householder QR is needed for extreme cases.")
 
compare_gram_schmidt_stability()

Gram-Schmidt Limitations

While MGS is more stable than CGS, both versions can lose orthogonality for severely ill-conditioned matrices. For production code, use Householder QR (what np.linalg.qr computes) which is fully backward stable. MGS is valuable for understanding and for cases where you need incremental updates.

Householder Reflections: The Gold Standard

Householder reflections are the industry-standard method for computing QR decomposition. They achieve full backward stability and are what LAPACK (and hence NumPy, SciPy, MATLAB) actually compute.

The Key Idea:

Instead of building Q column by column (Gram-Schmidt), Householder works by applying a sequence of reflections that zero out elements below the diagonal.

A Householder reflection (or elementary reflector) is a matrix of the form:

$$H = I - 2vv^\top \quad \text{where } |v| = 1$$

This matrix reflects vectors across the hyperplane orthogonal to v.

Properties of Householder Reflectors:

Orthogonal: Hᵀ = H, H² = I, so Hᵀ = H⁻¹
Symmetric: H = Hᵀ
Efficient storage: Only need to store v (not the full n×n matrix)
Efficient application: Hx = x - 2v(vᵀx) costs O(n), not O(n²)

The Core Problem We Solve:

Given a vector x, find H such that Hx has all zeros below the first entry:

$$Hx = \alpha e_1 = \begin{bmatrix} \alpha \ 0 \ \vdots \ 0 \end{bmatrix}$$

Since H preserves lengths: |α| = ‖x‖

The Formula:

$$v = x \pm |x| e_1, \quad v = v / |v|$$

(Choose sign to avoid catastrophic cancellation in first component: use sign(x₁)·‖x‖ for stability)

householder_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
 
def householder_vector(x):
    """
    Compute Householder vector v such that (I - 2vv^T)x is zero 
    below the first entry.
    
    Uses the numerically stable formula with sign choice.
    """
    n = len(x)
    
    # Compute norm
    sigma = np.dot(x[1:], x[1:])  # Sum of squares below first element
    v = np.zeros(n)
    v[0] = 1.0
    v[1:] = x[1:]
    
    if sigma < 1e-30:  # x is already in e_1 direction
        beta = 0
        return v, beta
    
    mu = np.sqrt(x[0]**2 + sigma)
    
    # Choose sign to avoid cancellation
    if x[0] <= 0:
        v[0] = x[0] - mu
    else:
        v[0] = -sigma / (x[0] + mu)
    
    # beta = 2 / (v^T v), but we normalize v
    beta = 2 * v[0]**2 / (sigma + v[0]**2)
    v = v / v[0]  # Normalize so v[0] = 1
    
    return v, beta
 
 
def householder_qr(A):
    """
    Compute QR decomposition using Householder reflections.
    
    This is the numerically stable, backward-stable algorithm 
    used by LAPACK and numpy.linalg.qr.
    
    Returns Q explicitly (for demonstration), though efficient 
    implementations store the Householder vectors instead.
    """
    m, n = A.shape
    Q = np.eye(m)
    R = A.copy().astype(float)
    
    for j in range(min(m-1, n)):
        # Apply Householder to zero out below-diagonal in column j
        x = R[j:, j]
        
        # Compute Householder vector for this column
        v, beta = householder_vector(x)
        
        # Apply to R: R[j:, j:] = (I - beta*v*v^T) @ R[j:, j:]
        # Efficient computation: R = R - beta * v * (v^T @ R)
        R[j:, j:] = R[j:, j:] - beta * np.outer(v, v @ R[j:, j:])
        
        # Accumulate into Q: Q = Q @ H_j^T = Q @ H_j
        # Q[:, j:] = Q[:, j:] @ (I - beta*v*v^T)
        Q[:, j:] = Q[:, j:] - beta * np.outer(Q[:, j:] @ v, v)
    
    return Q[:, :n], R[:n, :]  # Return thin QR
 
 
def verify_householder_qr():
    """
    Verify our implementation against numpy and check stability.
    """
    np.random.seed(42)
    
    # Test on ill-conditioned matrix
    m, n = 50, 20
    U, _ = np.linalg.qr(np.random.randn(m, m))
    V, _ = np.linalg.qr(np.random.randn(n, n))
    s = np.logspace(0, -8, n)  # κ ≈ 10^8
    A = U[:, :n] @ np.diag(s) @ V
    
    print("=== Householder QR Verification ===")
    print(f"Matrix condition number: {np.linalg.cond(A):.2e}")
    
    # Our implementation
    Q_ours, R_ours = householder_qr(A)
    
    # NumPy implementation
    Q_np, R_np = np.linalg.qr(A, mode='reduced')
    
    # Check properties
    print(f"\n||Q^T Q - I||_F (ours):    {np.linalg.norm(Q_ours.T @ Q_ours - np.eye(n)):.2e}")
    print(f"||Q^T Q - I||_F (numpy):   {np.linalg.norm(Q_np.T @ Q_np - np.eye(n)):.2e}")
    print(f"||QR - A||_F (ours):       {np.linalg.norm(Q_ours @ R_ours - A):.2e}")
    print(f"||QR - A||_F (numpy):      {np.linalg.norm(Q_np @ R_np - A):.2e}")
    
    # Test least squares with ill-conditioned matrix
    beta_true = np.ones(n)
    y = A @ beta_true
    
    # Solve via our QR
    from scipy.linalg import solve_triangular
    c = Q_ours.T @ y
    beta_qr = solve_triangular(R_ours, c)
    
    print(f"\nLeast squares ||β - β_true||: {np.linalg.norm(beta_qr - beta_true):.2e}")
 
verify_householder_qr()

Efficient Implementation Note

Real implementations don't form Q explicitly. They store the Householder vectors compactly in the lower triangle of R (since those values would be zero anyway). When you need Qᵀb, apply each Householder reflection sequentially: H_p...H_2·H_1·b. This is what np.linalg.qr(mode='raw') returns.

Pivoted QR: Handling Rank Deficiency

Standard QR decomposition assumes X has full column rank. But what if some columns are linearly dependent (or nearly so)? This happens frequently with:

Categorical variables with all levels included (perfect multicollinearity)
Automatically generated polynomial features
High-dimensional genomics/imaging data

Pivoted QR (also called Column-Pivoted QR or QR with Pivoting) addresses this by reordering columns during factorization:

$$XP = QR$$

where P is a permutation matrix that reorders columns in decreasing importance.

How Pivoting Works:

At each step j, instead of reflecting column j as-is, we first swap in the column with the largest remaining norm. This ensures:

The largest possible diagonal elements r_{jj} appear first
Numerical rank is revealed: if r_{kk} ≈ 0 for some k, columns k through p are (numerically) linearly dependent
The factorization remains well-conditioned for the first k columns

Rank Determination:

With pivoted QR, we can identify the numerical rank by finding where diagonal elements of R drop sharply:

$$r_{11} \geq r_{22} \geq \cdots \geq r_{kk} \gg r_{k+1,k+1} \approx \cdots \approx r_{pp} \approx 0$$

Columns k+1 through p (after permutation) are numerically in the null space.

pivoted_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import numpy as np
from scipy.linalg import qr
 
def analyze_rank_deficiency():
    """
    Demonstrates pivoted QR for detecting and handling rank deficiency.
    """
    np.random.seed(42)
    n, p = 100, 10
    
    # Create rank-deficient matrix: rank = 7 instead of 10
    true_rank = 7
    A = np.random.randn(n, true_rank) @ np.random.randn(true_rank, p)
    A += 1e-12 * np.random.randn(n, p)  # Tiny noise
    
    print("=== Rank-Deficient Least Squares ===")
    print(f"Matrix shape: {n} × {p}")
    print(f"True rank: {true_rank}")
    print(f"Numerical rank (SVD): {np.linalg.matrix_rank(A)}")
    
    # Standard QR (no pivoting)
    Q_std, R_std = np.linalg.qr(A, mode='reduced')
    
    # Pivoted QR
    Q_piv, R_piv, P = qr(A, pivoting=True, mode='economic')
    
    print(f"\nDiagonal of R (standard QR):")
    print(f"  {np.diag(R_std)}")
    
    print(f"\nDiagonal of R (pivoted QR, sorted by magnitude):")
    print(f"  {np.diag(R_piv)}")
    
    print(f"\nColumn permutation P: {P}")
    
    # Detect numerical rank from pivoted R
    r_diag = np.abs(np.diag(R_piv))
    threshold = r_diag[0] * 1e-10
    detected_rank = np.sum(r_diag > threshold)
    print(f"\nDetected numerical rank from pivoted QR: {detected_rank}")
    
    # Solve least squares with rank awareness
    y = np.random.randn(n)
    
    # Full solution (may be unstable for rank-deficient)
    try:
        beta_full = np.linalg.lstsq(A, y, rcond=None)[0]
        print(f"\nFull solution (lstsq) norm: {np.linalg.norm(beta_full):.4f}")
    except:
        print("Full solution failed!")
    
    # Truncated solution using only reliable components
    k = detected_rank
    R_k = R_piv[:k, :k]
    c = Q_piv.T @ y
    
    # Solve for first k components, zero out the rest
    beta_trunc = np.zeros(p)
    from scipy.linalg import solve_triangular
    beta_trunc[P[:k]] = solve_triangular(R_k, c[:k])
    
    print(f"Truncated solution norm:    {np.linalg.norm(beta_trunc):.4f}")
    print(f"Truncated residual:         {np.linalg.norm(A @ beta_trunc - y):.4f}")
    print(f"Full residual:              {np.linalg.norm(A @ beta_full - y):.4f}")
    
    print("\nNote: Truncated solution has similar residual but much smaller norm!")
    print("This gives the minimum-norm solution for rank-deficient problems.")
 
analyze_rank_deficiency()

Pivoted QR vs SVD for Rank Deficiency

Both pivoted QR and SVD can handle rank deficiency. SVD gives the mathematically optimal minimum-norm solution (the pseudoinverse). Pivoted QR is faster (O(np²) vs O(np² + p³)) and often sufficient. When you need the absolute best numerical properties, use SVD; for practical applications, pivoted QR is usually adequate.

Summary: QR Decomposition for Least Squares

We've covered the essential theory and practice of QR decomposition for solving least squares problems numerically stably:

Key Takeaways

•Orthogonal matrices preserve conditioning — κ(Q) = 1 always, so multiplying by Q never amplifies errors.
•QR avoids condition number squaring — By working with X directly, we face κ(X), not κ(XᵀX) = κ(X)².
•The algorithm solves Rβ = Qᵀy — After computing X = QR, least squares reduces to trivial back-substitution.
•Gram-Schmidt illustrates the concept — Modified GS is more stable than Classical GS, but neither is fully backward stable.
•Householder QR is the gold standard — This is what np.linalg.qr computes; it's fully backward stable.
•Pivoted QR handles rank deficiency — Column pivoting reveals numerical rank and enables minimum-norm solutions.

QR Algorithms Comparison
Algorithm	Stability	Cost	Use Case
Classical Gram-Schmidt	Poor	O(np²)	Educational only
Modified Gram-Schmidt	Good	O(np²)	Incremental updates, understanding
Householder QR	Excellent (backward stable)	O(np² - p³/3)	Default choice in libraries
Givens QR	Excellent	O(np²)	Sparse matrices, parallel
Pivoted Householder QR	Excellent + rank-revealing	O(np²)	Rank-deficient problems

What's Next:

Now that we understand how to solve least squares stably, the next page analyzes the computational complexity of different approaches. We'll compare the operation counts for normal equations, QR, and SVD, and understand when each method is appropriate based on problem dimensions.

Page Complete

You now understand QR decomposition—the backbone of numerically stable least squares. This knowledge is essential for implementing reliable regression systems and understanding what library functions like np.linalg.lstsq actually compute under the hood.

2 / 5

Loading learning content...

Machine LearningComputational Considerations

Computational Considerations in Linear Regression

LevelAdvanced

Duration90 mins

TopicComputational Considerations

2 / 5

QR Decomposition Solution

The Elegant Solution to Numerical Instability

The solution isn't to find a better way to invert XᵀX—it's to never form XᵀX in the first place.

What You Will Learn

Orthogonal Matrices: The Numerically Safe Transformations

Before diving into QR decomposition, we must understand why orthogonal matrices are the cornerstone of numerical linear algebra. An orthogonal matrix Q satisfies:

$$Q^\top Q = QQ^\top = I \quad \Leftrightarrow \quad Q^{-1} = Q^\top$$

Geometric Interpretation:

Orthogonal matrices represent rotations and reflections—rigid transformations that preserve lengths and angles. When we multiply a vector by Q:

Lengths are preserved: ‖Qx‖ = ‖x‖ for all x
Angles are preserved: ⟨Qx, Qy⟩ = ⟨x, y⟩ for all x, y
The inverse is trivial to compute: Q⁻¹ = Qᵀ (just transpose, no numerical computation needed)

The Condition Number of Orthogonal Matrices

Why This Matters for Stability:

Suppose we want to solve Ax = b but A is ill-conditioned. If we can factor A = QR where Q is orthogonal, then:

$$Ax = b \implies QRx = b \implies Rx = Q^\top b$$

Critically, the condition number we face is κ(R) = κ(A), not κ(AᵀA) = κ(A)². We've avoided the squaring!

Comparison: Normal Equations vs QR Decomposition
Property	Normal Equations	QR Decomposition
Solves	(XᵀX)β = Xᵀy	Rβ = Qᵀy
Condition experienced	κ(XᵀX) = κ(X)²	κ(R) = κ(X)
Matrix to invert	XᵀX (p × p)	None—back-substitution
Numerical stability	Unstable for ill-conditioned X	Backward stable
Extra computation	Form XᵀX	Compute Q, R
Recommended when	X is very well-conditioned	Always (default choice)

QR Decomposition: Definition and Least Squares Connection

The QR Factorization:

For any matrix X ∈ ℝⁿˣᵖ with n ≥ p (more rows than columns), we can compute:

$$X = QR$$

where:

Q ∈ ℝⁿˣⁿ is orthogonal (full QR) or Q ∈ ℝⁿˣᵖ with orthonormal columns (thin/reduced QR)
R ∈ ℝᵖˣᵖ is upper triangular (for thin QR) or R ∈ ℝⁿˣᵖ (for full QR, but bottom n-p rows are zero)

The reduced/thin QR is most useful for least squares:

$$X = Q_1 R_1 \quad \text{where } Q_1 \in \mathbb{R}^{n \times p}, R_1 \in \mathbb{R}^{p \times p}$$

Derivation: QR Solution to Least Squares

We want to minimize ‖Xβ - y‖². Substituting X = Q₁R₁:

$$|X\beta - y|^2 = |Q_1 R_1 \beta - y|^2$$

Now, for any orthogonal matrix Q:

$$|Qz| = |z| \quad \text{(orthogonal matrices preserve norms)}$$

Multiplying by Qᵀ (which is also orthogonal) preserves the norm:

$$|Q_1 R_1 \beta - y|^2 = |Q^\top(Q_1 R_1 \beta - y)|^2$$

where Q is the full orthogonal matrix. Partitioning Q = [Q₁ | Q₂]:

$$Q^\top y = \begin{bmatrix} Q_1^\top y \ Q_2^\top y \end{bmatrix} = \begin{bmatrix} c_1 \ c_2 \end{bmatrix}$$

The residual norm becomes:

$$|R_1\beta - c_1|^2 + |c_2|^2$$

The Key Insight

The QR Least Squares Algorithm:

Compute the QR factorization: X = Q₁R₁ (thin QR)
Form c₁ = Q₁ᵀy
Solve the triangular system R₁β = c₁ by back-substitution

Why Back-Substitution is Stable:

Solving an upper-triangular system:

is trivial: start from the bottom row and work up:

$$\beta_p = c_p / r_{pp}, \quad \beta_{p-1} = (c_{p-1} - r_{p-1,p}\beta_p) / r_{p-1,p-1}, \quad \ldots$$

Each step involves only additions, multiplications, and a single division—no matrix inversion, no condition-number-squaring.

qr_least_squares.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import numpy as np
from scipy.linalg import solve_triangular
 
def qr_least_squares(X, y):
    """
    Solve least squares using QR decomposition.
    
    This is the numerically stable approach:
    1. Compute thin QR: X = Q @ R
    2. Compute c = Q.T @ y
    3. Solve triangular system: R @ beta = c
    
    Parameters:
    -----------
    X : ndarray of shape (n, p)
        Design matrix
    y : ndarray of shape (n,)
        Target vector
        
    Returns:
    --------
    beta : ndarray of shape (p,)
        Least squares solution
    residual : float
        Squared residual norm ||X @ beta - y||^2
    """
    n, p = X.shape
    
    # Step 1: Compute thin QR factorization
    # mode='reduced' gives Q1 (n x p) and R (p x p)
    Q, R = np.linalg.qr(X, mode='reduced')
    
    # Step 2: Compute c = Q^T @ y
    c = Q.T @ y
    
    # Step 3: Solve R @ beta = c via back-substitution
    # solve_triangular is more efficient and numerically stable
    beta = solve_triangular(R, c)
    
    # Compute residual: ||y - X @ beta||^2
    # Equivalently: ||y||^2 - ||c||^2 (the part orthogonal to col(X))
    residual = np.linalg.norm(y)**2 - np.linalg.norm(c)**2
    
    return beta, residual
 
 
def compare_methods(X, y, beta_true):
    """
    Compare normal equations vs QR for the same problem.
    """
    # Method 1: Normal Equations (unstable)
    XtX = X.T @ X
    Xty = X.T @ y
    try:
        beta_normal = np.linalg.solve(XtX, Xty)
        error_normal = np.linalg.norm(beta_normal - beta_true)
    except np.linalg.LinAlgError:
        beta_normal = None
        error_normal = np.inf
    
    # Method 2: QR Decomposition (stable)
    beta_qr, residual = qr_least_squares(X, y)
    error_qr = np.linalg.norm(beta_qr - beta_true)
    
    print(f"Condition number κ(X):   {np.linalg.cond(X):.2e}")
    print(f"Condition number κ(XᵀX): {np.linalg.cond(XtX):.2e}")
    print(f"Normal Equations Error:  {error_normal:.2e}")
    print(f"QR Decomposition Error:  {error_qr:.2e}")
    print(f"QR advantage factor:     {error_normal / error_qr:.1f}x")
    
    return beta_normal, beta_qr
 
 
# Demonstration with ill-conditioned problem
np.random.seed(42)
n, p = 100, 10
 
# Create ill-conditioned X using specific singular values
U, _ = np.linalg.qr(np.random.randn(n, n))
V, _ = np.linalg.qr(np.random.randn(p, p))
singular_values = np.logspace(0, -8, p)  # 1, 1e-8 range → κ ≈ 10^8
X = U[:, :p] @ np.diag(singular_values) @ V
 
# True coefficients and generate data
beta_true = np.ones(p)
y = X @ beta_true + 1e-10 * np.random.randn(n)  # Tiny noise
 
print("=== Ill-Conditioned Least Squares Comparison ===")
compare_methods(X, y, beta_true)

Gram-Schmidt Orthogonalization

The simplest way to compute QR is the Gram-Schmidt process, an iterative algorithm that orthogonalizes columns one at a time.

The Classical Gram-Schmidt Algorithm:

Given matrix X with columns x₁, x₂, ..., xₚ, we construct orthonormal vectors q₁, q₂, ..., qₚ:

q₁ = x₁ / ‖x₁‖ (normalize first column)
For j = 2, 3, ..., p:
- Remove components in the direction of previous q's: $$\tilde{q}j = x_j - \sum{i=1}^{j-1} (q_i^\top x_j) q_i$$
- Normalize: q_j = q̃_j / ‖q̃_j‖

The R matrix is built as we go:

r_{ij} = q_i^ᵀ x_j for i < j (projections we subtract)
r_{jj} = ‖q̃_j‖ (the norm before normalizing)

Why Classical Gram-Schmidt is Numerically Problematic:

The Fix: Modified Gram-Schmidt (MGS)

Modified Gram-Schmidt makes a subtle but crucial change: instead of computing all projections at once using the original xⱼ, we update xⱼ incrementally after each projection:

q₁ = x₁ / ‖x₁‖
For j = 2, 3, ..., p:
- Start with x̃ⱼ = xⱼ
- For i = 1, 2, ..., j-1:
  - r_{ij} = q_i^ᵀ x̃_j
  - x̃_j ← x̃_j - r_{ij} q_i (update immediately!)
- r_{jj} = ‖x̃ⱼ‖, q_j = x̃_j / r_{jj}

Mathematically identical to CGS, but numerically much more stable.

gram_schmidt.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
import numpy as np
 
def classical_gram_schmidt(X):
    """
    Classical Gram-Schmidt orthogonalization.
    
    Mathematically correct but numerically unstable for 
    ill-conditioned matrices.
    """
    n, p = X.shape
    Q = np.zeros((n, p))
    R = np.zeros((p, p))
    
    for j in range(p):
        # Start with the j-th column
        v = X[:, j].copy()
        
        # Subtract projections onto all previous q vectors
        for i in range(j):
            R[i, j] = np.dot(Q[:, i], X[:, j])  # Using ORIGINAL X[:, j]
            v = v - R[i, j] * Q[:, i]
        
        # Normalize
        R[j, j] = np.linalg.norm(v)
        if R[j, j] > 1e-14:
            Q[:, j] = v / R[j, j]
        else:
            Q[:, j] = v  # Near-zero column (rank deficiency)
    
    return Q, R
 
 
def modified_gram_schmidt(X):
    """
    Modified Gram-Schmidt orthogonalization.
    
    Numerically more stable than classical GS because 
    projections are computed incrementally.
    """
    n, p = X.shape
    Q = np.zeros((n, p))
    R = np.zeros((p, p))
    V = X.copy().astype(float)  # Work with a copy
    
    for j in range(p):
        # First, orthogonalize column j against all previous columns
        for i in range(j):
            R[i, j] = np.dot(Q[:, i], V[:, j])  # Using CURRENT V[:, j]
            V[:, j] = V[:, j] - R[i, j] * Q[:, i]  # Immediate update!
        
        # Normalize
        R[j, j] = np.linalg.norm(V[:, j])
        if R[j, j] > 1e-14:
            Q[:, j] = V[:, j] / R[j, j]
        else:
            Q[:, j] = V[:, j]
    
    return Q, R
 
 
def measure_orthogonality(Q):
    """
    Measure how close Q^T Q is to identity.
    Returns ||Q^T Q - I||_F / p
    """
    p = Q.shape[1]
    return np.linalg.norm(Q.T @ Q - np.eye(p), 'fro') / p
 
 
def compare_gram_schmidt_stability():
    """
    Demonstrates the numerical advantage of Modified Gram-Schmidt.
    """
    np.random.seed(42)
    
    # Create increasingly ill-conditioned matrices
    n, p = 50, 20
    
    condition_numbers = [1e2, 1e4, 1e6, 1e8, 1e10]
    
    print("Gram-Schmidt Orthogonality Comparison")
    print("=" * 60)
    print(f"{'κ(X)':<12} {'CGS Orth. Error':<18} {'MGS Orth. Error':<18}")
    print("-" * 60)
    
    for kappa in condition_numbers:
        # Construct X with specified condition number
        U, _ = np.linalg.qr(np.random.randn(n, p))
        V, _ = np.linalg.qr(np.random.randn(p, p))
        s = np.logspace(0, -np.log10(kappa), p)
        X = U @ np.diag(s) @ V
        
        # Classical Gram-Schmidt
        Q_cgs, R_cgs = classical_gram_schmidt(X)
        orth_error_cgs = measure_orthogonality(Q_cgs)
        
        # Modified Gram-Schmidt
        Q_mgs, R_mgs = modified_gram_schmidt(X)
        orth_error_mgs = measure_orthogonality(Q_mgs)
        
        print(f"{kappa:<12.0e} {orth_error_cgs:<18.2e} {orth_error_mgs:<18.2e}")
    
    print()
    print("Observation: MGS maintains better orthogonality as κ(X) grows.")
    print("However, even MGS degrades—Householder QR is needed for extreme cases.")
 
compare_gram_schmidt_stability()

Gram-Schmidt Limitations

Householder Reflections: The Gold Standard

The Key Idea:

Instead of building Q column by column (Gram-Schmidt), Householder works by applying a sequence of reflections that zero out elements below the diagonal.

A Householder reflection (or elementary reflector) is a matrix of the form:

$$H = I - 2vv^\top \quad \text{where } |v| = 1$$

This matrix reflects vectors across the hyperplane orthogonal to v.

Properties of Householder Reflectors:

Orthogonal: Hᵀ = H, H² = I, so Hᵀ = H⁻¹
Symmetric: H = Hᵀ
Efficient storage: Only need to store v (not the full n×n matrix)
Efficient application: Hx = x - 2v(vᵀx) costs O(n), not O(n²)

The Core Problem We Solve:

Given a vector x, find H such that Hx has all zeros below the first entry:

$$Hx = \alpha e_1 = \begin{bmatrix} \alpha \ 0 \ \vdots \ 0 \end{bmatrix}$$

Since H preserves lengths: |α| = ‖x‖

The Formula:

$$v = x \pm |x| e_1, \quad v = v / |v|$$

(Choose sign to avoid catastrophic cancellation in first component: use sign(x₁)·‖x‖ for stability)

householder_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
import numpy as np
 
def householder_vector(x):
    """
    Compute Householder vector v such that (I - 2vv^T)x is zero 
    below the first entry.
    
    Uses the numerically stable formula with sign choice.
    """
    n = len(x)
    
    # Compute norm
    sigma = np.dot(x[1:], x[1:])  # Sum of squares below first element
    v = np.zeros(n)
    v[0] = 1.0
    v[1:] = x[1:]
    
    if sigma < 1e-30:  # x is already in e_1 direction
        beta = 0
        return v, beta
    
    mu = np.sqrt(x[0]**2 + sigma)
    
    # Choose sign to avoid cancellation
    if x[0] <= 0:
        v[0] = x[0] - mu
    else:
        v[0] = -sigma / (x[0] + mu)
    
    # beta = 2 / (v^T v), but we normalize v
    beta = 2 * v[0]**2 / (sigma + v[0]**2)
    v = v / v[0]  # Normalize so v[0] = 1
    
    return v, beta
 
 
def householder_qr(A):
    """
    Compute QR decomposition using Householder reflections.
    
    This is the numerically stable, backward-stable algorithm 
    used by LAPACK and numpy.linalg.qr.
    
    Returns Q explicitly (for demonstration), though efficient 
    implementations store the Householder vectors instead.
    """
    m, n = A.shape
    Q = np.eye(m)
    R = A.copy().astype(float)
    
    for j in range(min(m-1, n)):
        # Apply Householder to zero out below-diagonal in column j
        x = R[j:, j]
        
        # Compute Householder vector for this column
        v, beta = householder_vector(x)
        
        # Apply to R: R[j:, j:] = (I - beta*v*v^T) @ R[j:, j:]
        # Efficient computation: R = R - beta * v * (v^T @ R)
        R[j:, j:] = R[j:, j:] - beta * np.outer(v, v @ R[j:, j:])
        
        # Accumulate into Q: Q = Q @ H_j^T = Q @ H_j
        # Q[:, j:] = Q[:, j:] @ (I - beta*v*v^T)
        Q[:, j:] = Q[:, j:] - beta * np.outer(Q[:, j:] @ v, v)
    
    return Q[:, :n], R[:n, :]  # Return thin QR
 
 
def verify_householder_qr():
    """
    Verify our implementation against numpy and check stability.
    """
    np.random.seed(42)
    
    # Test on ill-conditioned matrix
    m, n = 50, 20
    U, _ = np.linalg.qr(np.random.randn(m, m))
    V, _ = np.linalg.qr(np.random.randn(n, n))
    s = np.logspace(0, -8, n)  # κ ≈ 10^8
    A = U[:, :n] @ np.diag(s) @ V
    
    print("=== Householder QR Verification ===")
    print(f"Matrix condition number: {np.linalg.cond(A):.2e}")
    
    # Our implementation
    Q_ours, R_ours = householder_qr(A)
    
    # NumPy implementation
    Q_np, R_np = np.linalg.qr(A, mode='reduced')
    
    # Check properties
    print(f"\n||Q^T Q - I||_F (ours):    {np.linalg.norm(Q_ours.T @ Q_ours - np.eye(n)):.2e}")
    print(f"||Q^T Q - I||_F (numpy):   {np.linalg.norm(Q_np.T @ Q_np - np.eye(n)):.2e}")
    print(f"||QR - A||_F (ours):       {np.linalg.norm(Q_ours @ R_ours - A):.2e}")
    print(f"||QR - A||_F (numpy):      {np.linalg.norm(Q_np @ R_np - A):.2e}")
    
    # Test least squares with ill-conditioned matrix
    beta_true = np.ones(n)
    y = A @ beta_true
    
    # Solve via our QR
    from scipy.linalg import solve_triangular
    c = Q_ours.T @ y
    beta_qr = solve_triangular(R_ours, c)
    
    print(f"\nLeast squares ||β - β_true||: {np.linalg.norm(beta_qr - beta_true):.2e}")
 
verify_householder_qr()

Efficient Implementation Note

Pivoted QR: Handling Rank Deficiency

Standard QR decomposition assumes X has full column rank. But what if some columns are linearly dependent (or nearly so)? This happens frequently with:

Categorical variables with all levels included (perfect multicollinearity)
Automatically generated polynomial features
High-dimensional genomics/imaging data

Pivoted QR (also called Column-Pivoted QR or QR with Pivoting) addresses this by reordering columns during factorization:

$$XP = QR$$

where P is a permutation matrix that reorders columns in decreasing importance.

How Pivoting Works:

At each step j, instead of reflecting column j as-is, we first swap in the column with the largest remaining norm. This ensures:

The largest possible diagonal elements r_{jj} appear first
Numerical rank is revealed: if r_{kk} ≈ 0 for some k, columns k through p are (numerically) linearly dependent
The factorization remains well-conditioned for the first k columns

Rank Determination:

With pivoted QR, we can identify the numerical rank by finding where diagonal elements of R drop sharply:

$$r_{11} \geq r_{22} \geq \cdots \geq r_{kk} \gg r_{k+1,k+1} \approx \cdots \approx r_{pp} \approx 0$$

Columns k+1 through p (after permutation) are numerically in the null space.

pivoted_qr.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import numpy as np
from scipy.linalg import qr
 
def analyze_rank_deficiency():
    """
    Demonstrates pivoted QR for detecting and handling rank deficiency.
    """
    np.random.seed(42)
    n, p = 100, 10
    
    # Create rank-deficient matrix: rank = 7 instead of 10
    true_rank = 7
    A = np.random.randn(n, true_rank) @ np.random.randn(true_rank, p)
    A += 1e-12 * np.random.randn(n, p)  # Tiny noise
    
    print("=== Rank-Deficient Least Squares ===")
    print(f"Matrix shape: {n} × {p}")
    print(f"True rank: {true_rank}")
    print(f"Numerical rank (SVD): {np.linalg.matrix_rank(A)}")
    
    # Standard QR (no pivoting)
    Q_std, R_std = np.linalg.qr(A, mode='reduced')
    
    # Pivoted QR
    Q_piv, R_piv, P = qr(A, pivoting=True, mode='economic')
    
    print(f"\nDiagonal of R (standard QR):")
    print(f"  {np.diag(R_std)}")
    
    print(f"\nDiagonal of R (pivoted QR, sorted by magnitude):")
    print(f"  {np.diag(R_piv)}")
    
    print(f"\nColumn permutation P: {P}")
    
    # Detect numerical rank from pivoted R
    r_diag = np.abs(np.diag(R_piv))
    threshold = r_diag[0] * 1e-10
    detected_rank = np.sum(r_diag > threshold)
    print(f"\nDetected numerical rank from pivoted QR: {detected_rank}")
    
    # Solve least squares with rank awareness
    y = np.random.randn(n)
    
    # Full solution (may be unstable for rank-deficient)
    try:
        beta_full = np.linalg.lstsq(A, y, rcond=None)[0]
        print(f"\nFull solution (lstsq) norm: {np.linalg.norm(beta_full):.4f}")
    except:
        print("Full solution failed!")
    
    # Truncated solution using only reliable components
    k = detected_rank
    R_k = R_piv[:k, :k]
    c = Q_piv.T @ y
    
    # Solve for first k components, zero out the rest
    beta_trunc = np.zeros(p)
    from scipy.linalg import solve_triangular
    beta_trunc[P[:k]] = solve_triangular(R_k, c[:k])
    
    print(f"Truncated solution norm:    {np.linalg.norm(beta_trunc):.4f}")
    print(f"Truncated residual:         {np.linalg.norm(A @ beta_trunc - y):.4f}")
    print(f"Full residual:              {np.linalg.norm(A @ beta_full - y):.4f}")
    
    print("\nNote: Truncated solution has similar residual but much smaller norm!")
    print("This gives the minimum-norm solution for rank-deficient problems.")
 
analyze_rank_deficiency()

Pivoted QR vs SVD for Rank Deficiency

Summary: QR Decomposition for Least Squares

We've covered the essential theory and practice of QR decomposition for solving least squares problems numerically stably:

Key Takeaways

•Orthogonal matrices preserve conditioning — κ(Q) = 1 always, so multiplying by Q never amplifies errors.
•QR avoids condition number squaring — By working with X directly, we face κ(X), not κ(XᵀX) = κ(X)².
•The algorithm solves Rβ = Qᵀy — After computing X = QR, least squares reduces to trivial back-substitution.
•Gram-Schmidt illustrates the concept — Modified GS is more stable than Classical GS, but neither is fully backward stable.
•Householder QR is the gold standard — This is what np.linalg.qr computes; it's fully backward stable.
•Pivoted QR handles rank deficiency — Column pivoting reveals numerical rank and enables minimum-norm solutions.

QR Algorithms Comparison
Algorithm	Stability	Cost	Use Case
Classical Gram-Schmidt	Poor	O(np²)	Educational only
Modified Gram-Schmidt	Good	O(np²)	Incremental updates, understanding
Householder QR	Excellent (backward stable)	O(np² - p³/3)	Default choice in libraries
Givens QR	Excellent	O(np²)	Sparse matrices, parallel
Pivoted Householder QR	Excellent + rank-revealing	O(np²)	Rank-deficient problems

What's Next:

Page Complete

2 / 5