Data Structures & AlgorithmsDivide and Conquer

Matrix Multiplication & Strassen's Algorithm

LevelAdvanced

Duration75 mins

TopicDivide and Conquer

2 / 4

Strassen's Algorithm Overview

A Revolutionary Discovery

In 1969, German mathematician Volker Strassen published a paper that shocked the computational mathematics community. For decades, the O(n³) complexity of matrix multiplication was considered fundamental and unavoidable—after all, the mathematical definition itself seemed to require exactly n³ multiplications.

Strassen proved everyone wrong with an elegant insight: by computing clever combinations of matrix elements, he could multiply two 2×2 matrices using only 7 multiplications instead of the conventional 8. This single saved multiplication, when applied recursively, yields an algorithm with complexity O(n^log₂7) ≈ O(n^2.807)—approximately 20% fewer operations asymptotically.

What You Will Learn

By the end of this page, you will understand the historical significance of Strassen's algorithm, why reducing 8 multiplications to 7 matters so profoundly, the high-level structure of the divide-and-conquer approach, and the famous formulas that make it work. You'll see how algebraic cleverness trumps brute force.

Historical Context: Breaking a Fundamental Barrier

Before Strassen, mathematicians and computer scientists believed O(n³) was the inherent cost of matrix multiplication. The reasoning seemed airtight:

The result matrix has n² entries
Each entry is defined as a sum of n products
Therefore, we need n³ products—case closed

This argument has a subtle flaw: it assumes each entry must be computed independently. Strassen realized that matrix entries are not independent—they share algebraic relationships that can be exploited.

Strassen's 1969 paper, titled "Gaussian Elimination is not Optimal," established two revolutionary results:

Matrix multiplication can be done in subcubic time
Gaussian elimination (for solving linear systems) is not computationally optimal

Impact on Computer Science

Strassen's algorithm was transformational not just for its practical implications, but for what it revealed: that naive algorithms, even those derived directly from mathematical definitions, are not necessarily optimal. This inspired decades of research into fast matrix multiplication, leading to algorithms approaching O(n²) complexity.

Evolution of Matrix Multiplication Complexity
Year	Algorithm	Complexity	Improvement over Previous
Ancient	Standard	O(n³) = O(n^3.0)	Baseline
1969	Strassen	O(n^2.807)	~6.5% exponent reduction
1978	Pan	O(n^2.796)	Additional improvement
1981	Schönhage	O(n^2.522)	Significant advance
1987	Coppersmith-Winograd	O(n^2.376)	Major breakthrough
2012	Williams	O(n^2.3728642)	Refined techniques
2020	Alman-Williams	O(n^2.3728596)	Current best known
???	Optimal	Ω(n²) ≤ ? ≤ O(n^2.373)	Open problem

The progression from n^3.0 to n^2.373 represents an enormous improvement for large matrices. For n = 10,000:

Standard: 10^12 operations
Strassen: 10^11.2 ≈ 1.6 × 10^11 operations (84% reduction)
Coppersmith-Winograd: 10^9.5 ≈ 3 × 10^9 operations (99.7% reduction)

However, the galactic algorithms (Coppersmith-Winograd and beyond) have such enormous constant factors that they're only faster for matrices larger than the observable universe could store. Strassen's algorithm is the only subcubic algorithm used in practice.

The Core Insight: Trading Additions for Multiplications

Strassen's algorithm is built on a crucial observation about computational costs:

Multiplications are expensive; additions are cheap.

For matrices, this is especially true:

Multiplying two n×n matrices normally requires O(n³) multiplications
Adding two n×n matrices only requires O(n²) additions

If we could somehow trade multiplications for additions—even at an unfavorable ratio—we might win overall. Strassen found that by computing clever sums and differences of matrix blocks, he could eliminate one multiplication out of eight.

Why does one multiplication matter?

Let T(n) be the time to multiply two n×n matrices. With the standard algorithm:

Divide into 2×2 blocks of size n/2
Compute 8 products of (n/2)×(n/2) matrices
Combine with O(n²) additions

Recurrence: T(n) = 8T(n/2) + O(n²)

By the Master Theorem: O(n^log₂8) = O(n³)

The Magic of Reducing Multiplications

With Strassen's 7 multiplications: T(n) = 7T(n/2) + O(n²). By the Master Theorem: O(n^log₂7) = O(n^2.807). Reducing from 8 to 7 recursive calls drops the exponent from 3.0 to 2.807—a permanent improvement at every level of recursion.

Standard 2×2 Block Multiplication

•8 matrix multiplications
•4 matrix additions
•Recurrence: T(n) = 8T(n/2) + O(n²)
•Complexity: O(n³)

Strassen's 2×2 Block Multiplication

•7 matrix multiplications
•18 matrix additions/subtractions
•Recurrence: T(n) = 7T(n/2) + O(n²)
•Complexity: O(n^2.807)

Block Decomposition: The Foundation

Both the standard and Strassen's algorithms begin by partitioning the input matrices into 2×2 blocks. Consider n×n matrices A and B (assuming n is a power of 2 for simplicity):

A = | A₁₁  A₁₂ |      B = | B₁₁  B₁₂ |
    | A₂₁  A₂₂ |          | B₂₁  B₂₂ |

Each block Aᵢⱼ and Bᵢⱼ is an (n/2)×(n/2) matrix.

The product C = A × B is:

C = | C₁₁  C₁₂ |    where each Cᵢⱼ is (n/2)×(n/2)
    | C₂₁  C₂₂ |

Standard block formulas:

C₁₁ = A₁₁B₁₁ + A₁₂B₂₁
C₁₂ = A₁₁B₁₂ + A₁₂B₂₂
C₂₁ = A₂₁B₁₁ + A₂₂B₂₁
C₂₂ = A₂₁B₁₂ + A₂₂B₂₂

Count: 8 multiplications (A₁₁B₁₁, A₁₂B₂₁, A₁₁B₁₂, A₁₂B₂₂, A₂₁B₁₁, A₂₂B₂₁, A₂₁B₁₂, A₂₂B₂₂) and 4 additions.

Block Matrix Multiplication Follows Regular Rules

The formulas for Cᵢⱼ are exactly what you'd expect from treating blocks as scalars: C₁₁ = A₁₁B₁₁ + A₁₂B₂₁ mirrors c₁₁ = a₁₁b₁₁ + a₁₂b₂₁ for 2×2 scalar matrices. This is the fundamental property that makes divide-and-conquer applicable to matrix multiplication.

The recursion:

To compute A₁₁B₁₁ (a product of (n/2)×(n/2) matrices), we recursively apply the same algorithm. This continues until we reach matrices small enough to multiply directly (typically 1×1 or when the overhead exceeds the benefit).

The key observation: If we can compute all four blocks of C using only 7 matrix multiplications (instead of 8), we save one recursive call at every level of recursion—and that saving compounds exponentially.

Strassen's Formulas: The Seven Products

Here is Strassen's magical set of intermediate products. These formulas may seem arbitrary, but they're carefully constructed to allow computing all four Cᵢⱼ blocks with only 7 multiplications.

Define seven auxiliary products:

M₁ = (A₁₁ + A₂₂)(B₁₁ + B₂₂)
M₂ = (A₂₁ + A₂₂)B₁₁
M₃ = A₁₁(B₁₂ - B₂₂)
M₄ = A₂₂(B₂₁ - B₁₁)
M₅ = (A₁₁ + A₁₂)B₂₂
M₆ = (A₂₁ - A₁₁)(B₁₁ + B₁₂)
M₇ = (A₁₂ - A₂₂)(B₂₁ + B₂₂)

Compute C blocks from M products:

C₁₁ = M₁ + M₄ - M₅ + M₇
C₁₂ = M₃ + M₅
C₂₁ = M₂ + M₄
C₂₂ = M₁ - M₂ + M₃ + M₆

The Formulas Aren't Intuitive—They're Algebraic Magic

Don't try to memorize these formulas by "understanding" them intuitively—they're the result of algebraic manipulation/search, not natural derivation. What matters is that they can be verified: expand each M, then expand each C formula, and you'll recover the standard block formulas exactly.

Verification (C₁₁ as example):

Let's verify that C₁₁ = M₁ + M₄ - M₅ + M₇ gives A₁₁B₁₁ + A₁₂B₂₁:

M₁ = (A₁₁ + A₂₂)(B₁₁ + B₂₂) = A₁₁B₁₁ + A₁₁B₂₂ + A₂₂B₁₁ + A₂₂B₂₂
M₄ = A₂₂(B₂₁ - B₁₁) = A₂₂B₂₁ - A₂₂B₁₁
M₅ = (A₁₁ + A₁₂)B₂₂ = A₁₁B₂₂ + A₁₂B₂₂
M₇ = (A₁₂ - A₂₂)(B₂₁ + B₂₂) = A₁₂B₂₁ + A₁₂B₂₂ - A₂₂B₂₁ - A₂₂B₂₂

M₁ + M₄ - M₅ + M₇ = 
  (A₁₁B₁₁ + A₁₁B₂₂ + A₂₂B₁₁ + A₂₂B₂₂)   [M₁]
+ (A₂₂B₂₁ - A₂₂B₁₁)                      [M₄]
- (A₁₁B₂₂ + A₁₂B₂₂)                      [-M₅]
+ (A₁₂B₂₁ + A₁₂B₂₂ - A₂₂B₂₁ - A₂₂B₂₂)   [M₇]

Collecting terms:
  A₁₁B₁₁: +1 → A₁₁B₁₁  ✓
  A₁₁B₂₂: +1 -1 = 0
  A₂₂B₁₁: +1 -1 = 0
  A₂₂B₂₂: +1 -1 = 0
  A₂₂B₂₁: +1 -1 = 0
  A₁₂B₂₂: -1 +1 = 0
  A₁₂B₂₁: +1 → A₁₂B₂₁  ✓

Result: C₁₁ = A₁₁B₁₁ + A₁₂B₂₁ ✓

The other three blocks (C₁₂, C₂₁, C₂₂) can be verified similarly.

Operation Count: Where the Savings Come From

Let's carefully count operations in Strassen's algorithm.

Multiplications:

7 products M₁ through M₇
Each Mᵢ is a single matrix multiplication (done recursively)
Total: 7 multiplications

Additions/Subtractions for computing M products:

M₁: A₁₁ + A₂₂ (1 add), B₁₁ + B₂₂ (1 add) → 2 additions
M₂: A₂₁ + A₂₂ (1 add) → 1 addition
M₃: B₁₂ - B₂₂ (1 sub) → 1 subtraction
M₄: B₂₁ - B₁₁ (1 sub) → 1 subtraction
M₅: A₁₁ + A₁₂ (1 add) → 1 addition
M₆: A₂₁ - A₁₁ (1 sub), B₁₁ + B₁₂ (1 add) → 2 operations
M₇: A₁₂ - A₂₂ (1 sub), B₂₁ + B₂₂ (1 add) → 2 operations

Subtotal: 10 additions/subtractions for M products

Additions/Subtractions for computing C blocks:

C₁₁ = M₁ + M₄ - M₅ + M₇ → 3 operations
C₁₂ = M₃ + M₅ → 1 operation
C₂₁ = M₂ + M₄ → 1 operation
C₂₂ = M₁ - M₂ + M₃ + M₆ → 3 operations

Subtotal: 8 additions/subtractions for C assembly

Total additions/subtractions: 18

Operation Comparison: Standard vs Strassen (per recursion level)
Operation Type	Standard Block Method	Strassen's Method
Matrix multiplications	8	7
Matrix additions/subtractions	4	18
Recursion depth impact	T(n) = 8T(n/2)	T(n) = 7T(n/2)
Asymptotic complexity	O(n³)	O(n^2.807)

The trade-off:

Strassen trades 1 multiplication for 14 additional additions (18 - 4 = 14). At first glance, this seems like a bad deal. But:

Multiplications are recursive: Each multiplication operates on (n/2)×(n/2) matrices
Additions are not recursive: Each addition operates on (n/2)×(n/2) matrices but costs only O((n/2)²) = O(n²/4)

The multiplications dominate because they recurse to the full depth log₂(n), while additions are done at each level. Saving one multiplication at every level accumulates to a better asymptotic bound.

Why Additions Don't Kill the Savings

Addition cost per level: 18 × (n/2)² = 4.5n². Total addition cost across log₂(n) levels: Still O(n² log n). Multiplication savings: From n^3 to n^2.807. The log factor from additions is absorbed into the dominant n^2.807 term.

Recurrence Relation and Complexity Derivation

Let's derive Strassen's complexity rigorously using recurrence relations.

Strassen's recurrence:

T(n) = 7T(n/2) + Θ(n²)

Where:

7T(n/2) = cost of 7 recursive multiplications on (n/2)×(n/2) matrices
Θ(n²) = cost of 18 matrix additions on (n/2)×(n/2) matrices = 18 × (n/2)² × c

Applying the Master Theorem:

For T(n) = aT(n/b) + f(n):

a = 7 (number of subproblems)
b = 2 (factor by which size reduces)
f(n) = Θ(n²)

Compare f(n) with n^(log_b(a)) = n^(log₂7) ≈ n^2.807:

Since n² = O(n^(2.807-ε)) for any ε < 0.807, we're in Case 1 of the Master Theorem.

Result: T(n) = Θ(n^(log₂7)) = Θ(n^2.807...)

Exact exponent:

log₂7 = ln(7)/ln(2) = 1.9459.../0.6931... = 2.8073549...

So Strassen's algorithm has complexity Θ(n^2.807) (or more precisely, Θ(n^log₂7)).

Comparison with standard algorithm:

Standard: T(n) = 8T(n/2) + Θ(n²) → Θ(n^log₂8) = Θ(n³)
Strassen: T(n) = 7T(n/2) + Θ(n²) → Θ(n^log₂7) = Θ(n^2.807)

Ratio for large n: n³ / n^2.807 = n^0.193

For n = 1000: 1000^0.193 ≈ 9.5
For n = 10000: 10000^0.193 ≈ 19
For n = 1000000: 1000000^0.193 ≈ 150

The improvement grows as matrices get larger!

The Power of Exponent Reduction

A reduction from 3.0 to 2.807 might seem modest (6.4%), but it's an EXPONENT reduction. For large n, this means Strassen's algorithm is asymptotically unboundedly faster than standard multiplication. As n → ∞, Strassen becomes infinitely faster.

The Complete Algorithm Structure

Here's the complete structure of Strassen's algorithm at a high level:

Algorithm: Strassen(A, B, n)

Input: Two n×n matrices A and B (n is a power of 2)
Output: Product C = A × B

1. BASE CASE:
   if n ≤ threshold:  // Usually 32-128
       return StandardMultiply(A, B)

2. DIVIDE:
   Split A into quadrants: A₁₁, A₁₂, A₂₁, A₂₂ (each n/2 × n/2)
   Split B into quadrants: B₁₁, B₁₂, B₂₁, B₂₂ (each n/2 × n/2)

3. COMPUTE SUMS/DIFFERENCES (18 matrix operations):
   S₁ = A₁₁ + A₂₂     S₂ = B₁₁ + B₂₂
   S₃ = A₂₁ + A₂₂     S₄ = B₁₂ - B₂₂
   S₅ = B₂₁ - B₁₁     S₆ = A₁₁ + A₁₂
   S₇ = A₂₁ - A₁₁     S₈ = B₁₁ + B₁₂
   S₉ = A₁₂ - A₂₂     S₁₀ = B₂₁ + B₂₂

4. RECURSIVE MULTIPLICATIONS (7 calls):
   M₁ = Strassen(S₁, S₂, n/2)      // (A₁₁ + A₂₂)(B₁₁ + B₂₂)
   M₂ = Strassen(S₃, B₁₁, n/2)     // (A₂₁ + A₂₂)B₁₁
   M₃ = Strassen(A₁₁, S₄, n/2)     // A₁₁(B₁₂ - B₂₂)
   M₄ = Strassen(A₂₂, S₅, n/2)     // A₂₂(B₂₁ - B₁₁)
   M₅ = Strassen(S₆, B₂₂, n/2)     // (A₁₁ + A₁₂)B₂₂
   M₆ = Strassen(S₇, S₈, n/2)      // (A₂₁ - A₁₁)(B₁₁ + B₁₂)
   M₇ = Strassen(S₉, S₁₀, n/2)     // (A₁₂ - A₂₂)(B₂₁ + B₂₂)

5. COMBINE (assemble C from M products):
   C₁₁ = M₁ + M₄ - M₅ + M₇
   C₁₂ = M₃ + M₅
   C₂₁ = M₂ + M₄
   C₂₂ = M₁ - M₂ + M₃ + M₆

6. return C assembled from C₁₁, C₁₂, C₂₁, C₂₂

strassen_overview
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
 
def strassen(A: np.ndarray, B: np.ndarray, threshold: int = 64) -> np.ndarray:
    """
    Strassen's matrix multiplication algorithm.
    
    Args:
        A, B: Square matrices of size n×n (n should be power of 2)
        threshold: Size below which to use standard multiplication
    
    Returns:
        Product C = A × B
    """
    n = A.shape[0]
    
    # Base case: use standard multiplication for small matrices
    if n <= threshold:
        return A @ B  # NumPy's optimized multiplication
    
    # Divide matrices into quadrants
    mid = n // 2
    A11, A12 = A[:mid, :mid], A[:mid, mid:]
    A21, A22 = A[mid:, :mid], A[mid:, mid:]
    B11, B12 = B[:mid, :mid], B[:mid, mid:]
    B21, B22 = B[mid:, :mid], B[mid:, mid:]
    
    # Compute the 7 products (recursively)
    M1 = strassen(A11 + A22, B11 + B22, threshold)  # (A₁₁ + A₂₂)(B₁₁ + B₂₂)
    M2 = strassen(A21 + A22, B11, threshold)        # (A₂₁ + A₂₂)B₁₁
    M3 = strassen(A11, B12 - B22, threshold)        # A₁₁(B₁₂ - B₂₂)
    M4 = strassen(A22, B21 - B11, threshold)        # A₂₂(B₂₁ - B₁₁)
    M5 = strassen(A11 + A12, B22, threshold)        # (A₁₁ + A₁₂)B₂₂
    M6 = strassen(A21 - A11, B11 + B12, threshold)  # (A₂₁ - A₁₁)(B₁₁ + B₁₂)
    M7 = strassen(A12 - A22, B21 + B22, threshold)  # (A₁₂ - A₂₂)(B₂₁ + B₂₂)
    
    # Combine to get result quadrants
    C11 = M1 + M4 - M5 + M7
    C12 = M3 + M5
    C21 = M2 + M4
    C22 = M1 - M2 + M3 + M6
    
    # Assemble result matrix
    C = np.empty((n, n), dtype=A.dtype)
    C[:mid, :mid] = C11
    C[:mid, mid:] = C12
    C[mid:, :mid] = C21
    C[mid:, mid:] = C22
    
    return C
 
# Example usage
n = 128
A = np.random.rand(n, n)
B = np.random.rand(n, n)
C = strassen(A, B)
print(f"Result shape: {C.shape}")
print(f"Verification: {np.allclose(C, A @ B)}")  # Should be True

Handling Non-Power-of-2 Matrix Sizes

Real-world matrices rarely have dimensions that are exact powers of 2. Several strategies handle this:

Strategy 1: Padding

Pad the matrices with zeros to the next power of 2:

If n = 100, pad to 128×128
Multiply the padded matrices
Extract the original 100×100 result

Pros: Simple to implement Cons: Can nearly double the matrix size (e.g., n=129 → pad to 256)

Strategy 2: Dynamic Padding

Pad to the nearest even number at each recursion level:

If n is odd, pad to n+1
Continue recursion with n/2
This adds at most log₂(n) extra rows/columns total

Pros: Minimal overhead Cons: More complex implementation

Strategy 3: Hybrid with Peeling

For odd dimensions, "peel off" one row/column:

Split odd dimension n into (n-1) + 1
Handle the 1-row/column separately with O(n²) work
Recurse on the (n-1) portion

Pros: No padding waste Cons: Additional bookkeeping

pad_to_power_of_2
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import numpy as np
 
def next_power_of_2(n: int) -> int:
    """Find the smallest power of 2 >= n."""
    power = 1
    while power < n:
        power *= 2
    return power
 
def strassen_general(A: np.ndarray, B: np.ndarray, threshold: int = 64) -> np.ndarray:
    """
    Strassen's algorithm for matrices of any size.
    Pads to the next power of 2 if necessary.
    """
    n = A.shape[0]
    m = next_power_of_2(n)
    
    if m == n:
        # Already a power of 2
        return strassen(A, B, threshold)
    
    # Pad matrices with zeros
    A_padded = np.zeros((m, m), dtype=A.dtype)
    B_padded = np.zeros((m, m), dtype=B.dtype)
    A_padded[:n, :n] = A
    B_padded[:n, :n] = B
    
    # Multiply padded matrices
    C_padded = strassen(A_padded, B_padded, threshold)
    
    # Extract original-sized result
    return C_padded[:n, :n]
 
# Example with non-power-of-2 size
n = 100
A = np.random.rand(n, n)
B = np.random.rand(n, n)
C = strassen_general(A, B)
print(f"Original size: {n}, Padded to: {next_power_of_2(n)}")
print(f"Verification: {np.allclose(C, A @ B)}")

Practical Impact of Padding

For n = 1000, padding to 1024 adds only 2.4% more elements. For n = 1025, padding to 2048 nearly doubles the work. Real implementations often fall back to standard multiplication or use hybrid approaches near power-of-2 boundaries.

Numerical Stability Considerations

Strassen's algorithm has different numerical properties compared to standard multiplication due to its use of subtractions.

Potential issues:

Catastrophic cancellation: When subtracting nearly equal matrices (A₁₁ ≈ A₂₁), the result has fewer significant digits
Error propagation: Errors in M products propagate through the combination formulas
Condition number sensitivity: Ill-conditioned matrices may produce larger errors

In practice:

For typical matrices with reasonable condition numbers, Strassen's algorithm is sufficiently accurate. High-precision libraries (like MPFR) can be used when exact computation is required. The error growth is bounded and manageable for most applications.

When Strassen Might Fail

For matrices with very large element ranges, or when the matrices have entries that nearly cancel, standard multiplication may be more numerically stable. Scientific computing libraries often provide both options and recommend standard multiplication for precision-critical applications.

Error bound analysis:

Let U be machine epsilon (≈ 10⁻¹⁶ for double precision). For standard multiplication:

‖C - C_computed‖ ≤ n · U · ‖A‖ · ‖B‖ + O(U²)

For Strassen:

‖C - C_computed‖ ≤ c · n^log₂(12) · U · ‖A‖ · ‖B‖ + O(U²)

Where c is a modest constant. The exponent log₂(12) ≈ 3.58 means error grows slightly faster with n than for standard multiplication, but remains small for reasonable matrix sizes.

Summary: Strassen's Revolutionary Contribution

Strassen's algorithm stands as one of the most elegant examples of divide-and-conquer thinking, demonstrating that mathematical insight can overcome apparent computational barriers.

Key Takeaways

•Strassen reduced multiplications from 8 to 7 — A single saved multiplication per recursion level compounds to asymptotic improvement
•Complexity drops from O(n³) to O(n^2.807) — This is a permanent 6.4% reduction in the exponent
•The algorithm trades multiplications for additions — 14 extra additions per level is worth one fewer multiplication
•Block decomposition enables divide-and-conquer — Matrices naturally partition into subproblems
•Seven carefully crafted products encode all four output blocks — The formulas look magical but verify algebraically
•Practical considerations include padding and numerical stability — Real implementations require engineering care

What's next:

Having seen the overview of Strassen's algorithm, we'll dive deeper into how the reduction of multiplications actually works on a mechanical level. The next page explores the algebraic identities that make the 7-multiplication trick possible, showing how the M products cleverly share computation across the four output blocks.

Page Complete

You now understand Strassen's algorithm at a high level—its historical significance, the core insight of trading multiplications for additions, the famous formulas, and the resulting O(n^2.807) complexity. Next, we'll examine how reducing multiplications through D&C actually achieves this remarkable improvement.

2 / 4

Loading learning content...

Data Structures & AlgorithmsDivide and Conquer

Matrix Multiplication & Strassen's Algorithm

LevelAdvanced

Duration75 mins

TopicDivide and Conquer

2 / 4

Strassen's Algorithm Overview

A Revolutionary Discovery

What You Will Learn

Historical Context: Breaking a Fundamental Barrier

Before Strassen, mathematicians and computer scientists believed O(n³) was the inherent cost of matrix multiplication. The reasoning seemed airtight:

The result matrix has n² entries
Each entry is defined as a sum of n products
Therefore, we need n³ products—case closed

Strassen's 1969 paper, titled "Gaussian Elimination is not Optimal," established two revolutionary results:

Matrix multiplication can be done in subcubic time
Gaussian elimination (for solving linear systems) is not computationally optimal

Impact on Computer Science

Evolution of Matrix Multiplication Complexity
Year	Algorithm	Complexity	Improvement over Previous
Ancient	Standard	O(n³) = O(n^3.0)	Baseline
1969	Strassen	O(n^2.807)	~6.5% exponent reduction
1978	Pan	O(n^2.796)	Additional improvement
1981	Schönhage	O(n^2.522)	Significant advance
1987	Coppersmith-Winograd	O(n^2.376)	Major breakthrough
2012	Williams	O(n^2.3728642)	Refined techniques
2020	Alman-Williams	O(n^2.3728596)	Current best known
???	Optimal	Ω(n²) ≤ ? ≤ O(n^2.373)	Open problem

The progression from n^3.0 to n^2.373 represents an enormous improvement for large matrices. For n = 10,000:

Standard: 10^12 operations
Strassen: 10^11.2 ≈ 1.6 × 10^11 operations (84% reduction)
Coppersmith-Winograd: 10^9.5 ≈ 3 × 10^9 operations (99.7% reduction)

The Core Insight: Trading Additions for Multiplications

Strassen's algorithm is built on a crucial observation about computational costs:

Multiplications are expensive; additions are cheap.

For matrices, this is especially true:

Multiplying two n×n matrices normally requires O(n³) multiplications
Adding two n×n matrices only requires O(n²) additions

Why does one multiplication matter?

Let T(n) be the time to multiply two n×n matrices. With the standard algorithm:

Divide into 2×2 blocks of size n/2
Compute 8 products of (n/2)×(n/2) matrices
Combine with O(n²) additions

Recurrence: T(n) = 8T(n/2) + O(n²)

By the Master Theorem: O(n^log₂8) = O(n³)

The Magic of Reducing Multiplications

Standard 2×2 Block Multiplication

•8 matrix multiplications
•4 matrix additions
•Recurrence: T(n) = 8T(n/2) + O(n²)
•Complexity: O(n³)

Strassen's 2×2 Block Multiplication

•7 matrix multiplications
•18 matrix additions/subtractions
•Recurrence: T(n) = 7T(n/2) + O(n²)
•Complexity: O(n^2.807)

Block Decomposition: The Foundation

Both the standard and Strassen's algorithms begin by partitioning the input matrices into 2×2 blocks. Consider n×n matrices A and B (assuming n is a power of 2 for simplicity):

A = | A₁₁  A₁₂ |      B = | B₁₁  B₁₂ |
    | A₂₁  A₂₂ |          | B₂₁  B₂₂ |

Each block Aᵢⱼ and Bᵢⱼ is an (n/2)×(n/2) matrix.

The product C = A × B is:

C = | C₁₁  C₁₂ |    where each Cᵢⱼ is (n/2)×(n/2)
    | C₂₁  C₂₂ |

Standard block formulas:

C₁₁ = A₁₁B₁₁ + A₁₂B₂₁
C₁₂ = A₁₁B₁₂ + A₁₂B₂₂
C₂₁ = A₂₁B₁₁ + A₂₂B₂₁
C₂₂ = A₂₁B₁₂ + A₂₂B₂₂

Count: 8 multiplications (A₁₁B₁₁, A₁₂B₂₁, A₁₁B₁₂, A₁₂B₂₂, A₂₁B₁₁, A₂₂B₂₁, A₂₁B₁₂, A₂₂B₂₂) and 4 additions.

Block Matrix Multiplication Follows Regular Rules

The recursion:

Strassen's Formulas: The Seven Products

Here is Strassen's magical set of intermediate products. These formulas may seem arbitrary, but they're carefully constructed to allow computing all four Cᵢⱼ blocks with only 7 multiplications.

Define seven auxiliary products:

M₁ = (A₁₁ + A₂₂)(B₁₁ + B₂₂)
M₂ = (A₂₁ + A₂₂)B₁₁
M₃ = A₁₁(B₁₂ - B₂₂)
M₄ = A₂₂(B₂₁ - B₁₁)
M₅ = (A₁₁ + A₁₂)B₂₂
M₆ = (A₂₁ - A₁₁)(B₁₁ + B₁₂)
M₇ = (A₁₂ - A₂₂)(B₂₁ + B₂₂)

Compute C blocks from M products:

C₁₁ = M₁ + M₄ - M₅ + M₇
C₁₂ = M₃ + M₅
C₂₁ = M₂ + M₄
C₂₂ = M₁ - M₂ + M₃ + M₆

The Formulas Aren't Intuitive—They're Algebraic Magic

Verification (C₁₁ as example):

Let's verify that C₁₁ = M₁ + M₄ - M₅ + M₇ gives A₁₁B₁₁ + A₁₂B₂₁:

M₁ = (A₁₁ + A₂₂)(B₁₁ + B₂₂) = A₁₁B₁₁ + A₁₁B₂₂ + A₂₂B₁₁ + A₂₂B₂₂
M₄ = A₂₂(B₂₁ - B₁₁) = A₂₂B₂₁ - A₂₂B₁₁
M₅ = (A₁₁ + A₁₂)B₂₂ = A₁₁B₂₂ + A₁₂B₂₂
M₇ = (A₁₂ - A₂₂)(B₂₁ + B₂₂) = A₁₂B₂₁ + A₁₂B₂₂ - A₂₂B₂₁ - A₂₂B₂₂

M₁ + M₄ - M₅ + M₇ = 
  (A₁₁B₁₁ + A₁₁B₂₂ + A₂₂B₁₁ + A₂₂B₂₂)   [M₁]
+ (A₂₂B₂₁ - A₂₂B₁₁)                      [M₄]
- (A₁₁B₂₂ + A₁₂B₂₂)                      [-M₅]
+ (A₁₂B₂₁ + A₁₂B₂₂ - A₂₂B₂₁ - A₂₂B₂₂)   [M₇]

Collecting terms:
  A₁₁B₁₁: +1 → A₁₁B₁₁  ✓
  A₁₁B₂₂: +1 -1 = 0
  A₂₂B₁₁: +1 -1 = 0
  A₂₂B₂₂: +1 -1 = 0
  A₂₂B₂₁: +1 -1 = 0
  A₁₂B₂₂: -1 +1 = 0
  A₁₂B₂₁: +1 → A₁₂B₂₁  ✓

Result: C₁₁ = A₁₁B₁₁ + A₁₂B₂₁ ✓

The other three blocks (C₁₂, C₂₁, C₂₂) can be verified similarly.

Operation Count: Where the Savings Come From

Let's carefully count operations in Strassen's algorithm.

Multiplications:

7 products M₁ through M₇
Each Mᵢ is a single matrix multiplication (done recursively)
Total: 7 multiplications

Additions/Subtractions for computing M products:

M₁: A₁₁ + A₂₂ (1 add), B₁₁ + B₂₂ (1 add) → 2 additions
M₂: A₂₁ + A₂₂ (1 add) → 1 addition
M₃: B₁₂ - B₂₂ (1 sub) → 1 subtraction
M₄: B₂₁ - B₁₁ (1 sub) → 1 subtraction
M₅: A₁₁ + A₁₂ (1 add) → 1 addition
M₆: A₂₁ - A₁₁ (1 sub), B₁₁ + B₁₂ (1 add) → 2 operations
M₇: A₁₂ - A₂₂ (1 sub), B₂₁ + B₂₂ (1 add) → 2 operations

Subtotal: 10 additions/subtractions for M products

Additions/Subtractions for computing C blocks:

C₁₁ = M₁ + M₄ - M₅ + M₇ → 3 operations
C₁₂ = M₃ + M₅ → 1 operation
C₂₁ = M₂ + M₄ → 1 operation
C₂₂ = M₁ - M₂ + M₃ + M₆ → 3 operations

Subtotal: 8 additions/subtractions for C assembly

Total additions/subtractions: 18

Operation Comparison: Standard vs Strassen (per recursion level)
Operation Type	Standard Block Method	Strassen's Method
Matrix multiplications	8	7
Matrix additions/subtractions	4	18
Recursion depth impact	T(n) = 8T(n/2)	T(n) = 7T(n/2)
Asymptotic complexity	O(n³)	O(n^2.807)

The trade-off:

Strassen trades 1 multiplication for 14 additional additions (18 - 4 = 14). At first glance, this seems like a bad deal. But:

Multiplications are recursive: Each multiplication operates on (n/2)×(n/2) matrices
Additions are not recursive: Each addition operates on (n/2)×(n/2) matrices but costs only O((n/2)²) = O(n²/4)

Why Additions Don't Kill the Savings

Recurrence Relation and Complexity Derivation

Let's derive Strassen's complexity rigorously using recurrence relations.

Strassen's recurrence:

T(n) = 7T(n/2) + Θ(n²)

Where:

7T(n/2) = cost of 7 recursive multiplications on (n/2)×(n/2) matrices
Θ(n²) = cost of 18 matrix additions on (n/2)×(n/2) matrices = 18 × (n/2)² × c

Applying the Master Theorem:

For T(n) = aT(n/b) + f(n):

a = 7 (number of subproblems)
b = 2 (factor by which size reduces)
f(n) = Θ(n²)

Compare f(n) with n^(log_b(a)) = n^(log₂7) ≈ n^2.807:

Since n² = O(n^(2.807-ε)) for any ε < 0.807, we're in Case 1 of the Master Theorem.

Result: T(n) = Θ(n^(log₂7)) = Θ(n^2.807...)

Exact exponent:

log₂7 = ln(7)/ln(2) = 1.9459.../0.6931... = 2.8073549...

So Strassen's algorithm has complexity Θ(n^2.807) (or more precisely, Θ(n^log₂7)).

Comparison with standard algorithm:

Standard: T(n) = 8T(n/2) + Θ(n²) → Θ(n^log₂8) = Θ(n³)
Strassen: T(n) = 7T(n/2) + Θ(n²) → Θ(n^log₂7) = Θ(n^2.807)

Ratio for large n: n³ / n^2.807 = n^0.193

For n = 1000: 1000^0.193 ≈ 9.5
For n = 10000: 10000^0.193 ≈ 19
For n = 1000000: 1000000^0.193 ≈ 150

The improvement grows as matrices get larger!

The Power of Exponent Reduction

The Complete Algorithm Structure

Here's the complete structure of Strassen's algorithm at a high level:

Algorithm: Strassen(A, B, n)

Input: Two n×n matrices A and B (n is a power of 2)
Output: Product C = A × B

1. BASE CASE:
   if n ≤ threshold:  // Usually 32-128
       return StandardMultiply(A, B)

2. DIVIDE:
   Split A into quadrants: A₁₁, A₁₂, A₂₁, A₂₂ (each n/2 × n/2)
   Split B into quadrants: B₁₁, B₁₂, B₂₁, B₂₂ (each n/2 × n/2)

3. COMPUTE SUMS/DIFFERENCES (18 matrix operations):
   S₁ = A₁₁ + A₂₂     S₂ = B₁₁ + B₂₂
   S₃ = A₂₁ + A₂₂     S₄ = B₁₂ - B₂₂
   S₅ = B₂₁ - B₁₁     S₆ = A₁₁ + A₁₂
   S₇ = A₂₁ - A₁₁     S₈ = B₁₁ + B₁₂
   S₉ = A₁₂ - A₂₂     S₁₀ = B₂₁ + B₂₂

4. RECURSIVE MULTIPLICATIONS (7 calls):
   M₁ = Strassen(S₁, S₂, n/2)      // (A₁₁ + A₂₂)(B₁₁ + B₂₂)
   M₂ = Strassen(S₃, B₁₁, n/2)     // (A₂₁ + A₂₂)B₁₁
   M₃ = Strassen(A₁₁, S₄, n/2)     // A₁₁(B₁₂ - B₂₂)
   M₄ = Strassen(A₂₂, S₅, n/2)     // A₂₂(B₂₁ - B₁₁)
   M₅ = Strassen(S₆, B₂₂, n/2)     // (A₁₁ + A₁₂)B₂₂
   M₆ = Strassen(S₇, S₈, n/2)      // (A₂₁ - A₁₁)(B₁₁ + B₁₂)
   M₇ = Strassen(S₉, S₁₀, n/2)     // (A₁₂ - A₂₂)(B₂₁ + B₂₂)

5. COMBINE (assemble C from M products):
   C₁₁ = M₁ + M₄ - M₅ + M₇
   C₁₂ = M₃ + M₅
   C₂₁ = M₂ + M₄
   C₂₂ = M₁ - M₂ + M₃ + M₆

6. return C assembled from C₁₁, C₁₂, C₂₁, C₂₂

strassen_overview
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import numpy as np
 
def strassen(A: np.ndarray, B: np.ndarray, threshold: int = 64) -> np.ndarray:
    """
    Strassen's matrix multiplication algorithm.
    
    Args:
        A, B: Square matrices of size n×n (n should be power of 2)
        threshold: Size below which to use standard multiplication
    
    Returns:
        Product C = A × B
    """
    n = A.shape[0]
    
    # Base case: use standard multiplication for small matrices
    if n <= threshold:
        return A @ B  # NumPy's optimized multiplication
    
    # Divide matrices into quadrants
    mid = n // 2
    A11, A12 = A[:mid, :mid], A[:mid, mid:]
    A21, A22 = A[mid:, :mid], A[mid:, mid:]
    B11, B12 = B[:mid, :mid], B[:mid, mid:]
    B21, B22 = B[mid:, :mid], B[mid:, mid:]
    
    # Compute the 7 products (recursively)
    M1 = strassen(A11 + A22, B11 + B22, threshold)  # (A₁₁ + A₂₂)(B₁₁ + B₂₂)
    M2 = strassen(A21 + A22, B11, threshold)        # (A₂₁ + A₂₂)B₁₁
    M3 = strassen(A11, B12 - B22, threshold)        # A₁₁(B₁₂ - B₂₂)
    M4 = strassen(A22, B21 - B11, threshold)        # A₂₂(B₂₁ - B₁₁)
    M5 = strassen(A11 + A12, B22, threshold)        # (A₁₁ + A₁₂)B₂₂
    M6 = strassen(A21 - A11, B11 + B12, threshold)  # (A₂₁ - A₁₁)(B₁₁ + B₁₂)
    M7 = strassen(A12 - A22, B21 + B22, threshold)  # (A₁₂ - A₂₂)(B₂₁ + B₂₂)
    
    # Combine to get result quadrants
    C11 = M1 + M4 - M5 + M7
    C12 = M3 + M5
    C21 = M2 + M4
    C22 = M1 - M2 + M3 + M6
    
    # Assemble result matrix
    C = np.empty((n, n), dtype=A.dtype)
    C[:mid, :mid] = C11
    C[:mid, mid:] = C12
    C[mid:, :mid] = C21
    C[mid:, mid:] = C22
    
    return C
 
# Example usage
n = 128
A = np.random.rand(n, n)
B = np.random.rand(n, n)
C = strassen(A, B)
print(f"Result shape: {C.shape}")
print(f"Verification: {np.allclose(C, A @ B)}")  # Should be True

Handling Non-Power-of-2 Matrix Sizes

Real-world matrices rarely have dimensions that are exact powers of 2. Several strategies handle this:

Strategy 1: Padding

Pad the matrices with zeros to the next power of 2:

If n = 100, pad to 128×128
Multiply the padded matrices
Extract the original 100×100 result

Pros: Simple to implement Cons: Can nearly double the matrix size (e.g., n=129 → pad to 256)

Strategy 2: Dynamic Padding

Pad to the nearest even number at each recursion level:

If n is odd, pad to n+1
Continue recursion with n/2
This adds at most log₂(n) extra rows/columns total

Pros: Minimal overhead Cons: More complex implementation

Strategy 3: Hybrid with Peeling

For odd dimensions, "peel off" one row/column:

Split odd dimension n into (n-1) + 1
Handle the 1-row/column separately with O(n²) work
Recurse on the (n-1) portion

Pros: No padding waste Cons: Additional bookkeeping

pad_to_power_of_2
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import numpy as np
 
def next_power_of_2(n: int) -> int:
    """Find the smallest power of 2 >= n."""
    power = 1
    while power < n:
        power *= 2
    return power
 
def strassen_general(A: np.ndarray, B: np.ndarray, threshold: int = 64) -> np.ndarray:
    """
    Strassen's algorithm for matrices of any size.
    Pads to the next power of 2 if necessary.
    """
    n = A.shape[0]
    m = next_power_of_2(n)
    
    if m == n:
        # Already a power of 2
        return strassen(A, B, threshold)
    
    # Pad matrices with zeros
    A_padded = np.zeros((m, m), dtype=A.dtype)
    B_padded = np.zeros((m, m), dtype=B.dtype)
    A_padded[:n, :n] = A
    B_padded[:n, :n] = B
    
    # Multiply padded matrices
    C_padded = strassen(A_padded, B_padded, threshold)
    
    # Extract original-sized result
    return C_padded[:n, :n]
 
# Example with non-power-of-2 size
n = 100
A = np.random.rand(n, n)
B = np.random.rand(n, n)
C = strassen_general(A, B)
print(f"Original size: {n}, Padded to: {next_power_of_2(n)}")
print(f"Verification: {np.allclose(C, A @ B)}")

Practical Impact of Padding

Numerical Stability Considerations

Strassen's algorithm has different numerical properties compared to standard multiplication due to its use of subtractions.

Potential issues:

Catastrophic cancellation: When subtracting nearly equal matrices (A₁₁ ≈ A₂₁), the result has fewer significant digits
Error propagation: Errors in M products propagate through the combination formulas
Condition number sensitivity: Ill-conditioned matrices may produce larger errors

In practice:

When Strassen Might Fail

Error bound analysis:

Let U be machine epsilon (≈ 10⁻¹⁶ for double precision). For standard multiplication:

‖C - C_computed‖ ≤ n · U · ‖A‖ · ‖B‖ + O(U²)

For Strassen:

‖C - C_computed‖ ≤ c · n^log₂(12) · U · ‖A‖ · ‖B‖ + O(U²)

Where c is a modest constant. The exponent log₂(12) ≈ 3.58 means error grows slightly faster with n than for standard multiplication, but remains small for reasonable matrix sizes.

Summary: Strassen's Revolutionary Contribution

Strassen's algorithm stands as one of the most elegant examples of divide-and-conquer thinking, demonstrating that mathematical insight can overcome apparent computational barriers.

Key Takeaways

•Strassen reduced multiplications from 8 to 7 — A single saved multiplication per recursion level compounds to asymptotic improvement
•Complexity drops from O(n³) to O(n^2.807) — This is a permanent 6.4% reduction in the exponent
•The algorithm trades multiplications for additions — 14 extra additions per level is worth one fewer multiplication
•Block decomposition enables divide-and-conquer — Matrices naturally partition into subproblems
•Seven carefully crafted products encode all four output blocks — The formulas look magical but verify algebraically
•Practical considerations include padding and numerical stability — Real implementations require engineering care

What's next:

Page Complete

2 / 4