Loading learning content...
In 1969, German mathematician Volker Strassen published a paper that shocked the computational mathematics community. For decades, the O(n³) complexity of matrix multiplication was considered fundamental and unavoidable—after all, the mathematical definition itself seemed to require exactly n³ multiplications.
Strassen proved everyone wrong with an elegant insight: by computing clever combinations of matrix elements, he could multiply two 2×2 matrices using only 7 multiplications instead of the conventional 8. This single saved multiplication, when applied recursively, yields an algorithm with complexity O(n^log₂7) ≈ O(n^2.807)—approximately 20% fewer operations asymptotically.
By the end of this page, you will understand the historical significance of Strassen's algorithm, why reducing 8 multiplications to 7 matters so profoundly, the high-level structure of the divide-and-conquer approach, and the famous formulas that make it work. You'll see how algebraic cleverness trumps brute force.
Before Strassen, mathematicians and computer scientists believed O(n³) was the inherent cost of matrix multiplication. The reasoning seemed airtight:
This argument has a subtle flaw: it assumes each entry must be computed independently. Strassen realized that matrix entries are not independent—they share algebraic relationships that can be exploited.
Strassen's 1969 paper, titled "Gaussian Elimination is not Optimal," established two revolutionary results:
Strassen's algorithm was transformational not just for its practical implications, but for what it revealed: that naive algorithms, even those derived directly from mathematical definitions, are not necessarily optimal. This inspired decades of research into fast matrix multiplication, leading to algorithms approaching O(n²) complexity.
| Year | Algorithm | Complexity | Improvement over Previous |
|---|---|---|---|
| Ancient | Standard | O(n³) = O(n^3.0) | Baseline |
| 1969 | Strassen | O(n^2.807) | ~6.5% exponent reduction |
| 1978 | Pan | O(n^2.796) | Additional improvement |
| 1981 | Schönhage | O(n^2.522) | Significant advance |
| 1987 | Coppersmith-Winograd | O(n^2.376) | Major breakthrough |
| 2012 | Williams | O(n^2.3728642) | Refined techniques |
| 2020 | Alman-Williams | O(n^2.3728596) | Current best known |
| ??? | Optimal | Ω(n²) ≤ ? ≤ O(n^2.373) | Open problem |
The progression from n^3.0 to n^2.373 represents an enormous improvement for large matrices. For n = 10,000:
However, the galactic algorithms (Coppersmith-Winograd and beyond) have such enormous constant factors that they're only faster for matrices larger than the observable universe could store. Strassen's algorithm is the only subcubic algorithm used in practice.
Strassen's algorithm is built on a crucial observation about computational costs:
Multiplications are expensive; additions are cheap.
For matrices, this is especially true:
If we could somehow trade multiplications for additions—even at an unfavorable ratio—we might win overall. Strassen found that by computing clever sums and differences of matrix blocks, he could eliminate one multiplication out of eight.
Why does one multiplication matter?
Let T(n) be the time to multiply two n×n matrices. With the standard algorithm:
Recurrence: T(n) = 8T(n/2) + O(n²)
By the Master Theorem: O(n^log₂8) = O(n³)
With Strassen's 7 multiplications: T(n) = 7T(n/2) + O(n²). By the Master Theorem: O(n^log₂7) = O(n^2.807). Reducing from 8 to 7 recursive calls drops the exponent from 3.0 to 2.807—a permanent improvement at every level of recursion.
Both the standard and Strassen's algorithms begin by partitioning the input matrices into 2×2 blocks. Consider n×n matrices A and B (assuming n is a power of 2 for simplicity):
A = | A₁₁ A₁₂ | B = | B₁₁ B₁₂ |
| A₂₁ A₂₂ | | B₂₁ B₂₂ |
Each block Aᵢⱼ and Bᵢⱼ is an (n/2)×(n/2) matrix.
The product C = A × B is:
C = | C₁₁ C₁₂ | where each Cᵢⱼ is (n/2)×(n/2)
| C₂₁ C₂₂ |
Standard block formulas:
C₁₁ = A₁₁B₁₁ + A₁₂B₂₁
C₁₂ = A₁₁B₁₂ + A₁₂B₂₂
C₂₁ = A₂₁B₁₁ + A₂₂B₂₁
C₂₂ = A₂₁B₁₂ + A₂₂B₂₂
Count: 8 multiplications (A₁₁B₁₁, A₁₂B₂₁, A₁₁B₁₂, A₁₂B₂₂, A₂₁B₁₁, A₂₂B₂₁, A₂₁B₁₂, A₂₂B₂₂) and 4 additions.
The formulas for Cᵢⱼ are exactly what you'd expect from treating blocks as scalars: C₁₁ = A₁₁B₁₁ + A₁₂B₂₁ mirrors c₁₁ = a₁₁b₁₁ + a₁₂b₂₁ for 2×2 scalar matrices. This is the fundamental property that makes divide-and-conquer applicable to matrix multiplication.
The recursion:
To compute A₁₁B₁₁ (a product of (n/2)×(n/2) matrices), we recursively apply the same algorithm. This continues until we reach matrices small enough to multiply directly (typically 1×1 or when the overhead exceeds the benefit).
The key observation: If we can compute all four blocks of C using only 7 matrix multiplications (instead of 8), we save one recursive call at every level of recursion—and that saving compounds exponentially.
Here is Strassen's magical set of intermediate products. These formulas may seem arbitrary, but they're carefully constructed to allow computing all four Cᵢⱼ blocks with only 7 multiplications.
Define seven auxiliary products:
M₁ = (A₁₁ + A₂₂)(B₁₁ + B₂₂)
M₂ = (A₂₁ + A₂₂)B₁₁
M₃ = A₁₁(B₁₂ - B₂₂)
M₄ = A₂₂(B₂₁ - B₁₁)
M₅ = (A₁₁ + A₁₂)B₂₂
M₆ = (A₂₁ - A₁₁)(B₁₁ + B₁₂)
M₇ = (A₁₂ - A₂₂)(B₂₁ + B₂₂)
Compute C blocks from M products:
C₁₁ = M₁ + M₄ - M₅ + M₇
C₁₂ = M₃ + M₅
C₂₁ = M₂ + M₄
C₂₂ = M₁ - M₂ + M₃ + M₆
Don't try to memorize these formulas by "understanding" them intuitively—they're the result of algebraic manipulation/search, not natural derivation. What matters is that they can be verified: expand each M, then expand each C formula, and you'll recover the standard block formulas exactly.
Verification (C₁₁ as example):
Let's verify that C₁₁ = M₁ + M₄ - M₅ + M₇ gives A₁₁B₁₁ + A₁₂B₂₁:
M₁ = (A₁₁ + A₂₂)(B₁₁ + B₂₂) = A₁₁B₁₁ + A₁₁B₂₂ + A₂₂B₁₁ + A₂₂B₂₂
M₄ = A₂₂(B₂₁ - B₁₁) = A₂₂B₂₁ - A₂₂B₁₁
M₅ = (A₁₁ + A₁₂)B₂₂ = A₁₁B₂₂ + A₁₂B₂₂
M₇ = (A₁₂ - A₂₂)(B₂₁ + B₂₂) = A₁₂B₂₁ + A₁₂B₂₂ - A₂₂B₂₁ - A₂₂B₂₂
M₁ + M₄ - M₅ + M₇ =
(A₁₁B₁₁ + A₁₁B₂₂ + A₂₂B₁₁ + A₂₂B₂₂) [M₁]
+ (A₂₂B₂₁ - A₂₂B₁₁) [M₄]
- (A₁₁B₂₂ + A₁₂B₂₂) [-M₅]
+ (A₁₂B₂₁ + A₁₂B₂₂ - A₂₂B₂₁ - A₂₂B₂₂) [M₇]
Collecting terms:
A₁₁B₁₁: +1 → A₁₁B₁₁ ✓
A₁₁B₂₂: +1 -1 = 0
A₂₂B₁₁: +1 -1 = 0
A₂₂B₂₂: +1 -1 = 0
A₂₂B₂₁: +1 -1 = 0
A₁₂B₂₂: -1 +1 = 0
A₁₂B₂₁: +1 → A₁₂B₂₁ ✓
Result: C₁₁ = A₁₁B₁₁ + A₁₂B₂₁ ✓
The other three blocks (C₁₂, C₂₁, C₂₂) can be verified similarly.
Let's carefully count operations in Strassen's algorithm.
Multiplications:
Additions/Subtractions for computing M products:
Subtotal: 10 additions/subtractions for M products
Additions/Subtractions for computing C blocks:
Subtotal: 8 additions/subtractions for C assembly
Total additions/subtractions: 18
| Operation Type | Standard Block Method | Strassen's Method |
|---|---|---|
| Matrix multiplications | 8 | 7 |
| Matrix additions/subtractions | 4 | 18 |
| Recursion depth impact | T(n) = 8T(n/2) | T(n) = 7T(n/2) |
| Asymptotic complexity | O(n³) | O(n^2.807) |
The trade-off:
Strassen trades 1 multiplication for 14 additional additions (18 - 4 = 14). At first glance, this seems like a bad deal. But:
The multiplications dominate because they recurse to the full depth log₂(n), while additions are done at each level. Saving one multiplication at every level accumulates to a better asymptotic bound.
Addition cost per level: 18 × (n/2)² = 4.5n². Total addition cost across log₂(n) levels: Still O(n² log n). Multiplication savings: From n^3 to n^2.807. The log factor from additions is absorbed into the dominant n^2.807 term.
Let's derive Strassen's complexity rigorously using recurrence relations.
Strassen's recurrence:
T(n) = 7T(n/2) + Θ(n²)
Where:
Applying the Master Theorem:
For T(n) = aT(n/b) + f(n):
Compare f(n) with n^(log_b(a)) = n^(log₂7) ≈ n^2.807:
Since n² = O(n^(2.807-ε)) for any ε < 0.807, we're in Case 1 of the Master Theorem.
Result: T(n) = Θ(n^(log₂7)) = Θ(n^2.807...)
Exact exponent:
log₂7 = ln(7)/ln(2) = 1.9459.../0.6931... = 2.8073549...
So Strassen's algorithm has complexity Θ(n^2.807) (or more precisely, Θ(n^log₂7)).
Comparison with standard algorithm:
Standard: T(n) = 8T(n/2) + Θ(n²) → Θ(n^log₂8) = Θ(n³)
Strassen: T(n) = 7T(n/2) + Θ(n²) → Θ(n^log₂7) = Θ(n^2.807)
Ratio for large n: n³ / n^2.807 = n^0.193
For n = 1000: 1000^0.193 ≈ 9.5
For n = 10000: 10000^0.193 ≈ 19
For n = 1000000: 1000000^0.193 ≈ 150
The improvement grows as matrices get larger!
A reduction from 3.0 to 2.807 might seem modest (6.4%), but it's an EXPONENT reduction. For large n, this means Strassen's algorithm is asymptotically unboundedly faster than standard multiplication. As n → ∞, Strassen becomes infinitely faster.
Here's the complete structure of Strassen's algorithm at a high level:
Algorithm: Strassen(A, B, n)
Input: Two n×n matrices A and B (n is a power of 2)
Output: Product C = A × B
1. BASE CASE:
if n ≤ threshold: // Usually 32-128
return StandardMultiply(A, B)
2. DIVIDE:
Split A into quadrants: A₁₁, A₁₂, A₂₁, A₂₂ (each n/2 × n/2)
Split B into quadrants: B₁₁, B₁₂, B₂₁, B₂₂ (each n/2 × n/2)
3. COMPUTE SUMS/DIFFERENCES (18 matrix operations):
S₁ = A₁₁ + A₂₂ S₂ = B₁₁ + B₂₂
S₃ = A₂₁ + A₂₂ S₄ = B₁₂ - B₂₂
S₅ = B₂₁ - B₁₁ S₆ = A₁₁ + A₁₂
S₇ = A₂₁ - A₁₁ S₈ = B₁₁ + B₁₂
S₉ = A₁₂ - A₂₂ S₁₀ = B₂₁ + B₂₂
4. RECURSIVE MULTIPLICATIONS (7 calls):
M₁ = Strassen(S₁, S₂, n/2) // (A₁₁ + A₂₂)(B₁₁ + B₂₂)
M₂ = Strassen(S₃, B₁₁, n/2) // (A₂₁ + A₂₂)B₁₁
M₃ = Strassen(A₁₁, S₄, n/2) // A₁₁(B₁₂ - B₂₂)
M₄ = Strassen(A₂₂, S₅, n/2) // A₂₂(B₂₁ - B₁₁)
M₅ = Strassen(S₆, B₂₂, n/2) // (A₁₁ + A₁₂)B₂₂
M₆ = Strassen(S₇, S₈, n/2) // (A₂₁ - A₁₁)(B₁₁ + B₁₂)
M₇ = Strassen(S₉, S₁₀, n/2) // (A₁₂ - A₂₂)(B₂₁ + B₂₂)
5. COMBINE (assemble C from M products):
C₁₁ = M₁ + M₄ - M₅ + M₇
C₁₂ = M₃ + M₅
C₂₁ = M₂ + M₄
C₂₂ = M₁ - M₂ + M₃ + M₆
6. return C assembled from C₁₁, C₁₂, C₂₁, C₂₂
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import numpy as np def strassen(A: np.ndarray, B: np.ndarray, threshold: int = 64) -> np.ndarray: """ Strassen's matrix multiplication algorithm. Args: A, B: Square matrices of size n×n (n should be power of 2) threshold: Size below which to use standard multiplication Returns: Product C = A × B """ n = A.shape[0] # Base case: use standard multiplication for small matrices if n <= threshold: return A @ B # NumPy's optimized multiplication # Divide matrices into quadrants mid = n // 2 A11, A12 = A[:mid, :mid], A[:mid, mid:] A21, A22 = A[mid:, :mid], A[mid:, mid:] B11, B12 = B[:mid, :mid], B[:mid, mid:] B21, B22 = B[mid:, :mid], B[mid:, mid:] # Compute the 7 products (recursively) M1 = strassen(A11 + A22, B11 + B22, threshold) # (A₁₁ + A₂₂)(B₁₁ + B₂₂) M2 = strassen(A21 + A22, B11, threshold) # (A₂₁ + A₂₂)B₁₁ M3 = strassen(A11, B12 - B22, threshold) # A₁₁(B₁₂ - B₂₂) M4 = strassen(A22, B21 - B11, threshold) # A₂₂(B₂₁ - B₁₁) M5 = strassen(A11 + A12, B22, threshold) # (A₁₁ + A₁₂)B₂₂ M6 = strassen(A21 - A11, B11 + B12, threshold) # (A₂₁ - A₁₁)(B₁₁ + B₁₂) M7 = strassen(A12 - A22, B21 + B22, threshold) # (A₁₂ - A₂₂)(B₂₁ + B₂₂) # Combine to get result quadrants C11 = M1 + M4 - M5 + M7 C12 = M3 + M5 C21 = M2 + M4 C22 = M1 - M2 + M3 + M6 # Assemble result matrix C = np.empty((n, n), dtype=A.dtype) C[:mid, :mid] = C11 C[:mid, mid:] = C12 C[mid:, :mid] = C21 C[mid:, mid:] = C22 return C # Example usagen = 128A = np.random.rand(n, n)B = np.random.rand(n, n)C = strassen(A, B)print(f"Result shape: {C.shape}")print(f"Verification: {np.allclose(C, A @ B)}") # Should be TrueReal-world matrices rarely have dimensions that are exact powers of 2. Several strategies handle this:
Strategy 1: Padding
Pad the matrices with zeros to the next power of 2:
Pros: Simple to implement Cons: Can nearly double the matrix size (e.g., n=129 → pad to 256)
Strategy 2: Dynamic Padding
Pad to the nearest even number at each recursion level:
Pros: Minimal overhead Cons: More complex implementation
Strategy 3: Hybrid with Peeling
For odd dimensions, "peel off" one row/column:
Pros: No padding waste Cons: Additional bookkeeping
12345678910111213141516171819202122232425262728293031323334353637383940
import numpy as np def next_power_of_2(n: int) -> int: """Find the smallest power of 2 >= n.""" power = 1 while power < n: power *= 2 return power def strassen_general(A: np.ndarray, B: np.ndarray, threshold: int = 64) -> np.ndarray: """ Strassen's algorithm for matrices of any size. Pads to the next power of 2 if necessary. """ n = A.shape[0] m = next_power_of_2(n) if m == n: # Already a power of 2 return strassen(A, B, threshold) # Pad matrices with zeros A_padded = np.zeros((m, m), dtype=A.dtype) B_padded = np.zeros((m, m), dtype=B.dtype) A_padded[:n, :n] = A B_padded[:n, :n] = B # Multiply padded matrices C_padded = strassen(A_padded, B_padded, threshold) # Extract original-sized result return C_padded[:n, :n] # Example with non-power-of-2 sizen = 100A = np.random.rand(n, n)B = np.random.rand(n, n)C = strassen_general(A, B)print(f"Original size: {n}, Padded to: {next_power_of_2(n)}")print(f"Verification: {np.allclose(C, A @ B)}")For n = 1000, padding to 1024 adds only 2.4% more elements. For n = 1025, padding to 2048 nearly doubles the work. Real implementations often fall back to standard multiplication or use hybrid approaches near power-of-2 boundaries.
Strassen's algorithm has different numerical properties compared to standard multiplication due to its use of subtractions.
Potential issues:
Catastrophic cancellation: When subtracting nearly equal matrices (A₁₁ ≈ A₂₁), the result has fewer significant digits
Error propagation: Errors in M products propagate through the combination formulas
Condition number sensitivity: Ill-conditioned matrices may produce larger errors
In practice:
For typical matrices with reasonable condition numbers, Strassen's algorithm is sufficiently accurate. High-precision libraries (like MPFR) can be used when exact computation is required. The error growth is bounded and manageable for most applications.
For matrices with very large element ranges, or when the matrices have entries that nearly cancel, standard multiplication may be more numerically stable. Scientific computing libraries often provide both options and recommend standard multiplication for precision-critical applications.
Error bound analysis:
Let U be machine epsilon (≈ 10⁻¹⁶ for double precision). For standard multiplication:
‖C - C_computed‖ ≤ n · U · ‖A‖ · ‖B‖ + O(U²)
For Strassen:
‖C - C_computed‖ ≤ c · n^log₂(12) · U · ‖A‖ · ‖B‖ + O(U²)
Where c is a modest constant. The exponent log₂(12) ≈ 3.58 means error grows slightly faster with n than for standard multiplication, but remains small for reasonable matrix sizes.
Strassen's algorithm stands as one of the most elegant examples of divide-and-conquer thinking, demonstrating that mathematical insight can overcome apparent computational barriers.
What's next:
Having seen the overview of Strassen's algorithm, we'll dive deeper into how the reduction of multiplications actually works on a mechanical level. The next page explores the algebraic identities that make the 7-multiplication trick possible, showing how the M products cleverly share computation across the four output blocks.
You now understand Strassen's algorithm at a high level—its historical significance, the core insight of trading multiplications for additions, the famous formulas, and the resulting O(n^2.807) complexity. Next, we'll examine how reducing multiplications through D&C actually achieves this remarkable improvement.