Loading content...
We've developed the divide and conquer algorithm for maximum subarray. But how efficient is it really? Claiming "it's faster than brute force" isn't enough—we need precise bounds that let us predict performance and compare with alternatives.
In this page, we'll rigorously derive the O(n log n) time complexity. Along the way, you'll master the technique of recurrence relations and see how the Master Theorem provides a powerful shortcut for analyzing D&C algorithms.
By the end of this page, you'll be able to: (1) Write the recurrence relation for the D&C maximum subarray algorithm; (2) Apply the Master Theorem to solve it; (3) Develop intuition for why O(n log n) arises; (4) Understand the space complexity implications; (5) Compare with brute force and anticipate where further improvement might come from.
A recurrence relation expresses the running time of an algorithm in terms of the running time on smaller inputs. For divide and conquer algorithms, this naturally captures the recursive structure.
Let T(n) denote the running time of our algorithm on an input of size n. We analyze what happens at each level of recursion.
Analyze each step:
1. Divide (compute midpoint):
mid = (low + high) // 22. Conquer (recursive calls):
3. Combine (cross-boundary + comparison):
4. Base case:
The Recurrence:
Combining these observations:
$$T(n) = \begin{cases} O(1) & \text{if } n = 1 \ 2T(n/2) + O(n) & \text{if } n > 1 \end{cases}$$
Or more precisely, for constants a, b:
$$T(n) = 2T(n/2) + cn$$
where:
This is exactly the form amenable to the Master Theorem.
The O(n) cross-boundary work is what determines the overall complexity. If cross-boundary were O(n²), the algorithm would be O(n² log n) or worse. The efficient linear-time cross-boundary computation is what makes D&C competitive.
The Master Theorem provides a direct formula for solving recurrences of the form:
$$T(n) = aT(n/b) + f(n)$$
where:
The theorem compares f(n) with n^(log_b(a)) to determine the overall complexity.
Applying to Maximum Subarray:
Our recurrence: T(n) = 2T(n/2) + cn
Parameters:
Compute the critical exponent:
Compare f(n) with n^(log_b(a)):
This is Case 2 of the Master Theorem.
By Case 2 of the Master Theorem: T(n) = Θ(n^(log_b(a)) × log n) = Θ(n × log n) = Θ(n log n). The D&C maximum subarray algorithm runs in O(n log n) time.
The Master Theorem gives us the answer, but let's develop deep intuition for why this complexity arises. Understanding the "why" helps you predict complexity for new algorithms.
The Recursion Tree Perspective:
Visualize the algorithm as a tree where each node represents a subproblem:
[n] Level 0: 1 problem of size n
/ \
[n/2] [n/2] Level 1: 2 problems of size n/2
/ \ / \
[n/4][n/4][n/4][n/4] Level 2: 4 problems of size n/4
⋮ ⋮ ⋮ ⋮
[1][1][1]...[1][1][1] Level log₂n: n problems of size 1
Work at each level:
Key observation: Every level does exactly cn work!
| Level | Problems | Size Each | Work per Problem | Total Work |
|---|---|---|---|---|
| 0 | 1 | n | cn | cn |
| 1 | 2 | n/2 | c(n/2) | cn |
| 2 | 4 | n/4 | c(n/4) | cn |
| 3 | 8 | n/8 | c(n/8) | cn |
| ... | ... | ... | ... | cn |
| log₂n | n | 1 | c | cn |
Total work:
This is the essence of Master Theorem Case 2: when the work at each level is the same, we multiply the per-level work by the number of levels.
Case 2 occurs at a "balance point" where the reduction in problem size exactly compensates for the increase in number of subproblems. This balance (a = b in our case) is characteristic of 'n log n' algorithms like merge sort.
For completeness, let's solve the recurrence directly using the substitution method. This builds skills for recurrences where the Master Theorem doesn't apply.
Recurrence: T(n) = 2T(n/2) + cn, T(1) = c
Guess: T(n) = O(n log n)
Proof by strong induction:
Inductive Hypothesis: Assume T(k) ≤ dk log k for all k < n, for some constant d to be determined.
Inductive Step: Prove T(n) ≤ dn log n.
$$T(n) = 2T(n/2) + cn$$ $$\leq 2 \cdot d(n/2)\log(n/2) + cn \quad \text{(by IH)}$$ $$= dn \cdot \log(n/2) + cn$$ $$= dn \cdot (\log n - \log 2) + cn$$ $$= dn \cdot (\log n - 1) + cn$$ $$= dn \log n - dn + cn$$ $$= dn \log n - (d - c)n$$ $$\leq dn \log n \quad \text{(provided } d \geq c \text{)}$$
Base Case: For small n, T(n) is bounded by a constant, satisfied by choosing d large enough.
Conclusion: T(n) = O(n log n). ∎
A symmetric argument shows T(n) = Ω(n log n). Together, this gives T(n) = Θ(n log n)—the algorithm is asymptotically tight at this complexity; we can't do better with D&C and O(n) combine work.
Time complexity isn't the only resource to consider. Let's analyze the space used by our algorithm.
Sources of space usage:
1. Input Storage: O(n)
2. Recursion Stack: O(log n)
3. Cross-Boundary Variables: O(1)
Total Additional Space: O(log n)
This compares favorably to algorithms that require O(n) auxiliary space (like merge sort, which needs an auxiliary array for merging).
| Component | Space | Notes |
|---|---|---|
| Input array | O(n) | Required regardless of algorithm |
| Recursion stack | O(log n) | log₂n depth × O(1) per frame |
| Cross-boundary vars | O(1) | Just counters and running sums |
| Total extra space | O(log n) | Dominated by recursion depth |
The O(log n) stack space comes from recursion. An iterative bottom-up D&C implementation could reduce this to O(1) extra space, though it's more complex to implement. Kadane's algorithm (which we'll see next) achieves O(1) space naturally.
Let's place the D&C algorithm in context by comparing it with other approaches to maximum subarray.
| Algorithm | Time | Space | Key Insight |
|---|---|---|---|
| Brute Force (cubic) | O(n³) | O(1) | Enumerate all pairs, sum each |
| Brute Force (quadratic) | O(n²) | O(1) | Running sum optimization |
| Divide & Conquer | O(n log n) | O(log n) | Cross-boundary decomposition |
| Kadane's Algorithm | O(n) | O(1) | DP / greedy insight |
Practical performance differences:
For n = 1,000,000:
The D&C algorithm is practical for large inputs, though Kadane's algorithm is still 20× faster asymptotically.
Why isn't D&C optimal here?
The fundamental issue is that D&C solves a problem that's "easier" than the algorithm realizes:
D&C doesn't use information across iterations: Each recursive call is independent; information discovered in the left half doesn't help solve the right half.
The problem has "overlapping subproblems": Many subproblems share common structure that D&C recomputes.
There's a simpler recurrence: Kadane's algorithm exploits the fact that max subarray ending at position i depends only on max subarray ending at position i-1.
D&C is like using a sledgehammer when a screwdriver suffices—it works, but it's not the right tool.
D&C isn't always optimal even when it works. For maximum subarray, the O(n log n) D&C solution is beaten by Kadane's O(n) algorithm. The value of studying D&C here is learning the technique for problems where it IS optimal (like closest pair of points).
To appreciate the D&C approach fully, let's understand when O(n log n) is actually optimal for D&C algorithms.
What makes these different from maximum subarray?
In problems where D&C is optimal, there's typically:
Maximum subarray lacks these properties: Kadane's algorithm shows a simpler recurrence exists, and information (the running maximum) does propagate usefully across positions.
The fact that D&C isn't optimal for maximum subarray is part of its pedagogical value. It teaches you to always ask: 'Is there a simpler approach?' rather than reflexively applying D&C because the problem can be divided.
Asymptotic complexity tells only part of the story. In practice, constant factors, cache behavior, and recursion overhead matter.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
def find_maximum_subarray_optimized(arr, low, high, threshold=32): """ Optimized D&C with threshold for switching to brute force. For small subproblems, brute force has lower overhead. """ # Switch to brute force for small subproblems if high - low + 1 <= threshold: return brute_force_max_subarray(arr, low, high) # Standard D&C for larger problems mid = (low + high) // 2 left = find_maximum_subarray_optimized(arr, low, mid, threshold) right = find_maximum_subarray_optimized(arr, mid + 1, high, threshold) cross = find_max_crossing_subarray(arr, low, mid, high) if left[2] >= right[2] and left[2] >= cross[2]: return left elif right[2] >= cross[2]: return right else: return cross def brute_force_max_subarray(arr, low, high): """O(n²) brute force for small ranges.""" max_sum = arr[low] max_left, max_right = low, low for i in range(low, high + 1): current_sum = 0 for j in range(i, high + 1): current_sum += arr[j] if current_sum > max_sum: max_sum = current_sum max_left, max_right = i, j return (max_left, max_right, max_sum) # Empirical measurementimport time def benchmark(func, arr, iterations=100): """Benchmark a function.""" start = time.perf_counter() for _ in range(iterations): func(arr, 0, len(arr) - 1) return (time.perf_counter() - start) / iterations * 1000 # ms # Compare pure D&C vs hybridarr = list(range(-500, 500)) # 1000 elementsimport randomrandom.shuffle(arr) # Results typically show: hybrid is 10-30% faster for moderate nThe optimal threshold depends on hardware and implementation. Common values range from 16 to 64. Profile on your target hardware to tune this parameter. Most standard library implementations (e.g., for sorting) use similar hybrid approaches.
What's Next:
We've established that D&C gives O(n log n) for maximum subarray. But an O(n) algorithm exists! In the next page, we'll explore Kadane's algorithm and understand why D&C, while correct and reasonable, isn't the best approach for this particular problem.
You now understand the rigorous complexity analysis of the D&C maximum subarray algorithm. You can derive O(n log n) using both the Master Theorem and direct substitution, understand why this bound arises, and appreciate its practical implications. Next, we'll see why D&C isn't the final word on this problem.