Loading learning content...
In divide-and-conquer algorithms, the DIVIDE phase is where you establish the structure of the solution. How you divide the problem determines the algorithm's efficiency, correctness, and applicability. For binary search, the division strategy seems almost trivially simple: find the middle, split in two.
But this simplicity is deceptive. Behind the straightforward midpoint calculation lies a rich set of decisions and considerations that, when understood deeply, illuminate why binary search works and how to adapt it to new problems.
This page explores the DIVIDE phase of binary search in exhaustive detail. You'll understand exactly how the search space is split, why the midpoint is chosen the way it is, how to avoid common pitfalls like integer overflow, and how alternative division strategies affect the algorithm.
Before we can divide, we must precisely define what we're dividing. In binary search, the search space is the range of indices (or values) where the target might exist. This definition is crucial because:
Formal Definition:
At any point during binary search, the search space is defined by two boundaries:
left: the smallest index that might contain the targetright: the largest index that might contain the targetThe search space is the closed interval [left, right], containing right - left + 1 elements.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
def visualize_search_space(arr: list[int], left: int, right: int, target: int): """ Visualize the current search space within the array. The search space [left, right] represents our invariant: - If target exists in arr, it exists in arr[left:right+1] - Elements outside this range have been PROVEN to not be the target """ n = len(arr) # Build visualization indices = ' '.join(f'{i:3}' for i in range(n)) values = ' '.join(f'{arr[i]:3}' for i in range(n)) # Mark search space markers = [] for i in range(n): if i < left or i > right: markers.append(' × ') # Eliminated elif i == left == right: markers.append(' ◆ ') # Single element remaining elif i == left: markers.append(' ◄ ') # Left boundary elif i == right: markers.append(' ► ') # Right boundary else: markers.append(' • ') # In search space print(f"Target: {target}") print(f"Indices: {indices}") print(f"Values: {values}") print(f"Space: {''.join(markers)}") print(f"Search Space: [{left}, {right}] — {right - left + 1} elements") print() # Example: Trace search space evolutionarr = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]target = 23 print("=== Initial State ===")visualize_search_space(arr, 0, 9, target) print("=== After comparing with arr[4]=16 < 23, eliminate left half ===")visualize_search_space(arr, 5, 9, target) print("=== After comparing with arr[7]=56 > 23, eliminate right portion ===")visualize_search_space(arr, 5, 6, target) print("=== After comparing with arr[5]=23 == 23, FOUND! ===")visualize_search_space(arr, 5, 5, target)Search Space Invariant:
The correctness of binary search rests on maintaining this invariant:
At all times, if
targetexists inarr, thenarr[left] ≤ target ≤ arr[right].
This invariant holds initially (when the search space is the entire array) and must be preserved by every operation. The divide phase is responsible for shrinking the search space while maintaining the invariant.
Some implementations use half-open intervals [left, right) where right is exclusive. Both conventions work, but mixing them causes bugs. This curriculum uses closed intervals [left, right] where both endpoints are inclusive. The termination condition becomes left > right (empty interval).
The core of the divide phase is computing the midpoint. This seemingly trivial operation has surprising depth. Let's examine it carefully.
The Goal:
Given a search space [left, right], select an index mid such that:
left ≤ mid ≤ right (mid is within the search space)The Naive Approach:
mid = (left + right) / 2
This works mathematically but has a subtle problem: integer overflow.
| Language | left | right | left + right | Result |
|---|---|---|---|---|
| Java (int) | 1,500,000,000 | 2,000,000,000 | 3,500,000,000 | Overflow! Negative number |
| C (int32) | 2,147,483,000 | 500,000,000 | 2,647,483,000 | Overflow! Undefined behavior |
| Python | Any | Any | Correct | Arbitrary precision (no overflow) |
| JavaScript | Large | Large | Usually OK | 53-bit precision only |
The Safe Approach:
mid = left + (right - left) / 2
This formula is mathematically equivalent but overflow-safe:
right - left is always non-negative (since left ≤ right)(right - left) / 2 is at most (right - left), well within boundsleft gives a value between left and rightFloor vs Ceiling:
Integer division truncates toward zero. For the range [5, 6]:
mid = 5 + (6 - 5) / 2 = 5 + 0 = 5 (left-biased)Some variants need right-biased (ceiling) midpoint:
mid = left + (right - left + 1) / 2 = 5 + 1 = 6The choice affects behavior with duplicates and edge cases.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
def demonstrate_midpoint_calculation(): """ Demonstrate different midpoint calculation methods and their implications. """ # Although Python handles arbitrary integers, # understanding the safe formula is important # for languages with fixed-size integers. test_cases = [ (0, 9), # Standard case (5, 5), # Single element (5, 6), # Two elements (odd behavior) (0, 1), # Two elements at start (0, 100), # Larger range ] print("Midpoint Calculations:") print("-" * 60) for left, right in test_cases: # Method 1: Naive (overflow-prone in some languages) mid_naive = (left + right) // 2 # Method 2: Overflow-safe (floor/left-biased) mid_safe_floor = left + (right - left) // 2 # Method 3: Overflow-safe (ceiling/right-biased) mid_safe_ceil = left + (right - left + 1) // 2 # Method 4: Bit manipulation (also safe) mid_bit = (left + right) >> 1 # Unsigned right shift print(f"[{left:3}, {right:3}] → " f"floor: {mid_safe_floor}, " f"ceil: {mid_safe_ceil}, " f"elements: {right - left + 1}") print() print("Key insight: For [5, 6]:") print(f" Floor midpoint: 5 + (6-5)//2 = 5 (left element)") print(f" Ceiling midpoint: 5 + (6-5+1)//2 = 6 (right element)") print() print("This matters when searching for boundaries in duplicate arrays!") demonstrate_midpoint_calculation() # Why this matters in practice:def find_first_vs_last_occurrence_demo(): """ Show how midpoint choice affects first/last occurrence search. """ arr = [1, 2, 2, 2, 3] print("Array: [1, 2, 2, 2, 3]") print() print("Finding FIRST occurrence of 2:") print(" - We need floor midpoint (left-biased)") print(" - When arr[mid] == target, search LEFT") print() print("Finding LAST occurrence of 2:") print(" - We might use ceiling midpoint (right-biased)") print(" - When arr[mid] == target, search RIGHT") print(" - Or use floor midpoint with careful boundary handling") find_first_vs_last_occurrence_demo()In some low-level contexts, you might see mid = (left + right) >> 1 using unsigned right shift. This avoids overflow issues in languages with unsigned arithmetic but requires care with signed integers. The left + (right - left) / 2 formula is clearer and equally efficient.
We always split the search space at the midpoint—dividing it roughly in half. But why is halving optimal? Could we divide into thirds (ternary search) or some other ratio?
Information-Theoretic Perspective:
Each comparison in binary search yields one bit of information: either the target is in the left half or the right half (or we found it). To distinguish between n elements, we need at least log₂(n) bits of information. Binary search achieves this lower bound—it's information-theoretically optimal for comparison-based searching.
Worst-Case Analysis:
The worst case occurs when the target is not in the array or is at an extreme position. In n elements, after k comparisons:
Binary search guarantees finding the answer (or confirming absence) in at most ⌈log₂(n)⌉ comparisons.
| Array Size (n) | Max Comparisons (log₂ n) | Linear Search Worst Case |
|---|---|---|
| 10 | 4 | 10 |
| 100 | 7 | 100 |
| 1,000 | 10 | 1,000 |
| 1,000,000 | 20 | 1,000,000 |
| 1,000,000,000 | 30 | 1,000,000,000 |
| 10^18 | 60 | 10^18 |
What About Ternary Search?
Suppose we divide into three parts instead of two. We'd need two comparisons to determine which third contains the target. After k 'rounds' of ternary search:
Ternary search uses about 26% MORE comparisons than binary search! The extra information from dividing into three parts doesn't compensate for the extra comparison required.
General Result:
For k-ary search (dividing into k parts), we need (k-1) comparisons to identify which part contains the target:
This is minimized when k = 2 (binary search). Higher k values increase comparison count.
Ternary search is useful for finding extrema (min/max) of unimodal functions, NOT for searching sorted arrays. For sorted data, binary is always optimal. The confusion arises because both 'ternary' applications exist, but they solve different problems.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import math def compare_search_strategies(): """ Compare the number of comparisons needed by different search strategies for various array sizes. """ print("Comparisons needed for searching (worst case):") print("-" * 70) print(f"{'Array Size':<20} {'Binary (k=2)':<15} {'Ternary (k=3)':<15} {'10-ary (k=10)':<15}") print("-" * 70) for n in [10, 100, 1000, 1_000_000, 1_000_000_000]: # Binary: 1 comparison per division, log_2(n) divisions binary = math.ceil(math.log2(n)) # Ternary: 2 comparisons per division, log_3(n) divisions ternary = math.ceil(2 * math.log(n, 3)) # 10-ary: 9 comparisons per division, log_10(n) divisions ten_ary = math.ceil(9 * math.log10(n)) print(f"{n:<20} {binary:<15} {ternary:<15} {ten_ary:<15}") print() print("Conclusion: Binary search is optimal for sorted array lookup!") print() # Mathematical proof print("Mathematical proof that k=2 is optimal:") print("-" * 50) print("For k-ary search, total comparisons = (k-1) × log_k(n)") print(" = (k-1) × log_2(n) / log_2(k)") print() print("Define f(k) = (k-1) / log_2(k)") print() for k in range(2, 11): ratio = (k - 1) / math.log2(k) print(f"k = {k}: ratio = {ratio:.4f} (×{ratio:.2f} of binary search)") print() print("Minimum at k=2 with ratio = 1.0") compare_search_strategies()When we split the search space [left, right] at midpoint mid, we create two potential subspaces:
[left, mid - 1] containing elements less than arr[mid][mid + 1, right] containing elements greater than arr[mid]Note that mid itself is excluded from both subspaces—we've already examined it!
Subspace Properties:
The Comparison Determines Selection:
After computing mid, we compare arr[mid] with target:
| Comparison | Result | Action |
|---|---|---|
arr[mid] == target | Found | Return mid |
arr[mid] < target | Target > mid element | Search right subspace [mid+1, right] |
arr[mid] > target | Target < mid element | Search left subspace [left, mid-1] |
This is where the sorted property is crucial. Because the array is sorted:
target > arr[mid], target cannot be in arr[left..mid]target < arr[mid], target cannot be in arr[mid..right]We can safely eliminate half the array with a single comparison!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
def trace_subspace_creation(arr: list[int], target: int) -> int: """ Binary search with detailed tracing of subspace creation. Shows how each division creates two potential subspaces and how the comparison determines which one to pursue. """ left, right = 0, len(arr) - 1 iteration = 1 print(f"Array: {arr}") print(f"Target: {target}") print("=" * 60) while left <= right: mid = left + (right - left) // 2 # Current state print(f"\nIteration {iteration}:") print(f" Search space: [{left}, {right}] = {arr[left:right+1]}") print(f" Midpoint: mid = {mid}, arr[mid] = {arr[mid]}") # Show the two potential subspaces left_subspace = arr[left:mid] if mid > left else [] right_subspace = arr[mid+1:right+1] if mid < right else [] print(f" Left subspace: [{left}, {mid-1}] = {left_subspace}") print(f" Right subspace: [{mid+1}, {right}] = {right_subspace}") if arr[mid] == target: print(f" Compare: arr[{mid}] = {arr[mid]} == {target}") print(f" ✓ FOUND at index {mid}") return mid elif arr[mid] < target: print(f" Compare: arr[{mid}] = {arr[mid]} < {target}") print(f" → Target must be in RIGHT subspace") print(f" → Eliminating left subspace and mid") left = mid + 1 else: print(f" Compare: arr[{mid}] = {arr[mid]} > {target}") print(f" → Target must be in LEFT subspace") print(f" → Eliminating right subspace and mid") right = mid - 1 iteration += 1 print(f"\nSearch space exhausted: [{left}, {right}]") print(f"✗ Target {target} NOT FOUND") return -1 # Example tracesarr = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91] print("\n" + "=" * 60)print("EXAMPLE 1: Target exists")print("=" * 60)trace_subspace_creation(arr, 23) print("\n" + "=" * 60)print("EXAMPLE 2: Target does not exist")print("=" * 60)trace_subspace_creation(arr, 40)The key insight isn't just that we 'pick a subspace to search.' It's that we ELIMINATE the other subspace completely. We prove, using the sorted property, that the target cannot be there. This is what makes binary search so efficient—we don't just ignore elements, we provably exclude them.
The divide phase seems straightforward, but edge cases lurk. Understanding and correctly handling these is what separates robust implementations from buggy ones.
Edge Case 1: Single Element
When left == right, the search space contains exactly one element. The division produces:
mid = left = right[left, mid-1] = [left, left-1] — empty![mid+1, right] = [right+1, right] — empty!Both subspaces are empty, which is correct—if the single element isn't the target, the search should terminate.
Edge Case 2: Two Elements
When left + 1 == right, we have exactly two elements:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
def analyze_two_element_case(): """ Detailed analysis of the two-element edge case. This is where many off-by-one bugs originate! """ left, right = 5, 6 # Two elements # Floor (left-biased) midpoint mid_floor = left + (right - left) // 2 # = 5 + (6 - 5) // 2 = 5 + 0 = 5 # Ceiling (right-biased) midpoint mid_ceil = left + (right - left + 1) // 2 # = 5 + (6 - 5 + 1) // 2 = 5 + 1 = 6 print("Two-element case: [left=5, right=6]") print() print("Using FLOOR midpoint (standard):") print(f" mid = {mid_floor}") print(f" If target < arr[{mid_floor}]: new right = {mid_floor - 1} = 4") print(f" → left=5 > right=4 → TERMINATE (not found)") print(f" If target > arr[{mid_floor}]: new left = {mid_floor + 1} = 6") print(f" → left=6, right=6 → ONE ELEMENT LEFT") print() print("Using CEILING midpoint (alternative):") print(f" mid = {mid_ceil}") print(f" If target < arr[{mid_ceil}]: new right = {mid_ceil - 1} = 5") print(f" → left=5, right=5 → ONE ELEMENT LEFT") print(f" If target > arr[{mid_ceil}]: new left = {mid_ceil + 1} = 7") print(f" → left=7 > right=6 → TERMINATE (not found)") print() print("Both work correctly, but choice affects loop behavior!") analyze_two_element_case() def demonstrate_empty_array(): """Edge case: empty array.""" arr = [] target = 42 left, right = 0, len(arr) - 1 # left=0, right=-1 print("Empty array case:") print(f" left = 0, right = -1") print(f" Condition 'left <= right' is {0 <= -1} = False") print(f" Loop never executes → return -1 (not found)") print(" ✓ Correct behavior!") demonstrate_empty_array() def demonstrate_single_element(): """Edge case: single element array.""" arr = [42] target_found = 42 target_not_found = 7 print() print("Single element array: [42]") print() for target in [target_found, target_not_found]: left, right = 0, 0 mid = left + (right - left) // 2 # = 0 print(f"Searching for {target}:") print(f" left=0, right=0, mid=0") print(f" arr[0]={arr[0]}, target={target}") if arr[mid] == target: print(f" Found! Return 0") elif arr[mid] < target: print(f" {arr[mid]} < {target}, so left = mid + 1 = 1") print(f" Now left=1 > right=0 → TERMINATE, not found") else: print(f" {arr[mid]} > {target}, so right = mid - 1 = -1") print(f" Now left=0 > right=-1 → TERMINATE, not found") print() demonstrate_single_element()A correct divide phase must guarantee progress toward termination. Specifically, the new search space must be STRICTLY SMALLER than the previous one. If you update left = mid or right = mid (without the +1 or -1), you risk infinite loops when the search space has two elements!
Edge Case 3: Off-by-One Errors
The most common bugs in binary search involve off-by-one errors:
| Mistake | Consequence |
|---|---|
left = mid instead of left = mid + 1 | Infinite loop possible |
right = mid instead of right = mid - 1 | Infinite loop possible |
left < right instead of left <= right | Miss single-element search |
right = n instead of right = n - 1 | Index out of bounds |
Each of these represents a failure in the divide phase—either failing to shrink the search space or shrinking it incorrectly.
The Invariant Protection:
To avoid these bugs, always think in terms of the invariant:
Different binary search variants modify the division strategy. Understanding these modifications as adjustments to the DIVIDE phase provides a unified view.
Variant 1: Find Insertion Position (Lower Bound)
Instead of finding exact match, find the leftmost position where we could insert target while maintaining sorted order.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
def binary_search_lower_bound(arr: list[int], target: int) -> int: """ Find the leftmost position where target could be inserted. Returns the first index where arr[index] >= target. Division modification: - We don't return early on finding target - We always narrow to find the LEFTMOST valid position """ left, right = 0, len(arr) # Note: right = len(arr), not len(arr) - 1 while left < right: # Note: not 'left <= right' mid = left + (right - left) // 2 if arr[mid] < target: # arr[mid] is too small, search right left = mid + 1 else: # arr[mid] >= target, could be answer, keep searching left right = mid # Note: not mid - 1 (mid could be the answer!) return left # First position where arr[left] >= target def binary_search_upper_bound(arr: list[int], target: int) -> int: """ Find the rightmost position after all occurrences of target. Returns the first index where arr[index] > target. Division modification: - We continue narrowing even when arr[mid] == target - We search RIGHT of equal elements """ left, right = 0, len(arr) while left < right: mid = left + (right - left) // 2 if arr[mid] <= target: # Note: <= not < # arr[mid] is not past target yet, search right left = mid + 1 else: # arr[mid] > target, could be answer, keep searching left right = mid return left # First position where arr[left] > target # Demonstrationarr = [1, 3, 3, 3, 5, 7] print(f"Array: {arr}")print() for target in [0, 1, 3, 4, 7, 10]: lower = binary_search_lower_bound(arr, target) upper = binary_search_upper_bound(arr, target) count = upper - lower print(f"Target {target}:") print(f" Lower bound: {lower} (first position >= {target})") print(f" Upper bound: {upper} (first position > {target})") print(f" Count of {target}s: {count}") print()Key Differences in Division:
| Standard Search | Lower/Upper Bound |
|---|---|
right = n - 1 (closed interval) | right = n (half-open interval) |
left <= right (include single element) | left < right (different termination) |
right = mid - 1 (exclude mid on left search) | right = mid (mid could be answer) |
| Returns early when found | Always narrows completely |
Why Half-Open Intervals:
For boundary-finding variants, half-open intervals [left, right) are often cleaner:
right - leftleft == rightnBoth closed intervals [left, right] and half-open intervals [left, right) work correctly. The key is consistency: choose one convention and stick to it. Closed intervals match the intuitive 'all elements from left to right' semantics. Half-open intervals simplify boundary calculations. Most libraries (C++ STL, Python bisect) use half-open intervals.
A visual understanding of how division works helps build intuition. Consider binary search as a decision tree where each node represents a comparison, and each edge represents a direction (left or right).
Tree Properties:
Height is log₂(n): The tree has at most ⌈log₂(n)⌉ levels, matching our time complexity analysis.
Each level halves the space: At each level, the search space size roughly halves.
Leaves are termination points: Either we find the target or determine it doesn't exist.
Path length = comparisons: The number of comparisons equals the path length from root to termination.
Space Reduction Over Levels:
For an array of 16 elements:
After just 4 comparisons, we can search through 16 elements. After 20 comparisons, we could search through over a million elements!
Viewing binary search as a tree traversal shows that the DIVIDE phase corresponds to choosing an edge at each node. The midpoint calculation determines the node's pivot value, and the comparison determines which edge to follow. The tree structure is implicit—we don't build it, we just navigate it.
We've explored the DIVIDE phase of binary search in comprehensive detail. Let's consolidate the key insights:
[left, right] — This interval contains all candidates. The invariant is that the target, if present, lies within.left + (right - left) / 2 to avoid integer overflow. Choose floor or ceiling based on variant needs.[left, mid-1] and right subspace [mid+1, right] are disjoint and ordered.What's Next:
With the DIVIDE phase understood, we'll examine the CONQUER phase in the next page. We'll see how binary search recursively (or iteratively) processes the selected subspace, why only one subproblem needs solving, and how this leads to the decrease-and-conquer classification.
You now deeply understand the DIVIDE phase of binary search—how search spaces are defined, how midpoints are calculated safely, why binary division is optimal, and how variants modify the division strategy. This foundation prepares you for understanding the CONQUER phase next.