Data Structures & AlgorithmsBinary Search — Core Algorithm

Binary Search — Core Algorithm

LevelBeginner

Duration75 mins

TopicBinary Search — Core Algorithm

2 / 4

The Divide-and-Eliminate Strategy

The Art of Strategic Elimination

Binary search is often called a "divide and conquer" algorithm, but this label obscures its true nature. Unlike merge sort or quick sort, which genuinely divide problems into independent subproblems, binary search does something more elegant: it eliminates.

Each step of binary search doesn't create new subproblems to solve—it discards half of all remaining possibilities. It's not "divide and conquer"; it's divide and eliminate. This distinction matters because it reveals the algorithm's fundamental insight: we don't need to examine most of the data to find what we're looking for.

Consider searching through a telephone book with 1,000,000 names. Linear search might require examining all million entries. Binary search guarantees you'll find any name in at most 20 comparisons. This isn't magic—it's the power of strategic elimination.

What You Will Learn

By the end of this page, you will understand:

• The divide-and-eliminate strategy as the conceptual heart of binary search • How selecting a midpoint enables maximum information gain per comparison • The geometric intuition behind logarithmic efficiency • Why this strategy is optimal for sorted data search • The connection to information theory and decision trees

The Elimination Insight

Let's trace the reasoning that leads to binary search.

The Naive Approach: Linear Search

Given a sorted array and a target value, linear search examines elements one by one:

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]  Target: 13
 ↑
Check 1: not 13
    ↑
   Check 2: not 13
       ↑
      Check 3: not 13
          ... (continue)

Each comparison eliminates exactly one element. For n elements, we might need n comparisons. But we're ignoring crucial information: the data is sorted.

The Key Question:

If we can only make one comparison before deciding which direction to search, which element should we examine to eliminate the most possibilities?

The Answer: The Middle Element

In sorted data, examining the middle element is optimal because:

If the middle equals our target → we're done
If the middle is too large → target must be in the left half (eliminates right half)
If the middle is too small → target must be in the right half (eliminates left half)

Crucially, in cases 2 and 3, we eliminate half of all remaining elements with a single comparison. No other element choice guarantees as much elimination.

Why the Middle?

If we pick an element near the beginning, we might eliminate very few elements (if the target is larger). If we pick near the end, we might eliminate very few (if the target is smaller). The middle guarantees we eliminate at least half, regardless of which half contains the target.

Examining First Element

•Best case: target is first → 1 comparison
•Worst case: target is last → n comparisons
•Average: n/2 comparisons
•No guaranteed elimination ratio

Examining Middle Element

•Best case: target is middle → 1 comparison
•Worst case: target is at edge → log n comparisons
•Average: log n comparisons
•Guarantees ≥50% elimination each step

Visualizing the Elimination Process

Let's trace binary search step by step to see elimination in action.

Example: Searching for 14 in a sorted array

Initial array (16 elements):
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31]
                    ↑
                  mid=15

Step 1: Compare 15 with 14
        15 > 14, so target is in LEFT half
        Eliminate 8 elements (indices 8-15)

        [1, 3, 5, 7, 9, 11, 13, 15] | eliminated: [17, 19, 21, 23, 25, 27, 29, 31]
              ↑
            mid=7

Step 2: Compare 7 with 14
        7 < 14, so target is in RIGHT half
        Eliminate 4 elements (indices 0-3)

        eliminated: [1, 3, 5, 7] | [9, 11, 13, 15]
                                        ↑
                                      mid=11

Step 3: Compare 11 with 14
        11 < 14, so target is in RIGHT half
        Eliminate 2 elements

        [13, 15]
          ↑
        mid=13

Step 4: Compare 13 with 14
        13 < 14, so target is in RIGHT half
        
        [15]
         ↑
        mid=15

Step 5: Compare 15 with 14
        15 > 14, left half is empty
        Target NOT FOUND!

Observations:

Exponential elimination: 16 → 8 → 4 → 2 → 1 elements
5 comparisons for 16 elements: log₂(16) = 4, plus 1 for the final check
Same steps for any target: Whether searching for 1, 17, or 99, we need at most ⌈log₂(n)⌉ + 1 comparisons

The beauty is in the efficiency: we examined only 5 of 16 elements. For 1 million elements, we'd examine only ~20.

Elimination Progress Through Binary Search
Step	Elements Remaining	Eliminated This Step	Cumulative Comparisons
Start	n	0	0
1	n/2	n/2	1
2	n/4	n/4	2
3	n/8	n/8	3
k	n/2^k	n/2^k	k
log₂(n)	1	1	log₂(n)

The Halving Pattern

Each step halves the search space. Starting with n elements, after k steps we have n/2^k elements. We're done when n/2^k ≤ 1, which happens when k ≥ log₂(n). This is why binary search takes O(log n) time.

The Mathematical Foundation

Let's formalize the divide-and-eliminate strategy mathematically.

The Recurrence Relation:

Let T(n) represent the maximum number of comparisons needed for binary search on n elements. Each step:

Makes 1 comparison
Reduces the problem to at most n/2 elements

This gives us:

T(n) = T(n/2) + 1 T(1) = 1 (base case: one element requires one comparison)

Solving the Recurrence:

Unrolling the recurrence:

T(n) = T(n/2) + 1
= T(n/4) + 1 + 1
= T(n/8) + 1 + 1 + 1
= T(n/2^k) + k

We stop when n/2^k = 1, i.e., k = log₂(n)

T(n) = T(1) + log₂(n) = 1 + log₂(n) = O(log n)

Logarithmic Growth:

Logarithmic functions grow incredibly slowly. To appreciate this:

n (elements)	log₂(n) (comparisons)	Ratio improvement over linear
10	~3.3	3x faster
100	~6.6	15x faster
1,000	~10	100x faster
1,000,000	~20	50,000x faster
1,000,000,000	~30	33,000,000x faster

For a billion elements, binary search needs about 30 comparisons. Linear search might need a billion. The gap is astronomical.

Intuition: Powers of 2

The number of binary search steps equals the number of times you can halve n before reaching 1. This is precisely what logarithms measure: log₂(n) answers 'how many times must I multiply 2 by itself to get n?' or equivalently 'how many times can I divide n by 2?'

Information-Theoretic Perspective

There's a beautiful connection between binary search and information theory that reveals why log n is optimal.

The Search as Information Game:

Before searching, the target could be at any of n positions—we have log₂(n) bits of uncertainty. Each yes/no comparison provides at most 1 bit of information. Therefore, we need at least log₂(n) comparisons to uniquely identify the target's position.

Binary Search is Optimal:

Binary search achieves this theoretical minimum. Each comparison ("Is target ≤ mid?") perfectly splits our uncertainty in half—exactly 1 bit of information per comparison. No comparison-based algorithm can do better on worst-case inputs.

information_perspective.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import math
 
def uncertainty_bits(n: int) -> float:
    """
    Calculate bits of uncertainty for n equally likely positions.
    """
    return math.log2(n)
 
 
def remaining_uncertainty(n: int, comparisons: int) -> float:
    """
    After k comparisons, remaining uncertainty bits.
    """
    remaining_elements = n / (2 ** comparisons)
    if remaining_elements <= 1:
        return 0
    return math.log2(remaining_elements)
 
 
# Trace information gain through binary search
n = 1024  # 2^10 = 1024 elements
 
print(f"Searching in {n} elements")
print(f"Initial uncertainty: {uncertainty_bits(n):.1f} bits")
print()
 
for k in range(11):
    remaining = remaining_uncertainty(n, k)
    elements_left = max(1, n // (2 ** k))
    info_gained = uncertainty_bits(n) - remaining
    
    print(f"After {k:2d} comparisons: "
          f"{elements_left:4d} candidates, "
          f"{remaining:.1f} bits uncertainty, "
          f"{info_gained:.1f} bits gained")
    
    if remaining == 0:
        print(f"
✓ Position uniquely identified in {k} comparisons")
        print(f"  (log₂({n}) = {math.log2(n):.1f})")
        break

Decision Tree Interpretation:

Binary search can be visualized as traversing a binary decision tree:

Each internal node represents a comparison
Left child: target < midpoint value
Right child: target ≥ midpoint value
Leaves: final answer (found or not found)

The depth of this tree is log₂(n), meaning any search follows a path of at most log₂(n) decisions. This is optimal: a binary tree of depth d has at most 2^d leaves, so to distinguish n positions, we need depth at least log₂(n).

Lower Bound Proof

Any comparison-based search algorithm on n elements requires Ω(log n) comparisons in the worst case. Binary search achieves O(log n). Therefore, binary search is asymptotically optimal for searching sorted arrays using comparisons.

The Strategy in Detail: Step-by-Step Logic

Let's crystallize the divide-and-eliminate strategy into precise steps.

Core Strategy:

Identify the bounds: Know where the search space starts and ends
Find the midpoint: Calculate the middle position of current bounds
Compare: Check the midpoint value against the target
Eliminate: Based on comparison, discard half the search space
Repeat: Apply steps 2-4 to the remaining space
Terminate: Stop when found or search space is empty

The Loop Invariant:

At every iteration, we maintain the invariant:

If the target exists in the array, it must be within the current search bounds [left, right]

This invariant is:

True initially: left = 0, right = n-1 includes all elements
Maintained each step: We only narrow bounds when we've proven the target cannot be outside them
Useful at termination: When bounds cross (left > right), the search space is empty, so target doesn't exist

strategy_trace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def binary_search_explained(arr: list[int], target: int) -> int:
    """
    Binary search with detailed tracing of the divide-and-eliminate strategy.
    """
    left = 0
    right = len(arr) - 1
    step = 0
    
    print(f"Searching for {target} in {arr}")
    print(f"{'='*60}")
    
    while left <= right:
        step += 1
        mid = (left + right) // 2
        current_space = arr[left:right+1]
        
        print(f"\nStep {step}:")
        print(f"  Search space: indices [{left}, {right}] → {current_space}")
        print(f"  Midpoint: index {mid}, value {arr[mid]}")
        
        if arr[mid] == target:
            print(f"  Comparison: {arr[mid]} == {target} ✓ FOUND!")
            print(f"\nResult: Found at index {mid} in {step} comparisons")
            return mid
        elif arr[mid] < target:
            print(f"  Comparison: {arr[mid]} < {target}")
            print(f"  Decision: Target must be in RIGHT half")
            eliminated = arr[left:mid+1]
            print(f"  Eliminated: indices [{left}, {mid}] → {eliminated}")
            left = mid + 1
        else:
            print(f"  Comparison: {arr[mid]} > {target}")
            print(f"  Decision: Target must be in LEFT half")
            eliminated = arr[mid:right+1]
            print(f"  Eliminated: indices [{mid}, {right}] → {eliminated}")
            right = mid - 1
    
    print(f"\nStep {step + 1}:")
    print(f"  Search space: empty (left={left} > right={right})")
    print(f"\nResult: Not found after {step} comparisons")
    return -1
 
 
# Demo
arr = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
 
print("\n" + "="*60)
print("EXAMPLE 1: Searching for an existing element")
print("="*60)
binary_search_explained(arr, 23)
 
print("\n" + "="*60)
print("EXAMPLE 2: Searching for a non-existing element")
print("="*60)
binary_search_explained(arr, 25)

The Termination Guarantee

Binary search always terminates because the search space shrinks by at least 1 element each iteration. When left > right, the space is empty and we return 'not found'. This guarantee holds even for empty arrays (the loop never executes, returning -1 immediately).

Midpoint Calculation: A Critical Detail

The midpoint calculation seems trivial: mid = (left + right) / 2. But this simple formula hides a critical bug that has plagued binary search implementations for decades.

The Overflow Problem:

Consider searching in an array with 2 billion elements (within 32-bit integer range). If left = 1,500,000,000 and right = 2,000,000,000:

left + right = 3,500,000,000

This exceeds the maximum 32-bit signed integer (~2.1 billion), causing integer overflow. The result wraps around to a negative number, producing a negative index—a catastrophic bug.

midpoint_calculation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Python handles arbitrary-precision integers, so overflow isn't
# a problem. However, understanding the issue is crucial for other languages.
 
import sys
 
def demonstrate_overflow_concept():
    """
    Demonstrates the midpoint overflow problem conceptually.
    In languages with fixed-width integers, this would overflow.
    """
    # Simulating 32-bit signed integer limits
    INT32_MAX = 2**31 - 1  # 2,147,483,647
    
    left = 1_500_000_000
    right = 2_000_000_000
    
    print(f"left = {left:,}")
    print(f"right = {right:,}")
    print(f"INT32_MAX = {INT32_MAX:,}")
    print()
    
    # Problematic approach
    naive_sum = left + right
    print(f"left + right = {naive_sum:,}")
    print(f"Exceeds INT32_MAX? {naive_sum > INT32_MAX}")
    
    if naive_sum > INT32_MAX:
        # Simulate 32-bit overflow (wrap-around)
        overflowed = naive_sum - 2**32
        print(f"After 32-bit overflow: {overflowed:,}")
        print(f"Divide by 2: {overflowed // 2:,} (NEGATIVE INDEX!)")
    
    print()
    
    # Safe approach: avoid overflow
    safe_mid = left + (right - left) // 2
    print(f"Safe calculation: left + (right - left) // 2")
    print(f"= {left} + ({right} - {left}) // 2")
    print(f"= {left} + {right - left} // 2")
    print(f"= {left} + {(right - left) // 2}")
    print(f"= {safe_mid:,}")
 
 
demonstrate_overflow_concept()
 
# THE CORRECT FORMULAS:
 
def midpoint_safe(left: int, right: int) -> int:
    """
    Overflow-safe midpoint calculation.
    Works for any language with fixed-width integers.
    """
    return left + (right - left) // 2
 
 
def midpoint_bitwise(left: int, right: int) -> int:
    """
    Alternative using bitwise operations.
    Equivalent to (left + right) // 2 but avoids overflow.
    Uses unsigned right shift in languages that support it.
    """
    return (left + right) >> 1  # Safe in Python, but >>> in Java/C#
 
 
# Both methods work correctly
left, right = 1_500_000_000, 2_000_000_000
print(f"\nSafe method: {midpoint_safe(left, right)}")
print(f"Bitwise method: {midpoint_bitwise(left, right)}")

Historical Bug

This overflow bug existed in Java's Arrays.binarySearch() for nearly a decade before being discovered in 2006. The lesson: even 'trivial' code can hide subtle bugs. Always use the safe formula: mid = left + (right - left) / 2

Why the Safe Formula Works:

mid = left + (right - left) / 2

right - left is always ≤ right (since left ≥ 0)
(right - left) / 2 is at most half of right
left + (right - left) / 2 is at most left + right/2, which won't overflow if right didn't

The safe formula rearranges arithmetic to keep intermediate values small.

Comparison with Other Elimination Strategies

Binary search divides by 2, but what about other division factors?

Ternary Search (Divide by 3):

Instead of one midpoint, use two: at 1/3 and 2/3 positions. Each iteration:

Makes 2 comparisons
Eliminates at least 1/3 of elements

After k iterations: n/3^k elements remain Steps to reach 1 element: log₃(n)

Comparisons: 2 × log₃(n) ≈ 2 × (log₂(n) / log₂(3)) ≈ 1.26 × log₂(n)

That's 26% more comparisons than binary search! Ternary search is worse, not better.

Comparison of K-ary Search Strategies
Strategy	Comparisons per Step	Reduction per Step	Total Comparisons (n=1M)
Binary (k=2)	1	50%	~20
Ternary (k=3)	2	67%	~25
Quaternary (k=4)	3	75%	~30
Linear (k=n)	1	0.0001%	~1,000,000

Why Binary is Optimal:

The key insight is information per comparison:

A yes/no comparison yields at most 1 bit of information
To locate 1 position among n requires log₂(n) bits
Binary search extracts 1 bit per comparison → log₂(n) comparisons
K-ary search (k>2) uses more comparisons per step, gaining only marginally more information

Binary search perfectly matches the information content of yes/no comparisons.

When Ternary Search Applies

Ternary search is useful for finding extrema of unimodal functions (one peak or valley), where comparisons aren't simple yes/no but rather 'increasing' vs 'decreasing'. In this context, its characteristics differ from searching sorted arrays.

Summary: The Power of Strategic Elimination

The divide-and-eliminate strategy transforms searching from a linear scan into a logarithmic descent. By choosing the midpoint, we maximize information gain and guarantee that half the remaining possibilities are eliminated with each comparison.

Let's consolidate the key insights:

Key Takeaways

•Elimination over Enumeration — Binary search doesn't examine elements one by one; it eliminates half the possibilities with each comparison.
•The Midpoint is Optimal — Choosing the middle element guarantees at least 50% elimination, regardless of where the target lies.
•Logarithmic Efficiency — Halving n repeatedly takes log₂(n) steps, making binary search exponentially faster than linear search for large datasets.
•Information-Theoretic Optimality — Binary search achieves the theoretical minimum comparisons for any comparison-based search algorithm.
•Careful Implementation — Even simple details like midpoint calculation can hide overflow bugs; use left + (right - left) / 2.
•Binary Beats K-ary — Dividing by factors larger than 2 requires more comparisons per step without proportionally more elimination, making binary optimal.

What's Next:

Now that we understand the strategic foundation, we're ready to translate this insight into actual code. The next page presents both iterative and recursive implementations of binary search, analyzing their trade-offs and helping you write bug-free implementations on your first attempt.

Page Complete

You now understand the divide-and-eliminate strategy—the conceptual heart of binary search. This isn't just an algorithm; it's a way of thinking: every decision should maximally reduce uncertainty. With this mental model, you're ready to implement binary search with confidence.

2 / 4

Loading learning content...

Data Structures & AlgorithmsBinary Search — Core Algorithm

Binary Search — Core Algorithm

LevelBeginner

Duration75 mins

TopicBinary Search — Core Algorithm

2 / 4

The Divide-and-Eliminate Strategy

The Art of Strategic Elimination

What You Will Learn

By the end of this page, you will understand:

The Elimination Insight

Let's trace the reasoning that leads to binary search.

The Naive Approach: Linear Search

Given a sorted array and a target value, linear search examines elements one by one:

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]  Target: 13
 ↑
Check 1: not 13
    ↑
   Check 2: not 13
       ↑
      Check 3: not 13
          ... (continue)

Each comparison eliminates exactly one element. For n elements, we might need n comparisons. But we're ignoring crucial information: the data is sorted.

The Key Question:

If we can only make one comparison before deciding which direction to search, which element should we examine to eliminate the most possibilities?

The Answer: The Middle Element

In sorted data, examining the middle element is optimal because:

If the middle equals our target → we're done
If the middle is too large → target must be in the left half (eliminates right half)
If the middle is too small → target must be in the right half (eliminates left half)

Crucially, in cases 2 and 3, we eliminate half of all remaining elements with a single comparison. No other element choice guarantees as much elimination.

Why the Middle?

Examining First Element

•Best case: target is first → 1 comparison
•Worst case: target is last → n comparisons
•Average: n/2 comparisons
•No guaranteed elimination ratio

Examining Middle Element

•Best case: target is middle → 1 comparison
•Worst case: target is at edge → log n comparisons
•Average: log n comparisons
•Guarantees ≥50% elimination each step

Visualizing the Elimination Process

Let's trace binary search step by step to see elimination in action.

Example: Searching for 14 in a sorted array

Initial array (16 elements):
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31]
                    ↑
                  mid=15

Step 1: Compare 15 with 14
        15 > 14, so target is in LEFT half
        Eliminate 8 elements (indices 8-15)

        [1, 3, 5, 7, 9, 11, 13, 15] | eliminated: [17, 19, 21, 23, 25, 27, 29, 31]
              ↑
            mid=7

Step 2: Compare 7 with 14
        7 < 14, so target is in RIGHT half
        Eliminate 4 elements (indices 0-3)

        eliminated: [1, 3, 5, 7] | [9, 11, 13, 15]
                                        ↑
                                      mid=11

Step 3: Compare 11 with 14
        11 < 14, so target is in RIGHT half
        Eliminate 2 elements

        [13, 15]
          ↑
        mid=13

Step 4: Compare 13 with 14
        13 < 14, so target is in RIGHT half
        
        [15]
         ↑
        mid=15

Step 5: Compare 15 with 14
        15 > 14, left half is empty
        Target NOT FOUND!

Observations:

Exponential elimination: 16 → 8 → 4 → 2 → 1 elements
5 comparisons for 16 elements: log₂(16) = 4, plus 1 for the final check
Same steps for any target: Whether searching for 1, 17, or 99, we need at most ⌈log₂(n)⌉ + 1 comparisons

The beauty is in the efficiency: we examined only 5 of 16 elements. For 1 million elements, we'd examine only ~20.

Elimination Progress Through Binary Search
Step	Elements Remaining	Eliminated This Step	Cumulative Comparisons
Start	n	0	0
1	n/2	n/2	1
2	n/4	n/4	2
3	n/8	n/8	3
k	n/2^k	n/2^k	k
log₂(n)	1	1	log₂(n)

The Halving Pattern

The Mathematical Foundation

Let's formalize the divide-and-eliminate strategy mathematically.

The Recurrence Relation:

Let T(n) represent the maximum number of comparisons needed for binary search on n elements. Each step:

Makes 1 comparison
Reduces the problem to at most n/2 elements

This gives us:

T(n) = T(n/2) + 1 T(1) = 1 (base case: one element requires one comparison)

Solving the Recurrence:

Unrolling the recurrence:

T(n) = T(n/2) + 1
= T(n/4) + 1 + 1
= T(n/8) + 1 + 1 + 1
= T(n/2^k) + k

We stop when n/2^k = 1, i.e., k = log₂(n)

T(n) = T(1) + log₂(n) = 1 + log₂(n) = O(log n)

Logarithmic Growth:

Logarithmic functions grow incredibly slowly. To appreciate this:

n (elements)	log₂(n) (comparisons)	Ratio improvement over linear
10	~3.3	3x faster
100	~6.6	15x faster
1,000	~10	100x faster
1,000,000	~20	50,000x faster
1,000,000,000	~30	33,000,000x faster

For a billion elements, binary search needs about 30 comparisons. Linear search might need a billion. The gap is astronomical.

Intuition: Powers of 2

Information-Theoretic Perspective

There's a beautiful connection between binary search and information theory that reveals why log n is optimal.

The Search as Information Game:

Binary Search is Optimal:

information_perspective.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import math
 
def uncertainty_bits(n: int) -> float:
    """
    Calculate bits of uncertainty for n equally likely positions.
    """
    return math.log2(n)
 
 
def remaining_uncertainty(n: int, comparisons: int) -> float:
    """
    After k comparisons, remaining uncertainty bits.
    """
    remaining_elements = n / (2 ** comparisons)
    if remaining_elements <= 1:
        return 0
    return math.log2(remaining_elements)
 
 
# Trace information gain through binary search
n = 1024  # 2^10 = 1024 elements
 
print(f"Searching in {n} elements")
print(f"Initial uncertainty: {uncertainty_bits(n):.1f} bits")
print()
 
for k in range(11):
    remaining = remaining_uncertainty(n, k)
    elements_left = max(1, n // (2 ** k))
    info_gained = uncertainty_bits(n) - remaining
    
    print(f"After {k:2d} comparisons: "
          f"{elements_left:4d} candidates, "
          f"{remaining:.1f} bits uncertainty, "
          f"{info_gained:.1f} bits gained")
    
    if remaining == 0:
        print(f"
✓ Position uniquely identified in {k} comparisons")
        print(f"  (log₂({n}) = {math.log2(n):.1f})")
        break

Decision Tree Interpretation:

Binary search can be visualized as traversing a binary decision tree:

Each internal node represents a comparison
Left child: target < midpoint value
Right child: target ≥ midpoint value
Leaves: final answer (found or not found)

Lower Bound Proof

The Strategy in Detail: Step-by-Step Logic

Let's crystallize the divide-and-eliminate strategy into precise steps.

Core Strategy:

Identify the bounds: Know where the search space starts and ends
Find the midpoint: Calculate the middle position of current bounds
Compare: Check the midpoint value against the target
Eliminate: Based on comparison, discard half the search space
Repeat: Apply steps 2-4 to the remaining space
Terminate: Stop when found or search space is empty

The Loop Invariant:

At every iteration, we maintain the invariant:

If the target exists in the array, it must be within the current search bounds [left, right]

This invariant is:

True initially: left = 0, right = n-1 includes all elements
Maintained each step: We only narrow bounds when we've proven the target cannot be outside them
Useful at termination: When bounds cross (left > right), the search space is empty, so target doesn't exist

strategy_trace.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def binary_search_explained(arr: list[int], target: int) -> int:
    """
    Binary search with detailed tracing of the divide-and-eliminate strategy.
    """
    left = 0
    right = len(arr) - 1
    step = 0
    
    print(f"Searching for {target} in {arr}")
    print(f"{'='*60}")
    
    while left <= right:
        step += 1
        mid = (left + right) // 2
        current_space = arr[left:right+1]
        
        print(f"\nStep {step}:")
        print(f"  Search space: indices [{left}, {right}] → {current_space}")
        print(f"  Midpoint: index {mid}, value {arr[mid]}")
        
        if arr[mid] == target:
            print(f"  Comparison: {arr[mid]} == {target} ✓ FOUND!")
            print(f"\nResult: Found at index {mid} in {step} comparisons")
            return mid
        elif arr[mid] < target:
            print(f"  Comparison: {arr[mid]} < {target}")
            print(f"  Decision: Target must be in RIGHT half")
            eliminated = arr[left:mid+1]
            print(f"  Eliminated: indices [{left}, {mid}] → {eliminated}")
            left = mid + 1
        else:
            print(f"  Comparison: {arr[mid]} > {target}")
            print(f"  Decision: Target must be in LEFT half")
            eliminated = arr[mid:right+1]
            print(f"  Eliminated: indices [{mid}, {right}] → {eliminated}")
            right = mid - 1
    
    print(f"\nStep {step + 1}:")
    print(f"  Search space: empty (left={left} > right={right})")
    print(f"\nResult: Not found after {step} comparisons")
    return -1
 
 
# Demo
arr = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
 
print("\n" + "="*60)
print("EXAMPLE 1: Searching for an existing element")
print("="*60)
binary_search_explained(arr, 23)
 
print("\n" + "="*60)
print("EXAMPLE 2: Searching for a non-existing element")
print("="*60)
binary_search_explained(arr, 25)

The Termination Guarantee

Midpoint Calculation: A Critical Detail

The midpoint calculation seems trivial: mid = (left + right) / 2. But this simple formula hides a critical bug that has plagued binary search implementations for decades.

The Overflow Problem:

Consider searching in an array with 2 billion elements (within 32-bit integer range). If left = 1,500,000,000 and right = 2,000,000,000:

left + right = 3,500,000,000

This exceeds the maximum 32-bit signed integer (~2.1 billion), causing integer overflow. The result wraps around to a negative number, producing a negative index—a catastrophic bug.

midpoint_calculation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Python handles arbitrary-precision integers, so overflow isn't
# a problem. However, understanding the issue is crucial for other languages.
 
import sys
 
def demonstrate_overflow_concept():
    """
    Demonstrates the midpoint overflow problem conceptually.
    In languages with fixed-width integers, this would overflow.
    """
    # Simulating 32-bit signed integer limits
    INT32_MAX = 2**31 - 1  # 2,147,483,647
    
    left = 1_500_000_000
    right = 2_000_000_000
    
    print(f"left = {left:,}")
    print(f"right = {right:,}")
    print(f"INT32_MAX = {INT32_MAX:,}")
    print()
    
    # Problematic approach
    naive_sum = left + right
    print(f"left + right = {naive_sum:,}")
    print(f"Exceeds INT32_MAX? {naive_sum > INT32_MAX}")
    
    if naive_sum > INT32_MAX:
        # Simulate 32-bit overflow (wrap-around)
        overflowed = naive_sum - 2**32
        print(f"After 32-bit overflow: {overflowed:,}")
        print(f"Divide by 2: {overflowed // 2:,} (NEGATIVE INDEX!)")
    
    print()
    
    # Safe approach: avoid overflow
    safe_mid = left + (right - left) // 2
    print(f"Safe calculation: left + (right - left) // 2")
    print(f"= {left} + ({right} - {left}) // 2")
    print(f"= {left} + {right - left} // 2")
    print(f"= {left} + {(right - left) // 2}")
    print(f"= {safe_mid:,}")
 
 
demonstrate_overflow_concept()
 
# THE CORRECT FORMULAS:
 
def midpoint_safe(left: int, right: int) -> int:
    """
    Overflow-safe midpoint calculation.
    Works for any language with fixed-width integers.
    """
    return left + (right - left) // 2
 
 
def midpoint_bitwise(left: int, right: int) -> int:
    """
    Alternative using bitwise operations.
    Equivalent to (left + right) // 2 but avoids overflow.
    Uses unsigned right shift in languages that support it.
    """
    return (left + right) >> 1  # Safe in Python, but >>> in Java/C#
 
 
# Both methods work correctly
left, right = 1_500_000_000, 2_000_000_000
print(f"\nSafe method: {midpoint_safe(left, right)}")
print(f"Bitwise method: {midpoint_bitwise(left, right)}")

Historical Bug

Why the Safe Formula Works:

mid = left + (right - left) / 2

right - left is always ≤ right (since left ≥ 0)
(right - left) / 2 is at most half of right
left + (right - left) / 2 is at most left + right/2, which won't overflow if right didn't

The safe formula rearranges arithmetic to keep intermediate values small.

Comparison with Other Elimination Strategies

Binary search divides by 2, but what about other division factors?

Ternary Search (Divide by 3):

Instead of one midpoint, use two: at 1/3 and 2/3 positions. Each iteration:

Makes 2 comparisons
Eliminates at least 1/3 of elements

After k iterations: n/3^k elements remain Steps to reach 1 element: log₃(n)

Comparisons: 2 × log₃(n) ≈ 2 × (log₂(n) / log₂(3)) ≈ 1.26 × log₂(n)

That's 26% more comparisons than binary search! Ternary search is worse, not better.

Comparison of K-ary Search Strategies
Strategy	Comparisons per Step	Reduction per Step	Total Comparisons (n=1M)
Binary (k=2)	1	50%	~20
Ternary (k=3)	2	67%	~25
Quaternary (k=4)	3	75%	~30
Linear (k=n)	1	0.0001%	~1,000,000

Why Binary is Optimal:

The key insight is information per comparison:

A yes/no comparison yields at most 1 bit of information
To locate 1 position among n requires log₂(n) bits
Binary search extracts 1 bit per comparison → log₂(n) comparisons
K-ary search (k>2) uses more comparisons per step, gaining only marginally more information

Binary search perfectly matches the information content of yes/no comparisons.

When Ternary Search Applies

Summary: The Power of Strategic Elimination

Let's consolidate the key insights:

Key Takeaways

•Elimination over Enumeration — Binary search doesn't examine elements one by one; it eliminates half the possibilities with each comparison.
•The Midpoint is Optimal — Choosing the middle element guarantees at least 50% elimination, regardless of where the target lies.
•Logarithmic Efficiency — Halving n repeatedly takes log₂(n) steps, making binary search exponentially faster than linear search for large datasets.
•Information-Theoretic Optimality — Binary search achieves the theoretical minimum comparisons for any comparison-based search algorithm.
•Careful Implementation — Even simple details like midpoint calculation can hide overflow bugs; use left + (right - left) / 2.
•Binary Beats K-ary — Dividing by factors larger than 2 requires more comparisons per step without proportionally more elimination, making binary optimal.

What's Next:

Page Complete

2 / 4