Core Array Operations - Learning Module

Loading content...

0/279

Search (Sorted) — O(log n) Binary Search Preview

The Power of Order

In the previous page, we discovered a frustrating truth: searching unsorted data requires O(n) time. Check every element. No shortcuts.

But what if the data is sorted?

Here's a remarkable fact: searching 1 billion sorted elements takes at most 30 comparisons. Not 30 million. Not 30 thousand. Thirty.

This is the power of binary search—one of the most important algorithms in computer science. It transforms the impossible into the trivial.

What You Will Learn

By the end of this page, you will understand the intuition behind binary search, why sorted order enables exponential speedup, and the mathematical foundation of O(log n) complexity. This is a conceptual preview—full implementation details come in dedicated modules.

The Phone Book Intuition

Before formal algorithms, let's think about how humans naturally search sorted data.

The phone book problem:

Imagine you have a physical phone book with millions of names, sorted alphabetically. You want to find 'Nakamura.'

Do you start at page 1 and flip through every page? Of course not. You might:

Open to the middle and see 'Johnson'
'Nakamura' > 'Johnson' alphabetically → Skip the entire first half
Open to the middle of the remaining pages and see 'Peterson'
'Nakamura' < 'Peterson' → Skip the second quarter
Continue halving until you find 'Nakamura'

Each step eliminates half of the remaining possibilities.

The Key Insight

When data is sorted, a single comparison tells you not just one fact ('this element isn't the target') but DIRECTION ('the target is in the left/right half'). Each comparison eliminates 50% of the remaining search space. This is exponentially more powerful than linear search.

Contrast with unsorted data:

If the phone book were shuffled randomly, checking 'Johnson' in the middle tells you nothing about where 'Nakamura' might be. It could be anywhere in the remaining pages. You'd have no choice but to check pages one by one.

Sortedness creates relationships between elements that unsorted data lacks. Each comparison leverages these relationships to eliminate massive portions of the search space.

Binary Search — The Core Algorithm

Binary search formalizes the phone book intuition into a precise algorithm.

The algorithm:

Start with the entire array — low = 0, high = n-1
Find the middle — mid = (low + high) / 2
Compare target with arr[mid]:
- If equal → Found! Return mid
- If target < arr[mid] → Search left half (high = mid - 1)
- If target > arr[mid] → Search right half (low = mid + 1)
Repeat until low > high → Not found

Each iteration halves the search space. After k iterations, only n/2^k elements remain.

binary_search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def binary_search(arr, target):
    """
    Search for target in sorted array.
    Returns index if found, -1 otherwise.
    Time: O(log n), Space: O(1)
    """
    low = 0
    high = len(arr) - 1
    
    while low <= high:
        mid = (low + high) // 2  # Find middle index
        
        if arr[mid] == target:
            return mid           # Found!
        elif target < arr[mid]:
            high = mid - 1       # Search left half
        else:
            low = mid + 1        # Search right half
    
    return -1  # Not found
 
# Example: Find 23 in sorted array
sorted_arr = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
result = binary_search(sorted_arr, 23)
print(f"Found at index: {result}")  # Output: 5
 
# Trace for finding 23:
# Step 1: low=0, high=9, mid=4 → arr[4]=16 < 23 → search right
# Step 2: low=5, high=9, mid=7 → arr[7]=56 > 23 → search left
# Step 3: low=5, high=6, mid=5 → arr[5]=23 = 23 → Found!
# Total: 3 comparisons for 10 elements

Binary Search Requires Sorted Input

Binary search ONLY works on sorted arrays. If you apply it to unsorted data, it will return incorrect results (false negatives or wrong indices). Always verify sortedness before using binary search, or ensure your data pipeline maintains sorted order.

Understanding O(log n) — The Mathematics of Halving

What does O(log n) actually mean, and why is it so powerful?

The logarithm counts halvings:

log₂(n) answers the question: "How many times must I divide n by 2 to get to 1?"

log₂(2) = 1 → Divide once: 2 → 1
log₂(4) = 2 → Divide twice: 4 → 2 → 1
log₂(8) = 3 → Divide thrice: 8 → 4 → 2 → 1
log₂(1024) = 10
log₂(1,000,000) ≈ 20
log₂(1,000,000,000) ≈ 30

Binary search halves the search space each iteration. So searching n elements takes at most log₂(n) iterations.

Binary Search vs Linear Search — Comparisons Needed
Array Size (n)	Linear Search (O(n))	Binary Search (O(log n))	Speedup Factor
10	10	4	2.5×
100	100	7	14×
1,000	1,000	10	100×
1,000,000	1,000,000	20	50,000×
1,000,000,000	1,000,000,000	30	33,000,000×

The Exponential Gap

Notice how the speedup factor grows with n. For a billion elements, binary search is 33 MILLION times faster than linear search. This isn't incremental improvement—it's a completely different scale of performance. O(log n) is why sorted data is so valuable.

Why logarithms grow so slowly:

Logarithms are the inverse of exponentials. Just as 2^30 = 1 billion (exponentially fast growth), log₂(1 billion) = 30 (logarithmically slow growth).

This means:

Doubling your data adds just ONE more comparison
10× your data adds about 3-4 more comparisons
1000× your data adds about 10 more comparisons

O(log n) scales incredibly well. You could search the entire internet's URLs (hundreds of billions) in about 38 comparisons.

Visual Walkthrough — Searching Step by Step

Let's trace binary search visually to see the halving in action.

Example: Find 23 in [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]

Initial: [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
          ^                              ^
        low=0                          high=9
          
Step 1: mid = (0+9)/2 = 4 → arr[4] = 16
        16 < 23 → target is RIGHT of mid
        [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
                          ^              ^
                        low=5          high=9
        (Left half eliminated: 5 elements gone)

Step 2: mid = (5+9)/2 = 7 → arr[7] = 56
        56 > 23 → target is LEFT of mid
        [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
                          ^   ^
                        low=5 high=6
        (Right quarter eliminated: 2 more elements gone)

Step 3: mid = (5+6)/2 = 5 → arr[5] = 23
        23 == 23 → FOUND at index 5!
        
Total comparisons: 3 (not 10 for linear search)

What about searching for 50 (not in array)?

Step 1: mid=4, arr[4]=16 < 50 → search right (low=5)
Step 2: mid=7, arr[7]=56 > 50 → search left (high=6)
Step 3: mid=5, arr[5]=23 < 50 → search right (low=6)
Step 4: mid=6, arr[6]=38 < 50 → search right (low=7)

Now low (7) > high (6) → NOT FOUND
Total comparisons: 4

Even unsuccessful searches take at most log₂(n) comparisons.

Every Comparison Halves the Possibilities

The magic is that each comparison removes ~50% of remaining elements. After 10 comparisons, you've reduced 1,024 elements to 1. After 20 comparisons, you've handled 1 million. After 30, a billion. The reduction is exponential.

Prerequisites — What Binary Search Requires

Binary search isn't a free lunch. It has strict requirements that must be satisfied.

Requirement 1: Sorted Order

•Data MUST be sorted (ascending or descending—you adjust comparisons accordingly)
•If data isn't sorted, binary search gives wrong answers
•Sorting costs O(n log n) — this is the 'entrance fee' for binary search
•If data changes frequently, maintaining sorted order can be expensive

Requirement 2: Random Access (O(1) Index Access)

•Binary search needs to jump to the middle instantly
•Arrays provide O(1) random access — perfect fit
•Linked lists have O(n) access — binary search becomes O(n log n), defeating the purpose
•This is why binary search is fundamentally an array algorithm

The trade-off equation:

Total cost = Sorting cost + (Number of searches × Search cost)

Linear search: 0 + (k × n) = k×n
Binary search: n log n + (k × log n) = n log n + k log n

If k (number of searches) is large, binary search wins. If k is small, the sorting overhead may not be worth it.

When Binary Search Pays Off

Binary search pays off when you search the same data multiple times. If you search once and discard the data, linear search on unsorted data may be faster (no sorting overhead). If you search thousands of times, sort once and enjoy O(log n) per search.

Common Pitfalls in Binary Search

Binary search is famously tricky to implement correctly. Even experienced programmers make subtle errors. Here are the most common mistakes:

Pitfall 1: Integer Overflow in Mid Calculation

•Bad: mid = (low + high) / 2
•If low + high exceeds integer max, it overflows!
•Good: mid = low + (high - low) / 2
•This mathematically equivalent form avoids overflow

Pitfall 2: Off-by-One Errors in Bounds

•Wrong: high = mid (should be mid - 1)
•Wrong: low = mid (should be mid + 1)
•These cause infinite loops when low and high are adjacent
•Always exclude mid from the next search range

Pitfall 3: Wrong Loop Condition

•Wrong: while (low < high) — misses single-element case
•Right: while (low <= high) — checks when low equals high
•The equals case matters for arrays of size 1 or when narrowed to 1 element

The Knuth Quote

"Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky." — Donald Knuth. Even the legendary Knuth acknowledged this. The first correct binary search was published in 1946, but the first bug-free implementation in a major programming language took until 1962. Test your implementations carefully!

Beyond Basic Binary Search — Variations Preview

Binary search isn't just for finding exact values. The core idea—eliminating half the search space—applies to many problems.

Finding boundaries:

Find the first occurrence of a value (leftmost)
Find the last occurrence of a value (rightmost)
Find the insertion point for a value

Searching on answer:

Find the minimum value that satisfies a condition
Find the maximum speed that meets a deadline
Optimize within constraints (binary search on result space, not data)

Example: First and Last Position

In [1, 2, 2, 2, 2, 3, 4], standard binary search might find any of the 2s. But what if you need:

First position of 2 → index 1
Last position of 2 → index 4

These require modified binary search algorithms that continue searching even after finding a match, to find the boundary.

Example: Square Root

Find √x without a sqrt function. Binary search between 0 and x for the largest number whose square is ≤ x. The 'sorted array' here is the implicit sequence 0, 1, 2, 3, ..., x and the 'search' is finding the boundary where n² > x.

The Binary Search Paradigm

Binary search is more than an algorithm—it's a paradigm. Whenever you have a monotonic condition (something that changes from false to true or vice versa once), you can binary search to find the transition point. This insight unlocks dozens of advanced problems.

When to Choose Binary Search

Here's a decision framework for choosing between linear and binary search:

Decision Framework: Linear vs Binary Search
Criterion	Choose Linear Search	Choose Binary Search
Data is sorted?	No / Not feasible to sort	Yes / Can be sorted
Search frequency	One-time or rare	Frequent / Repeated
Data size	Small (< 100)	Large (> 100)
Data structure	Any (linked list OK)	Array (O(1) access)
Insertion frequency	High (order hard to maintain)	Low / Static data
Find exact match only?	Either	Yes (or variations)

Common scenarios for binary search:

Database indexes — Data is sorted by key; queries are frequent
Auto-complete — Prefix matching in sorted word lists
Version bisection — Finding the commit that introduced a bug
Numerical methods — Finding roots, optimizing functions
Competitive programming — Searching answer space, optimization problems

Common scenarios for linear search:

Configuration lookup — Small file, read once
User input validation — Short list of valid options
Finding by predicate — Complex conditions that don't preserve order
Unsorted logs — Searching recent entries without indexing

Summary: The Logarithmic Advantage

We've explored one of the most powerful ideas in algorithm design. Let's consolidate:

Key Takeaways

•Sorted order enables exponential speedup — Each comparison eliminates half the search space
•O(log n) is incredibly efficient — A billion elements in ~30 comparisons
•Binary search requires sorted data and O(1) access — Arrays are the natural fit
•The trade-off is sorting cost — O(n log n) upfront; amortized over many searches
•Common pitfalls exist — Overflow, off-by-one, wrong loop conditions
•Binary search is a paradigm — Applies beyond simple value lookup

What's next:

We've covered access (O(1)) and search (O(n) and O(log n)). But arrays have another critical aspect: modification. What happens when you need to insert or delete elements?

Next, we explore insertion and deletion in arrays, and discover why these operations cost O(n) time—even though access is instant.

Page Complete

You now understand the power of binary search and O(log n) complexity. This conceptual foundation prepares you for deep dives into binary search variations, which you'll encounter in dedicated modules. For now, you know: sorted arrays + binary search = incredibly fast lookup.