Loading learning content...
Standard binary search answers a simple question: "Is this element present in the sorted array?" But real-world problems rarely ask such simple questions. Consider these variations:
These questions all share a common structure: they require finding a boundary in a sorted sequence. Standard binary search falls short because it arbitrarily returns any matching position—but we need the specific position that marks the boundary.
This page introduces leftmost binary search (also called lower bound search), the foundational technique for finding the first occurrence of a target value. Mastering this pattern unlocks a family of powerful variations that form the backbone of advanced binary search applications.
By the end of this page, you will understand: (1) Why standard binary search fails for boundary-finding problems, (2) The loop invariant technique for designing correct binary search variants, (3) The leftmost binary search algorithm with rigorous proof of correctness, (4) How to handle all edge cases including empty arrays, missing targets, and all-matching arrays.
Let's begin by understanding precisely why standard binary search is insufficient for finding the first occurrence of a target. This understanding will motivate our solution.
Standard Binary Search Behavior:
Consider the sorted array [1, 2, 3, 3, 3, 3, 4, 5] and target 3. Standard binary search might return:
Which index gets returned depends entirely on how the array happens to partition during the search. Standard binary search makes no guarantee about which matching element it finds—it only guarantees finding some matching element in O(log n) time.
123456789101112131415161718192021222324252627282930
def standard_binary_search(arr, target): """ Standard binary search - returns ANY index containing target. Returns -1 if target is not found. PROBLEM: No guarantee about WHICH occurrence is returned! """ left, right = 0, len(arr) - 1 while left <= right: mid = left + (right - left) // 2 if arr[mid] == target: return mid # Returns immediately upon finding ANY match elif arr[mid] < target: left = mid + 1 else: right = mid - 1 return -1 # Demonstration of the problemarr = [1, 2, 3, 3, 3, 3, 4, 5]target = 3 result = standard_binary_search(arr, target)print(f"Array: {arr}")print(f"Target: {target}")print(f"Found at index: {result}") # Could be 2, 3, 4, or 5!print(f"First occurrence is at: 2") # But we need THISThe Core Issue:
The problem lies in the early termination on line where we return mid immediately when arr[mid] == target. The algorithm doesn't know whether there are more matching elements to the left of this position. Perhaps there are, perhaps there aren't—but the algorithm never checks.
Naive Fix Attempt:
A tempting but flawed approach is to find any match and then scan leftward:
12345678910111213141516171819202122232425262728293031323334
def naive_first_occurrence(arr, target): """ FLAWED APPROACH: Find any match, then scan left. Time Complexity: O(log n + k) where k is the number of duplicates WORST CASE: O(n) when ALL elements match the target! This destroys the O(log n) guarantee that made binary search valuable! """ left, right = 0, len(arr) - 1 while left <= right: mid = left + (right - left) // 2 if arr[mid] == target: # Found a match! But is it the first? # Naive solution: scan leftward while mid > 0 and arr[mid - 1] == target: mid -= 1 # O(k) additional work! return mid elif arr[mid] < target: left = mid + 1 else: right = mid - 1 return -1 # Worst case demonstrationarr = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5] # All same valuetarget = 5 # Standard binary search is O(log 10) = O(4) comparisons# But then we scan ALL elements left - that's O(n) linear scan!# Total: O(log n + n) = O(n) - we've lost all benefits!When all elements in the array equal the target, the naive approach degrades to O(n) linear time. This is unacceptable—we need an algorithm that maintains O(log n) time complexity regardless of how many duplicates exist. The solution requires fundamentally rethinking how binary search should behave when it finds a match.
To design a correct leftmost binary search, we need to shift our mental model from element finding to boundary finding. This conceptual shift is crucial for understanding all binary search variants.
The Key Insight:
Instead of asking "Where is the target?", we ask:
"Where is the boundary between elements less than the target and elements greater than or equal to the target?"
This boundary is precisely where the first occurrence of the target would be (if it exists). Let's visualize this:
Partitioning the Array:
For any sorted array and target value, we can conceptually partition the array into two regions:
| Region | Elements | Indices (for target=3) |
|---|---|---|
| Left Region | All elements < target | Indices 0, 1 (values 1, 2) |
| Right Region | All elements ≥ target | Indices 2, 3, 4, 5, 6, 7 (values 3, 3, 3, 3, 4, 5) |
The boundary is the first index of the right region—which is exactly what we want!
Critical Observations:
Boundary-finding binary search works by evaluating a predicate function at each position. For leftmost search, the predicate is arr[i] >= target. The predicate is False for all elements in the left region and True for all elements in the right region. We're searching for the first True (the transition point from False to True).
| Index | Value | Predicate: value ≥ 3 | Region |
|---|---|---|---|
| 0 | 1 | False | Left (< target) |
| 1 | 2 | False | Left (< target) |
| 2 | 3 | True | Right (≥ target) ← FIRST TRUE |
| 3 | 3 | True | Right (≥ target) |
| 4 | 3 | True | Right (≥ target) |
| 5 | 3 | True | Right (≥ target) |
| 6 | 4 | True | Right (≥ target) |
| 7 | 5 | True | Right (≥ target) |
Why This Model Works:
The power of this model is that it transforms an ambiguous question ("which matching element?") into an unambiguous one ("where is the boundary?"). Every sorted array has exactly one boundary for any given target—including arrays with zero, one, or many occurrences of the target.
This model also reveals why binary search works on this problem: the predicate function is monotonic (once it becomes True, it stays True). Any monotonic predicate can be searched in O(log n) time using binary search.
Binary search variants are notoriously error-prone. Even experienced engineers struggle with off-by-one errors, infinite loops, and incorrect boundary handling. The key to writing correct binary search code is using loop invariants—precise mathematical statements that remain true throughout the algorithm's execution.
What is a Loop Invariant?
A loop invariant is a condition that:
For leftmost binary search, we'll use two pointers: left and right. The invariant precisely describes what these pointers represent throughout the algorithm.
Invariant: At all times during execution:
• All elements at indices < left are definitely < target (the left region)
• All elements at indices ≥ right are definitely ≥ target (the right region)
• Elements in the range [left, right) are unexplored
When the loop terminates (left == right), the unexplored region is empty, and left points to the boundary—the first position where arr[i] ≥ target.
Visualizing the Invariant:
At any point during execution, the array looks like this:
[ definitely < target ][ unexplored ][ definitely >= target ]
| | |
0 left right len(arr)
Let's trace through an example to see how the invariant is maintained:
| Iteration | left | right | mid | arr[mid] | Comparison | Action | Invariant Status |
|---|---|---|---|---|---|---|---|
| Init | 0 | 8 | Setup | Trivially true: unexplored = [0,8) | |||
| 1 | 0 | 8 | 4 | 3 | 3 >= 3 | right = 4 | Now [4,8) is >= target ✓ |
| 2 | 0 | 4 | 2 | 3 | 3 >= 3 | right = 2 | Now [2,8) is >= target ✓ |
| 3 | 0 | 2 | 1 | 2 | 2 < 3 | left = 2 | Now [0,2) is < target ✓ |
| End | 2 | 2 | Exit loop | left == right == 2, boundary found! |
Why the Invariant Guarantees Correctness:
Initialization: When left = 0 and right = len(arr), the unexplored region is [0, len(arr)) which is the entire array. The invariant is trivially satisfied—there are no elements to the left of left and no elements at or after right.
Maintenance: Each iteration shrinks the unexplored region while preserving the invariant:
arr[mid] < target: We've proven that mid belongs to the left region. Setting left = mid + 1 excludes it from unexplored while maintaining that all indices < left are < target.arr[mid] >= target: We've proven that mid belongs to the right region. Setting right = mid marks it as explored while maintaining that all indices >= right are >= target.Termination: When left == right, the unexplored region [left, right) is empty. By the invariant:
< left are < target≥ left are >= targetleft is exactly the boundary—the first index where arr[left] >= target.This is a crucial detail. When arr[mid] >= target, we know mid is in the right region, but mid might be the FIRST element of the right region—it could be our answer! We can't exclude it. Setting right = mid marks it as explored (the new right boundary) without losing it as a potential answer. Setting right = mid - 1 would be incorrect and could skip the answer entirely.
Now we can implement the algorithm with confidence, knowing exactly why each line is correct based on our loop invariant.
Algorithm Overview:
left = 0, right = len(arr) (note: right is exclusive)left < right (unexplored region is non-empty):
mid = left + (right - left) // 2arr[mid] < target: left = mid + 1arr[mid] >= target): right = midleft (the boundary position)arr[left] == target to confirm the target exists1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
def leftmost_binary_search(arr: list, target: int) -> int: """ Find the leftmost (first) occurrence of target in a sorted array. Returns the index of the first occurrence of target. Returns -1 if target is not in the array. Time Complexity: O(log n) - guaranteed, regardless of duplicates Space Complexity: O(1) Invariant maintained throughout: - All elements at indices < left are definitely < target - All elements at indices >= right are definitely >= target - Elements in [left, right) are unexplored """ if not arr: # Handle empty array edge case return -1 left = 0 right = len(arr) # Note: right is exclusive (len, not len-1) # Loop until unexplored region is empty while left < right: # Compute midpoint (avoiding integer overflow) mid = left + (right - left) // 2 if arr[mid] < target: # arr[mid] is in the left region (< target) # We can exclude it from consideration left = mid + 1 else: # arr[mid] >= target, so it's in the right region # It might be the first occurrence, so don't exclude it! right = mid # left now points to the boundary: # - the first index where arr[left] >= target # - could be len(arr) if all elements are < target # Check if we actually found the target if left < len(arr) and arr[left] == target: return left else: return -1 # Target not found # Comprehensive test casesdef test_leftmost_binary_search(): # Test case 1: Multiple occurrences arr1 = [1, 2, 3, 3, 3, 3, 4, 5] assert leftmost_binary_search(arr1, 3) == 2, "Should find first 3" # Test case 2: Single occurrence arr2 = [1, 2, 3, 4, 5] assert leftmost_binary_search(arr2, 3) == 2, "Should find the only 3" # Test case 3: Target not present arr3 = [1, 2, 4, 5] assert leftmost_binary_search(arr3, 3) == -1, "3 not present" # Test case 4: All same elements arr4 = [3, 3, 3, 3, 3] assert leftmost_binary_search(arr4, 3) == 0, "First of all 3s" # Test case 5: Target at beginning arr5 = [1, 1, 1, 2, 3] assert leftmost_binary_search(arr5, 1) == 0, "First element" # Test case 6: Target at end arr6 = [1, 2, 3, 5, 5, 5] assert leftmost_binary_search(arr6, 5) == 3, "First 5" # Test case 7: Empty array assert leftmost_binary_search([], 3) == -1, "Empty array" # Test case 8: Single element - found assert leftmost_binary_search([3], 3) == 0, "Single element found" # Test case 9: Single element - not found (less) assert leftmost_binary_search([5], 3) == -1, "Single element less" # Test case 10: Single element - not found (greater) assert leftmost_binary_search([1], 3) == -1, "Single element greater" print("All tests passed! ✓") if __name__ == "__main__": test_leftmost_binary_search()Three details often cause bugs:
right = len(arr) (not len(arr) - 1) — The boundary could be at index len(arr) if all elements are less than target.
Loop condition: left < right (not left <= right) — We stop when the unexplored region is empty.
right = mid (not mid - 1) — When arr[mid] >= target, mid could be the answer; don't skip it!
Most programming languages provide built-in functions for leftmost binary search. Understanding these library functions—and their subtle differences from our implementation—is essential for practical programming.
Python: bisect.bisect_left
Python's bisect module provides bisect_left, which finds the leftmost position where an element could be inserted to maintain sorted order. This is exactly our boundary-finding operation!
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
import bisect def find_first_occurrence_bisect(arr: list, target: int) -> int: """ Find first occurrence using Python's bisect_left. bisect_left returns the leftmost position where target could be inserted while maintaining sorted order. This is EXACTLY our boundary position! """ pos = bisect.bisect_left(arr, target) # bisect_left returns a position, not -1 for "not found" # We need to verify that the target actually exists at this position if pos < len(arr) and arr[pos] == target: return pos return -1 # Key insight: bisect_left always returns a valid insertion pointarr = [1, 2, 3, 3, 3, 3, 4, 5] # For existing element: returns index of first occurrenceprint(bisect.bisect_left(arr, 3)) # 2 (first 3) # For non-existing element: returns where it WOULD goprint(bisect.bisect_left(arr, 3.5)) # 6 (between 3s and 4) # For element less than all: returns 0print(bisect.bisect_left(arr, 0)) # 0 # For element greater than all: returns len(arr)print(bisect.bisect_left(arr, 100)) # 8 # IMPORTANT: bisect_left vs our function# - bisect_left ALWAYS returns a valid index (0 to len(arr))# - Our function returns -1 when target doesn't exist# - Choose based on what your problem needs! # Counting elements less than targetdef count_less_than(arr: list, target: int) -> int: """ Count how many elements are strictly less than target. This is just bisect_left(arr, target)! """ return bisect.bisect_left(arr, target) print(f"Elements less than 3: {count_less_than(arr, 3)}") # 2print(f"Elements less than 4: {count_less_than(arr, 4)}") # 6| Language | Function | Returns | Notes |
|---|---|---|---|
| Python | bisect.bisect_left(a, x) | Insertion point (index) | Always returns valid index 0 to len(a) |
| C++ | std::lower_bound(begin, end, x) | Iterator to first >= x | Returns end iterator if all elements < x |
| Java | Arrays.binarySearch(a, x) | Index or -(insertion point + 1) | Negative means not found; decode with -(result + 1) |
| C# | Array.BinarySearch(a, x) | Index or ~(insertion point) | Use bitwise NOT to get insertion point if negative |
| Go | sort.SearchInts(a, x) | Insertion point (index) | Returns len(a) if all elements < x |
Java's Arrays.binarySearch uses a clever encoding: if the element is found, it returns the index; if not found, it returns -(insertionPoint + 1). To extract the insertion point for a 'not found' result: insertionPoint = -(result + 1) or equivalently insertionPoint = ~result (bitwise NOT). However, note that Java's binarySearch doesn't guarantee finding the FIRST occurrence—you may need to implement leftmost search yourself.
A robust implementation must handle all edge cases correctly. Let's analyze each case systematically, verifying our algorithm's behavior at the extremes.
| Edge Case | Array | Target | Expected Result | Boundary Position | Notes |
|---|---|---|---|---|---|
| Empty array | [] | 3 | -1 | 0 | No elements to search |
| Single element - found | [3] | 3 | 0 | 0 | Only element matches |
| Single element - too small | [5] | 3 | -1 | 0 | Target would go before |
| Single element - too large | [1] | 3 | -1 | 1 | Target would go after |
| Target at start | [3, 4, 5] | 3 | 0 | 0 | First element matches |
| Target at end | [1, 2, 3] | 3 | 2 | 2 | Last element matches |
| All same (matches) | [3, 3, 3, 3] | 3 | 0 | 0 | First of many matches |
| All same (no match) | [5, 5, 5, 5] | 3 | -1 | 0 | None match, all greater |
| Target smaller than all | [5, 6, 7, 8] | 3 | -1 | 0 | Would insert at start |
| Target larger than all | [1, 2, 3, 4] | 10 | -1 | 4 | Would insert at end |
| Large gap | [1, 100] | 50 | -1 | 1 | Target falls in gap |
| Duplicates at boundaries | [3, 3, 5, 5] | 5 | 2 | 2 | First of last group |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
import bisect def leftmost_binary_search(arr: list, target: int) -> int: """Leftmost binary search with comprehensive edge case handling.""" if not arr: return -1 left, right = 0, len(arr) while left < right: mid = left + (right - left) // 2 if arr[mid] < target: left = mid + 1 else: right = mid if left < len(arr) and arr[left] == target: return left return -1 def verify_edge_cases(): """ Exhaustive edge case verification with explanations. Each case tests a specific boundary condition. """ test_cases = [ # (array, target, expected, description) ([], 3, -1, "Empty array"), ([3], 3, 0, "Single element - exact match"), ([5], 3, -1, "Single element - target smaller"), ([1], 3, -1, "Single element - target larger"), ([3, 4, 5], 3, 0, "Target is first element"), ([1, 2, 3], 3, 2, "Target is last element"), ([3, 3, 3, 3], 3, 0, "All elements match - return first"), ([5, 5, 5, 5], 3, -1, "All elements greater"), ([1, 1, 1, 1], 3, -1, "All elements smaller"), ([5, 6, 7, 8], 3, -1, "Target smaller than all elements"), ([1, 2, 3, 4], 10, -1, "Target larger than all elements"), ([1, 100], 50, -1, "Target falls in large gap"), ([3, 3, 5, 5], 3, 0, "Duplicates at start"), ([3, 3, 5, 5], 5, 2, "Duplicates at end"), ([1, 2, 3, 3, 3, 3, 4, 5], 3, 2, "Multiple occurrences in middle"), ([1, 1, 1, 1, 1, 1, 1, 2], 2, 7, "Single occurrence at very end"), ([1] * 1000 + [2], 2, 1000, "Large array with single match at end"), ([2] + [3] * 1000, 2, 0, "Large array with single match at start"), (list(range(1000)), 500, 500, "Sequential array"), (list(range(0, 1000, 2)), 500, 250, "Even numbers only - exact match"), (list(range(0, 1000, 2)), 501, -1, "Even numbers only - no match"), ] passed = 0 failed = 0 for arr, target, expected, description in test_cases: result = leftmost_binary_search(arr, target) if result == expected: passed += 1 print(f"✓ {description}") else: failed += 1 print(f"✗ {description}") print(f" Array: {arr[:10]}{'...' if len(arr) > 10 else ''}") print(f" Target: {target}, Expected: {expected}, Got: {result}") print(f"\n{passed}/{passed + failed} tests passed") # Also verify against bisect.bisect_left for consistency print("\nVerifying against bisect.bisect_left...") for arr, target, expected, description in test_cases: boundary = bisect.bisect_left(arr, target) our_result = leftmost_binary_search(arr, target) # Compare: bisect gives boundary, we give index or -1 bisect_based = boundary if (boundary < len(arr) and arr[boundary] == target) else -1 if our_result != bisect_based: print(f" Mismatch! {description}") print(f" Our result: {our_result}, bisect-based: {bisect_based}") if __name__ == "__main__": verify_edge_cases()A single-element array is surprisingly tricky to reason about. With arr = [5] and target = 3, our algorithm sets left=0, right=1. First iteration: mid=0, arr[0]=5 >= 3, so right=0. Now left == right == 0. Check: arr[0]=5 != 3, so return -1. Correct! The algorithm works because the invariant holds even for the smallest non-empty arrays.
Leftmost binary search is a fundamental building block for many practical problems. Here are several important applications:
[1,2,3,3,3,4,5] with target 3, boundary = 2 = count of elements < 3.WHERE date >= '2024-01-01'.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
import bisectfrom typing import List, Tuple, Optional # Application 1: Count elements in a rangedef count_in_range(arr: List[int], low: int, high: int) -> int: """ Count elements where low <= element <= high. Uses leftmost search for lower bound and rightmost for upper bound. Time: O(log n) """ left_boundary = bisect.bisect_left(arr, low) right_boundary = bisect.bisect_right(arr, high) return right_boundary - left_boundary # Example: Count scores between 70 and 90scores = [55, 60, 70, 72, 78, 85, 90, 92, 95, 100]print(f"Scores in [70, 90]: {count_in_range(scores, 70, 90)}") # 5 # Application 2: First event after timestampdef first_event_after(events: List[Tuple[int, str]], timestamp: int) -> Optional[Tuple[int, str]]: """ Find the first event that occurred at or after the given timestamp. Events are sorted by timestamp (first element of tuple). """ if not events: return None # Extract timestamps for binary search timestamps = [e[0] for e in events] pos = bisect.bisect_left(timestamps, timestamp) if pos < len(events): return events[pos] return None events = [ (100, "start"), (150, "process"), (200, "validate"), (250, "complete"),]print(f"First event after 175: {first_event_after(events, 175)}") # (200, 'validate') # Application 3: IP Address Range Lookupdef find_network(ip_ranges: List[Tuple[int, int, str]], ip: int) -> Optional[str]: """ Given sorted IP ranges (start, end, network_name), find which network an IP belongs to. IP ranges are sorted by start address. """ if not ip_ranges: return None # Find the last range whose start <= ip starts = [r[0] for r in ip_ranges] pos = bisect.bisect_right(starts, ip) - 1 if pos >= 0: start, end, name = ip_ranges[pos] if start <= ip <= end: return name return None # Example IP ranges (using simple integers for clarity)networks = [ (0, 99, "Network A"), (100, 199, "Network B"), (200, 299, "Network C"),]print(f"IP 150 is in: {find_network(networks, 150)}") # Network B # Application 4: Grade assignmentdef assign_grade(score: int) -> str: """ Assign letter grade based on score. Uses binary search on thresholds. """ thresholds = [60, 70, 80, 90] # F, D, C, B, A grades = ['F', 'D', 'C', 'B', 'A'] # Find first threshold >= score... wait, we want first threshold > score # Actually: find how many thresholds the score has exceeded pos = bisect.bisect_right(thresholds, score - 1) # -1 because >= threshold gets that grade # Simpler: bisect_right tells us how many thresholds are <= score pos = bisect.bisect_right(thresholds, score) return grades[pos] if pos < len(grades) else grades[-1] # Actually, cleaner version:def assign_grade_v2(score: int) -> str: """Assign letter grade - cleaner implementation.""" thresholds = [0, 60, 70, 80, 90] # Minimum score for F, D, C, B, A grades = ['F', 'D', 'C', 'B', 'A'] pos = bisect.bisect_right(thresholds, score) - 1 return grades[pos] for score in [55, 65, 75, 85, 95]: print(f"Score {score} => {assign_grade_v2(score)}")Finding the first occurrence of a target in a sorted array is a fundamental skill that extends far beyond simple element lookup. Let's consolidate what we've learned:
left is < target, and everything >= right is >= target. This guarantees correctness.right = len(arr) (exclusive), left < right for loop condition, and right = mid (not mid - 1) when arr[mid] >= target.bisect_left, C++'s lower_bound, and Java's binarySearch behave differently. Know your language's semantics.| Property | Value |
|---|---|
| Time Complexity | O(log n) |
| Space Complexity | O(1) |
| Loop Condition | left < right |
| Initial left | 0 |
| Initial right | len(arr) (exclusive) |
| When arr[mid] < target | left = mid + 1 |
| When arr[mid] >= target | right = mid |
| Result | left = boundary position |
| Final Check | arr[left] == target? |
What's Next:
In the next page, we'll explore Finding Last Occurrence (rightmost binary search). You'll see how a simple modification to our predicate and update rules gives us the symmetric operation—finding the rightmost position of a target. Together, leftmost and rightmost search form a powerful toolkit for range queries and duplicate handling.
You've mastered the foundational technique of leftmost binary search. You understand why standard binary search fails, how to reason with loop invariants, and how to handle all edge cases. This pattern—finding a boundary in a sorted sequence—will appear repeatedly as we explore more binary search variations.