Loading content...
In an ideal world, sorted arrays would contain unique elements. But reality is messy—duplicates are everywhere:
Duplicates fundamentally change the nature of binary search. The question shifts from "Where is the element?" to "Which one of the matching elements do I want?" This page consolidates everything we've learned about leftmost and rightmost search into a unified framework for handling any duplicate scenario.
The Problem with Standard Binary Search:
Standard binary search returns some matching element but provides no guarantee about which one. For unique elements, this is fine. For duplicates, it's a source of subtle bugs.
By the end of this page, you will: (1) Understand the complete taxonomy of duplicate-related queries, (2) Know which binary search variant to use for each case, (3) Implement robust solutions that handle all edge cases, (4) Recognize common interview patterns involving duplicates.
When an array contains duplicates, there are many distinct questions you might want to answer. Understanding this taxonomy helps you map problems to the correct algorithm.
| Query Type | Question | Algorithm | Result for [1,2,3,3,3,4] target=3 |
|---|---|---|---|
| Existence | Does target exist? | Standard BS or check boundaries | True |
| Any Index | An index containing target? | Standard binary search | 2, 3, or 4 (any) |
| First Index | Index of first occurrence? | Leftmost (bisect_left) | 2 |
| Last Index | Index of last occurrence? | Rightmost (bisect_right) - 1 | 4 |
| Count | How many occurrences? | bisect_right - bisect_left | 3 |
| Range | [first, last] indices? | Two binary searches | [2, 4] |
| Insertion Left | Where to insert (before equals)? | bisect_left | 2 |
| Insertion Right | Where to insert (after equals)? | bisect_right | 5 |
The Key Insight:
Every duplicate-related query can be answered with at most two boundary-finding binary searches (leftmost and rightmost). No query requires more than O(log n) time, regardless of how many duplicates exist.
Let's visualize the structure of duplicates in a sorted array:
Boundaries Fully Characterize Duplicates:
For target = 3 in [1, 2, 3, 3, 3, 3, 4, 5]:
From these two values, we can derive everything:
Rather than implementing ad-hoc binary searches for each query type, we can build a unified helper that answers all duplicate-related queries. This approach reduces bugs and increases code reuse.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
import bisectfrom typing import List, Tuple, Optional class DuplicateHelper: """ A helper class that answers all duplicate-related queries on a sorted array. All operations are O(log n). """ def __init__(self, arr: List[int]): """ Initialize with a sorted array. Does NOT copy the array - mutations will affect this helper. """ self.arr = arr # ===================================================== # Core boundaries - everything builds on these # ===================================================== def left_boundary(self, target: int) -> int: """ First position where arr[i] >= target. Same as bisect_left(arr, target). """ return bisect.bisect_left(self.arr, target) def right_boundary(self, target: int) -> int: """ First position where arr[i] > target. Same as bisect_right(arr, target). """ return bisect.bisect_right(self.arr, target) # ===================================================== # Derived queries # ===================================================== def exists(self, target: int) -> bool: """Check if target exists in array.""" pos = self.left_boundary(target) return pos < len(self.arr) and self.arr[pos] == target def first_index(self, target: int) -> int: """ Index of first occurrence of target. Returns -1 if not found. """ pos = self.left_boundary(target) if pos < len(self.arr) and self.arr[pos] == target: return pos return -1 def last_index(self, target: int) -> int: """ Index of last occurrence of target. Returns -1 if not found. """ pos = self.right_boundary(target) if pos > 0 and self.arr[pos - 1] == target: return pos - 1 return -1 def count(self, target: int) -> int: """Count occurrences of target.""" return self.right_boundary(target) - self.left_boundary(target) def range(self, target: int) -> Tuple[int, int]: """ Return [first, last] indices of target. Returns (-1, -1) if not found. """ left = self.left_boundary(target) if left >= len(self.arr) or self.arr[left] != target: return (-1, -1) right = self.right_boundary(target) return (left, right - 1) # ===================================================== # Advanced queries # ===================================================== def count_less_than(self, target: int) -> int: """Count elements strictly less than target.""" return self.left_boundary(target) def count_less_or_equal(self, target: int) -> int: """Count elements less than or equal to target.""" return self.right_boundary(target) def count_greater_than(self, target: int) -> int: """Count elements strictly greater than target.""" return len(self.arr) - self.right_boundary(target) def count_greater_or_equal(self, target: int) -> int: """Count elements greater than or equal to target.""" return len(self.arr) - self.left_boundary(target) def count_in_range(self, low: int, high: int) -> int: """Count elements in [low, high] inclusive.""" return self.right_boundary(high) - self.left_boundary(low) def ceiling(self, target: int) -> Optional[int]: """Smallest element >= target, or None.""" pos = self.left_boundary(target) return self.arr[pos] if pos < len(self.arr) else None def floor(self, target: int) -> Optional[int]: """Largest element <= target, or None.""" pos = self.right_boundary(target) return self.arr[pos - 1] if pos > 0 else None # Comprehensive demonstrationdef demo(): arr = [1, 2, 3, 3, 3, 3, 4, 5] helper = DuplicateHelper(arr) print(f"Array: {arr}") print(f"Target: 3") print() print("Basic queries:") print(f" Exists? {helper.exists(3)}") print(f" First index: {helper.first_index(3)}") print(f" Last index: {helper.last_index(3)}") print(f" Count: {helper.count(3)}") print(f" Range: {helper.range(3)}") print() print("Counting queries:") print(f" Count < 3: {helper.count_less_than(3)}") print(f" Count <= 3: {helper.count_less_or_equal(3)}") print(f" Count > 3: {helper.count_greater_than(3)}") print(f" Count >= 3: {helper.count_greater_or_equal(3)}") print(f" Count in [2,4]: {helper.count_in_range(2, 4)}") print() print("Non-existent target (3.5):") print(f" Exists? {helper.exists(3.5)}") print(f" First index: {helper.first_index(3.5)}") print(f" Count: {helper.count(3.5)}") print(f" Ceiling: {helper.ceiling(3.5)}") # 4 print(f" Floor: {helper.floor(3.5)}") # 3 if __name__ == "__main__": demo()The DuplicateHelper class demonstrates a clean design: implement only the two core operations (left_boundary and right_boundary), then derive everything else. This minimizes the code you need to test and debug. If your boundaries are correct, all derived operations are automatically correct.
Even experienced developers make mistakes when handling duplicates. Here are the most common pitfalls and how to avoid them.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125
import bisect # =====================================================# PITFALL 1: Linear scan after binary search# ===================================================== def first_occurrence_WRONG(arr, target): """ WRONG: Finds any match then scans left. O(n) in worst case when all elements match! """ left, right = 0, len(arr) - 1 while left <= right: mid = (left + right) // 2 if arr[mid] == target: # BUG: Linear scan destroys complexity while mid > 0 and arr[mid - 1] == target: mid -= 1 return mid elif arr[mid] < target: left = mid + 1 else: right = mid - 1 return -1 def first_occurrence_CORRECT(arr, target): """ CORRECT: Pure binary search, O(log n) guaranteed. """ pos = bisect.bisect_left(arr, target) if pos < len(arr) and arr[pos] == target: return pos return -1 # Demonstrate the differencearr = [5] * 1000000 # 1 million 5simport time # WRONG approach (commented out - too slow!)# start = time.time()# first_occurrence_WRONG(arr, 5)# print(f"Wrong approach: {time.time() - start:.4f}s") # ~0.5 seconds # CORRECT approachstart = time.time()for _ in range(10000): # Run 10000 times! first_occurrence_CORRECT(arr, 5)print(f"Correct approach (10000 runs): {time.time() - start:.4f}s") # ~0.1 seconds # =====================================================# PITFALL 2: Confusing left and right boundaries# ===================================================== def last_occurrence_WRONG_V1(arr, target): """ WRONG: Uses bisect_left instead of bisect_right. Only works if target appears exactly once! """ pos = bisect.bisect_left(arr, target) if pos < len(arr) and arr[pos] == target: return pos # Bug: This is FIRST occurrence, not last! return -1 def last_occurrence_WRONG_V2(arr, target): """ WRONG: Forgets to subtract 1 from right boundary. Always returns index AFTER last occurrence! """ pos = bisect.bisect_right(arr, target) return pos # Bug: This is one PAST the last occurrence! def last_occurrence_CORRECT(arr, target): """ CORRECT: Uses bisect_right and subtracts 1. """ pos = bisect.bisect_right(arr, target) if pos > 0 and arr[pos - 1] == target: return pos - 1 return -1 # Testarr = [1, 3, 3, 3, 5]print(f"\nLast 3 in {arr}:")print(f" WRONG V1: {last_occurrence_WRONG_V1(arr, 3)}") # 1 (first, not last!)print(f" WRONG V2: {last_occurrence_WRONG_V2(arr, 3)}") # 4 (one past last!)print(f" CORRECT: {last_occurrence_CORRECT(arr, 3)}") # 3 (correct!) # =====================================================# PITFALL 3: Missing bounds check# ===================================================== def ceiling_WRONG(arr, target): """ WRONG: Doesn't check if position is valid. Crashes when target > all elements! """ pos = bisect.bisect_left(arr, target) return arr[pos] # IndexError when target > all! def ceiling_CORRECT(arr, target): """ CORRECT: Checks bounds before accessing. """ pos = bisect.bisect_left(arr, target) if pos < len(arr): return arr[pos] return None # No ceiling exists # Testarr = [1, 3, 5]print(f"\nCeiling of 10 in {arr}:")try: print(f" WRONG: {ceiling_WRONG(arr, 10)}")except IndexError: print(f" WRONG: IndexError!")print(f" CORRECT: {ceiling_CORRECT(arr, 10)}") # NoneThe worst mistake is adding a linear scan after finding a match. It feels intuitive ("just move left until you find the first one") but destroys your O(log n) guarantee. When ALL elements match the target, you scan the ENTIRE array—becoming O(n). Always use the proper leftmost/rightmost binary search instead.
Several classic coding interview problems test your ability to handle duplicates correctly. Let's examine the most common patterns.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172
import bisectfrom typing import List # =====================================================# Pattern 1: LeetCode 34 - Find First and Last Position# ===================================================== def searchRange(nums: List[int], target: int) -> List[int]: """ Given a sorted array with duplicates, find the starting and ending position of a given target value. Return [-1, -1] if target is not found. Time: O(log n), Space: O(1) """ left = bisect.bisect_left(nums, target) # Check if target exists if left >= len(nums) or nums[left] != target: return [-1, -1] right = bisect.bisect_right(nums, target) return [left, right - 1] # Testprint("LeetCode 34: Find First and Last Position")print(f" searchRange([5,7,7,8,8,10], 8) = {searchRange([5,7,7,8,8,10], 8)}") # [3, 4]print(f" searchRange([5,7,7,8,8,10], 6) = {searchRange([5,7,7,8,8,10], 6)}") # [-1, -1] # =====================================================# Pattern 2: Count Occurrences in Sorted Array# ===================================================== def countOccurrences(nums: List[int], target: int) -> int: """ Count how many times target appears in a sorted array. Time: O(log n), Space: O(1) """ return bisect.bisect_right(nums, target) - bisect.bisect_left(nums, target) print(f"\nCount of 3 in [1,2,3,3,3,4,5]: {countOccurrences([1,2,3,3,3,4,5], 3)}") # 3 # =====================================================# Pattern 3: Search in Sorted Array (with duplicates consideration)# ===================================================== def search(nums: List[int], target: int) -> bool: """ Determine if target exists in sorted array (may have duplicates). Note: When deciding between bisect_left and standard binary search, bisect_left is more robust because it gives you the definitive position. """ pos = bisect.bisect_left(nums, target) return pos < len(nums) and nums[pos] == target # =====================================================# Pattern 4: Median of Sorted Arrays (duplicate-aware)# ===================================================== def findDuplicatePairs(nums: List[int]) -> int: """ Count pairs (i, j) where i < j and nums[i] == nums[j]. Key insight: For each unique value that appears k times, there are k*(k-1)/2 pairs. Using binary search to find count of each value: O(n log n) (Better approach: single pass with sortedcontainers) """ if not nums: return 0 total_pairs = 0 i = 0 while i < len(nums): # Find range of current value left = i right = bisect.bisect_right(nums, nums[i]) count = right - left # k values create k*(k-1)/2 pairs total_pairs += count * (count - 1) // 2 # Move to next distinct value i = right return total_pairs print(f"\nDuplicate pairs in [1,2,2,2,3,3]: {findDuplicatePairs([1,2,2,2,3,3])}") # 3+1=4 # =====================================================# Pattern 5: Removing Duplicates (find boundaries)# ===================================================== def removeDuplicatesInRange(nums: List[int], left: int, right: int) -> List[int]: """ Remove all occurrences of a specific value from array. Uses binary search to find the range efficiently. Returns new array without values in [left, right] range. """ if not nums: return [] start = bisect.bisect_left(nums, left) end = bisect.bisect_right(nums, right) # Remove elements in [start, end) return nums[:start] + nums[end:] arr = [1, 2, 3, 3, 3, 4, 5, 5, 6]print(f"\nRemove [3,5] from {arr}: {removeDuplicatesInRange(arr, 3, 5)}") # [1, 2, 6] # =====================================================# Pattern 6: Find K Closest Elements# ===================================================== def findKClosest(nums: List[int], k: int, x: int) -> List[int]: """ Find k elements closest to x in a sorted array. Approach: 1. Find insertion position of x 2. Expand window outward, comparing distances Handles duplicates naturally - returns first k in sorted order. Time: O(log n + k) """ if not nums: return [] # Find the position where x would be inserted pos = bisect.bisect_left(nums, x) # Two pointers expanding outward left = pos - 1 right = pos result = [] while len(result) < k and (left >= 0 or right < len(nums)): if left < 0: result.append(nums[right]) right += 1 elif right >= len(nums): result.insert(0, nums[left]) left -= 1 elif x - nums[left] <= nums[right] - x: result.insert(0, nums[left]) left -= 1 else: result.append(nums[right]) right += 1 return result arr = [1, 2, 3, 3, 4, 5, 6]print(f"\n3 closest to 3.5 in {arr}: {findKClosest(arr, 3, 4)}") # [3, 3, 4] or [3, 4, 5]"Find First and Last Position of Element in Sorted Array" (LeetCode 34) is THE canonical duplicate-handling problem. If you can solve it correctly, you understand duplicate handling. It requires exactly two binary searches: leftmost for the first position, rightmost for the last.
Duplicates can appear in arrays with special properties (rotated, sorted in segments, etc.). These require careful adaptation of binary search to preserve correctness.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
from typing import List # =====================================================# Rotated Sorted Array WITH Duplicates# (LeetCode 81)# ===================================================== def searchRotatedWithDuplicates(nums: List[int], target: int) -> bool: """ Search in a rotated sorted array that may contain duplicates. The key challenge: when nums[left] == nums[mid] == nums[right], we can't determine which half is sorted! Solution: shrink the search space by one when this happens. Time: O(log n) average, O(n) worst case (all elements equal) """ if not nums: return False left, right = 0, len(nums) - 1 while left <= right: mid = left + (right - left) // 2 if nums[mid] == target: return True # THE CRITICAL CASE: can't determine sorted half if nums[left] == nums[mid] == nums[right]: # Shrink from both ends left += 1 right -= 1 # Left half is sorted elif nums[left] <= nums[mid]: if nums[left] <= target < nums[mid]: right = mid - 1 else: left = mid + 1 # Right half is sorted else: if nums[mid] < target <= nums[right]: left = mid + 1 else: right = mid - 1 return False # Test casesprint("Rotated array with duplicates:")print(f" [2,5,6,0,0,1,2], target=0: {searchRotatedWithDuplicates([2,5,6,0,0,1,2], 0)}") # Trueprint(f" [2,5,6,0,0,1,2], target=3: {searchRotatedWithDuplicates([2,5,6,0,0,1,2], 3)}") # Falseprint(f" [1,1,1,1,1,2,1], target=2: {searchRotatedWithDuplicates([1,1,1,1,1,2,1], 2)}") # True # =====================================================# Find Minimum in Rotated Sorted Array with Duplicates# (LeetCode 154)# ===================================================== def findMinWithDuplicates(nums: List[int]) -> int: """ Find minimum element in a rotated sorted array with duplicates. Same challenge as search: when nums[mid] == nums[right], we can't determine if minimum is in left or right half. Solution: shrink right by 1 when this happens. Time: O(log n) average, O(n) worst case """ left, right = 0, len(nums) - 1 while left < right: mid = left + (right - left) // 2 if nums[mid] > nums[right]: # Minimum is in right half (excluding mid) left = mid + 1 elif nums[mid] < nums[right]: # Minimum is in left half (including mid) right = mid else: # nums[mid] == nums[right], can't determine # Safe to shrink right by 1 right -= 1 return nums[left] print("\nFind minimum in rotated array with duplicates:")print(f" [2,2,2,0,1]: {findMinWithDuplicates([2,2,2,0,1])}") # 0print(f" [3,3,1,3]: {findMinWithDuplicates([3,3,1,3])}") # 1print(f" [1,1,1,1]: {findMinWithDuplicates([1,1,1,1])}") # 1 # =====================================================# Peak Element with Plateau (Duplicates)# ===================================================== def findPeakWithPlateau(nums: List[int]) -> int: """ Find a peak element in array that may have plateaus (duplicates). A peak is an element strictly greater than its neighbors. Edge values are considered peaks if greater than their single neighbor. Note: Standard binary search doesn't work with plateaus. We need to handle the equal case specially. """ if not nums: return -1 n = len(nums) # Edge cases if n == 1: return 0 if nums[0] > nums[1]: return 0 if nums[n-1] > nums[n-2]: return n - 1 left, right = 1, n - 2 while left <= right: mid = left + (right - left) // 2 # Check if mid is a peak if nums[mid] > nums[mid-1] and nums[mid] > nums[mid+1]: return mid # If in a plateau, try to escape it if nums[mid-1] == nums[mid] == nums[mid+1]: # Plateau - move in any direction left += 1 elif nums[mid] < nums[mid+1]: # Ascending - peak is to the right left = mid + 1 else: # Descending - peak is to the left right = mid - 1 return -1 # No peak found (shouldn't happen with valid input) print("\nPeak element with plateau:")print(f" [1,2,2,2,3,2,1]: {findPeakWithPlateau([1,2,2,2,3,2,1])}") # 4 (value 3)In some special cases (like rotated arrays with duplicates), the presence of duplicates can force us into O(n) worst-case behavior. When elements are identical, we can't determine which half to search—we have to shrink by one at a time. This is an inherent limitation, not a bug in the algorithm. The average case is still O(log n) for random inputs.
Handling duplicates correctly is what separates robust binary search implementations from buggy ones. With the boundary-based approach, you can confidently solve any duplicate-related query.
| Query | Implementation |
|---|---|
| First occurrence | bisect_left, then check if target exists |
| Last occurrence | bisect_right - 1, then check if target exists |
| Count | bisect_right - bisect_left |
| Range [first, last] | [bisect_left, bisect_right - 1] |
| Exists? | bisect_left != bisect_right OR check arr[pos] |
| Insert keeping sorted | Use bisect.insort or insert at bisect_left/right |
What's Next:
In the final page of this module, we'll tackle Off-by-One Errors and How to Avoid Them—the most common source of bugs in binary search implementations. You'll learn systematic techniques for getting the boundary conditions right every time.
You've mastered the complete taxonomy of duplicate-related queries and know how to handle each one correctly. With the unified DuplicateHelper pattern, you can confidently tackle any duplicate scenario in interviews and production code.