Loading content...
Consider three superficially similar questions:
"Is the word 'algorithm' in this book?" — A yes/no answer suffices.
"Where is the word 'algorithm' in this book?" — We need a page number.
"Show me every occurrence of 'algorithm' in this book." — We need all page numbers.
These questions look alike, but they have fundamentally different computational requirements. The first can sometimes be answered in O(1) time with preprocessing; the second typically requires O(log n) with sorted data; the third inherently requires O(k) where k is the number of matches—we cannot avoid examining each match at least once to report it.
Understanding these distinctions is crucial for designing efficient algorithms and choosing appropriate data structures.
By the end of this page, you will understand the three fundamental result types in search—existence, location, and enumeration—and how each type affects algorithm design, complexity analysis, and data structure selection. You'll see why these aren't just variations on a theme but fundamentally different computational problems.
Every search algorithm answers at least one of three fundamental questions:
Output: Boolean (true/false)
Example Queries:
Algorithmic Implications:
Output: Index, position, pointer, or path
Example Queries:
Algorithmic Implications:
Output: Collection of all matches (indices, elements, or paths)
Example Queries:
Algorithmic Implications:
| Characteristic | Existence | Location | All Occurrences |
|---|---|---|---|
| Return Type | Boolean | Index/Position (or null) | Collection of indices |
| Can Early Terminate? | Yes, on first match | Yes, on first match | No—must find all |
| Minimum Work (exists) | O(1) if lucky | O(1) if lucky | O(k) where k = count |
| Minimum Work (not exists) | Depends on structure | Depends on structure | Must verify exhaustively |
| Output Size | O(1) fixed | O(1) fixed | O(k) variable |
| Memory for Output | Constant | Constant | Proportional to matches |
Existence search asks the simplest question: is the target present? Yet this simplicity enables sophisticated optimizations impossible for other search types.
Because we only need a boolean answer, we can:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
from typing import List, Setimport hashlib class ExistenceSearchPatterns: """ Three approaches to existence search with different trade-offs. """ # Approach 1: Hash Set — O(1) average, O(n) space @staticmethod def hash_set_existence(collection: List[int], target: int) -> bool: """ Convert to hash set for O(1) lookups. Trade-off: O(n) preprocessing, O(n) space Best when: Many queries on same collection """ lookup_set: Set[int] = set(collection) return target in lookup_set # O(1) average case # Approach 2: Sorted + Binary Search — O(log n) time, O(1) extra space @staticmethod def binary_search_existence(sorted_arr: List[int], target: int) -> bool: """ Binary search on sorted data. Trade-off: Requires sorted input or O(n log n) preprocessing Best when: Data naturally sorted, memory constrained """ left, right = 0, len(sorted_arr) - 1 while left <= right: mid = (left + right) // 2 if sorted_arr[mid] == target: return True # Found! Early termination elif sorted_arr[mid] < target: left = mid + 1 else: right = mid - 1 return False # Approach 3: Bloom Filter — O(1), O(m) space, probabilistic @staticmethod def bloom_filter_existence( items: List[str], target: str, filter_size: int = 1000 ) -> str: """ Probabilistic existence with no false negatives. Trade-off: May have false positives, but never false negatives Best when: Huge datasets, can tolerate false positives Returns: "definitely_not", "probably_yes" """ # Initialize bit array bit_array = [False] * filter_size # Hash functions (simplified - real implementation uses multiple) def hash1(s: str) -> int: return int(hashlib.md5(s.encode()).hexdigest(), 16) % filter_size def hash2(s: str) -> int: return int(hashlib.sha1(s.encode()).hexdigest(), 16) % filter_size # Build filter for item in items: bit_array[hash1(item)] = True bit_array[hash2(item)] = True # Query if bit_array[hash1(target)] and bit_array[hash2(target)]: return "probably_yes" return "definitely_not" # 100% certain of non-existenceBloom filters are remarkable for existence queries: they use only bits (not full elements), support O(1) operations, and guarantee no false negatives. If the filter says 'not present,' it's 100% correct. If it says 'present,' there's a small probability of false positive. This trade-off is perfect for caches, spell checkers, and network routers where false positives are acceptable but false negatives are not.
Location search returns where a target exists, not merely that it exists. This seemingly small difference has significant algorithmic implications.
| Variant | Description | Use Case | Example |
|---|---|---|---|
| First Occurrence | Return leftmost/earliest position | Canonical position needed | First mention of term in document |
| Last Occurrence | Return rightmost/latest position | Most recent version | Last error in log file |
| Any Occurrence | Return any valid position | Existence proof with evidence | Find any matching record |
| Closest Occurrence | Return nearest match to given position | Context-aware search | Nearest gas station to location |
| Best Occurrence | Return optimal by some metric | Ranked results | Highest-rated match |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
from typing import List, Optional def find_first_occurrence(sorted_arr: List[int], target: int) -> int: """ Find the leftmost (first) occurrence of target in sorted array. Returns -1 if not found. Key insight: Don't stop when found—continue searching left. Time: O(log n) Space: O(1) """ left, right = 0, len(sorted_arr) - 1 result = -1 while left <= right: mid = (left + right) // 2 if sorted_arr[mid] == target: result = mid # Record this occurrence right = mid - 1 # But continue searching LEFT for earlier one elif sorted_arr[mid] < target: left = mid + 1 else: right = mid - 1 return result def find_last_occurrence(sorted_arr: List[int], target: int) -> int: """ Find the rightmost (last) occurrence of target in sorted array. Returns -1 if not found. Mirror of first occurrence: continue searching right after finding. Time: O(log n) Space: O(1) """ left, right = 0, len(sorted_arr) - 1 result = -1 while left <= right: mid = (left + right) // 2 if sorted_arr[mid] == target: result = mid # Record this occurrence left = mid + 1 # But continue searching RIGHT for later one elif sorted_arr[mid] < target: left = mid + 1 else: right = mid - 1 return result def find_closest_occurrence(sorted_arr: List[int], target: int) -> int: """ Find the position of element closest in value to target. If tie, return the smaller element's position. Uses binary search to find insertion point, then checks neighbors. Time: O(log n) Space: O(1) """ if not sorted_arr: return -1 left, right = 0, len(sorted_arr) - 1 # Binary search for insertion point while left < right: mid = (left + right) // 2 if sorted_arr[mid] < target: left = mid + 1 else: right = mid # Now 'left' is the insertion point # Compare with neighbor to find closest if left == 0: return 0 if left == len(sorted_arr): return len(sorted_arr) - 1 # Compare distances to left and right neighbors if abs(sorted_arr[left] - target) < abs(sorted_arr[left - 1] - target): return left return left - 1Notice how finding the first occurrence differs from standard binary search: when we find a match, we don't stop—we continue searching the left half for an even earlier occurrence. This 'don't stop on success' pattern is counterintuitive but essential for precise location searches with duplicates.
When we need all matches, the problem fundamentally changes. We cannot early-terminate, and our complexity becomes inherently tied to the output size.
For existence and location searches, complexity is independent of how many matches exist:
For all occurrences, if there are k matches:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
from typing import List, Generatorfrom bisect import bisect_left, bisect_right def find_all_linear(arr: List[int], target: int) -> List[int]: """ Find all occurrences via linear scan. Time: O(n) regardless of structure Space: O(k) where k = number of matches Works on any array, sorted or not. """ return [i for i, val in enumerate(arr) if val == target] def find_all_sorted(sorted_arr: List[int], target: int) -> List[int]: """ Find all occurrences in sorted array using binary search bounds. Time: O(log n + k) where k = number of matches Space: O(k) for output Key insight: Find first and last occurrence, then enumerate the range. """ # Find leftmost occurrence left_idx = bisect_left(sorted_arr, target) # Check if target exists if left_idx == len(sorted_arr) or sorted_arr[left_idx] != target: return [] # Find rightmost occurrence + 1 right_idx = bisect_right(sorted_arr, target) # Enumerate all positions in [left_idx, right_idx) return list(range(left_idx, right_idx)) def find_all_generator(arr: List[int], target: int) -> Generator[int, None, None]: """ Generator version for memory-efficient enumeration. Yields indices lazily instead of building full list. Useful when k might be very large or we might stop early. Time: O(n) total if consumed fully Space: O(1) constant—doesn't store results """ for i, val in enumerate(arr): if val == target: yield i # Lazy production of results # Example: Difference in memory behaviordef demonstrate_memory_difference(): """ Shows why generators matter for large result sets. """ huge_array = [42] * 10_000_000 # 10 million copies of 42 target = 42 # Approach 1: Build full list (dangerous for memory!) # all_indices = find_all_linear(huge_array, target) # This would create a list of 10 million integers # Approach 2: Generator (memory safe) for idx in find_all_generator(huge_array, target): if idx >= 5: break # Only needed first few—saved from creating huge list print(f"Found at {idx}")When asking for 'all matches,' consider: how many could there be? In an array of n elements all equal to the target, finding all means returning n indices—the 'search' becomes copying the entire array. Design your interface to handle this: consider pagination, streaming (generators), or early-termination limits.
1. Pagination: Return results in chunks (LIMIT/OFFSET in SQL)
def find_all_paginated(arr, target, offset=0, limit=100):
results = []
found = 0
for i, val in enumerate(arr):
if val == target:
if found >= offset:
results.append(i)
if len(results) >= limit:
break
found += 1
return results
2. Streaming: Yield results as generated (Python generators, iterators)
3. Count First: Return count before enumeration to set expectations
4. Sampling: Return random sample of matches for very large result sets
Understanding which question to ask is itself a skill. Often, the stated requirement can be satisfied with a simpler question than assumed.
| Stated Requirement | Minimum Sufficient Question | Why? |
|---|---|---|
| Check if user exists before creating | Existence | Don't need their location, just yes/no |
| Retrieve user by ID | Location | Need the position to access data |
| Count matching records | All occurrences (but only count) | Can modify to count without storing positions |
| Display search results page | Top-K of all occurrences | Paginated subset, not true 'all' |
| Validate no duplicates exist | First two occurrences sufficient | If count > 1, reject—don't need full list |
| Merge duplicate entries | All occurrences | Must know every location to merge |
Always ask: 'Can I answer with less?' Existence is cheaper than location, location is cheaper than enumeration. If you can satisfy requirements with a simpler question, do so. Complexity saved at the API level propagates through the entire system.
Different programming languages provide different primitives for each search type. Understanding these built-ins prevents reinventing wheels poorly.
12345678910111213141516171819202122232425262728
# Python search idioms arr = [10, 20, 30, 20, 40, 20, 50]target = 20 # EXISTENCE: 'in' operator (linear for lists, O(1) for sets)exists = target in arr # Trueexists_set = target in set(arr) # O(1) after set construction # SINGLE LOCATION: index() method (first occurrence)try: first_idx = arr.index(target) # 1except ValueError: first_idx = -1 # ALL OCCURRENCES: list comprehension or filterall_indices = [i for i, x in enumerate(arr) if x == target] # [1, 3, 5] # COUNT (variant of all): count() methodoccurrence_count = arr.count(target) # 3 # For sorted data: bisect modulefrom bisect import bisect_left, bisect_rightsorted_arr = sorted(arr) # [10, 20, 20, 20, 30, 40, 50] left = bisect_left(sorted_arr, target) # 1 (first 20)right = bisect_right(sorted_arr, target) # 4 (after last 20)count_sorted = right - left # 3 occurrences12345678910111213141516171819202122232425262728293031
// JavaScript search idioms const arr = [10, 20, 30, 20, 40, 20, 50];const target = 20; // EXISTENCE: includes() for arrays, has() for Setsconst exists = arr.includes(target); // trueconst existsSet = new Set(arr).has(target); // true (O(1) lookup) // SINGLE LOCATION: indexOf() (first), lastIndexOf() (last)const firstIdx = arr.indexOf(target); // 1const lastIdx = arr.lastIndexOf(target); // 5 // ALL OCCURRENCES: reduce or filter with indicesconst allIndices = arr.reduce((acc, val, idx) => { if (val === target) acc.push(idx); return acc;}, []); // [1, 3, 5] // ALTERNATIVE: filter on entriesconst allIndices2 = [...arr.entries()] .filter(([_, val]) => val === target) .map(([idx, _]) => idx); // FIND (first element matching predicate, not index)const found = arr.find(x => x > 25); // 30const foundIdx = arr.findIndex(x => x > 25); // 2 // ES2023: findLast, findLastIndex (last matching predicate)const lastOver25 = arr.findLast(x => x > 25); // 50const lastOver25Idx = arr.findLastIndex(x => x > 25); // 6Know your language's standard library. Reimplementing indexOf() incorrectly is a common source of bugs. Importantly, understand the complexity: JavaScript's includes() is O(n) on arrays but O(1) on Sets. Python's 'in' is O(n) for lists but O(1) for sets and dicts. Using the wrong collection type can hide linear scans.
Let's consolidate the complexity analysis for each search type across common data structures.
| Data Structure | Existence | First Location | All Occurrences (k matches) |
|---|---|---|---|
| Unsorted Array | O(n) | O(n) | O(n) |
| Sorted Array | O(log n) | O(log n) | O(log n + k) |
| Hash Set/Map | O(1) avg | O(1) avg | O(k) if bucketed |
| Balanced BST | O(log n) | O(log n) | O(log n + k) |
| Linked List | O(n) | O(n) | O(n) |
| Bloom Filter | O(1) | N/A | N/A |
| Trie (strings) | O(m)* | O(m)* | O(m + k)* |
*m = length of search string, not number of elements
Key Observations:
Existence ≤ Location ≤ All: Complexity never decreases as we demand more information.
Structure dictates floor: Unsorted data cannot be searched faster than O(n) for any question type without preprocessing.
Hash excels at existence: When you only need yes/no, hash-based structures provide unmatched O(1) performance.
'All' has output-dependent term: The k in O(log n + k) reflects that reporting k results requires at least k time. No algorithm can do better.
Specialized structures for specialized questions: Bloom filters sacrifice location capability for space-efficient existence.
Be cautious when a requirement says 'find all.' Even with O(log n + k) complexity, if k = n (all elements match), you're back to O(n). Always bound or validate expected result sizes. Unbounded 'find all' in production systems can cause resource exhaustion.
The distinction between existence, location, and all-occurrences searches is fundamental to algorithm design. Let's consolidate the key insights:
What's Next:
With existence, location, and enumeration understood, we'll next examine success and failure conditions—how to precisely define when a search has succeeded, when it has definitively failed, and how to communicate these outcomes reliably.
You now understand the three fundamental types of search results and how each type affects algorithm design, complexity, and implementation. This knowledge helps you choose the right search primitive for each requirement. Next, we'll explore how to define and handle success and failure in search algorithms.