Search Problem - Learning Module

Loading content...

0/276

Existence vs Location vs All Occurrences

Not All Searches Ask the Same Question

Consider three superficially similar questions:

"Is the word 'algorithm' in this book?" — A yes/no answer suffices.
"Where is the word 'algorithm' in this book?" — We need a page number.
"Show me every occurrence of 'algorithm' in this book." — We need all page numbers.

These questions look alike, but they have fundamentally different computational requirements. The first can sometimes be answered in O(1) time with preprocessing; the second typically requires O(log n) with sorted data; the third inherently requires O(k) where k is the number of matches—we cannot avoid examining each match at least once to report it.

Understanding these distinctions is crucial for designing efficient algorithms and choosing appropriate data structures.

What You Will Learn

By the end of this page, you will understand the three fundamental result types in search—existence, location, and enumeration—and how each type affects algorithm design, complexity analysis, and data structure selection. You'll see why these aren't just variations on a theme but fundamentally different computational problems.

The Three Questions of Search

Every search algorithm answers at least one of three fundamental questions:

Question 1: Does It Exist? (Existence Search)

Output: Boolean (true/false)

Example Queries:

"Is 42 in this array?"
"Does this graph contain a cycle?"
"Is this word in the dictionary?"

Algorithmic Implications:

Can terminate immediately upon finding any match
False negatives require exhaustive checking (unless structure guarantees)
Bloom filters provide probabilistic existence with no false negatives

Question 2: Where Is It? (Location Search)

Output: Index, position, pointer, or path

Example Queries:

"At what index does 42 appear?"
"What is the path from A to B in this graph?"
"What memory address holds this object?"

Algorithmic Implications:

Must track position during search, not just match status
May need to preserve the search path for reconstruction
"Not found" is typically represented as null, -1, or empty path

Question 3: What Are All of Them? (Enumeration Search)

Output: Collection of all matches (indices, elements, or paths)

Example Queries:

"Find all positions where 42 appears."
"List all paths from A to B."
"Return all users matching these criteria."

Algorithmic Implications:

Cannot use early termination—must continue searching after each match
Output size depends on data, not just input size
Complexity is often O(n) plus O(output size)

Search Question Characteristics Comparison
Characteristic	Existence	Location	All Occurrences
Return Type	Boolean	Index/Position (or null)	Collection of indices
Can Early Terminate?	Yes, on first match	Yes, on first match	No—must find all
Minimum Work (exists)	O(1) if lucky	O(1) if lucky	O(k) where k = count
Minimum Work (not exists)	Depends on structure	Depends on structure	Must verify exhaustively
Output Size	O(1) fixed	O(1) fixed	O(k) variable
Memory for Output	Constant	Constant	Proportional to matches

Existence Search in Depth

Existence search asks the simplest question: is the target present? Yet this simplicity enables sophisticated optimizations impossible for other search types.

Why Existence Is Special

Because we only need a boolean answer, we can:

Use probabilistic structures — Bloom filters trade perfect accuracy for massive space savings
Employ hashing — O(1) average-case lookups with hash sets
Accept approximations — Locality-sensitive hashing for approximate membership
Terminate early — Stop as soon as we find evidence of existence

existence-patterns.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
from typing import List, Set
import hashlib
 
class ExistenceSearchPatterns:
    """
    Three approaches to existence search with different trade-offs.
    """
    
    # Approach 1: Hash Set — O(1) average, O(n) space
    @staticmethod
    def hash_set_existence(collection: List[int], target: int) -> bool:
        """
        Convert to hash set for O(1) lookups.
        
        Trade-off: O(n) preprocessing, O(n) space
        Best when: Many queries on same collection
        """
        lookup_set: Set[int] = set(collection)
        return target in lookup_set  # O(1) average case
    
    # Approach 2: Sorted + Binary Search — O(log n) time, O(1) extra space
    @staticmethod
    def binary_search_existence(sorted_arr: List[int], target: int) -> bool:
        """
        Binary search on sorted data.
        
        Trade-off: Requires sorted input or O(n log n) preprocessing
        Best when: Data naturally sorted, memory constrained
        """
        left, right = 0, len(sorted_arr) - 1
        while left <= right:
            mid = (left + right) // 2
            if sorted_arr[mid] == target:
                return True  # Found! Early termination
            elif sorted_arr[mid] < target:
                left = mid + 1
            else:
                right = mid - 1
        return False
    
    # Approach 3: Bloom Filter — O(1), O(m) space, probabilistic
    @staticmethod
    def bloom_filter_existence(
        items: List[str], 
        target: str, 
        filter_size: int = 1000
    ) -> str:
        """
        Probabilistic existence with no false negatives.
        
        Trade-off: May have false positives, but never false negatives
        Best when: Huge datasets, can tolerate false positives
        
        Returns: "definitely_not", "probably_yes"
        """
        # Initialize bit array
        bit_array = [False] * filter_size
        
        # Hash functions (simplified - real implementation uses multiple)
        def hash1(s: str) -> int:
            return int(hashlib.md5(s.encode()).hexdigest(), 16) % filter_size
        def hash2(s: str) -> int:
            return int(hashlib.sha1(s.encode()).hexdigest(), 16) % filter_size
        
        # Build filter
        for item in items:
            bit_array[hash1(item)] = True
            bit_array[hash2(item)] = True
        
        # Query
        if bit_array[hash1(target)] and bit_array[hash2(target)]:
            return "probably_yes"
        return "definitely_not"  # 100% certain of non-existence

The Bloom Filter Insight

Bloom filters are remarkable for existence queries: they use only bits (not full elements), support O(1) operations, and guarantee no false negatives. If the filter says 'not present,' it's 100% correct. If it says 'present,' there's a small probability of false positive. This trade-off is perfect for caches, spell checkers, and network routers where false positives are acceptable but false negatives are not.

Location Search in Depth

Location search returns where a target exists, not merely that it exists. This seemingly small difference has significant algorithmic implications.

Why Location Is Harder Than Existence

Must preserve context — Not just "found it" but "found it here"
Index stability matters — If the structure changes, indices may invalidate
Representation cost — Must return something richer than a boolean
Disambiguation — What if multiple locations exist? Which one do we return?

Location Search Variants
Variant	Description	Use Case	Example
First Occurrence	Return leftmost/earliest position	Canonical position needed	First mention of term in document
Last Occurrence	Return rightmost/latest position	Most recent version	Last error in log file
Any Occurrence	Return any valid position	Existence proof with evidence	Find any matching record
Closest Occurrence	Return nearest match to given position	Context-aware search	Nearest gas station to location
Best Occurrence	Return optimal by some metric	Ranked results	Highest-rated match

location-variants.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
from typing import List, Optional
 
def find_first_occurrence(sorted_arr: List[int], target: int) -> int:
    """
    Find the leftmost (first) occurrence of target in sorted array.
    Returns -1 if not found.
    
    Key insight: Don't stop when found—continue searching left.
    
    Time: O(log n)
    Space: O(1)
    """
    left, right = 0, len(sorted_arr) - 1
    result = -1
    
    while left <= right:
        mid = (left + right) // 2
        if sorted_arr[mid] == target:
            result = mid        # Record this occurrence
            right = mid - 1     # But continue searching LEFT for earlier one
        elif sorted_arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return result
 
 
def find_last_occurrence(sorted_arr: List[int], target: int) -> int:
    """
    Find the rightmost (last) occurrence of target in sorted array.
    Returns -1 if not found.
    
    Mirror of first occurrence: continue searching right after finding.
    
    Time: O(log n)
    Space: O(1)
    """
    left, right = 0, len(sorted_arr) - 1
    result = -1
    
    while left <= right:
        mid = (left + right) // 2
        if sorted_arr[mid] == target:
            result = mid        # Record this occurrence
            left = mid + 1      # But continue searching RIGHT for later one
        elif sorted_arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return result
 
 
def find_closest_occurrence(sorted_arr: List[int], target: int) -> int:
    """
    Find the position of element closest in value to target.
    If tie, return the smaller element's position.
    
    Uses binary search to find insertion point, then checks neighbors.
    
    Time: O(log n)
    Space: O(1)
    """
    if not sorted_arr:
        return -1
    
    left, right = 0, len(sorted_arr) - 1
    
    # Binary search for insertion point
    while left < right:
        mid = (left + right) // 2
        if sorted_arr[mid] < target:
            left = mid + 1
        else:
            right = mid
    
    # Now 'left' is the insertion point
    # Compare with neighbor to find closest
    if left == 0:
        return 0
    if left == len(sorted_arr):
        return len(sorted_arr) - 1
    
    # Compare distances to left and right neighbors
    if abs(sorted_arr[left] - target) < abs(sorted_arr[left - 1] - target):
        return left
    return left - 1

The Subtle Art of First/Last

Notice how finding the first occurrence differs from standard binary search: when we find a match, we don't stop—we continue searching the left half for an even earlier occurrence. This 'don't stop on success' pattern is counterintuitive but essential for precise location searches with duplicates.

Finding All Occurrences

When we need all matches, the problem fundamentally changes. We cannot early-terminate, and our complexity becomes inherently tied to the output size.

The Output-Sensitive Nature

For existence and location searches, complexity is independent of how many matches exist:

Existence in sorted array: O(log n) whether 1 match or 1 million
First location in sorted array: O(log n) regardless of duplicate count

For all occurrences, if there are k matches:

We must output k results: Ω(k) lower bound
Best case: O(log n + k) with sorted data
Worst case: O(n) if matches are scattered

all-occurrences-strategies.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
from typing import List, Generator
from bisect import bisect_left, bisect_right
 
def find_all_linear(arr: List[int], target: int) -> List[int]:
    """
    Find all occurrences via linear scan.
    
    Time: O(n) regardless of structure
    Space: O(k) where k = number of matches
    
    Works on any array, sorted or not.
    """
    return [i for i, val in enumerate(arr) if val == target]
 
 
def find_all_sorted(sorted_arr: List[int], target: int) -> List[int]:
    """
    Find all occurrences in sorted array using binary search bounds.
    
    Time: O(log n + k) where k = number of matches
    Space: O(k) for output
    
    Key insight: Find first and last occurrence, then enumerate the range.
    """
    # Find leftmost occurrence
    left_idx = bisect_left(sorted_arr, target)
    
    # Check if target exists
    if left_idx == len(sorted_arr) or sorted_arr[left_idx] != target:
        return []
    
    # Find rightmost occurrence + 1
    right_idx = bisect_right(sorted_arr, target)
    
    # Enumerate all positions in [left_idx, right_idx)
    return list(range(left_idx, right_idx))
 
 
def find_all_generator(arr: List[int], target: int) -> Generator[int, None, None]:
    """
    Generator version for memory-efficient enumeration.
    
    Yields indices lazily instead of building full list.
    Useful when k might be very large or we might stop early.
    
    Time: O(n) total if consumed fully
    Space: O(1) constant—doesn't store results
    """
    for i, val in enumerate(arr):
        if val == target:
            yield i  # Lazy production of results
 
 
# Example: Difference in memory behavior
def demonstrate_memory_difference():
    """
    Shows why generators matter for large result sets.
    """
    huge_array = [42] * 10_000_000  # 10 million copies of 42
    target = 42
    
    # Approach 1: Build full list (dangerous for memory!)
    # all_indices = find_all_linear(huge_array, target)
    # This would create a list of 10 million integers
    
    # Approach 2: Generator (memory safe)
    for idx in find_all_generator(huge_array, target):
        if idx >= 5:
            break  # Only needed first few—saved from creating huge list
        print(f"Found at {idx}")

The Output Size Problem

When asking for 'all matches,' consider: how many could there be? In an array of n elements all equal to the target, finding all means returning n indices—the 'search' becomes copying the entire array. Design your interface to handle this: consider pagination, streaming (generators), or early-termination limits.

Strategies for Managing Large Result Sets

1. Pagination: Return results in chunks (LIMIT/OFFSET in SQL)

def find_all_paginated(arr, target, offset=0, limit=100):
    results = []
    found = 0
    for i, val in enumerate(arr):
        if val == target:
            if found >= offset:
                results.append(i)
                if len(results) >= limit:
                    break
            found += 1
    return results

2. Streaming: Yield results as generated (Python generators, iterators)

3. Count First: Return count before enumeration to set expectations

4. Sampling: Return random sample of matches for very large result sets

Comparative Analysis: Choosing the Right Question

Understanding which question to ask is itself a skill. Often, the stated requirement can be satisfied with a simpler question than assumed.

Analysis: What Do You Actually Need?

Mapping Requirements to Search Types
Stated Requirement	Minimum Sufficient Question	Why?
Check if user exists before creating	Existence	Don't need their location, just yes/no
Retrieve user by ID	Location	Need the position to access data
Count matching records	All occurrences (but only count)	Can modify to count without storing positions
Display search results page	Top-K of all occurrences	Paginated subset, not true 'all'
Validate no duplicates exist	First two occurrences sufficient	If count > 1, reject—don't need full list
Merge duplicate entries	All occurrences	Must know every location to merge

When to Use Existence

•Guard conditions before operations
•Duplicate detection (just need yes/no)
•Membership testing in sets
•Validation that element is present
•Cache hit/miss determination
•Feature flag checks

When to Use Location

•Retrieving records by key
•In-place updates to found element
•Deletion of specific element
•Navigation to specific state
•Linking to found item
•Replacement or modification

The Refinement Principle

Always ask: 'Can I answer with less?' Existence is cheaper than location, location is cheaper than enumeration. If you can satisfy requirements with a simpler question, do so. Complexity saved at the API level propagates through the entire system.

Implementation Patterns Across Languages

Different programming languages provide different primitives for each search type. Understanding these built-ins prevents reinventing wheels poorly.

Standard Library Patterns

search-idioms.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Python search idioms
 
arr = [10, 20, 30, 20, 40, 20, 50]
target = 20
 
# EXISTENCE: 'in' operator (linear for lists, O(1) for sets)
exists = target in arr              # True
exists_set = target in set(arr)     # O(1) after set construction
 
# SINGLE LOCATION: index() method (first occurrence)
try:
    first_idx = arr.index(target)   # 1
except ValueError:
    first_idx = -1
 
# ALL OCCURRENCES: list comprehension or filter
all_indices = [i for i, x in enumerate(arr) if x == target]  # [1, 3, 5]
 
# COUNT (variant of all): count() method
occurrence_count = arr.count(target)  # 3
 
# For sorted data: bisect module
from bisect import bisect_left, bisect_right
sorted_arr = sorted(arr)  # [10, 20, 20, 20, 30, 40, 50]
 
left = bisect_left(sorted_arr, target)   # 1 (first 20)
right = bisect_right(sorted_arr, target) # 4 (after last 20)
count_sorted = right - left              # 3 occurrences

search-idioms.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// JavaScript search idioms
 
const arr = [10, 20, 30, 20, 40, 20, 50];
const target = 20;
 
// EXISTENCE: includes() for arrays, has() for Sets
const exists = arr.includes(target);              // true
const existsSet = new Set(arr).has(target);       // true (O(1) lookup)
 
// SINGLE LOCATION: indexOf() (first), lastIndexOf() (last)
const firstIdx = arr.indexOf(target);             // 1
const lastIdx = arr.lastIndexOf(target);          // 5
 
// ALL OCCURRENCES: reduce or filter with indices
const allIndices = arr.reduce((acc, val, idx) => {
    if (val === target) acc.push(idx);
    return acc;
}, []);  // [1, 3, 5]
 
// ALTERNATIVE: filter on entries
const allIndices2 = [...arr.entries()]
    .filter(([_, val]) => val === target)
    .map(([idx, _]) => idx);
 
// FIND (first element matching predicate, not index)
const found = arr.find(x => x > 25);              // 30
const foundIdx = arr.findIndex(x => x > 25);      // 2
 
// ES2023: findLast, findLastIndex (last matching predicate)
const lastOver25 = arr.findLast(x => x > 25);     // 50
const lastOver25Idx = arr.findLastIndex(x => x > 25);  // 6

Library Awareness

Know your language's standard library. Reimplementing indexOf() incorrectly is a common source of bugs. Importantly, understand the complexity: JavaScript's includes() is O(n) on arrays but O(1) on Sets. Python's 'in' is O(n) for lists but O(1) for sets and dicts. Using the wrong collection type can hide linear scans.

Time Complexity Comparison

Let's consolidate the complexity analysis for each search type across common data structures.

Time Complexity by Data Structure and Search Type
Data Structure	Existence	First Location	All Occurrences (k matches)
Unsorted Array	O(n)	O(n)	O(n)
Sorted Array	O(log n)	O(log n)	O(log n + k)
Hash Set/Map	O(1) avg	O(1) avg	O(k) if bucketed
Balanced BST	O(log n)	O(log n)	O(log n + k)
Linked List	O(n)	O(n)	O(n)
Bloom Filter	O(1)	N/A	N/A
Trie (strings)	O(m)*	O(m)*	O(m + k)*

*m = length of search string, not number of elements

Key Observations:

Existence ≤ Location ≤ All: Complexity never decreases as we demand more information.
Structure dictates floor: Unsorted data cannot be searched faster than O(n) for any question type without preprocessing.
Hash excels at existence: When you only need yes/no, hash-based structures provide unmatched O(1) performance.
'All' has output-dependent term: The k in O(log n + k) reflects that reporting k results requires at least k time. No algorithm can do better.
Specialized structures for specialized questions: Bloom filters sacrifice location capability for space-efficient existence.

The Hidden Cost of 'All'

Be cautious when a requirement says 'find all.' Even with O(log n + k) complexity, if k = n (all elements match), you're back to O(n). Always bound or validate expected result sizes. Unbounded 'find all' in production systems can cause resource exhaustion.

Summary: Three Questions, Different Algorithms

The distinction between existence, location, and all-occurrences searches is fundamental to algorithm design. Let's consolidate the key insights:

Key Takeaways

•Three fundamental questions — Does it exist? Where is it? Where are all of them? Each requires different algorithmic treatment.
•Existence enables special optimizations — Probabilistic structures, early termination, and O(1) hash lookups are possible only for existence queries.
•Location requires context preservation — Must track position during search, not just match status. Consider which occurrence (first, last, any) when duplicates exist.
•All occurrences is output-sensitive — Complexity includes the result size k. Even efficient algorithms become slow when k is large.
•Ask the simplest sufficient question — Don't request all occurrences when existence suffices. API complexity propagates through systems.
•Know your language's idioms — Standard libraries provide optimized implementations. Understand their complexity characteristics.
•Manage large result sets — Use pagination, streaming, or early-termination limits for 'find all' operations in production.

What's Next:

With existence, location, and enumeration understood, we'll next examine success and failure conditions—how to precisely define when a search has succeeded, when it has definitively failed, and how to communicate these outcomes reliably.

Page Complete

You now understand the three fundamental types of search results and how each type affects algorithm design, complexity, and implementation. This knowledge helps you choose the right search primitive for each requirement. Next, we'll explore how to define and handle success and failure in search algorithms.

Existence vs Location vs All Occurrences

Not All Searches Ask the Same Question

Consider three superficially similar questions:

"Is the word 'algorithm' in this book?" — A yes/no answer suffices.
"Where is the word 'algorithm' in this book?" — We need a page number.
"Show me every occurrence of 'algorithm' in this book." — We need all page numbers.

Understanding these distinctions is crucial for designing efficient algorithms and choosing appropriate data structures.

What You Will Learn

The Three Questions of Search

Every search algorithm answers at least one of three fundamental questions:

Question 1: Does It Exist? (Existence Search)

Output: Boolean (true/false)

Example Queries:

"Is 42 in this array?"
"Does this graph contain a cycle?"
"Is this word in the dictionary?"

Algorithmic Implications:

Can terminate immediately upon finding any match
False negatives require exhaustive checking (unless structure guarantees)
Bloom filters provide probabilistic existence with no false negatives

Question 2: Where Is It? (Location Search)

Output: Index, position, pointer, or path

Example Queries:

"At what index does 42 appear?"
"What is the path from A to B in this graph?"
"What memory address holds this object?"

Algorithmic Implications:

Must track position during search, not just match status
May need to preserve the search path for reconstruction
"Not found" is typically represented as null, -1, or empty path

Question 3: What Are All of Them? (Enumeration Search)

Output: Collection of all matches (indices, elements, or paths)

Example Queries:

"Find all positions where 42 appears."
"List all paths from A to B."
"Return all users matching these criteria."

Algorithmic Implications:

Cannot use early termination—must continue searching after each match
Output size depends on data, not just input size
Complexity is often O(n) plus O(output size)

Search Question Characteristics Comparison
Characteristic	Existence	Location	All Occurrences
Return Type	Boolean	Index/Position (or null)	Collection of indices
Can Early Terminate?	Yes, on first match	Yes, on first match	No—must find all
Minimum Work (exists)	O(1) if lucky	O(1) if lucky	O(k) where k = count
Minimum Work (not exists)	Depends on structure	Depends on structure	Must verify exhaustively
Output Size	O(1) fixed	O(1) fixed	O(k) variable
Memory for Output	Constant	Constant	Proportional to matches

Existence Search in Depth

Existence search asks the simplest question: is the target present? Yet this simplicity enables sophisticated optimizations impossible for other search types.

Why Existence Is Special

Because we only need a boolean answer, we can:

Use probabilistic structures — Bloom filters trade perfect accuracy for massive space savings
Employ hashing — O(1) average-case lookups with hash sets
Accept approximations — Locality-sensitive hashing for approximate membership
Terminate early — Stop as soon as we find evidence of existence

existence-patterns.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
from typing import List, Set
import hashlib
 
class ExistenceSearchPatterns:
    """
    Three approaches to existence search with different trade-offs.
    """
    
    # Approach 1: Hash Set — O(1) average, O(n) space
    @staticmethod
    def hash_set_existence(collection: List[int], target: int) -> bool:
        """
        Convert to hash set for O(1) lookups.
        
        Trade-off: O(n) preprocessing, O(n) space
        Best when: Many queries on same collection
        """
        lookup_set: Set[int] = set(collection)
        return target in lookup_set  # O(1) average case
    
    # Approach 2: Sorted + Binary Search — O(log n) time, O(1) extra space
    @staticmethod
    def binary_search_existence(sorted_arr: List[int], target: int) -> bool:
        """
        Binary search on sorted data.
        
        Trade-off: Requires sorted input or O(n log n) preprocessing
        Best when: Data naturally sorted, memory constrained
        """
        left, right = 0, len(sorted_arr) - 1
        while left <= right:
            mid = (left + right) // 2
            if sorted_arr[mid] == target:
                return True  # Found! Early termination
            elif sorted_arr[mid] < target:
                left = mid + 1
            else:
                right = mid - 1
        return False
    
    # Approach 3: Bloom Filter — O(1), O(m) space, probabilistic
    @staticmethod
    def bloom_filter_existence(
        items: List[str], 
        target: str, 
        filter_size: int = 1000
    ) -> str:
        """
        Probabilistic existence with no false negatives.
        
        Trade-off: May have false positives, but never false negatives
        Best when: Huge datasets, can tolerate false positives
        
        Returns: "definitely_not", "probably_yes"
        """
        # Initialize bit array
        bit_array = [False] * filter_size
        
        # Hash functions (simplified - real implementation uses multiple)
        def hash1(s: str) -> int:
            return int(hashlib.md5(s.encode()).hexdigest(), 16) % filter_size
        def hash2(s: str) -> int:
            return int(hashlib.sha1(s.encode()).hexdigest(), 16) % filter_size
        
        # Build filter
        for item in items:
            bit_array[hash1(item)] = True
            bit_array[hash2(item)] = True
        
        # Query
        if bit_array[hash1(target)] and bit_array[hash2(target)]:
            return "probably_yes"
        return "definitely_not"  # 100% certain of non-existence

The Bloom Filter Insight

Location Search in Depth

Location search returns where a target exists, not merely that it exists. This seemingly small difference has significant algorithmic implications.

Why Location Is Harder Than Existence

Must preserve context — Not just "found it" but "found it here"
Index stability matters — If the structure changes, indices may invalidate
Representation cost — Must return something richer than a boolean
Disambiguation — What if multiple locations exist? Which one do we return?

Location Search Variants
Variant	Description	Use Case	Example
First Occurrence	Return leftmost/earliest position	Canonical position needed	First mention of term in document
Last Occurrence	Return rightmost/latest position	Most recent version	Last error in log file
Any Occurrence	Return any valid position	Existence proof with evidence	Find any matching record
Closest Occurrence	Return nearest match to given position	Context-aware search	Nearest gas station to location
Best Occurrence	Return optimal by some metric	Ranked results	Highest-rated match

location-variants.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
from typing import List, Optional
 
def find_first_occurrence(sorted_arr: List[int], target: int) -> int:
    """
    Find the leftmost (first) occurrence of target in sorted array.
    Returns -1 if not found.
    
    Key insight: Don't stop when found—continue searching left.
    
    Time: O(log n)
    Space: O(1)
    """
    left, right = 0, len(sorted_arr) - 1
    result = -1
    
    while left <= right:
        mid = (left + right) // 2
        if sorted_arr[mid] == target:
            result = mid        # Record this occurrence
            right = mid - 1     # But continue searching LEFT for earlier one
        elif sorted_arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return result
 
 
def find_last_occurrence(sorted_arr: List[int], target: int) -> int:
    """
    Find the rightmost (last) occurrence of target in sorted array.
    Returns -1 if not found.
    
    Mirror of first occurrence: continue searching right after finding.
    
    Time: O(log n)
    Space: O(1)
    """
    left, right = 0, len(sorted_arr) - 1
    result = -1
    
    while left <= right:
        mid = (left + right) // 2
        if sorted_arr[mid] == target:
            result = mid        # Record this occurrence
            left = mid + 1      # But continue searching RIGHT for later one
        elif sorted_arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return result
 
 
def find_closest_occurrence(sorted_arr: List[int], target: int) -> int:
    """
    Find the position of element closest in value to target.
    If tie, return the smaller element's position.
    
    Uses binary search to find insertion point, then checks neighbors.
    
    Time: O(log n)
    Space: O(1)
    """
    if not sorted_arr:
        return -1
    
    left, right = 0, len(sorted_arr) - 1
    
    # Binary search for insertion point
    while left < right:
        mid = (left + right) // 2
        if sorted_arr[mid] < target:
            left = mid + 1
        else:
            right = mid
    
    # Now 'left' is the insertion point
    # Compare with neighbor to find closest
    if left == 0:
        return 0
    if left == len(sorted_arr):
        return len(sorted_arr) - 1
    
    # Compare distances to left and right neighbors
    if abs(sorted_arr[left] - target) < abs(sorted_arr[left - 1] - target):
        return left
    return left - 1

The Subtle Art of First/Last

Finding All Occurrences

When we need all matches, the problem fundamentally changes. We cannot early-terminate, and our complexity becomes inherently tied to the output size.

The Output-Sensitive Nature

For existence and location searches, complexity is independent of how many matches exist:

Existence in sorted array: O(log n) whether 1 match or 1 million
First location in sorted array: O(log n) regardless of duplicate count

For all occurrences, if there are k matches:

We must output k results: Ω(k) lower bound
Best case: O(log n + k) with sorted data
Worst case: O(n) if matches are scattered

all-occurrences-strategies.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
from typing import List, Generator
from bisect import bisect_left, bisect_right
 
def find_all_linear(arr: List[int], target: int) -> List[int]:
    """
    Find all occurrences via linear scan.
    
    Time: O(n) regardless of structure
    Space: O(k) where k = number of matches
    
    Works on any array, sorted or not.
    """
    return [i for i, val in enumerate(arr) if val == target]
 
 
def find_all_sorted(sorted_arr: List[int], target: int) -> List[int]:
    """
    Find all occurrences in sorted array using binary search bounds.
    
    Time: O(log n + k) where k = number of matches
    Space: O(k) for output
    
    Key insight: Find first and last occurrence, then enumerate the range.
    """
    # Find leftmost occurrence
    left_idx = bisect_left(sorted_arr, target)
    
    # Check if target exists
    if left_idx == len(sorted_arr) or sorted_arr[left_idx] != target:
        return []
    
    # Find rightmost occurrence + 1
    right_idx = bisect_right(sorted_arr, target)
    
    # Enumerate all positions in [left_idx, right_idx)
    return list(range(left_idx, right_idx))
 
 
def find_all_generator(arr: List[int], target: int) -> Generator[int, None, None]:
    """
    Generator version for memory-efficient enumeration.
    
    Yields indices lazily instead of building full list.
    Useful when k might be very large or we might stop early.
    
    Time: O(n) total if consumed fully
    Space: O(1) constant—doesn't store results
    """
    for i, val in enumerate(arr):
        if val == target:
            yield i  # Lazy production of results
 
 
# Example: Difference in memory behavior
def demonstrate_memory_difference():
    """
    Shows why generators matter for large result sets.
    """
    huge_array = [42] * 10_000_000  # 10 million copies of 42
    target = 42
    
    # Approach 1: Build full list (dangerous for memory!)
    # all_indices = find_all_linear(huge_array, target)
    # This would create a list of 10 million integers
    
    # Approach 2: Generator (memory safe)
    for idx in find_all_generator(huge_array, target):
        if idx >= 5:
            break  # Only needed first few—saved from creating huge list
        print(f"Found at {idx}")

The Output Size Problem

Strategies for Managing Large Result Sets

1. Pagination: Return results in chunks (LIMIT/OFFSET in SQL)

def find_all_paginated(arr, target, offset=0, limit=100):
    results = []
    found = 0
    for i, val in enumerate(arr):
        if val == target:
            if found >= offset:
                results.append(i)
                if len(results) >= limit:
                    break
            found += 1
    return results

2. Streaming: Yield results as generated (Python generators, iterators)

3. Count First: Return count before enumeration to set expectations

4. Sampling: Return random sample of matches for very large result sets

Comparative Analysis: Choosing the Right Question

Understanding which question to ask is itself a skill. Often, the stated requirement can be satisfied with a simpler question than assumed.

Analysis: What Do You Actually Need?

Mapping Requirements to Search Types
Stated Requirement	Minimum Sufficient Question	Why?
Check if user exists before creating	Existence	Don't need their location, just yes/no
Retrieve user by ID	Location	Need the position to access data
Count matching records	All occurrences (but only count)	Can modify to count without storing positions
Display search results page	Top-K of all occurrences	Paginated subset, not true 'all'
Validate no duplicates exist	First two occurrences sufficient	If count > 1, reject—don't need full list
Merge duplicate entries	All occurrences	Must know every location to merge

When to Use Existence

•Guard conditions before operations
•Duplicate detection (just need yes/no)
•Membership testing in sets
•Validation that element is present
•Cache hit/miss determination
•Feature flag checks

When to Use Location

•Retrieving records by key
•In-place updates to found element
•Deletion of specific element
•Navigation to specific state
•Linking to found item
•Replacement or modification

The Refinement Principle

Implementation Patterns Across Languages

Different programming languages provide different primitives for each search type. Understanding these built-ins prevents reinventing wheels poorly.

Standard Library Patterns

search-idioms.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Python search idioms
 
arr = [10, 20, 30, 20, 40, 20, 50]
target = 20
 
# EXISTENCE: 'in' operator (linear for lists, O(1) for sets)
exists = target in arr              # True
exists_set = target in set(arr)     # O(1) after set construction
 
# SINGLE LOCATION: index() method (first occurrence)
try:
    first_idx = arr.index(target)   # 1
except ValueError:
    first_idx = -1
 
# ALL OCCURRENCES: list comprehension or filter
all_indices = [i for i, x in enumerate(arr) if x == target]  # [1, 3, 5]
 
# COUNT (variant of all): count() method
occurrence_count = arr.count(target)  # 3
 
# For sorted data: bisect module
from bisect import bisect_left, bisect_right
sorted_arr = sorted(arr)  # [10, 20, 20, 20, 30, 40, 50]
 
left = bisect_left(sorted_arr, target)   # 1 (first 20)
right = bisect_right(sorted_arr, target) # 4 (after last 20)
count_sorted = right - left              # 3 occurrences

search-idioms.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// JavaScript search idioms
 
const arr = [10, 20, 30, 20, 40, 20, 50];
const target = 20;
 
// EXISTENCE: includes() for arrays, has() for Sets
const exists = arr.includes(target);              // true
const existsSet = new Set(arr).has(target);       // true (O(1) lookup)
 
// SINGLE LOCATION: indexOf() (first), lastIndexOf() (last)
const firstIdx = arr.indexOf(target);             // 1
const lastIdx = arr.lastIndexOf(target);          // 5
 
// ALL OCCURRENCES: reduce or filter with indices
const allIndices = arr.reduce((acc, val, idx) => {
    if (val === target) acc.push(idx);
    return acc;
}, []);  // [1, 3, 5]
 
// ALTERNATIVE: filter on entries
const allIndices2 = [...arr.entries()]
    .filter(([_, val]) => val === target)
    .map(([idx, _]) => idx);
 
// FIND (first element matching predicate, not index)
const found = arr.find(x => x > 25);              // 30
const foundIdx = arr.findIndex(x => x > 25);      // 2
 
// ES2023: findLast, findLastIndex (last matching predicate)
const lastOver25 = arr.findLast(x => x > 25);     // 50
const lastOver25Idx = arr.findLastIndex(x => x > 25);  // 6

Library Awareness

Time Complexity Comparison

Let's consolidate the complexity analysis for each search type across common data structures.

Time Complexity by Data Structure and Search Type
Data Structure	Existence	First Location	All Occurrences (k matches)
Unsorted Array	O(n)	O(n)	O(n)
Sorted Array	O(log n)	O(log n)	O(log n + k)
Hash Set/Map	O(1) avg	O(1) avg	O(k) if bucketed
Balanced BST	O(log n)	O(log n)	O(log n + k)
Linked List	O(n)	O(n)	O(n)
Bloom Filter	O(1)	N/A	N/A
Trie (strings)	O(m)*	O(m)*	O(m + k)*

*m = length of search string, not number of elements

Key Observations:

Existence ≤ Location ≤ All: Complexity never decreases as we demand more information.
Structure dictates floor: Unsorted data cannot be searched faster than O(n) for any question type without preprocessing.
Hash excels at existence: When you only need yes/no, hash-based structures provide unmatched O(1) performance.
'All' has output-dependent term: The k in O(log n + k) reflects that reporting k results requires at least k time. No algorithm can do better.
Specialized structures for specialized questions: Bloom filters sacrifice location capability for space-efficient existence.

The Hidden Cost of 'All'

Summary: Three Questions, Different Algorithms

The distinction between existence, location, and all-occurrences searches is fundamental to algorithm design. Let's consolidate the key insights:

Key Takeaways

•Three fundamental questions — Does it exist? Where is it? Where are all of them? Each requires different algorithmic treatment.
•Existence enables special optimizations — Probabilistic structures, early termination, and O(1) hash lookups are possible only for existence queries.
•Location requires context preservation — Must track position during search, not just match status. Consider which occurrence (first, last, any) when duplicates exist.
•All occurrences is output-sensitive — Complexity includes the result size k. Even efficient algorithms become slow when k is large.
•Ask the simplest sufficient question — Don't request all occurrences when existence suffices. API complexity propagates through systems.
•Know your language's idioms — Standard libraries provide optimized implementations. Understand their complexity characteristics.
•Manage large result sets — Use pagination, streaming, or early-termination limits for 'find all' operations in production.

What's Next:

Page Complete