Loading content...
Every search algorithm must answer a critical question: When are we done? This seems trivial—we're done when we find what we're looking for or when we've exhausted all possibilities. But consider the subtleties:
Precisely defining success and failure conditions is essential for correctness (does the algorithm actually work?), reliability (can callers trust the result?), and debuggability (when something goes wrong, can we diagnose it?).
By the end of this page, you will understand how to define precise termination conditions for searches, distinguish between different types of 'not found' results, handle edge cases robustly, and design APIs that communicate search outcomes clearly and unambiguously.
Success in search means finding an element that satisfies the target specification. But 'satisfies' requires careful definition.
For value-based targets, success is straightforward:
Search for: x where x = 42
Success: Found element equal to 42
Precision: Element must be *identical*, not merely similar
Pitfalls:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
import mathfrom typing import Any, Protocol, Callable # Pitfall 1: Floating-point comparisondef naive_float_search(arr: list[float], target: float) -> int: """PROBLEMATIC: Direct float equality often fails.""" for i, val in enumerate(arr): if val == target: # Dangerous for floats! return i return -1 def robust_float_search( arr: list[float], target: float, epsilon: float = 1e-9) -> int: """BETTER: Use epsilon-based comparison.""" for i, val in enumerate(arr): if abs(val - target) < epsilon: # Within tolerance return i return -1 # Pitfall 2: String case sensitivitydef search_case_sensitive(words: list[str], target: str) -> int: return next((i for i, w in enumerate(words) if w == target), -1) def search_case_insensitive(words: list[str], target: str) -> int: target_lower = target.lower() return next((i for i, w in enumerate(words) if w.lower() == target_lower), -1) # Pitfall 3: Object identity vs equalityclass Point: def __init__(self, x: int, y: int): self.x, self.y = x, y def __eq__(self, other): return isinstance(other, Point) and self.x == other.x and self.y == other.y def __hash__(self): return hash((self.x, self.y)) p1 = Point(1, 2)p2 = Point(1, 2)# p1 == p2 is True (value equality via __eq__)# p1 is p2 is False (different objects in memory) # Pitfall 4: None/null handlingdef search_with_none(arr: list[Any], target: Any) -> int: """Searching for None requires explicit handling.""" for i, val in enumerate(arr): # Special case: target is None if target is None: if val is None: return i elif val == target: # Regular equality return i return -1Before implementing any search, explicitly define your equality relation. Document whether comparisons are case-sensitive, whether floating-point tolerance is used, whether null is a valid target, and whether you're comparing by reference or value. Ambiguity here causes bugs that are notoriously difficult to diagnose.
For predicate-based targets, success means finding an element where the predicate evaluates to true. This introduces additional complexity.
1. Definedness: What happens if the predicate throws an exception on some elements?
def risky_predicate(x):
return x['value'] > 10 # Throws KeyError if 'value' missing
2. Purity: Does the predicate have side effects? Can it return different results on the same input?
call_count = 0
def impure_predicate(x):
global call_count
call_count += 1
return call_count > 5 # Returns True only after 5 calls!
3. Totality: Is the predicate defined for all possible elements, or are there invalid inputs?
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
from typing import Callable, TypeVar, Optional, Listfrom dataclasses import dataclass T = TypeVar('T') @dataclassclass SearchResult: """Rich result type for predicate search.""" found: bool index: int element: Optional[any] error: Optional[Exception] def safe_predicate_search( arr: List[T], predicate: Callable[[T], bool], on_predicate_error: str = "skip" # "skip", "fail", "treat_as_false") -> SearchResult: """ Search with robust predicate error handling. Args: arr: Elements to search predicate: Condition to satisfy on_predicate_error: How to handle predicate exceptions - "skip": Ignore elements where predicate fails - "fail": Abort search, return error - "treat_as_false": Treat exception as predicate returning False Returns: SearchResult with found status, index, element, and any error """ for i, element in enumerate(arr): try: if predicate(element): return SearchResult( found=True, index=i, element=element, error=None ) except Exception as e: if on_predicate_error == "fail": return SearchResult( found=False, index=-1, element=None, error=e ) elif on_predicate_error == "skip": continue # Try next element else: # treat_as_false continue # Element doesn't satisfy predicate return SearchResult( found=False, index=-1, element=None, error=None ) # Example: Searching with a predicate that might faildata = [ {"name": "Alice", "age": 30}, {"name": "Bob"}, # Missing 'age' key {"name": "Charlie", "age": 25},] def age_over_28(person): return person["age"] > 28 # Throws KeyError on Bob # Different behaviors based on error policyresult_skip = safe_predicate_search(data, age_over_28, "skip")# Finds Alice at index 0, skips Bob, never reaches Charlie result_fail = safe_predicate_search(data, age_over_28, "fail")# Returns error when predicate fails on BobA well-designed search predicate should be: (1) Total — defined for all possible inputs, (2) Pure — no side effects, same input yields same output, (3) Terminating — always returns in bounded time. Violating these properties leads to subtle bugs and non-deterministic behavior.
Failure in search seems simple: we didn't find it. But there are actually multiple types of failure, and conflating them causes real problems.
Type 1: Definitive Absence The target does not exist in the search space. We have conclusive proof.
Search space exhausted, target not found.
Guarantee: Target is definitely not present.
Type 2: Inconclusive Search We stopped before exhausting all possibilities. Target may or may not exist.
Timeout reached, search aborted.
Guarantee: None. Target might exist in unexplored regions.
Type 3: Invalid Search The search itself is malformed or cannot be executed.
Search space is empty, target cannot be evaluated, access denied.
Guarantee: The search operation did not complete meaningfully.
| Failure Type | Cause | Guarantee | Appropriate Response |
|---|---|---|---|
| Definitive Absence | Target not in space | 100% certain not present | Report not found, proceed safely |
| Search Timeout | Time limit exceeded | Unknown—might exist | Retry with more time, or report uncertainty |
| Resource Exhaustion | Memory/stack limit | Unknown—might exist | Use different algorithm, report failure mode |
| Access Failure | Permission denied, network error | Unknown—couldn't check | Retry, escalate, or fail explicitly |
| Invalid Query | Malformed target, empty space | Search couldn't run | Report input error, don't confuse with not-found |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
// TypeScript: Distinguishing failure types with discriminated unions type SearchSuccess<T> = { status: 'found'; index: number; element: T;}; type DefinitiveNotFound = { status: 'not_found'; reason: 'exhaustive_search_complete';}; type InconclusiveSearch = { status: 'inconclusive'; reason: 'timeout' | 'resource_limit' | 'interrupted'; searchedCount: number; totalCount: number;}; type SearchError = { status: 'error'; reason: 'empty_space' | 'invalid_query' | 'access_denied' | 'internal_error'; message: string;}; type SearchResult<T> = | SearchSuccess<T> | DefinitiveNotFound | InconclusiveSearch | SearchError; // Usage: Caller can handle each case appropriatelyfunction handleSearchResult<T>(result: SearchResult<T>): void { switch (result.status) { case 'found': console.log(`Found at index ${result.index}`); break; case 'not_found': console.log('Definitively not present'); break; case 'inconclusive': console.log(`Searched ${result.searchedCount}/${result.totalCount}, ` + `stopped due to ${result.reason}`); // Maybe retry with different parameters break; case 'error': console.error(`Search failed: ${result.message}`); // Log, alert, or propagate error break; }}Returning -1 or null for all failure types loses critical information. 'Not found after exhaustive search' means subsequent code can safely assume absence. 'Not found due to timeout' does NOT provide this guarantee. Conflating them leads to incorrect decisions downstream.
A search algorithm must eventually stop. The termination condition is the logical predicate that, when true, causes the algorithm to halt.
Early Exit on First Match: For existence or single-location searches, stop immediately upon finding a match.
for elem in collection:
if predicate(elem):
return Success(elem) # TERMINATE: success condition met
# If loop completes without return, continue to failure termination
Exhaustive Search for All Matches: For enumeration searches, continue until all elements examined.
results = []
for elem in collection:
if predicate(elem):
results.append(elem) # Don't terminate—continue searching
return results # TERMINATE: space exhausted
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
from typing import Iterator, Optional, TypeVarimport time T = TypeVar('T') def search_with_termination_conditions( iterator: Iterator[T], predicate, max_items: Optional[int] = None, timeout_seconds: Optional[float] = None, stop_on_first: bool = True) -> dict: """ Demonstrates various termination conditions. Termination triggers: 1. SUCCESS: Found match (if stop_on_first) 2. EXHAUSTED: Iterator depleted 3. LIMIT_REACHED: Examined max_items elements 4. TIMEOUT: Exceeded time limit """ results = [] examined = 0 start_time = time.time() for item in iterator: # Check timeout condition if timeout_seconds and (time.time() - start_time) > timeout_seconds: return { 'status': 'timeout', 'examined': examined, 'partial_results': results } # Check item limit condition if max_items and examined >= max_items: return { 'status': 'limit_reached', 'examined': examined, 'partial_results': results } examined += 1 # Check match condition if predicate(item): results.append(item) # Early termination for single-match searches if stop_on_first: return { 'status': 'found', 'examined': examined, 'result': item } # Iterator exhausted if results: return { 'status': 'complete', 'examined': examined, 'results': results } else: return { 'status': 'not_found', 'examined': examined } # Special case: Binary search terminationdef binary_search_termination_explained(arr: list, target) -> int: """ Binary search terminates when: 1. Target found (success) 2. Search space empty: left > right (definitive failure) The space halves each iteration, guaranteeing termination. """ left, right = 0, len(arr) - 1 while left <= right: # Termination: when left > right, space is empty mid = (left + right) // 2 if arr[mid] == target: return mid # SUCCESS TERMINATION elif arr[mid] < target: left = mid + 1 # Eliminate left half else: right = mid - 1 # Eliminate right half return -1 # FAILURE TERMINATION: space exhaustedFor a search algorithm to be correct, it must guarantee termination. Prove this by showing either: (1) a finite search space that shrinks each iteration, or (2) a monotonically decreasing counter that reaches zero. Infinite loops in search are unacceptable—even 'infinite' streams need explicit stopping conditions.
Edge cases reveal the robustness of a search implementation. The most dangerous is often the simplest: what happens when the search space is empty?
# Searching an empty array
def naive_search(arr, target):
for i, elem in enumerate(arr):
if elem == target:
return i
return -1 # Returns -1 for empty array
# But is -1 "not found in empty space" or "space checked, not present"?
# Semantically different!
| Edge Case | Naive Behavior | Robust Behavior |
|---|---|---|
| Empty search space | Return not-found | Distinguish: 'empty space' vs 'not present' |
| Single-element space | Works, but verify bounds | Test explicitly—common off-by-one source |
| All elements match | Return first/all as expected | Consider performance implications |
| No elements match | Return not-found | Ensure exhaustive—don't miss corner positions |
| Target at boundaries | First/last positions | Test i=0 and i=n-1 explicitly |
| Duplicate targets | Define which to return | Document: first? last? any? all? |
| Null/undefined in space | May crash or misbehave | Handle null elements gracefully |
| Target is null | Often crashes or wrong | Explicit null-target handling |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
from typing import List, Optional, Unionfrom enum import Enum, auto class SearchSpaceStatus(Enum): EMPTY = auto() SINGLE_ELEMENT = auto() NORMAL = auto() class EdgeCaseAwareSearch: """ Demonstrates robust edge case handling for search. """ @staticmethod def analyze_search_space(arr: List) -> SearchSpaceStatus: """Classify the search space before searching.""" if len(arr) == 0: return SearchSpaceStatus.EMPTY elif len(arr) == 1: return SearchSpaceStatus.SINGLE_ELEMENT return SearchSpaceStatus.NORMAL @staticmethod def robust_linear_search( arr: List[Optional[int]], target: Optional[int] ) -> dict: """ Linear search with comprehensive edge case handling. """ # Edge case 1: Empty space if len(arr) == 0: return { 'status': 'empty_space', 'message': 'Cannot search empty collection', 'index': None } # Edge case 2: Target is None (requires special comparison) if target is None: for i, elem in enumerate(arr): if elem is None: return { 'status': 'found', 'index': i, 'message': 'Found null element' } return { 'status': 'not_found', 'index': None, 'message': 'Null not present in collection' } # Normal search (target is not None) for i, elem in enumerate(arr): # Edge case 3: Element is None (skip safely) if elem is None: continue if elem == target: return { 'status': 'found', 'index': i, 'message': f'Found at index {i}' } return { 'status': 'not_found', 'index': None, 'message': f'Target {target} not present' } @staticmethod def robust_binary_search(sorted_arr: List[int], target: int) -> dict: """ Binary search with boundary edge case handling. """ n = len(sorted_arr) # Edge case: Empty array if n == 0: return {'status': 'empty_space', 'index': None} # Edge case: Single element if n == 1: if sorted_arr[0] == target: return {'status': 'found', 'index': 0} return {'status': 'not_found', 'index': None} # Edge case: Target outside array bounds if target < sorted_arr[0]: return { 'status': 'not_found', 'index': None, 'message': 'Target smaller than all elements' } if target > sorted_arr[-1]: return { 'status': 'not_found', 'index': None, 'message': 'Target larger than all elements' } # Normal binary search left, right = 0, n - 1 while left <= right: mid = left + (right - left) // 2 # Overflow-safe midpoint if sorted_arr[mid] == target: return {'status': 'found', 'index': mid} elif sorted_arr[mid] < target: left = mid + 1 else: right = mid - 1 return {'status': 'not_found', 'index': None}When implementing or testing search, start with edge cases: empty array, single element, target at first position, target at last position, target not present, all elements matching. If these work, the normal case usually follows. Edge cases reveal off-by-one errors, boundary condition bugs, and assumption violations.
How you communicate search results through your API profoundly affects usability, correctness, and debuggability. Let's examine common patterns and their trade-offs.
Return a special value indicating "not found" — commonly -1 for indices or null for elements.
Pros: Simple, no special types needed Cons: Sentinel might be a valid result; requires checking; loses failure type information
123456789101112131415
// Sentinel value patternfunction indexOf(arr, target) { for (let i = 0; i < arr.length; i++) { if (arr[i] === target) return i; } return -1; // Sentinel: "not found"} // Caller must check for sentinelconst idx = indexOf(data, 42);if (idx === -1) { console.log("Not found");} else { console.log("Found at", idx);}Return a wrapper indicating presence or absence of result.
Pros: Type-safe; compiler prevents using absent value; explicit handling Cons: Requires unwrapping; can be verbose
123456789101112131415161718192021222324
// Option/Maybe pattern in TypeScripttype Option<T> = { some: true; value: T } | { some: false }; function find<T>(arr: T[], predicate: (x: T) => boolean): Option<T> { for (const item of arr) { if (predicate(item)) { return { some: true, value: item }; } } return { some: false };} // Usage: Compiler enforces handling both casesconst result = find([1, 2, 3], x => x > 5);if (result.some) { console.log("Found:", result.value); // Type-safe access} else { console.log("Not found");} // Or with pattern matching (TypeScript 4.6+)const message = result.some ? `Found: ${result.value}` : "Not found";Throw when target not found.
Pros: Forces handling; can include detailed failure information Cons: Expensive; breaks control flow; 'not found' often isn't exceptional
12345678910111213141516171819202122232425262728293031323334
# Exception pattern (use sparingly for search) class ElementNotFoundError(Exception): """Raised when search target is not found.""" def __init__(self, target, space_size): self.target = target self.space_size = space_size super().__init__(f"Target {target} not found in {space_size} elements") def strict_find(arr, target): """ Find element or raise exception. Use when: Not finding is truly exceptional (programmer error). Avoid when: Not finding is a normal, expected outcome. """ for i, elem in enumerate(arr): if elem == target: return i raise ElementNotFoundError(target, len(arr)) # Usagetry: idx = strict_find(data, 42) print(f"Found at {idx}")except ElementNotFoundError as e: print(f"Error: {e}") # WARNING: Don't use exceptions for normal control flow!# This is anti-pattern in most search scenarios:try: idx = strict_find(user_input_list, query) # Bad if query often missesexcept ElementNotFoundError: idx = -1 # Exceptions as control flow = slow and unclearFor search APIs: (1) Use Optional types for single-result searches, (2) Return empty collections for all-matches searches, (3) Reserve exceptions for truly exceptional conditions (malformed input, system errors), (4) Consider Result types that distinguish 'not found' from 'error'. Match your API to caller expectations.
Not all searches demand exact matches. Many real-world scenarios accept 'close enough' as success. This introduces a spectrum between success and failure.
Fuzzy String Search: Find "Jhon" when "John" exists Nearest Neighbor: Find the closest point when exact match doesn't exist Threshold Search: Find elements within acceptable tolerance Ranked Results: Return best matches even if none are perfect
| Match Quality | Definition | Example |
|---|---|---|
| Exact | Perfect match on all criteria | Search 'apple', find 'apple' |
| Case-insensitive | Match ignoring letter case | Search 'Apple', find 'apple' |
| Prefix match | Target starts with query | Search 'app', find 'apple' |
| Contains | Target contains query | Search 'ppl', find 'apple' |
| Fuzzy (edit distance) | Within k character changes | Search 'aple', find 'apple' |
| Phonetic | Sounds alike | Search 'appel', find 'apple' |
| Semantic | Conceptually similar | Search 'fruit', find 'apple' |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
from typing import List, Tuple, NamedTuplefrom enum import Enum class MatchQuality(Enum): EXACT = 1.0 CASE_INSENSITIVE = 0.95 PREFIX = 0.8 CONTAINS = 0.7 FUZZY = 0.5 NONE = 0.0 class ApproximateMatch(NamedTuple): """Result of approximate search with quality score.""" found: bool element: str index: int quality: MatchQuality score: float def approximate_string_search( strings: List[str], query: str, min_quality: MatchQuality = MatchQuality.FUZZY) -> List[ApproximateMatch]: """ Search for strings with varying match quality. Returns matches meeting minimum quality threshold, ranked by score. """ results = [] query_lower = query.lower() for i, s in enumerate(strings): s_lower = s.lower() # Check match types from exact to fuzzy if s == query: quality = MatchQuality.EXACT score = 1.0 elif s_lower == query_lower: quality = MatchQuality.CASE_INSENSITIVE score = 0.95 elif s_lower.startswith(query_lower): quality = MatchQuality.PREFIX score = 0.8 * (len(query) / len(s)) elif query_lower in s_lower: quality = MatchQuality.CONTAINS score = 0.7 * (len(query) / len(s)) elif levenshtein_distance(s_lower, query_lower) <= 2: quality = MatchQuality.FUZZY score = 0.5 else: continue # No match if quality.value >= min_quality.value: results.append(ApproximateMatch( found=True, element=s, index=i, quality=quality, score=score )) # Sort by score descending return sorted(results, key=lambda x: x.score, reverse=True) def levenshtein_distance(s1: str, s2: str) -> int: """Compute edit distance between two strings.""" if len(s1) < len(s2): return levenshtein_distance(s2, s1) if len(s2) == 0: return len(s1) previous_row = range(len(s2) + 1) for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 deletions = current_row[j] + 1 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1]When implementing approximate search, return both the match AND its quality/score. This lets callers decide: accept any match, require high confidence, or show ranked results. A 60% match might be acceptable for suggestions but insufficient for authentication.
We've explored the crucial question of when a search ends and how to communicate its outcome. Let's consolidate the key insights:
Module Complete:
With this page, you've completed Module 1: The Search Problem — Formalization. You now possess the rigorous conceptual framework for understanding search: formal problem definition, search space characterization, target specification, result types, and success/failure conditions.
What's Next:
In Module 2, we'll apply this framework to Linear Search, the simplest but foundational search algorithm. We'll see how even 'trivial' algorithms have subtleties, and establish the baseline against which all other search algorithms are measured.
Congratulations! You've completed the formal foundations of search. You now understand search not as 'looking for things' but as a precisely defined computational problem with formal components. This rigorous understanding separates engineers who merely implement algorithms from those who can analyze, select, and design them. Next, we'll begin with the most fundamental algorithm: Linear Search.