Common Trie Patterns - Learning Module

Loading content...

0/276

Word Search in Grid with Trie Optimization

The Grid Word Search Challenge

Imagine you're building a word puzzle game—something like Boggle or a crossword validator. You have a 2D grid of characters and a dictionary of valid words. Your task is to find all words from the dictionary that can be formed by starting at any cell and traversing adjacent cells (horizontally, vertically, or diagonally) without revisiting cells.

This problem appears everywhere: word games, text mining, pattern matching, and is a favorite in technical interviews because it tests multiple skills simultaneously—graph traversal, backtracking, data structure selection, and algorithmic optimization.

The naive approach—check each word individually—quickly becomes catastrophically slow. But with tries, we transform this from an intractable problem into an elegant, efficient solution. This is one of the most powerful and instructive applications of the trie data structure.

What You Will Learn

By the end of this page, you will understand: (1) Why naive word search fails at scale, (2) How tries enable searching for multiple words simultaneously, (3) The complete implementation of trie-optimized grid search, (4) Time and space complexity analysis, and (5) Optimization techniques used in production systems.

Understanding the Problem: Word Search II

Let's formalize the Word Search in Grid problem, often called Word Search II in competitive programming contexts.

Problem Statement:

Given:

An m × n grid board of characters
A list of k words from a dictionary

Find all words that can be constructed by sequentially connecting adjacent cells. Adjacent cells are those horizontally, vertically, or diagonally neighboring (8 directions, or 4 directions in simpler variants). Each cell may only be used once per word.

Example:

Problem Example
Grid (3×4):
┌───┬───┬───┬───┐
│ o │ a │ a │ n │
├───┼───┼───┼───┤
│ e │ t │ a │ e │
├───┼───┼───┼───┤
│ i │ h │ k │ r │
└───┴───┴───┴───┘
 
Dictionary: ["oath", "pea", "eat", "rain", "oat", "eta"]
 
Found words: ["oath", "eat", "oat", "eta"]
 
Paths:
• "oath": (0,0)→(1,0)→(1,1)→(2,1)  [o→a→t→h but wait, that's wrong]
  Actually: (0,0)→(0,1)→(1,1)→(2,1) [o→a→t→h] ✓
• "eat":  (1,0)→(0,1)→(1,1) [e→a→t] ✓
• "oat":  (0,0)→(0,1)→(1,1) [o→a→t] ✓
• "eta":  (1,0)→(1,1)→(0,1) [e→t→a] ✓

Why This Problem Is Challenging:

The challenge lies in the combinatorial explosion of possible paths. From each cell, you can move to up to 8 neighbors. For a path of length L, the number of possible paths from a single starting cell is roughly 8^L. Multiply this across all m × n starting cells and all k words in the dictionary.

With a 10×10 grid and 1000 words averaging 5 characters each, the naive approach performs approximately:

100 (cells) × 1000 (words) × 8^5 (paths) ≈ 3.3 billion operations

This is clearly unacceptable. We need a smarter approach.

The Naive Approach and Its Limitations

Before appreciating the trie optimization, let's understand exactly why the straightforward approach fails.

The Naive Strategy:

For each word in the dictionary:

For each cell in the grid:
- If the cell matches the word's first character, start DFS
- Try to match the remaining characters by exploring adjacent cells
- Backtrack when stuck or when the word is found

naive_approach.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# Naive Approach: O(k × m × n × 8^L) - INEFFICIENT
def find_words_naive(board: list[list[str]], words: list[str]) -> list[str]:
    """
    Naive approach: Check each word individually using DFS.
    
    Time: O(k × m × n × 8^L) where L = max word length
    Space: O(L) for recursion stack
    """
    m, n = len(board), len(board[0])
    directions = [(-1,-1), (-1,0), (-1,1), (0,-1), (0,1), (1,-1), (1,0), (1,1)]
    found = []
    
    def dfs(row: int, col: int, word_index: int, word: str, visited: set) -> bool:
        # Base case: word completely matched
        if word_index == len(word):
            return True
        
        # Bounds check and character match
        if (row < 0 or row >= m or col < 0 or col >= n or 
            (row, col) in visited or 
            board[row][col] != word[word_index]):
            return False
        
        # Mark as visited
        visited.add((row, col))
        
        # Try all 8 directions
        for dr, dc in directions:
            if dfs(row + dr, col + dc, word_index + 1, word, visited):
                visited.remove((row, col))  # Backtrack
                return True
        
        # Backtrack
        visited.remove((row, col))
        return False
    
    # Check each word individually - THIS IS THE PROBLEM
    for word in words:
        word_found = False
        for i in range(m):
            for j in range(n):
                if board[i][j] == word[0]:  # Start if first char matches
                    if dfs(i, j, 0, word, set()):
                        word_found = True
                        break
            if word_found:
                break
        if word_found:
            found.append(word)
    
    return found

Critical Inefficiency

The naive approach performs a complete DFS for every single word. If 100 words share the prefix 'pre', we perform 100 separate, redundant searches to find 'pre' in the grid. This duplication of effort is catastrophic at scale.

Complexity Analysis of Naive Approach:

Component	Complexity
Words to check	O(k)
Starting positions	O(m × n)
Paths per position	O(8^L) where L = word length
Total Time	O(k × m × n × 8^L)
Space (recursion)	O(L)

The Fundamental Problem:

We're treating each word as an isolated search. But words share structure—they share prefixes! The words 'cat', 'car', 'card', 'care' all start with 'ca'. When we find 'ca' in the grid, we should be able to check all four words simultaneously, not search four separate times.

The Trie Insight: Searching Multiple Words Simultaneously

The key insight that transforms this problem is recognizing that tries naturally represent shared prefixes as shared paths.

Instead of searching for words one at a time, we:

Build a trie from all dictionary words
Search the grid once, carrying a trie node as we traverse
Simultaneously check if the current path matches any word prefix
Prune immediately when the current path isn't a prefix of any word

This is a paradigm shift: we go from k separate word searches to one grid traversal with k-word awareness.

Naive: Per-Word Search

•Search for 'apple': Full DFS
•Search for 'apply': Full DFS (repeating 'appl')
•Search for 'application': Full DFS (repeating 'appli')
•3 complete searches
•Massive redundancy

Trie: Unified Search

•Build trie with all 3 words
•One DFS traversal
•When at 'appl', know all 3 words are possible
•Find 'e' → found 'apple'
•Find 'y' → found 'apply'
•No redundant prefix exploration

Visual: Trie Enables Prefix Sharing

Consider the dictionary: ['cat', 'car', 'card', 'care', 'cards']

Trie Structure
        (root)
           │
           c
           │
           a ─────────┐
           │          │
           t*         r*
                      │
                  ┌───┴───┐
                  d*      e*
                  │
                  s*
 
Nodes marked with * indicate end-of-word.
 
When exploring the grid:
- Find 'c' → Enter trie at 'c'
- Find 'a' → Move to 'a' node (both 'cat' and 'car*' branches exist)
- Find 't' → Move to 't*' → Found "cat"! Continue searching...
- Find 'r' → Move to 'r*' → Found "car"! Continue for 'd' and 'e'...
 
ONE traversal checks ALL words simultaneously!

The Power of Early Termination

When the trie node has no child for the current character, we immediately prune that search path. If no dictionary word starts with 'cx', we never explore past 'cx' in the grid—regardless of how many cells we'd otherwise visit. This pruning is the source of the dramatic speedup.

Building the Trie for Word Search

The first step is constructing a trie from the dictionary. For word search problems, we need a trie that:

Stores the complete word at end-of-word nodes (to collect results easily)
Supports efficient child lookup (hash map for varied alphabets)
Allows node removal for an important optimization (discussed later)

Let's build this specialized trie:

trie_for_word_search.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
class TrieNode:
    """
    A trie node optimized for word search problems.
    
    Key features:
    - 'children' dict for O(1) child lookup
    - 'word' stores the complete word at end nodes (for easy result collection)
    - 'count' tracks number of words passing through (for optimization)
    """
    __slots__ = ['children', 'word', 'count']  # Memory optimization
    
    def __init__(self):
        self.children: dict[str, 'TrieNode'] = {}
        self.word: str | None = None  # Non-None if this node marks end of a word
        self.count: int = 0  # Number of words passing through this node
 
 
class Trie:
    """
    Trie data structure for word search optimization.
    
    Supports:
    - insert(word): Add a word to the trie
    - Word storage at leaf nodes for result collection
    - Count tracking for optimization during search
    """
    
    def __init__(self):
        self.root = TrieNode()
    
    def insert(self, word: str) -> None:
        """
        Insert a word into the trie.
        
        Time: O(L) where L = len(word)
        Space: O(L) in worst case (no prefix sharing)
        """
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            node.count += 1  # Track words passing through
        node.word = word  # Store complete word at end node
    
    def build_from_words(self, words: list[str]) -> None:
        """
        Build trie from a list of words.
        
        Time: O(sum of all word lengths)
        """
        for word in words:
            self.insert(word)
 
 
# Example usage:
words = ["oath", "pea", "eat", "rain", "oat", "eta"]
trie = Trie()
trie.build_from_words(words)
 
# The trie structure now looks like:
#
#         (root)
#        /  |  \
#       o   p   e...
#       |   |
#       a   e
#       |   |
#       t*  a*
#      /
#     h*
#
# Where * indicates node.word is set to the complete word

Why Store the Complete Word?

By storing the complete word at end-of-word nodes (node.word), we avoid reconstructing the word during DFS. When we reach an end-of-word node during grid traversal, we simply add node.word to our results. This is cleaner and more efficient than tracking the path.

The count Field Explained:

The count field tracks how many dictionary words "pass through" each node. This enables a crucial optimization:

When we find a word, we can decrement counts along its path
When a node's count reaches 0, no remaining words use that prefix
We can then prune future searches through that node

This prevents finding the same word multiple times and accelerates the search as words are discovered.

The Complete Trie-Optimized Algorithm

Now we combine everything into the complete, optimized word search algorithm. The key insight is that we traverse the grid while simultaneously navigating the trie. The trie node acts as our "state" of which words are still possible.

word_search_optimized.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
class TrieNode:
    __slots__ = ['children', 'word', 'count']
    
    def __init__(self):
        self.children: dict[str, 'TrieNode'] = {}
        self.word: str | None = None
        self.count: int = 0
 
 
class Solution:
    """
    Word Search II - Trie-Optimized Solution
    
    Time Complexity: O(m × n × 4^L) where L = max word length
                     But with aggressive pruning, much faster in practice
    Space Complexity: O(k × L) for trie + O(L) for recursion
                      where k = number of words
    """
    
    def find_words(self, board: list[list[str]], words: list[str]) -> list[str]:
        if not board or not board[0] or not words:
            return []
        
        # Step 1: Build trie from all dictionary words
        root = TrieNode()
        for word in words:
            node = root
            for char in word:
                if char not in node.children:
                    node.children[char] = TrieNode()
                node = node.children[char]
                node.count += 1
            node.word = word
        
        # Step 2: Set up grid traversal
        m, n = len(board), len(board[0])
        # 4-directional movement (can be 8 for diagonal)
        directions = [(-1, 0), (1, 0), (0, -1), (0, 1)]
        result = []
        
        def dfs(row: int, col: int, node: TrieNode) -> None:
            """
            DFS traversal of grid, synchronized with trie navigation.
            
            Args:
                row, col: Current position in grid
                node: Current position in trie (represents prefix matched so far)
            """
            char = board[row][col]
            
            # Pruning: If current char not in trie children, no word continues
            if char not in node.children:
                return
            
            # Move to child node in trie
            child_node = node.children[char]
            
            # Check if we've found a word
            if child_node.word is not None:
                result.append(child_node.word)
                # OPTIMIZATION: Remove word to prevent duplicates
                child_node.word = None
                # Decrement counts to enable pruning
                self._decrement_counts(root, child_node.word if child_node.word else result[-1])
            
            # OPTIMIZATION: If no words remain through this path, prune
            if child_node.count == 0:
                # Remove this child from parent to speed up future searches
                del node.children[char]
                return
            
            # Mark cell as visited (in-place modification)
            original_char = board[row][col]
            board[row][col] = '#'  # Use special marker
            
            # Explore all 4 directions
            for dr, dc in directions:
                new_row, new_col = row + dr, col + dc
                if 0 <= new_row < m and 0 <= new_col < n:
                    if board[new_row][new_col] != '#':  # Not visited
                        dfs(new_row, new_col, child_node)
            
            # Backtrack: Restore cell
            board[row][col] = original_char
        
        # Step 3: Start DFS from every cell
        for i in range(m):
            for j in range(n):
                if board[i][j] in root.children:
                    dfs(i, j, root)
        
        return result
    
    def _decrement_counts(self, root: TrieNode, word: str) -> None:
        """
        Decrement counts along the path for a found word.
        This enables pruning of exhausted branches.
        """
        node = root
        for char in word:
            if char in node.children:
                node.children[char].count -= 1
                # If count reaches 0, we could delete the child
                if node.children[char].count == 0:
                    del node.children[char]
                    return
                node = node.children[char]
 
 
# Example usage
solution = Solution()
board = [
    ['o', 'a', 'a', 'n'],
    ['e', 't', 'a', 'e'],
    ['i', 'h', 'k', 'r']
]
words = ["oath", "pea", "eat", "rain", "oat", "eta"]
 
result = solution.find_words(board, words)
print(result)  # ['oath', 'oat', 'eat', 'eta'] (order may vary)

Algorithm Walkthrough:

Build the Trie (Lines 23-31): Insert all dictionary words. Each end-of-word node stores the complete word.
Start DFS from Each Cell (Lines 79-82): For each cell whose character exists as a child of root, begin DFS.
Synchronized Traversal (Lines 39-75): As we move through the grid, we simultaneously move through the trie. The trie node tells us which words are still possible.
Word Discovery (Lines 53-59): When we reach a node with word set, we've found a dictionary word. Add it to results.
Backtracking (Lines 69-74): Restore the cell and return, allowing other paths to use this cell.
Pruning (Lines 61-65): When a branch is exhausted (count = 0), remove it from the trie to prevent future wasteful exploration.

Critical Optimizations for Production

The basic trie-optimized algorithm is already dramatically faster than the naive approach. However, several additional optimizations can make it production-ready:

Key Optimizations

•1. In-Place Visited Marking: Instead of a visited set (O(1) but high overhead), modify the board directly by replacing characters with a marker like '#'. Restore during backtracking. This avoids set operations entirely.
•2. Dynamic Trie Pruning: When a word is found, remove it from the trie (set word = None) and decrement counts. When counts hit zero, delete the branch. This shrinks the trie as we find words.
•3. Early Termination on Root Children: Before starting DFS from a cell, check if that character exists in root.children. Skip cells that can't start any word.
•4. Character Frequency Pre-check: Before starting, verify that the grid contains all characters needed for the dictionary words. Skip words whose characters don't exist in the grid.
•5. Length-Based Pruning: Skip words longer than m × n (impossible to form). Track max word length to limit DFS depth.

optimizations.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def find_words_optimized(board: list[list[str]], words: list[str]) -> list[str]:
    """
    Fully optimized word search with all production-level enhancements.
    """
    if not board or not board[0] or not words:
        return []
    
    m, n = len(board), len(board[0])
    
    # OPTIMIZATION 1: Pre-compute grid character frequency
    grid_chars = set()
    for row in board:
        grid_chars.update(row)
    
    # OPTIMIZATION 2: Filter words with impossible characters
    valid_words = []
    for word in words:
        if all(c in grid_chars for c in word) and len(word) <= m * n:
            valid_words.append(word)
    
    if not valid_words:
        return []
    
    # Build trie only with valid words
    root = build_trie(valid_words)
    
    # OPTIMIZATION 3: Pre-compute which cells can start a word
    starting_cells = []
    for i in range(m):
        for j in range(n):
            if board[i][j] in root.children:
                starting_cells.append((i, j))
    
    result = []
    directions = [(-1, 0), (1, 0), (0, -1), (0, 1)]
    
    def dfs(row: int, col: int, node: TrieNode) -> None:
        char = board[row][col]
        
        if char not in node.children:
            return
        
        child = node.children[char]
        
        if child.word:
            result.append(child.word)
            child.word = None  # Prevent duplicates
        
        # OPTIMIZATION 4: In-place marking with original char backup
        board[row][col] = '#'
        
        for dr, dc in directions:
            nr, nc = row + dr, col + dc
            if 0 <= nr < m and 0 <= nc < n and board[nr][nc] != '#':
                if board[nr][nc] in child.children:  # Extra pruning
                    dfs(nr, nc, child)
        
        board[row][col] = char  # Restore
        
        # OPTIMIZATION 5: Prune exhausted branches
        if not child.children and child.word is None:
            del node.children[char]
    
    # Only start from viable cells
    for i, j in starting_cells:
        dfs(i, j, root)
    
    return result

Why These Optimizations Matter

In worst-case scenarios, these optimizations can provide 10-100x speedup. The character frequency pre-check alone can eliminate 50%+ of words in many real-world dictionaries. Dynamic pruning ensures the algorithm gets faster as it progresses—the trie shrinks with each found word.

Time and Space Complexity Analysis

Understanding the complexity of the trie-optimized solution requires careful analysis of both the trie construction and the grid search phases.

Complexity Comparison: Naive vs Trie-Optimized
Aspect	Naive Approach	Trie-Optimized
Time: Build Phase	N/A	O(W × L) where W = total words, L = avg length
Time: Search Phase	O(k × m × n × 4^L)	O(m × n × 4^L)
Time: Overall	O(k × m × n × 4^L)	O(W × L + m × n × 4^L)
Space: Data Structure	O(1)	O(W × L) for trie
Space: Recursion	O(L)	O(L)
Pruning Benefit	None	Massive (early termination)

Detailed Analysis:

Time Complexity:

Trie Construction: O(W × L) where W is the total number of characters across all words
- Each character is processed exactly once
Grid Search: O(m × n × 4^L) in the worst case
- We start from each of m × n cells
- Each path can extend up to 4^L ways (4 directions, L = max word length)
Key Improvement: The factor of k (number of words) is eliminated!
- Naive: O(k × m × n × 4^L) — scales with dictionary size
- Optimized: O(m × n × 4^L) — independent of dictionary size!

In Practice:

The 4^L factor is a loose upper bound. In practice:

Trie pruning eliminates most paths early
Visited cell marking prevents revisits
Dynamic branch removal accelerates as words are found
Average case is often O(m × n × L) for reasonable inputs

The Key Speedup

Eliminating the k factor is the crucial win. With 10,000 dictionary words, the naive approach is 10,000× slower. The trie-optimized approach handles 10 words or 10 million words with the same grid traversal—only the trie build time increases linearly.

Space Complexity:

Trie Storage: O(W × L) in the worst case
- If all words are completely distinct, each character needs a node
- In practice, prefix sharing reduces this significantly
Recursion Stack: O(L) where L is the maximum word length
- DFS depth is bounded by the longest possible word
Result Storage: O(k) for storing found words

Space Optimization Note:

For memory-constrained environments, consider processing words in batches—build smaller tries, search, then rebuild. This trades time for space.

Real-World Applications

The trie-optimized word search pattern extends far beyond puzzle games. Understanding this pattern unlocks solutions to many practical problems:

Industry Applications

•Word Puzzle Games (Boggle, Scrabble): Find all valid words formable on a game board. Commercial games use this exact algorithm.
•Text Mining and OCR Validation: When OCR produces uncertain character grids, search for valid dictionary words to improve accuracy.
•DNA Sequence Analysis: Search for known gene patterns in nucleotide grids. The 4-character alphabet (A, C, G, T) is perfect for tries.
•Crossword Puzzle Validation: Verify that all horizontal and vertical sequences form valid words.
•Name Entity Recognition: Search document layouts for known entity names, handling 2D spatial relationships.
•Security Pattern Matching: Detect known malicious patterns in 2D representations of binary data.

Pattern Recognition

Whenever you face a problem involving: (1) a 2D grid of characters, (2) a dictionary of patterns to find, and (3) path-based matching, immediately consider the trie-optimized approach. It's the canonical solution for this problem class.

Summary: Word Search with Tries

We've covered one of the most powerful applications of the trie data structure—optimizing word search in grids. Let's consolidate the key insights:

Key Takeaways

•The Problem: Finding dictionary words in a character grid has exponential path complexity. Naive per-word search is O(k × m × n × 4^L).
•The Insight: Tries represent shared prefixes as shared paths. This enables searching for all k words simultaneously.
•The Algorithm: Build a trie from the dictionary. Traverse the grid once, carrying a trie node. When the trie has no child for a character, prune immediately.
•The Optimization: Eliminate the k factor by searching for all words in one traversal—O(m × n × 4^L) regardless of dictionary size.
•Production Enhancements: In-place marking, dynamic pruning, character pre-filtering, and length-based cutoffs provide 10-100× additional speedup.
•The Pattern: This trie + DFS combination applies to any multi-pattern search in structured data.

Page Complete

You now understand how to use tries to optimize word search in grids—a classic and frequently-tested pattern. Next, we'll explore another essential trie pattern: finding the longest common prefix among strings.