Loading learning content...
The insert operation is the foundational building block of every trie. Without insertion, there is no trie—just an empty root node waiting to be populated. Yet the elegance of trie insertion lies not in complexity, but in its beautiful simplicity: we trace a path through the tree, character by character, creating new nodes only when necessary.
Unlike hash tables that compute a hash and store the entire string at once, or binary search trees that compare whole strings at each node, a trie constructs a character-level path from root to a termination marker. This architectural decision has profound implications for prefix-based operations, memory sharing, and lexicographic ordering—advantages we explored in previous modules.
In this page, we will dissect the insert operation with surgical precision. You'll understand not just how to implement it, but why each step exists, what invariants must be maintained, and when different implementation choices affect performance.
By the end of this page, you will be able to: (1) Explain the conceptual model of trie insertion and why it differs from other data structures, (2) Implement insertion for both array-based and hash map-based trie nodes, (3) Trace through insertions step-by-step, visualizing the tree structure, (4) Analyze the time and space complexity with rigorous justification, and (5) Handle edge cases including empty strings, duplicate insertions, and special characters.
To truly understand trie insertion, we must first internalize the fundamental trie invariant:
Every path from the root to an end-of-word marker represents exactly one string in the trie.
This invariant is the lens through which all trie operations should be viewed. When we insert a string, we are not storing the string in the conventional sense—we are creating a path that encodes the string through the sequence of edges traversed.
Imagine walking through a forest of infinite possibilities. Starting at the root (the forest entrance), each step you take is determined by the next character in your string. If a path exists for that character, you follow it. If no path exists, you carve a new one—creating a new node and establishing a new edge.
When you've consumed all characters in your string, you plant a flag at your current position: the end-of-word marker. This flag says, "A complete string ends here." Without this marker, we couldn't distinguish between strings that are prefixes of other strings. For example, we need to differentiate between having inserted just "cat" versus having inserted both "cat" and "catastrophe".
Consider inserting "app" and "apple" into a trie. Both share the path a→p→p. Without an end-of-word marker at the node reached after processing "app", we'd have no way to know that "app" is a valid word—we'd only see it as a prefix of "apple". The marker transforms a path from being a "prefix observation" into a "complete string declaration".
This path-based representation has several structural consequences that shape the insert algorithm:
Root is never a character: The root node is a sentinel—an entry point that doesn't correspond to any character. The first character of every string labels the edge from the root to its first child.
Shared prefixes share nodes: If we insert "team" and "tear", the nodes for 't', 'e', and 'a' are shared. The paths diverge only at the fourth character ('m' vs 'r'). This prefix sharing is automatic—a natural consequence of following existing paths.
Depth equals string length: A string of length m creates a path of exactly m nodes below the root. The end-of-word marker is placed at depth m.
Character set determines branching factor: Each node can have as many children as there are characters in the alphabet. For lowercase English, that's 26 potential children per node—though most nodes will have far fewer.
Let's formalize the insertion process with a rigorous, step-by-step algorithm. We'll then trace through concrete examples to cement understanding.
Given a trie with root node root and a string word to insert:
current = root. This pointer will walk down the trie as we process each character.c in word (from first to last):current corresponding to character c.current for character c.current to the child node corresponding to c (whether it existed or was just created).current.isEndOfWord = true.Step 1 (Initialize): We must start at the root because every string path begins there. The root is the universal prefix—the empty string "" that prefixes all strings.
Steps 2-5 (Character Loop): This is the core of the algorithm. We're essentially doing a multi-level lookup/create operation. For each character, we either follow an existing edge (the character was seen before in some prefix) or create a new edge (this character at this position is new to the trie).
Step 6 (Mark End): This final step is often overlooked but is absolutely critical. Without it, we're just creating paths in the trie without declaring which paths represent complete strings. The end-of-word marker is the semantic payload that transforms a structural path into a meaningful dictionary entry.
Forgetting to set the end-of-word marker is the #1 bug in trie implementations. Your insert will appear to work—the nodes are created—but search and startsWith will fail for some queries because the trie doesn't know which paths are complete words versus mere prefixes.
Let's trace through inserting multiple words to see how the trie evolves. We'll insert, in order: "cat", "car", "card", "care", "careful".
We start with just the root node—an empty trie:
root [isEndOfWord: false]
└── (no children)
Processing characters: c → a → t
c: No child at root for 'c'. Create node. Move to it.a: No child for 'a'. Create node. Move to it.t: No child for 't'. Create node. Move to it.isEndOfWord = true.root
└── c
└── a
└── t [END]
Processing: c → a → r
c: Child exists at root! Follow it.a: Child exists! Follow it.r: No child for 'r'. Create node. Move to it.isEndOfWord = true.root
└── c
└── a
├── t [END]
└── r [END]
Notice: We reused the existing path for "ca". Only 'r' required a new node. This is prefix sharing in action.
Processing: c → a → r → d
c: Follow existing.a: Follow existing.r: Follow existing (we created it for "car").d: No child for 'd'. Create node. Move to it.isEndOfWord = true.root
└── c
└── a
├── t [END]
└── r [END]
└── d [END]
Key Insight: "car" is now both a complete word AND a prefix of "card". Both 'r' node and 'd' node have isEndOfWord = true.
"care": c(follow) → a(follow) → r(follow) → e(create) → mark END
"careful": c → a → r → e(follow!) → f(create) → u(create) → l(create) → mark END
Final structure:
root
└── c
└── a
├── t [END]
└── r [END]
├── d [END]
└── e [END]
└── f
└── u
└── l [END]
Five words, but the total node count is small thanks to shared prefixes (c-a is shared by all, c-a-r is shared by car/card/care/careful).
Multiple words can share the same node: The 'r' node is part of four different word paths.
Some nodes have isEndOfWord=true even with children: The 'r' node marks the end of "car" but also has children 'd' and 'e'. This is perfectly normal.
Insertion order doesn't matter for final structure: Whether we insert "careful" before "care" or after, the trie structure is the same.
Unlike binary search trees where insertion order dramatically affects shape, trie structure is determined solely by the set of strings inserted, not the order. This property simplifies reasoning about trie behavior and means you never need to "balance" a trie.
The insert algorithm is the same regardless of how we represent children, but the implementation details differ. Let's examine both the array-based and hash map-based approaches, understanding the trade-offs that inform our choice.
When the alphabet is fixed and small (e.g., 26 lowercase letters), we can use an array of child pointers:
12345678910111213141516171819202122232425262728293031323334353637383940414243
class TrieNode { children: (TrieNode | null)[]; isEndOfWord: boolean; constructor() { // 26 slots for 'a' through 'z' this.children = new Array(26).fill(null); this.isEndOfWord = false; }} class Trie { private root: TrieNode; constructor() { this.root = new TrieNode(); } /** * Inserts a word into the trie. * Time: O(m) where m is the length of the word * Space: O(m * 26) in worst case for all new nodes */ insert(word: string): void { let current = this.root; for (const char of word) { // Map 'a'-'z' to indices 0-25 const index = char.charCodeAt(0) - 'a'.charCodeAt(0); // Create node if it doesn't exist if (current.children[index] === null) { current.children[index] = new TrieNode(); } // Move to the child current = current.children[index]!; } // Mark the end of the word current.isEndOfWord = true; }}Array Approach Analysis:
123456789101112131415161718192021222324252627282930313233343536373839
class TrieNode { children: Map<string, TrieNode>; isEndOfWord: boolean; constructor() { this.children = new Map(); this.isEndOfWord = false; }} class Trie { private root: TrieNode; constructor() { this.root = new TrieNode(); } /** * Inserts a word into the trie. * Time: O(m) where m is the length of the word * Space: O(m) for new nodes (each node only stores actual children) */ insert(word: string): void { let current = this.root; for (const char of word) { // Check if child exists, create if not if (!current.children.has(char)) { current.children.set(char, new TrieNode()); } // Move to the child current = current.children.get(char)!; } // Mark the end of the word current.isEndOfWord = true; }}Hash Map Approach Analysis:
| Criterion | Array-Based | Hash Map-Based |
|---|---|---|
| Lookup Speed | O(1) with simple arithmetic | O(1) average with hash overhead |
| Memory Efficiency | Wastes space for sparse nodes | Only stores actual children |
| Best Alphabet Size | Small, fixed (26-128) | Large, variable (Unicode) |
| Implementation Complexity | Simpler index math | Slightly more code |
| Cache Locality | Better (contiguous array) | Worse (hash buckets) |
Let's rigorously analyze the complexity of the insert operation. Understanding these bounds is essential for making informed data structure choices.
Where m is the length of the string being inserted.
The algorithm iterates through each character exactly once. At each character, we perform:
Since we do O(1) work for each of m characters, total time is O(m).
Crucially, this is independent of the number of strings already in the trie. Whether the trie contains 10 words or 10 million words, inserting a string of length m always takes O(m) time. This is a remarkable property that hash tables share for insertion, but balanced BSTs do not (their O(log n) factor depends on total elements).
Hash set insert: O(m) to hash + O(1) to store = O(m). Similar to trie! But tries give you prefix operations. Balanced BST with string keys: O(m log n) because each comparison takes O(m) and there are O(log n) comparisons. Tries win when you need prefix matching or have many strings with shared prefixes.
In the worst case, no prefix of the new word exists in the trie, so we create m new nodes. The space for each node depends on the implementation:
Array-based (alphabet size Σ):
Hash map-based:
If we insert n strings with total combined length L (sum of all string lengths), the trie uses at most O(L) nodes (each character in all strings could require its own node in the worst case). In practice, prefix sharing means actual node count is often much less than L.
Best case (maximum sharing): If all strings are permutations of each other or highly overlapping prefixes, node count approaches the size of the longest string.
Worst case (no sharing): All strings are unique from the first character (e.g., "apple", "banana", "cherry"). Node count equals L.
| Metric | Complexity | Notes |
|---|---|---|
| Time (single insert) | O(m) | m = length of string being inserted |
| Space (single insert, worst) | O(m) | When no prefix exists in trie |
| Space (single insert, best) | O(1) | When entire string is already a prefix of existing word |
| Aggregate time (n inserts) | O(L) | L = total characters across all strings |
| Aggregate space (n inserts) | O(L × Σ) array / O(L) hash | Σ = alphabet size |
Production-quality code must handle edge cases gracefully. Let's examine the scenarios that trip up naive implementations.
What happens if we insert an empty string ""?
Algorithm behavior: The character loop executes zero times. We immediately mark root.isEndOfWord = true.
Is this correct? Depends on your requirements. If empty strings should be valid dictionary entries, this is correct. If not, guard against it:
insert(word: string): void {
if (word.length === 0) {
return; // or throw, depending on API contract
}
// ... rest of insertion
}
Implication: If you allow empty string insertion, search("") should return true, and startsWith("") should always return true (every string starts with the empty prefix).
What if we insert the same word twice?
Algorithm behavior: The second insertion follows the existing path (no new nodes needed) and sets isEndOfWord = true again on a node that's already marked.
Result: The trie is unchanged. Insertion is idempotent.
Performance implication: Duplicate insertions are actually faster than new insertions because no memory allocation occurs. This makes tries excellent for building dictionaries from potentially duplicated input (e.g., word counts from text).
Insert "apple", then insert "app".
Algorithm behavior: When inserting "app", we follow existing nodes for a→p→p. The 'p' node (at depth 3) is already created as part of "apple's" path. We simply mark this node as isEndOfWord = true.
Result: Both "apple" and "app" are now valid words. The node for the third 'p' has:
isEndOfWord = true (because "app" ends here)This demonstrates that any node can be both a word termination and a prefix node.
Insert "app", then insert "apple".
Algorithm behavior: When inserting "apple", we follow existing nodes for a→p→p. After that, we create new nodes for l→e and mark 'e' as end of word.
Result: Same trie structure as Case 3. Insertion order doesn't affect final structure.
For array-based tries with a fixed alphabet, inserting characters outside that alphabet causes bugs (array index out of bounds or negative indices). Always validate input or use hash map-based children for untrusted input. Example: inserting "café" into a lowercase-only trie crashes on 'é' unless handled.
123456789101112131415161718192021222324252627
/** * Robust insert with input validation */insert(word: string): void { // Handle empty string based on requirements if (word.length === 0) { return; // Silently ignore; alternatively throw } let current = this.root; for (const char of word) { // Validate character is in expected range const index = char.charCodeAt(0) - 'a'.charCodeAt(0); if (index < 0 || index >= 26) { throw new Error(`Invalid character '${char}' - only lowercase a-z allowed`); } if (current.children[index] === null) { current.children[index] = new TrieNode(); } current = current.children[index]!; } current.isEndOfWord = true;}The basic insert operation can be extended to support additional functionality. Here are variations you'll encounter in practice.
Instead of just marking end-of-word, store an associated value. This transforms the trie from a set into a map.
123456789101112131415161718192021222324252627282930313233343536
class TrieNode<T> { children: Map<string, TrieNode<T>>; value: T | null; // null if no value at this node constructor() { this.children = new Map(); this.value = null; }} class TrieMap<T> { private root: TrieNode<T>; constructor() { this.root = new TrieNode<T>(); } /** * Insert a key-value pair into the trie. * Returns the previous value if key existed, null otherwise. */ insert(key: string, value: T): T | null { let current = this.root; for (const char of key) { if (!current.children.has(char)) { current.children.set(char, new TrieNode<T>()); } current = current.children.get(char)!; } const previousValue = current.value; current.value = value; return previousValue; }}Track how many times each word has been inserted. Useful for autocomplete ranking.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
class TrieNode { children: Map<string, TrieNode>; count: number; // 0 if not a complete word, >0 for word count constructor() { this.children = new Map(); this.count = 0; }} class FrequencyTrie { private root: TrieNode; constructor() { this.root = new TrieNode(); } /** * Insert a word and increment its frequency. */ insert(word: string): void { let current = this.root; for (const char of word) { if (!current.children.has(char)) { current.children.set(char, new TrieNode()); } current = current.children.get(char)!; } current.count++; // Increment, don't just set to 1 } /** * Get the frequency of a word (0 if not present). */ getFrequency(word: string): number { let current = this.root; for (const char of word) { if (!current.children.has(char)) { return 0; } current = current.children.get(char)!; } return current.count; }}Track how many words pass through each node. Enables efficient prefix counting.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
class TrieNode { children: Map<string, TrieNode>; isEndOfWord: boolean; prefixCount: number; // How many words pass through this node constructor() { this.children = new Map(); this.isEndOfWord = false; this.prefixCount = 0; }} class PrefixCountTrie { private root: TrieNode; constructor() { this.root = new TrieNode(); } /** * Insert and update prefix counts along the path. */ insert(word: string): void { let current = this.root; for (const char of word) { if (!current.children.has(char)) { current.children.set(char, new TrieNode()); } current = current.children.get(char)!; current.prefixCount++; // Increment at each node along path } current.isEndOfWord = true; } /** * Count words that start with the given prefix. * O(m) where m is prefix length. */ countWordsWithPrefix(prefix: string): number { let current = this.root; for (const char of prefix) { if (!current.children.has(char)) { return 0; } current = current.children.get(char)!; } return current.prefixCount; }}We've thoroughly explored the trie insert operation—from conceptual model to implementation details to edge cases. Let's consolidate the key takeaways:
What's Next:
Now that we can build tries through insertion, we need to query them. The next page explores the search operation—how to determine if a complete word exists in the trie. We'll see how search follows the same path-walking logic as insert, but with different termination conditions.
You now have a comprehensive understanding of trie insertion. You can implement it with either array or hash map children, analyze its complexity, handle edge cases, and extend it for advanced use cases like frequency counting and prefix statistics. Next: searching for exact matches.