Loading learning content...
The word trie comes from "retrieval"—and that single etymological fact captures the essence of this data structure. A trie is designed from the ground up for one purpose: retrieving strings efficiently.
But here's what makes tries special: they don't just retrieve strings. They retrieve them by their prefixes. This seemingly simple twist unlocks capabilities that other data structures simply cannot match. When you type "prog" into a search bar and instantly see "programming," "progress," and "program," there's a very good chance a trie (or its variant) is doing the heavy lifting behind the scenes.
By the end of this page, you will understand why a trie is called a 'prefix tree,' how it fundamentally differs from other tree structures you've encountered, and why that difference matters for string-intensive applications. You'll develop an intuition for how tries organize data that will make all subsequent operations seem natural and inevitable.
A trie is a tree-based data structure, but it's fundamentally different from the binary search trees (BSTs) you've studied. The key insight is this:
In a BST, each node holds a complete key. In a trie, the key is distributed across a path from root to node.
This distinction is profound. When you store the word "tree" in a BST, the node contains the entire string "tree." When you store "tree" in a trie, there is no single node that holds "tree"—instead, there's a path through nodes labeled 't', 'r', 'e', 'e' that collectively represents the word.
Let's formalize this:
A trie (also called a prefix tree or digital tree) is a k-ary search tree where:
Why "prefix tree" is the perfect name:
Consider storing these words: "car," "card," "care," "careful," "cars," "cat."
In a trie, the path from root spelling "c-a-r" is shared by ALL words starting with "car." That path is the prefix "car." The tree literally organizes itself around prefixes—longer words extend shorter prefixes, and all words with the same prefix share the same path up to that prefix.
This is not a design choice. It's the defining characteristic. A trie is a physical manifestation of prefix relationships in the data.
| Aspect | Binary Search Tree | Trie |
|---|---|---|
| Key storage | Complete key in each node | Key distributed across path |
| Node meaning | Node = one key-value pair | Node = one character/unit |
| Search process | Compare entire key at each node | Match one character per node |
| Branching factor | 2 (binary) | Alphabet size (e.g., 26 for lowercase letters) |
| Key representation | Explicit in node | Implicit in path from root |
| Prefix queries | Requires traversal + comparison | Natural—just follow the path |
In a trie, the root node is special: it represents the empty string (ε). This might seem like a trivial detail, but it's actually fundamental to understanding how tries work.
Think about it: every string has the empty string as a prefix. The string "cat" starts with "", then "c", then "ca", then "cat". By making the root represent the empty string, we establish that every string's path starts at the root—which is exactly what we want for a prefix tree.
This design choice has practical implications:
Here's a powerful mental model: the trie rooted at any node N represents all strings that share the prefix spelled by the path from root to N. The entire trie (rooted at root) represents all strings that share the prefix '' (empty string)—which is all strings. Subtries are just 'filtered' views.
If you've studied hash tables, you might wonder: "Can't I just store strings in a hash set and be done with it?"
For simple membership queries ("Is 'programming' in the dictionary?"), yes—hash tables work beautifully. They give O(1) average-case lookup for exact matches. But the moment you need prefix-aware operations, hash tables fall apart:
The fundamental problem with hashing for prefixes:
Hash functions are designed to spread similar inputs across the hash table. The words "program" and "programming" hash to completely different buckets—they're as unrelated to the hash table as "program" and "elephant."
But these words aren't unrelated. "Programming" is an extension of "program." A data structure for string operations should know this. Tries do. Hash tables don't.
Consider autocomplete: given the prefix "pro", find all matching words. With a hash table containing n strings:
| Data Structure | Algorithm | Time Complexity | For 1 million strings, prefix 'pro' |
|---|---|---|---|
| Hash Table | Iterate all strings, check each for prefix | O(n × p) where p = prefix length | ~1,000,000 comparisons |
| Trie | Navigate to 'p'-'r'-'o' node, collect all descendants | O(p + k) where k = matching strings | 3 steps + only matching results |
Hash tables and tries are not competitors—they solve different problems. Use hash tables for exact-match lookups. Use tries when prefixes matter. The choice should be obvious once you understand what each structure is designed for.
Here's a concept that distinguishes tries from binary trees: the branching factor.
In a binary tree, each node has at most 2 children. In a trie storing lowercase English words, each node can have up to 26 children—one for each letter. This high branching factor is both a superpower and a challenge.
Different alphabets, different tries:
The alphabet size profoundly affects trie design:
This isn't just trivia—it determines whether you use a fixed-size array or a hash map for children, which affects every operation's performance.
For typical English word applications, nodes use fixed-size arrays of 26 pointers. This wastes space for sparse nodes but provides O(1) child access—a trade-off usually worth making. We'll explore alternatives in the node representation module.
For those with a computer science theory background, here's an elegant way to understand tries: a trie is a minimal deterministic finite automaton (DFA) for recognizing a set of strings.
If that sentence made sense to you, the connection is powerful:
If automata theory is new to you, don't worry—here's the intuition:
The recognition process:
Imagine you're a robot reading a string character by character: "c", "a", "r", "e".
This is exactly how trie search works! The trie is a machine for recognizing whether strings belong to your stored set. Each path from root is a sequence of state transitions that recognizes a specific prefix.
The automaton view explains why tries are so efficient for string matching: there's no backtracking. Each character deterministically leads to exactly one next state. You process each character exactly once. This is fundamentally different from algorithms that might need to reconsider earlier decisions.
Here's a remarkable property of tries: the height of a trie is determined by the longest string stored, not the number of strings.
In a balanced BST with n strings, height is O(log n). Store a million strings, and you're looking at ~20 levels of tree.
In a trie storing the same million strings, if the longest word is 15 characters, the trie's height is 15. Period. Whether you store 100 strings or 100 million, if the longest is 15 characters, maximum depth is 15.
| Strings Stored | Longest String | Trie Height | Balanced BST Height |
|---|---|---|---|
| 1,000 | 10 chars | 10 | ~10 |
| 10,000 | 10 chars | 10 | ~13 |
| 100,000 | 10 chars | 10 | ~17 |
| 1,000,000 | 10 chars | 10 | ~20 |
| 100,000,000 | 10 chars | 10 | ~27 |
What this means for performance:
Lookup time in a trie is O(m), where m is the length of the search string—completely independent of how many strings are stored. This is powerful:
The BST, meanwhile, requires O(log n) comparisons, and each comparison is itself O(m) for string comparison—giving O(m log n) total. The trie's O(m) wins as n grows, and wins decisively.
A trie's lookup time depends only on the query string length, not on the dataset size. This makes tries ideal for scenarios with massive datasets and bounded-length strings—like word dictionaries, IP addresses, or short codes.
Tries are not memory-free. In fact, a naive trie implementation can be a memory hog. Understanding the space trade-offs is essential for making informed decisions about when to use tries.
The optimistic case: lots of shared prefixes
If your strings share many prefixes, tries are space-efficient. Consider storing all English words starting with "pre-": "precise," "predict," "premium," "prepare," etc. The prefix "p-r-e" is stored once, shared by all these words. The more sharing, the less memory.
The pessimistic case: no shared prefixes
If your strings are random with no shared prefixes, a trie degenerates into storing each string along a separate path. Worse, each node might allocate space for 26 child pointers while only using 1. Memory explodes.
Rough memory estimation:
For a trie with n nodes, where each node uses an array of 26 pointers (8 bytes each on 64-bit) plus a boolean end-of-word flag:
For 1 million nodes: ~209 MB. That's significant!
With hash-map children, you only store actual edges. If average out-degree is 3:
A 7× memory reduction—but with slower child access (O(1) array vs O(1) amortized map, but with higher constants).
Don't default to tries without considering memory. For small datasets or memory-constrained environments, a sorted array with binary search might be simpler and more efficient. Tries shine for large, prefix-heavy datasets where the query pattern justifies the memory cost.
Now that you understand what a trie is, let's establish when it's the right choice. This decision-making framework will serve you throughout your career:
Here's a simple heuristic: if your use case involves 'typing the first few characters to find matches,' you probably want a trie. The structure is literally designed for that workflow—finding all strings that start with what the user has typed so far.
Let's consolidate everything into a clear mental model you can carry forward:
What's next:
You now understand the trie's identity as a prefix tree. The next page dives into one of the most powerful properties this creates: shared prefixes mean shared paths. We'll see exactly how multiple strings overlay onto the same structure, and why this enables tries to compress redundant information naturally.
You've mastered the core concept: a trie is a tree where strings are stored as paths, organized by their prefixes. This single insight—'prefix tree'—explains every trie operation you'll learn. Next, we'll see the elegance of shared prefixes in action.