Loading content...
Throughout our exploration of Binary Search Trees, we've implicitly assumed that every key in the tree is unique. This assumption simplifies our understanding of the BST property: for every node, all keys in the left subtree are strictly less than the node's key, and all keys in the right subtree are strictly greater than the node's key.
But what happens when we need to store duplicate keys? This isn't an edge case—it's a fundamental requirement in countless real-world applications:
The moment we introduce duplicates, our clean BST property faces an existential question: Where does a duplicate go? Left? Right? Does it matter?
By the end of this page, you will understand multiple strategies for handling duplicate keys in BSTs, evaluate their tradeoffs in terms of complexity, space, and use cases, and implement production-ready solutions. You'll learn why 'just pick left or right' isn't sufficient and how different strategies impact tree balance, search semantics, and deletion correctness.
Let's precisely define the problem. The standard BST property states:
For every node N with key k:
- All keys in the left subtree of N are < k
- All keys in the right subtree of N are > k
This definition uses strict inequalities. When we insert a duplicate key (a new node with key = k where k already exists), we have several questions to answer:
These questions reveal that duplicate handling isn't a minor detail—it's a design decision that permeates every BST operation.
Simply saying 'put duplicates on the left' or 'put duplicates on the right' without careful analysis leads to subtle bugs. Consider: if duplicates always go right, repeated insertion of the same key creates a linked list hanging off that node—O(n) degeneracy from a single repeated value!
The semantic question:
Beyond implementation, we must consider what our BST means with duplicates:
search(k) return any node with key k, or a specific one?Different applications need different answers. A database index needs to find all matching records. A priority queue only needs to extract one minimum-priority item. A multiset needs accurate counts. Our strategy must match our use case.
The most straightforward approach is to modify our BST property to accommodate duplicates. We have two variants:
Variant A: Duplicates go left
For every node N with key k:
- All keys in the left subtree are ≤ k
- All keys in the right subtree are > k
Variant B: Duplicates go right
For every node N with key k:
- All keys in the left subtree are < k
- All keys in the right subtree are ≥ k
Both variants are valid and maintain the searchability of the BST. The key insight is that we've replaced one strict inequality with a non-strict one, creating a 'direction' for duplicates to flow.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
interface BSTNode<T> { key: number; value: T; left: BSTNode<T> | null; right: BSTNode<T> | null;} /** * Insert with duplicates going right. * BST Property: left < key, right >= key * * This means when we encounter an equal key, we continue * searching in the right subtree until we find the correct * leaf position. */function insertWithDuplicatesRight<T>( root: BSTNode<T> | null, key: number, value: T): BSTNode<T> { if (root === null) { return { key, value, left: null, right: null }; } if (key < root.key) { // Strictly less: go left root.left = insertWithDuplicatesRight(root.left, key, value); } else { // Greater OR EQUAL: go right // This is where duplicates get placed root.right = insertWithDuplicatesRight(root.right, key, value); } return root;} /** * Search for ANY node with the given key. * Returns the first match found (not necessarily the first inserted). */function searchAny<T>( root: BSTNode<T> | null, key: number): BSTNode<T> | null { if (root === null) return null; if (key === root.key) { return root; // Found one occurrence } else if (key < root.key) { return searchAny(root.left, key); } else { return searchAny(root.right, key); }} /** * Find ALL nodes with the given key. * With duplicates-right, after finding a match, more matches * can exist in the right subtree. */function searchAll<T>( root: BSTNode<T> | null, key: number, results: BSTNode<T>[] = []): BSTNode<T>[] { if (root === null) return results; if (key === root.key) { results.push(root); // More duplicates may exist in the right subtree searchAll(root.right, key, results); } else if (key < root.key) { searchAll(root.left, key, results); } else { searchAll(root.right, key, results); } return results;}Analysis of this approach:
| Aspect | Evaluation |
|---|---|
| Insertion | O(h) - straightforward traversal |
| Search (any) | O(h) - stops at first match |
| Search (all) | O(h + k) - where k is number of duplicates |
| Deletion | Complex - must handle chains of duplicates |
| Space | One node per key occurrence |
| Balance | Risk of degenerate chains for repeated keys |
If we insert the same key n times, all n nodes form a chain in one direction. For 'duplicates-right', inserting key=5 ten times creates a linked list of 10 nodes hanging to the right. This transforms our O(log n) tree into O(n) for those operations touching the duplicate chain. In extreme cases, this can devastate performance.
A more space-efficient and balance-friendly approach is to augment each node with a count field that tracks how many times the key appears. Instead of creating new nodes for duplicates, we simply increment the counter.
Modified node structure:
Node {
key: K
value: V // or a list of values if values differ
count: number // number of occurrences of this key
left: Node | null
right: Node | null
}
This approach maintains the original strict BST property (left < root < right) while supporting duplicates through the count mechanism.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
interface CountBSTNode<T> { key: number; values: T[]; // All values associated with this key count: number; // Number of occurrences left: CountBSTNode<T> | null; right: CountBSTNode<T> | null;} /** * Insert with count tracking. * Maintains strict BST property: left < key < right * Duplicates increase the count rather than creating new nodes. */function insertWithCount<T>( root: CountBSTNode<T> | null, key: number, value: T): CountBSTNode<T> { if (root === null) { return { key, values: [value], count: 1, left: null, right: null, }; } if (key === root.key) { // Duplicate found: increment count, store value root.count++; root.values.push(value); } else if (key < root.key) { root.left = insertWithCount(root.left, key, value); } else { root.right = insertWithCount(root.right, key, value); } return root;} /** * Search returns the node with count information. * Caller can see how many duplicates exist. */function searchWithCount<T>( root: CountBSTNode<T> | null, key: number): { found: boolean; count: number; values: T[] } { if (root === null) { return { found: false, count: 0, values: [] }; } if (key === root.key) { return { found: true, count: root.count, values: root.values }; } else if (key < root.key) { return searchWithCount(root.left, key); } else { return searchWithCount(root.right, key); }} /** * Delete one occurrence of a key. * Only removes node when count reaches zero. */function deleteOne<T>( root: CountBSTNode<T> | null, key: number): CountBSTNode<T> | null { if (root === null) return null; if (key < root.key) { root.left = deleteOne(root.left, key); } else if (key > root.key) { root.right = deleteOne(root.right, key); } else { // Found the key if (root.count > 1) { // Multiple occurrences: just decrement root.count--; root.values.pop(); // Remove last value (LIFO) return root; } // count === 1: perform actual node deletion // Standard BST deletion logic follows if (root.left === null) return root.right; if (root.right === null) return root.left; // Two children: find inorder successor let successor = root.right; while (successor.left !== null) { successor = successor.left; } root.key = successor.key; root.values = successor.values; root.count = successor.count; root.right = deleteOne(root.right, successor.key); } return root;}| Operation | Complexity | Notes |
|---|---|---|
| Insert (new key) | O(h) | Standard BST insertion |
| Insert (duplicate) | O(h) | Find node, increment count |
| Search | O(h) | Returns count and all values |
| Delete one | O(h) | Decrement count or remove node |
| Delete all of key | O(h) | Just remove the node |
| Count of key k | O(h) | Direct access via count field |
| Total elements | O(n) | Sum all counts in tree |
The count approach is balance-friendly. Inserting the same key 1,000 times doesn't create 1,000 nodes—it creates one node with count=1000. This means duplicate-heavy workloads don't degrade tree structure. Combined with self-balancing (AVL, Red-Black), this approach guarantees O(log n) operations where n is the number of distinct keys.
When to use the count field approach:
When to avoid:
A natural extension of the count field approach is to store all values associated with a key in a list. This is the most flexible solution and is commonly used in database indexes where a single index key (e.g., 'age=25') maps to multiple records.
Key insight: The BST is indexed by key, but each key maps to a collection of values rather than a single value. The tree structure remains strictly a BST, but the 'value' at each node is a list.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160
interface ListBSTNode<T> { key: number; values: T[]; // All values with this key, preserving insertion order left: ListBSTNode<T> | null; right: ListBSTNode<T> | null;} /** * A BST-backed multimap: keys to lists of values. * Perfect for database indexes on non-unique columns. */class BSTMultiMap<T> { private root: ListBSTNode<T> | null = null; private totalCount = 0; // Total values across all keys /** * Insert a key-value pair. * If key exists, appends value to existing list. */ insert(key: number, value: T): void { this.root = this.insertNode(this.root, key, value); this.totalCount++; } private insertNode( node: ListBSTNode<T> | null, key: number, value: T ): ListBSTNode<T> { if (node === null) { return { key, values: [value], left: null, right: null }; } if (key === node.key) { node.values.push(value); // Append to existing list } else if (key < node.key) { node.left = this.insertNode(node.left, key, value); } else { node.right = this.insertNode(node.right, key, value); } return node; } /** * Get all values associated with a key. * Returns empty array if key not found. */ get(key: number): T[] { const node = this.findNode(this.root, key); return node ? [...node.values] : []; // Return copy } /** * Get the count of values for a specific key. */ count(key: number): number { const node = this.findNode(this.root, key); return node ? node.values.length : 0; } /** * Remove a specific value from a key's list. * Uses deep equality check for objects. */ removeValue(key: number, value: T): boolean { const node = this.findNode(this.root, key); if (!node) return false; const index = node.values.findIndex((v) => this.deepEquals(v, value) ); if (index === -1) return false; node.values.splice(index, 1); this.totalCount--; // If no values remain, remove the node entirely if (node.values.length === 0) { this.root = this.deleteNode(this.root, key); } return true; } /** * Range query: get all values where key is in [minKey, maxKey]. * This is where BST shines over hash tables. */ range(minKey: number, maxKey: number): { key: number; values: T[] }[] { const results: { key: number; values: T[] }[] = []; this.rangeQuery(this.root, minKey, maxKey, results); return results; } private rangeQuery( node: ListBSTNode<T> | null, minKey: number, maxKey: number, results: { key: number; values: T[] }[] ): void { if (node === null) return; // If current key > minKey, left subtree may have valid keys if (node.key > minKey) { this.rangeQuery(node.left, minKey, maxKey, results); } // If current key is in range, include it if (node.key >= minKey && node.key <= maxKey) { results.push({ key: node.key, values: [...node.values] }); } // If current key < maxKey, right subtree may have valid keys if (node.key < maxKey) { this.rangeQuery(node.right, minKey, maxKey, results); } } private findNode( node: ListBSTNode<T> | null, key: number ): ListBSTNode<T> | null { if (node === null) return null; if (key === node.key) return node; if (key < node.key) return this.findNode(node.left, key); return this.findNode(node.right, key); } private deleteNode( node: ListBSTNode<T> | null, key: number ): ListBSTNode<T> | null { if (node === null) return null; if (key < node.key) { node.left = this.deleteNode(node.left, key); } else if (key > node.key) { node.right = this.deleteNode(node.right, key); } else { // Standard BST delete if (!node.left) return node.right; if (!node.right) return node.left; let successor = node.right; while (successor.left) successor = successor.left; node.key = successor.key; node.values = successor.values; node.right = this.deleteNode(node.right, successor.key); } return node; } private deepEquals(a: T, b: T): boolean { return JSON.stringify(a) === JSON.stringify(b); }}This is essentially how database secondary indexes work on non-unique columns. An index on 'customer_age' creates a BST (often a B-tree variant) where each age value maps to a list of record IDs. A query like 'SELECT * WHERE age BETWEEN 25 AND 35' becomes an efficient range query on this structure.
Comparison with Strategy 2 (Count Field):
| Aspect | Count Field | Value List |
|---|---|---|
| Memory per duplicate | Just +1 integer | +1 pointer/value |
| Access individual values | ❌ Not stored | ✅ Preserved |
| Insertion order | ❌ Lost | ✅ Maintained |
| Remove specific value | ❌ Can only decrement | ✅ Find and remove |
| Best for | Multiset (counting) | Multi-map (indexing) |
A hybrid approach chains duplicate nodes together using an additional pointer. Each node with a duplicate key links to the next node with the same key. This preserves individual node identity while avoiding the balance problems of Strategy 1.
Node structure:
Node {
key: K
value: V
left: Node | null
right: Node | null
next: Node | null // Link to next duplicate
}
The BST structure maintains strict ordering (left < root < right), and duplicates form a singly-linked list attached to the first occurrence.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
interface LinkedBSTNode<T> { key: number; value: T; left: LinkedBSTNode<T> | null; right: LinkedBSTNode<T> | null; next: LinkedBSTNode<T> | null; // Chain of duplicates} /** * Insert with linked duplicate chain. * First occurrence is in tree structure; duplicates chain off it. */function insertLinked<T>( root: LinkedBSTNode<T> | null, key: number, value: T): LinkedBSTNode<T> { if (root === null) { return { key, value, left: null, right: null, next: null }; } if (key === root.key) { // Found existing node with same key: chain the duplicate const newNode: LinkedBSTNode<T> = { key, value, left: null, right: null, next: root.next, // Point to current chain }; root.next = newNode; // New node becomes head of chain } else if (key < root.key) { root.left = insertLinked(root.left, key, value); } else { root.right = insertLinked(root.right, key, value); } return root;} /** * Find all values for a key by traversing the chain. */function findAllLinked<T>( root: LinkedBSTNode<T> | null, key: number): T[] { const node = findNodeLinked(root, key); if (!node) return []; const results: T[] = []; let current: LinkedBSTNode<T> | null = node; // Traverse the chain while (current !== null) { results.push(current.value); current = current.next; } return results;} function findNodeLinked<T>( root: LinkedBSTNode<T> | null, key: number): LinkedBSTNode<T> | null { if (root === null) return null; if (key === root.key) return root; if (key < root.key) return findNodeLinked(root.left, key); return findNodeLinked(root.right, key);} /** * Delete one occurrence of a key. * If it's the primary node and has a chain, promote next in chain. */function deleteOneLinked<T>( root: LinkedBSTNode<T> | null, key: number): LinkedBSTNode<T> | null { if (root === null) return null; if (key < root.key) { root.left = deleteOneLinked(root.left, key); return root; } if (key > root.key) { root.right = deleteOneLinked(root.right, key); return root; } // Found the key if (root.next !== null) { // Has duplicates in chain: promote next to current position const promoted = root.next; promoted.left = root.left; promoted.right = root.right; return promoted; } // No duplicates: standard BST delete if (root.left === null) return root.right; if (root.right === null) return root.left; // Two children: find inorder successor let successor = root.right; let successorParent = root; while (successor.left !== null) { successorParent = successor; successor = successor.left; } // Replace current with successor root.key = successor.key; root.value = successor.value; root.next = successor.next; if (successorParent === root) { root.right = successor.right; } else { successorParent.left = successor.right; } return root;}Advantages of the linked chain approach:
Disadvantages:
This approach shines when duplicates are common, each duplicate must be individually addressable, and you need to maintain compatibility with tree-balancing algorithms. It's used in some database implementations where each 'duplicate' entry has its own transaction metadata or version information.
Let's synthesize our strategies into a decision framework. The right choice depends on your specific requirements:
| Strategy | Balance Impact | Space per Dup | Use Case |
|---|---|---|---|
| Left/Right Placement | ❌ Can degrade to O(n) | Full node | Simple, few duplicates |
| Count Field | ✅ No impact | 1 integer | Counting, multiset |
| Value List | ✅ No impact | 1 pointer + value | Multi-map, indexing |
| Linked Chain | ✅ Minimal impact | Full node + 1 ptr | Individual node identity |
Decision flowchart:
Start
│
├─ Do duplicates need individual identity?
│ │
│ ├─ Yes → Do you need to preserve tree balance?
│ │ │
│ │ ├─ Yes → Use Linked Chain (Strategy 4)
│ │ │
│ │ └─ No, duplicates are rare → Use Left/Right (Strategy 1)
│ │
│ └─ No → Is the value different for each duplicate?
│ │
│ ├─ Yes → Use Value List (Strategy 3)
│ │
│ └─ No → Use Count Field (Strategy 2)
For most production systems, start with Strategy 2 (Count Field) or Strategy 3 (Value List). These preserve tree balance, have excellent performance characteristics, and handle the majority of use cases. Only use Strategy 1 for quick prototypes or when duplicates are extremely rare. Use Strategy 4 when individual node identity is truly required.
Understanding duplicate handling isn't just academic—it directly impacts how you design systems. Let's examine concrete applications:
We've thoroughly explored the duplicate handling problem in Binary Search Trees. Let's consolidate the key insights:
What's next:
Now that we understand duplicate handling, we're ready to explore more sophisticated BST extensions. The next page covers Order Statistics Trees—a powerful augmentation that adds rank-based operations to BSTs, enabling queries like 'find the k-th smallest element' in O(log n) time.
You now understand the full spectrum of duplicate handling strategies in BSTs. You can evaluate tradeoffs, implement production-ready solutions, and recognize which strategy fits your application requirements. This foundation prepares you for the augmented BST patterns that follow.