Data Structures & AlgorithmsHash Tables

Common Hashing Patterns & Problem Types

LevelIntermediate

Duration75 mins

TopicHash Tables

3 / 5

Duplicate Detection

The Uniqueness Problem

Duplicate detection is one of the most practical hashing patterns you'll encounter. It's not glamorous, but it's everywhere:

Data Validation: Does this email already exist in our database?
Deduplication: Which files are duplicates in this backup?
Fraud Detection: Has this transaction ID been processed before?
Content Moderation: Is this image a repost?
Distributed Systems: Have we already processed this event?

The core question is simple: Have we seen this before?

Without hash tables, answering this question for n items requires O(n²) time—checking every pair. With hash tables, it's O(n) time. For datasets with billions of records, this isn't just an optimization—it's the difference between possible and impossible.

What You Will Master

By the end of this page, you will understand duplicate detection in all its forms: finding any duplicate, finding all duplicates, finding the first duplicate, detecting near-duplicates, and handling duplicate detection at scale. You'll learn both the algorithmic patterns and the production considerations.

Basic Duplicate Detection

The simplest duplicate detection asks: Does this collection contain any duplicates?

The hash-based solution uses a set—the fundamental data structure for tracking membership:

Algorithm:

Create an empty set
For each element:
- If element is in set → duplicate found!
- Otherwise, add element to set
If we finish without finding a duplicate → no duplicates

This is sometimes called the "seen set" pattern.

contains_duplicate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/**
 * Contains Duplicate: Check if any element appears twice
 * Time: O(n) average, O(n²) worst case (hash collision)
 * Space: O(n)
 */
function containsDuplicate<T>(elements: T[]): boolean {
    const seen = new Set<T>();
    
    for (const element of elements) {
        if (seen.has(element)) {
            return true; // Duplicate found!
        }
        seen.add(element);
    }
    
    return false; // No duplicates
}
 
// Alternative: One-liner using Set size comparison
// If set size < array length, duplicates must exist
function containsDuplicateOneLiner<T>(elements: T[]): boolean {
    return new Set(elements).size !== elements.length;
}
 
// Examples
console.log(containsDuplicate([1, 2, 3, 1]));    // true
console.log(containsDuplicate([1, 2, 3, 4]));    // false
console.log(containsDuplicate([]));              // false (empty array)
console.log(containsDuplicate(["a", "b", "a"])); // true (works with strings)

Set.add() Optimization

In Java and some other languages, Set.add() returns false if the element already exists. This combines the 'check' and 'add' into a single atomic operation—cleaner code and sometimes better performance due to a single hash lookup instead of two.

Finding All Duplicates

Sometimes we need more than a boolean—we need to identify which elements are duplicates. This requires tracking counts or using multiple sets.

Variant 1: Find All Duplicate Elements

Return a list of all elements that appear more than once:

find_all_duplicates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/**
 * Find All Duplicate Elements
 * Returns each duplicate value exactly once
 * Time: O(n), Space: O(n)
 */
function findAllDuplicates<T>(elements: T[]): T[] {
    const seen = new Set<T>();
    const duplicates = new Set<T>();
    
    for (const element of elements) {
        if (seen.has(element)) {
            duplicates.add(element); // Already seen → it's a duplicate
        } else {
            seen.add(element);
        }
    }
    
    return [...duplicates];
}
 
/**
 * Alternative: Using frequency map
 * More flexible if you need counts later
 */
function findAllDuplicatesWithCounts<T>(elements: T[]): Map<T, number> {
    const freq = new Map<T, number>();
    
    // Count all occurrences
    for (const element of elements) {
        freq.set(element, (freq.get(element) ?? 0) + 1);
    }
    
    // Filter to only duplicates
    const duplicates = new Map<T, number>();
    for (const [key, count] of freq) {
        if (count > 1) {
            duplicates.set(key, count);
        }
    }
    
    return duplicates;
}
 
// Example
console.log(findAllDuplicates([1, 2, 3, 2, 4, 3, 5])); // [2, 3]
console.log(findAllDuplicatesWithCounts([1, 2, 3, 2, 4, 3, 3, 5])); 
// Map { 2 => 2, 3 => 3 }

Variant 2: Find Duplicates in Constrained Space

A classic interview problem: Given an array of n integers where each integer is in the range [1, n], find duplicates using O(1) extra space.

The trick: use the array itself as a hash table by negating values at indices.

find_duplicates_constant_space
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/**
 * Find Duplicates with O(1) Extra Space
 * 
 * Constraint: Values are in range [1, n] for array of length n
 * 
 * Key insight: Use value as index to mark "seen" by negating
 * If we encounter an already-negative value, it's a duplicate
 * 
 * Time: O(n), Space: O(1) (modifies input array in-place)
 */
function findDuplicatesConstantSpace(nums: number[]): number[] {
    const duplicates: number[] = [];
    
    for (let i = 0; i < nums.length; i++) {
        // Get the index corresponding to this value
        // Use absolute value because we might have negated it
        const index = Math.abs(nums[i]) - 1; // -1 because 1-indexed values, 0-indexed array
        
        if (nums[index] < 0) {
            // Already visited → this value is a duplicate
            duplicates.push(Math.abs(nums[i]));
        } else {
            // Mark as visited by negating
            nums[index] = -nums[index];
        }
    }
    
    // Optional: restore the array
    for (let i = 0; i < nums.length; i++) {
        nums[i] = Math.abs(nums[i]);
    }
    
    return duplicates;
}
 
// Example walkthrough: nums = [4, 3, 2, 7, 8, 2, 3, 1]
// i=0: value=4, index=3, nums[3]=7>0 → negate → [4, 3, 2, -7, 8, 2, 3, 1]
// i=1: value=3, index=2, nums[2]=2>0 → negate → [4, 3, -2, -7, 8, 2, 3, 1]
// i=2: value=2, index=1, nums[1]=3>0 → negate → [4, -3, -2, -7, 8, 2, 3, 1]
// i=3: value=7, index=6, nums[6]=3>0 → negate → [4, -3, -2, -7, 8, 2, -3, 1]
// i=4: value=8, index=7, nums[7]=1>0 → negate → [4, -3, -2, -7, 8, 2, -3, -1]
// i=5: value=2, index=1, nums[1]=-3<0 → duplicate found! → duplicates = [2]
// i=6: value=3, index=2, nums[2]=-2<0 → duplicate found! → duplicates = [2, 3]
// i=7: value=1, index=0, nums[0]=4>0 → negate → [-4, -3, -2, -7, 8, 2, -3, -1]
 
console.log(findDuplicatesConstantSpace([4, 3, 2, 7, 8, 2, 3, 1])); // [2, 3]

When to Use In-Place Techniques

The constant-space technique modifies the input array, which may not be acceptable in production code. Use it when: • Memory is critically constrained • Input can be modified (or you restore it afterward) • Values are in the required range [1, n]

For general-purpose code, the O(n) space hash set is usually preferred for clarity.

First and K-th Duplicate

Sometimes the position of duplicates matters—not just whether they exist.

Find First Duplicate (by second occurrence)

Return the first element that appears for the second time as we scan left to right:

first_duplicate
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**
 * Find First Duplicate: Return value of first repeated element
 * "First" = earliest second occurrence while scanning left to right
 * 
 * Time: O(n), Space: O(n)
 */
function findFirstDuplicate<T>(elements: T[]): T | undefined {
    const seen = new Set<T>();
    
    for (const element of elements) {
        if (seen.has(element)) {
            return element; // First duplicate found!
        }
        seen.add(element);
    }
    
    return undefined;
}
 
/**
 * Find First Duplicate with Index
 * Return the index of the second occurrence
 */
function findFirstDuplicateIndex<T>(elements: T[]): number {
    const seen = new Set<T>();
    
    for (let i = 0; i < elements.length; i++) {
        if (seen.has(elements[i])) {
            return i; // Index of second occurrence
        }
        seen.add(elements[i]);
    }
    
    return -1;
}
 
/**
 * Find First Duplicate with Both Indices
 * Return indices of both occurrences
 */
function findFirstDuplicatePair<T>(elements: T[]): [number, number] | null {
    const firstOccurrence = new Map<T, number>();
    
    for (let i = 0; i < elements.length; i++) {
        if (firstOccurrence.has(elements[i])) {
            return [firstOccurrence.get(elements[i])!, i];
        }
        firstOccurrence.set(elements[i], i);
    }
    
    return null;
}
 
// Examples
console.log(findFirstDuplicate([2, 1, 3, 5, 3, 2]));       // 3 (3 repeats first)
console.log(findFirstDuplicateIndex([2, 1, 3, 5, 3, 2])); // 4 (index of second 3)
console.log(findFirstDuplicatePair([2, 1, 3, 5, 3, 2]));  // [2, 4] (indices of both 3s)

Find First Duplicate (by first occurrence of duplicate value)

A different interpretation: return the element whose first occurrence is earliest among all duplicates:

first_duplicate_by_earliest
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from typing import TypeVar, List, Optional
 
T = TypeVar('T')
 
def find_earliest_duplicate(elements: List[T]) -> Optional[T]:
    """
    Find duplicate whose FIRST occurrence is earliest.
    
    Example: [2, 1, 3, 5, 3, 2]
    - 3's first occurrence: index 2
    - 2's first occurrence: index 0
    - Answer: 2 (even though 3 repeats first)
    
    Time: O(n), Space: O(n)
    """
    first_occurrence = {}
    duplicates = set()
    
    for i, element in enumerate(elements):
        if element in first_occurrence:
            duplicates.add(element)
        else:
            first_occurrence[element] = i
    
    if not duplicates:
        return None
    
    # Find duplicate with earliest first occurrence
    earliest = min(duplicates, key=lambda x: first_occurrence[x])
    return earliest
 
 
# Example
print(find_earliest_duplicate([2, 1, 3, 5, 3, 2]))  # 2 (first duplicate by first occurrence)
 
 
def find_kth_duplicate(elements: List[T], k: int) -> Optional[T]:
    """
    Find the k-th distinct duplicate value (1-indexed).
    
    Example: [1, 2, 3, 2, 1, 4, 3, 5, 1], k=2
    - First duplicate detected: 2 (at index 3)
    - Second duplicate detected: 1 (at index 4)
    - Answer: 1
    """
    seen = set()
    duplicates_found = []
    
    for element in elements:
        if element in seen and element not in duplicates_found:
            duplicates_found.append(element)
            if len(duplicates_found) == k:
                return element
        seen.add(element)
    
    return None  # Fewer than k duplicates
 
 
# Example
print(find_kth_duplicate([1, 2, 3, 2, 1, 4, 3, 5, 1], 2))  # 1

Duplicates Within Distance K

A common variant asks: Are there duplicates within a sliding window of size k?

This is useful for detecting suspicious patterns—like multiple login attempts from the same IP within a short time window.

Classic Problem: Nearby Duplicate

Given an array and integer k, return true if there are two distinct indices i and j such that nums[i] == nums[j] and |i - j| <= k.

contains_duplicate_ii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
/**
 * Contains Duplicate II: Duplicates within distance k
 * 
 * Approach 1: Sliding Window Set - O(n) time, O(k) space
 * Maintain a set of the last k elements
 */
function containsNearbyDuplicate(nums: number[], k: number): boolean {
    const window = new Set<number>();
    
    for (let i = 0; i < nums.length; i++) {
        // Check if current element is in the window
        if (window.has(nums[i])) {
            return true;
        }
        
        // Add current element to window
        window.add(nums[i]);
        
        // Remove element that's now outside the window
        if (window.size > k) {
            window.delete(nums[i - k]);
        }
    }
    
    return false;
}
 
/**
 * Approach 2: Hash Map with Index - O(n) time, O(n) space
 * Store the most recent index of each value
 */
function containsNearbyDuplicateMap(nums: number[], k: number): boolean {
    const lastIndex = new Map<number, number>();
    
    for (let i = 0; i < nums.length; i++) {
        if (lastIndex.has(nums[i])) {
            const prevIndex = lastIndex.get(nums[i])!;
            if (i - prevIndex <= k) {
                return true;
            }
        }
        lastIndex.set(nums[i], i);
    }
    
    return false;
}
 
// Example walkthrough: nums = [1, 2, 3, 1], k = 3
// Sliding window approach:
// i=0: window = {} → add 1 → {1}
// i=1: window = {1} → 2 not in window → add 2 → {1, 2}
// i=2: window = {1, 2} → 3 not in window → add 3 → {1, 2, 3}
// i=3: window = {1, 2, 3} → 1 IS in window → return true!
 
console.log(containsNearbyDuplicate([1, 2, 3, 1], 3));    // true
console.log(containsNearbyDuplicate([1, 2, 3, 1, 2, 3], 2)); // false

Classic Problem: Nearby Almost Duplicate

A more complex variant: check if there are two distinct indices i and j such that |nums[i] - nums[j]| <= t and |i - j| <= k.

This requires both proximity in position AND proximity in value. The bucket-based solution is elegant:

contains_duplicate_iii
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
 * Contains Duplicate III: Nearby Almost Duplicate
 * |nums[i] - nums[j]| <= t AND |i - j| <= k
 * 
 * Key insight: Use buckets of size (t+1)
 * - Numbers in same bucket differ by at most t
 * - Numbers in adjacent buckets might differ by at most t
 * 
 * Time: O(n), Space: O(k)
 */
function containsNearbyAlmostDuplicate(
    nums: number[], 
    k: number, 
    t: number
): boolean {
    if (k <= 0 || t < 0) return false;
    
    const buckets = new Map<number, number>();
    const bucketSize = t + 1; // +1 to avoid division by zero when t=0
    
    // Helper: Get bucket ID for a number
    // Handle negative numbers correctly
    const getBucketId = (num: number): number => {
        return num >= 0 
            ? Math.floor(num / bucketSize)
            : Math.floor((num + 1) / bucketSize) - 1;
    };
    
    for (let i = 0; i < nums.length; i++) {
        const bucketId = getBucketId(nums[i]);
        
        // Check same bucket (guaranteed |diff| <= t)
        if (buckets.has(bucketId)) {
            return true;
        }
        
        // Check adjacent buckets (might have |diff| <= t)
        if (buckets.has(bucketId - 1)) {
            if (Math.abs(nums[i] - buckets.get(bucketId - 1)!) <= t) {
                return true;
            }
        }
        if (buckets.has(bucketId + 1)) {
            if (Math.abs(nums[i] - buckets.get(bucketId + 1)!) <= t) {
                return true;
            }
        }
        
        // Add current number to its bucket
        buckets.set(bucketId, nums[i]);
        
        // Remove element outside window
        if (i >= k) {
            const oldBucketId = getBucketId(nums[i - k]);
            buckets.delete(oldBucketId);
        }
    }
    
    return false;
}
 
// Example: nums = [1, 5, 9, 1, 5, 9], k = 2, t = 3
// Bucket size = 4 (so 0-3 in bucket 0, 4-7 in bucket 1, etc.)
// i=0: 1 → bucket 0 → buckets = {0: 1}
// i=1: 5 → bucket 1 → check bucket 0: |5-1|=4 > 3 → buckets = {0: 1, 1: 5}
// i=2: 9 → bucket 2 → check bucket 1: |9-5|=4 > 3 → buckets = {0: 1, 1: 5, 2: 9}
// Remove bucket for nums[0]=1 → buckets = {1: 5, 2: 9}
// i=3: 1 → bucket 0 → no adjacent match → buckets = {0: 1, 1: 5, 2: 9}
// ... continues
 
console.log(containsNearbyAlmostDuplicate([1, 2, 1, 1], 1, 0)); // true

Real-World Considerations

Production duplicate detection involves challenges beyond the basic algorithm.

Challenge 1: Near-Duplicates

'John Smith' and 'JOHN SMITH' are duplicates, but exact matching won't find them. Solutions:

Normalize before checking (lowercase, trim whitespace)
Use fuzzy matching (Levenshtein distance)
Apply domain-specific canonicalization

near_duplicate_detection
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/**
 * Near-Duplicate Detection with Normalization
 */
function findNearDuplicates(strings: string[]): Map<string, string[]> {
    const groups = new Map<string, string[]>();
    
    for (const str of strings) {
        // Normalize: lowercase, remove extra whitespace, etc.
        const normalized = str.toLowerCase().trim().replace(/\s+/g, ' ');
        
        if (!groups.has(normalized)) {
            groups.set(normalized, []);
        }
        groups.get(normalized)!.push(str);
    }
    
    // Return only groups with duplicates
    return new Map(
        [...groups.entries()].filter(([_, values]) => values.length > 1)
    );
}
 
// Example
const names = [
    "John Smith",
    "JOHN SMITH",
    "John  Smith", // Extra space
    "Jane Doe",
    "jane doe"
];
 
console.log(findNearDuplicates(names));
// Map {
//   "john smith" => ["John Smith", "JOHN SMITH", "John  Smith"],
//   "jane doe" => ["Jane Doe", "jane doe"]
// }

Challenge 2: Scale and Memory

For billions of records, keeping all values in memory isn't feasible. Solutions:

Bloom Filters: Probabilistic data structure that can say "definitely not seen" or "probably seen" with tunable false-positive rate
External Sorting: Sort data on disk, then linear scan for adjacent duplicates
Distributed Hashing: Partition data by hash, check duplicates within partitions

bloom_filter_concept
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/**
 * Conceptual Bloom Filter for Duplicate Detection
 * 
 * A Bloom filter uses k hash functions and a bit array
 * - Adding: Set bits at positions hash1(x), hash2(x), ..., hashk(x)
 * - Checking: If ALL bits at hash positions are set → might be present
 *            If ANY bit is 0 → definitely not present
 * 
 * Trade-off: Small memory, but false positives possible
 */
class SimpleBloomFilter {
    private bits: boolean[];
    private size: number;
    
    constructor(size: number = 1000) {
        this.size = size;
        this.bits = new Array(size).fill(false);
    }
    
    // Simplified: using two hash functions
    private hash1(value: string): number {
        let hash = 0;
        for (const char of value) {
            hash = (hash * 31 + char.charCodeAt(0)) % this.size;
        }
        return hash;
    }
    
    private hash2(value: string): number {
        let hash = 0;
        for (const char of value) {
            hash = (hash * 37 + char.charCodeAt(0)) % this.size;
        }
        return hash;
    }
    
    add(value: string): void {
        this.bits[this.hash1(value)] = true;
        this.bits[this.hash2(value)] = true;
    }
    
    mightContain(value: string): boolean {
        return this.bits[this.hash1(value)] && this.bits[this.hash2(value)];
    }
}
 
// Usage for deduplication with space constraint
function deduplicateWithBloom(
    stream: Iterable<string>
): string[] {
    const bloom = new SimpleBloomFilter(10000);
    const result: string[] = [];
    
    for (const item of stream) {
        if (!bloom.mightContain(item)) {
            // Definitely new
            result.push(item);
            bloom.add(item);
        }
        // If mightContain returns true, it MIGHT be duplicate (could be false positive)
        // For guaranteed deduplication, combine with exact check for "maybe" cases
    }
    
    return result;
}

Challenge 3: Distributed Systems

In distributed systems, duplicate detection becomes the problem of exactly-once delivery:

Idempotency Keys: Clients include unique IDs; servers track processed IDs
Deduplication Windows: Keep recent IDs in memory, discard older ones
Content-Addressable Storage: Use content hash as key; duplicates automatically map to same key

The CAP Theorem Connection

In distributed systems, the "have we seen this before?" question is hard because of network partitions. Different nodes might disagree. Solutions typically trade off: • Accuracy (false positives/negatives) • Latency (how long to wait for consensus) • Memory (how much state to keep)

Common Interview Problems

Let's examine some classic duplicate-related interview problems that go beyond basic detection.

Problem 1: Single Number

Every element appears twice except one. Find that single element.

This can be solved with a hash set, but there's an elegant O(1) space solution using XOR:

single_number
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
 * Single Number: One element appears once, all others appear twice
 * 
 * Hash Set Approach: O(n) time, O(n) space
 */
function singleNumberSet(nums: number[]): number {
    const set = new Set<number>();
    
    for (const num of nums) {
        if (set.has(num)) {
            set.delete(num); // Seen twice = remove
        } else {
            set.add(num);    // First occurrence = add
        }
    }
    
    return [...set][0]; // Only one element remains
}
 
/**
 * XOR Approach: O(n) time, O(1) space
 * 
 * Key insight: a XOR a = 0 and a XOR 0 = a
 * So duplicates cancel out, leaving the single number
 */
function singleNumberXOR(nums: number[]): number {
    let result = 0;
    for (const num of nums) {
        result ^= num;
    }
    return result;
}
 
// Example: [4, 1, 2, 1, 2]
// 4 XOR 1 XOR 2 XOR 1 XOR 2
// = (1 XOR 1) XOR (2 XOR 2) XOR 4
// = 0 XOR 0 XOR 4
// = 4
 
console.log(singleNumberXOR([4, 1, 2, 1, 2])); // 4

Problem 2: Find Missing Number

Given an array containing n distinct numbers in range [0, n], find the missing one.

missing_number
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from typing import List
 
def missing_number_set(nums: List[int]) -> int:
    """
    Hash Set Approach: O(n) time, O(n) space
    """
    all_nums = set(range(len(nums) + 1))
    present = set(nums)
    missing = all_nums - present
    return missing.pop()
 
 
def missing_number_xor(nums: List[int]) -> int:
    """
    XOR Approach: O(n) time, O(1) space
    
    XOR all numbers 0 to n with all numbers in array.
    Duplicates cancel, leaving the missing number.
    """
    result = len(nums)  # Start with n (since range is [0, n])
    
    for i, num in enumerate(nums):
        result ^= i ^ num
    
    return result
 
 
def missing_number_math(nums: List[int]) -> int:
    """
    Mathematical Approach: O(n) time, O(1) space
    
    Sum(0 to n) - Sum(nums) = missing number
    """
    n = len(nums)
    expected_sum = n * (n + 1) // 2
    actual_sum = sum(nums)
    return expected_sum - actual_sum
 
 
# Example: [3, 0, 1] → missing 2
print(missing_number_xor([3, 0, 1]))  # 2

Problem 3: Find All Numbers Disappeared in an Array

Given an array of integers where 1 ≤ a[i] ≤ n, find all integers in [1, n] that don't appear.

find_disappeared
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/**
 * Find Disappeared Numbers
 * 
 * Hash Set Approach: O(n) time, O(n) space
 */
function findDisappearedNumbersSet(nums: number[]): number[] {
    const present = new Set(nums);
    const missing: number[] = [];
    
    for (let i = 1; i <= nums.length; i++) {
        if (!present.has(i)) {
            missing.push(i);
        }
    }
    
    return missing;
}
 
/**
 * In-Place Marking: O(n) time, O(1) space
 * 
 * Use negation to mark indices as "seen"
 */
function findDisappearedNumbersInPlace(nums: number[]): number[] {
    // Mark indices corresponding to present values
    for (let i = 0; i < nums.length; i++) {
        const index = Math.abs(nums[i]) - 1;
        if (nums[index] > 0) {
            nums[index] = -nums[index];
        }
    }
    
    // Indices still positive are missing
    const missing: number[] = [];
    for (let i = 0; i < nums.length; i++) {
        if (nums[i] > 0) {
            missing.push(i + 1);
        }
    }
    
    // Restore (optional)
    for (let i = 0; i < nums.length; i++) {
        nums[i] = Math.abs(nums[i]);
    }
    
    return missing;
}
 
console.log(findDisappearedNumbersInPlace([4, 3, 2, 7, 8, 2, 3, 1])); 
// [5, 6]

Summary: Duplicate Detection Mastery

Duplicate detection, while conceptually simple, underpins critical systems across computing. Let's consolidate the key insights:

Key Takeaways

•Set-based detection is the standard — O(n) time with clear, simple code
•In-place techniques trade clarity for space — Use when memory is constrained and values are in range [1, n]
•Sliding window handles distance constraints — Maintain a window set of size k for nearby duplicates
•Near-duplicates require normalization — Canonicalize before comparing
•Scale requires probabilistic structures — Bloom filters for space-efficient approximate checking
•XOR is powerful for exact-pair duplicates — O(1) space when every element appears exactly twice (or is missing)

Duplicate Detection Quick Reference
Problem Variant	Approach	Time	Space
Contains any duplicate	Hash Set	O(n)	O(n)
Find all duplicates	Set + Set or Map	O(n)	O(n)
Duplicates within k	Sliding Window Set	O(n)	O(min(n,k))
Almost duplicates	Bucket Sort	O(n)	O(k)
In-place (values 1-n)	Index negation	O(n)	O(1)
Single non-duplicate	XOR	O(n)	O(1)

Page Complete

You now have a comprehensive understanding of duplicate detection—from basic membership checking to advanced distance-constrained and probabilistic methods. This pattern appears everywhere from data validation to distributed systems. Next, we'll explore grouping by characteristic.

3 / 5

Loading learning content...

Data Structures & AlgorithmsHash Tables

Common Hashing Patterns & Problem Types

LevelIntermediate

Duration75 mins

TopicHash Tables

3 / 5

Duplicate Detection

The Uniqueness Problem

Duplicate detection is one of the most practical hashing patterns you'll encounter. It's not glamorous, but it's everywhere:

Data Validation: Does this email already exist in our database?
Deduplication: Which files are duplicates in this backup?
Fraud Detection: Has this transaction ID been processed before?
Content Moderation: Is this image a repost?
Distributed Systems: Have we already processed this event?

The core question is simple: Have we seen this before?

What You Will Master

Basic Duplicate Detection

The simplest duplicate detection asks: Does this collection contain any duplicates?

The hash-based solution uses a set—the fundamental data structure for tracking membership:

Algorithm:

Create an empty set
For each element:
- If element is in set → duplicate found!
- Otherwise, add element to set
If we finish without finding a duplicate → no duplicates

This is sometimes called the "seen set" pattern.

contains_duplicate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/**
 * Contains Duplicate: Check if any element appears twice
 * Time: O(n) average, O(n²) worst case (hash collision)
 * Space: O(n)
 */
function containsDuplicate<T>(elements: T[]): boolean {
    const seen = new Set<T>();
    
    for (const element of elements) {
        if (seen.has(element)) {
            return true; // Duplicate found!
        }
        seen.add(element);
    }
    
    return false; // No duplicates
}
 
// Alternative: One-liner using Set size comparison
// If set size < array length, duplicates must exist
function containsDuplicateOneLiner<T>(elements: T[]): boolean {
    return new Set(elements).size !== elements.length;
}
 
// Examples
console.log(containsDuplicate([1, 2, 3, 1]));    // true
console.log(containsDuplicate([1, 2, 3, 4]));    // false
console.log(containsDuplicate([]));              // false (empty array)
console.log(containsDuplicate(["a", "b", "a"])); // true (works with strings)

Set.add() Optimization

Finding All Duplicates

Sometimes we need more than a boolean—we need to identify which elements are duplicates. This requires tracking counts or using multiple sets.

Variant 1: Find All Duplicate Elements

Return a list of all elements that appear more than once:

find_all_duplicates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/**
 * Find All Duplicate Elements
 * Returns each duplicate value exactly once
 * Time: O(n), Space: O(n)
 */
function findAllDuplicates<T>(elements: T[]): T[] {
    const seen = new Set<T>();
    const duplicates = new Set<T>();
    
    for (const element of elements) {
        if (seen.has(element)) {
            duplicates.add(element); // Already seen → it's a duplicate
        } else {
            seen.add(element);
        }
    }
    
    return [...duplicates];
}
 
/**
 * Alternative: Using frequency map
 * More flexible if you need counts later
 */
function findAllDuplicatesWithCounts<T>(elements: T[]): Map<T, number> {
    const freq = new Map<T, number>();
    
    // Count all occurrences
    for (const element of elements) {
        freq.set(element, (freq.get(element) ?? 0) + 1);
    }
    
    // Filter to only duplicates
    const duplicates = new Map<T, number>();
    for (const [key, count] of freq) {
        if (count > 1) {
            duplicates.set(key, count);
        }
    }
    
    return duplicates;
}
 
// Example
console.log(findAllDuplicates([1, 2, 3, 2, 4, 3, 5])); // [2, 3]
console.log(findAllDuplicatesWithCounts([1, 2, 3, 2, 4, 3, 3, 5])); 
// Map { 2 => 2, 3 => 3 }

Variant 2: Find Duplicates in Constrained Space

A classic interview problem: Given an array of n integers where each integer is in the range [1, n], find duplicates using O(1) extra space.

The trick: use the array itself as a hash table by negating values at indices.

find_duplicates_constant_space
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/**
 * Find Duplicates with O(1) Extra Space
 * 
 * Constraint: Values are in range [1, n] for array of length n
 * 
 * Key insight: Use value as index to mark "seen" by negating
 * If we encounter an already-negative value, it's a duplicate
 * 
 * Time: O(n), Space: O(1) (modifies input array in-place)
 */
function findDuplicatesConstantSpace(nums: number[]): number[] {
    const duplicates: number[] = [];
    
    for (let i = 0; i < nums.length; i++) {
        // Get the index corresponding to this value
        // Use absolute value because we might have negated it
        const index = Math.abs(nums[i]) - 1; // -1 because 1-indexed values, 0-indexed array
        
        if (nums[index] < 0) {
            // Already visited → this value is a duplicate
            duplicates.push(Math.abs(nums[i]));
        } else {
            // Mark as visited by negating
            nums[index] = -nums[index];
        }
    }
    
    // Optional: restore the array
    for (let i = 0; i < nums.length; i++) {
        nums[i] = Math.abs(nums[i]);
    }
    
    return duplicates;
}
 
// Example walkthrough: nums = [4, 3, 2, 7, 8, 2, 3, 1]
// i=0: value=4, index=3, nums[3]=7>0 → negate → [4, 3, 2, -7, 8, 2, 3, 1]
// i=1: value=3, index=2, nums[2]=2>0 → negate → [4, 3, -2, -7, 8, 2, 3, 1]
// i=2: value=2, index=1, nums[1]=3>0 → negate → [4, -3, -2, -7, 8, 2, 3, 1]
// i=3: value=7, index=6, nums[6]=3>0 → negate → [4, -3, -2, -7, 8, 2, -3, 1]
// i=4: value=8, index=7, nums[7]=1>0 → negate → [4, -3, -2, -7, 8, 2, -3, -1]
// i=5: value=2, index=1, nums[1]=-3<0 → duplicate found! → duplicates = [2]
// i=6: value=3, index=2, nums[2]=-2<0 → duplicate found! → duplicates = [2, 3]
// i=7: value=1, index=0, nums[0]=4>0 → negate → [-4, -3, -2, -7, 8, 2, -3, -1]
 
console.log(findDuplicatesConstantSpace([4, 3, 2, 7, 8, 2, 3, 1])); // [2, 3]

When to Use In-Place Techniques

For general-purpose code, the O(n) space hash set is usually preferred for clarity.

First and K-th Duplicate

Sometimes the position of duplicates matters—not just whether they exist.

Find First Duplicate (by second occurrence)

Return the first element that appears for the second time as we scan left to right:

first_duplicate
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**
 * Find First Duplicate: Return value of first repeated element
 * "First" = earliest second occurrence while scanning left to right
 * 
 * Time: O(n), Space: O(n)
 */
function findFirstDuplicate<T>(elements: T[]): T | undefined {
    const seen = new Set<T>();
    
    for (const element of elements) {
        if (seen.has(element)) {
            return element; // First duplicate found!
        }
        seen.add(element);
    }
    
    return undefined;
}
 
/**
 * Find First Duplicate with Index
 * Return the index of the second occurrence
 */
function findFirstDuplicateIndex<T>(elements: T[]): number {
    const seen = new Set<T>();
    
    for (let i = 0; i < elements.length; i++) {
        if (seen.has(elements[i])) {
            return i; // Index of second occurrence
        }
        seen.add(elements[i]);
    }
    
    return -1;
}
 
/**
 * Find First Duplicate with Both Indices
 * Return indices of both occurrences
 */
function findFirstDuplicatePair<T>(elements: T[]): [number, number] | null {
    const firstOccurrence = new Map<T, number>();
    
    for (let i = 0; i < elements.length; i++) {
        if (firstOccurrence.has(elements[i])) {
            return [firstOccurrence.get(elements[i])!, i];
        }
        firstOccurrence.set(elements[i], i);
    }
    
    return null;
}
 
// Examples
console.log(findFirstDuplicate([2, 1, 3, 5, 3, 2]));       // 3 (3 repeats first)
console.log(findFirstDuplicateIndex([2, 1, 3, 5, 3, 2])); // 4 (index of second 3)
console.log(findFirstDuplicatePair([2, 1, 3, 5, 3, 2]));  // [2, 4] (indices of both 3s)

Find First Duplicate (by first occurrence of duplicate value)

A different interpretation: return the element whose first occurrence is earliest among all duplicates:

first_duplicate_by_earliest
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from typing import TypeVar, List, Optional
 
T = TypeVar('T')
 
def find_earliest_duplicate(elements: List[T]) -> Optional[T]:
    """
    Find duplicate whose FIRST occurrence is earliest.
    
    Example: [2, 1, 3, 5, 3, 2]
    - 3's first occurrence: index 2
    - 2's first occurrence: index 0
    - Answer: 2 (even though 3 repeats first)
    
    Time: O(n), Space: O(n)
    """
    first_occurrence = {}
    duplicates = set()
    
    for i, element in enumerate(elements):
        if element in first_occurrence:
            duplicates.add(element)
        else:
            first_occurrence[element] = i
    
    if not duplicates:
        return None
    
    # Find duplicate with earliest first occurrence
    earliest = min(duplicates, key=lambda x: first_occurrence[x])
    return earliest
 
 
# Example
print(find_earliest_duplicate([2, 1, 3, 5, 3, 2]))  # 2 (first duplicate by first occurrence)
 
 
def find_kth_duplicate(elements: List[T], k: int) -> Optional[T]:
    """
    Find the k-th distinct duplicate value (1-indexed).
    
    Example: [1, 2, 3, 2, 1, 4, 3, 5, 1], k=2
    - First duplicate detected: 2 (at index 3)
    - Second duplicate detected: 1 (at index 4)
    - Answer: 1
    """
    seen = set()
    duplicates_found = []
    
    for element in elements:
        if element in seen and element not in duplicates_found:
            duplicates_found.append(element)
            if len(duplicates_found) == k:
                return element
        seen.add(element)
    
    return None  # Fewer than k duplicates
 
 
# Example
print(find_kth_duplicate([1, 2, 3, 2, 1, 4, 3, 5, 1], 2))  # 1

Duplicates Within Distance K

A common variant asks: Are there duplicates within a sliding window of size k?

This is useful for detecting suspicious patterns—like multiple login attempts from the same IP within a short time window.

Classic Problem: Nearby Duplicate

Given an array and integer k, return true if there are two distinct indices i and j such that nums[i] == nums[j] and |i - j| <= k.

contains_duplicate_ii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
/**
 * Contains Duplicate II: Duplicates within distance k
 * 
 * Approach 1: Sliding Window Set - O(n) time, O(k) space
 * Maintain a set of the last k elements
 */
function containsNearbyDuplicate(nums: number[], k: number): boolean {
    const window = new Set<number>();
    
    for (let i = 0; i < nums.length; i++) {
        // Check if current element is in the window
        if (window.has(nums[i])) {
            return true;
        }
        
        // Add current element to window
        window.add(nums[i]);
        
        // Remove element that's now outside the window
        if (window.size > k) {
            window.delete(nums[i - k]);
        }
    }
    
    return false;
}
 
/**
 * Approach 2: Hash Map with Index - O(n) time, O(n) space
 * Store the most recent index of each value
 */
function containsNearbyDuplicateMap(nums: number[], k: number): boolean {
    const lastIndex = new Map<number, number>();
    
    for (let i = 0; i < nums.length; i++) {
        if (lastIndex.has(nums[i])) {
            const prevIndex = lastIndex.get(nums[i])!;
            if (i - prevIndex <= k) {
                return true;
            }
        }
        lastIndex.set(nums[i], i);
    }
    
    return false;
}
 
// Example walkthrough: nums = [1, 2, 3, 1], k = 3
// Sliding window approach:
// i=0: window = {} → add 1 → {1}
// i=1: window = {1} → 2 not in window → add 2 → {1, 2}
// i=2: window = {1, 2} → 3 not in window → add 3 → {1, 2, 3}
// i=3: window = {1, 2, 3} → 1 IS in window → return true!
 
console.log(containsNearbyDuplicate([1, 2, 3, 1], 3));    // true
console.log(containsNearbyDuplicate([1, 2, 3, 1, 2, 3], 2)); // false

Classic Problem: Nearby Almost Duplicate

A more complex variant: check if there are two distinct indices i and j such that |nums[i] - nums[j]| <= t and |i - j| <= k.

This requires both proximity in position AND proximity in value. The bucket-based solution is elegant:

contains_duplicate_iii
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
 * Contains Duplicate III: Nearby Almost Duplicate
 * |nums[i] - nums[j]| <= t AND |i - j| <= k
 * 
 * Key insight: Use buckets of size (t+1)
 * - Numbers in same bucket differ by at most t
 * - Numbers in adjacent buckets might differ by at most t
 * 
 * Time: O(n), Space: O(k)
 */
function containsNearbyAlmostDuplicate(
    nums: number[], 
    k: number, 
    t: number
): boolean {
    if (k <= 0 || t < 0) return false;
    
    const buckets = new Map<number, number>();
    const bucketSize = t + 1; // +1 to avoid division by zero when t=0
    
    // Helper: Get bucket ID for a number
    // Handle negative numbers correctly
    const getBucketId = (num: number): number => {
        return num >= 0 
            ? Math.floor(num / bucketSize)
            : Math.floor((num + 1) / bucketSize) - 1;
    };
    
    for (let i = 0; i < nums.length; i++) {
        const bucketId = getBucketId(nums[i]);
        
        // Check same bucket (guaranteed |diff| <= t)
        if (buckets.has(bucketId)) {
            return true;
        }
        
        // Check adjacent buckets (might have |diff| <= t)
        if (buckets.has(bucketId - 1)) {
            if (Math.abs(nums[i] - buckets.get(bucketId - 1)!) <= t) {
                return true;
            }
        }
        if (buckets.has(bucketId + 1)) {
            if (Math.abs(nums[i] - buckets.get(bucketId + 1)!) <= t) {
                return true;
            }
        }
        
        // Add current number to its bucket
        buckets.set(bucketId, nums[i]);
        
        // Remove element outside window
        if (i >= k) {
            const oldBucketId = getBucketId(nums[i - k]);
            buckets.delete(oldBucketId);
        }
    }
    
    return false;
}
 
// Example: nums = [1, 5, 9, 1, 5, 9], k = 2, t = 3
// Bucket size = 4 (so 0-3 in bucket 0, 4-7 in bucket 1, etc.)
// i=0: 1 → bucket 0 → buckets = {0: 1}
// i=1: 5 → bucket 1 → check bucket 0: |5-1|=4 > 3 → buckets = {0: 1, 1: 5}
// i=2: 9 → bucket 2 → check bucket 1: |9-5|=4 > 3 → buckets = {0: 1, 1: 5, 2: 9}
// Remove bucket for nums[0]=1 → buckets = {1: 5, 2: 9}
// i=3: 1 → bucket 0 → no adjacent match → buckets = {0: 1, 1: 5, 2: 9}
// ... continues
 
console.log(containsNearbyAlmostDuplicate([1, 2, 1, 1], 1, 0)); // true

Real-World Considerations

Production duplicate detection involves challenges beyond the basic algorithm.

Challenge 1: Near-Duplicates

'John Smith' and 'JOHN SMITH' are duplicates, but exact matching won't find them. Solutions:

Normalize before checking (lowercase, trim whitespace)
Use fuzzy matching (Levenshtein distance)
Apply domain-specific canonicalization

near_duplicate_detection
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/**
 * Near-Duplicate Detection with Normalization
 */
function findNearDuplicates(strings: string[]): Map<string, string[]> {
    const groups = new Map<string, string[]>();
    
    for (const str of strings) {
        // Normalize: lowercase, remove extra whitespace, etc.
        const normalized = str.toLowerCase().trim().replace(/\s+/g, ' ');
        
        if (!groups.has(normalized)) {
            groups.set(normalized, []);
        }
        groups.get(normalized)!.push(str);
    }
    
    // Return only groups with duplicates
    return new Map(
        [...groups.entries()].filter(([_, values]) => values.length > 1)
    );
}
 
// Example
const names = [
    "John Smith",
    "JOHN SMITH",
    "John  Smith", // Extra space
    "Jane Doe",
    "jane doe"
];
 
console.log(findNearDuplicates(names));
// Map {
//   "john smith" => ["John Smith", "JOHN SMITH", "John  Smith"],
//   "jane doe" => ["Jane Doe", "jane doe"]
// }

Challenge 2: Scale and Memory

For billions of records, keeping all values in memory isn't feasible. Solutions:

Bloom Filters: Probabilistic data structure that can say "definitely not seen" or "probably seen" with tunable false-positive rate
External Sorting: Sort data on disk, then linear scan for adjacent duplicates
Distributed Hashing: Partition data by hash, check duplicates within partitions

bloom_filter_concept
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
/**
 * Conceptual Bloom Filter for Duplicate Detection
 * 
 * A Bloom filter uses k hash functions and a bit array
 * - Adding: Set bits at positions hash1(x), hash2(x), ..., hashk(x)
 * - Checking: If ALL bits at hash positions are set → might be present
 *            If ANY bit is 0 → definitely not present
 * 
 * Trade-off: Small memory, but false positives possible
 */
class SimpleBloomFilter {
    private bits: boolean[];
    private size: number;
    
    constructor(size: number = 1000) {
        this.size = size;
        this.bits = new Array(size).fill(false);
    }
    
    // Simplified: using two hash functions
    private hash1(value: string): number {
        let hash = 0;
        for (const char of value) {
            hash = (hash * 31 + char.charCodeAt(0)) % this.size;
        }
        return hash;
    }
    
    private hash2(value: string): number {
        let hash = 0;
        for (const char of value) {
            hash = (hash * 37 + char.charCodeAt(0)) % this.size;
        }
        return hash;
    }
    
    add(value: string): void {
        this.bits[this.hash1(value)] = true;
        this.bits[this.hash2(value)] = true;
    }
    
    mightContain(value: string): boolean {
        return this.bits[this.hash1(value)] && this.bits[this.hash2(value)];
    }
}
 
// Usage for deduplication with space constraint
function deduplicateWithBloom(
    stream: Iterable<string>
): string[] {
    const bloom = new SimpleBloomFilter(10000);
    const result: string[] = [];
    
    for (const item of stream) {
        if (!bloom.mightContain(item)) {
            // Definitely new
            result.push(item);
            bloom.add(item);
        }
        // If mightContain returns true, it MIGHT be duplicate (could be false positive)
        // For guaranteed deduplication, combine with exact check for "maybe" cases
    }
    
    return result;
}

Challenge 3: Distributed Systems

In distributed systems, duplicate detection becomes the problem of exactly-once delivery:

Idempotency Keys: Clients include unique IDs; servers track processed IDs
Deduplication Windows: Keep recent IDs in memory, discard older ones
Content-Addressable Storage: Use content hash as key; duplicates automatically map to same key

The CAP Theorem Connection

Common Interview Problems

Let's examine some classic duplicate-related interview problems that go beyond basic detection.

Problem 1: Single Number

Every element appears twice except one. Find that single element.

This can be solved with a hash set, but there's an elegant O(1) space solution using XOR:

single_number
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
 * Single Number: One element appears once, all others appear twice
 * 
 * Hash Set Approach: O(n) time, O(n) space
 */
function singleNumberSet(nums: number[]): number {
    const set = new Set<number>();
    
    for (const num of nums) {
        if (set.has(num)) {
            set.delete(num); // Seen twice = remove
        } else {
            set.add(num);    // First occurrence = add
        }
    }
    
    return [...set][0]; // Only one element remains
}
 
/**
 * XOR Approach: O(n) time, O(1) space
 * 
 * Key insight: a XOR a = 0 and a XOR 0 = a
 * So duplicates cancel out, leaving the single number
 */
function singleNumberXOR(nums: number[]): number {
    let result = 0;
    for (const num of nums) {
        result ^= num;
    }
    return result;
}
 
// Example: [4, 1, 2, 1, 2]
// 4 XOR 1 XOR 2 XOR 1 XOR 2
// = (1 XOR 1) XOR (2 XOR 2) XOR 4
// = 0 XOR 0 XOR 4
// = 4
 
console.log(singleNumberXOR([4, 1, 2, 1, 2])); // 4

Problem 2: Find Missing Number

Given an array containing n distinct numbers in range [0, n], find the missing one.

missing_number
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from typing import List
 
def missing_number_set(nums: List[int]) -> int:
    """
    Hash Set Approach: O(n) time, O(n) space
    """
    all_nums = set(range(len(nums) + 1))
    present = set(nums)
    missing = all_nums - present
    return missing.pop()
 
 
def missing_number_xor(nums: List[int]) -> int:
    """
    XOR Approach: O(n) time, O(1) space
    
    XOR all numbers 0 to n with all numbers in array.
    Duplicates cancel, leaving the missing number.
    """
    result = len(nums)  # Start with n (since range is [0, n])
    
    for i, num in enumerate(nums):
        result ^= i ^ num
    
    return result
 
 
def missing_number_math(nums: List[int]) -> int:
    """
    Mathematical Approach: O(n) time, O(1) space
    
    Sum(0 to n) - Sum(nums) = missing number
    """
    n = len(nums)
    expected_sum = n * (n + 1) // 2
    actual_sum = sum(nums)
    return expected_sum - actual_sum
 
 
# Example: [3, 0, 1] → missing 2
print(missing_number_xor([3, 0, 1]))  # 2

Problem 3: Find All Numbers Disappeared in an Array

Given an array of integers where 1 ≤ a[i] ≤ n, find all integers in [1, n] that don't appear.

find_disappeared
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/**
 * Find Disappeared Numbers
 * 
 * Hash Set Approach: O(n) time, O(n) space
 */
function findDisappearedNumbersSet(nums: number[]): number[] {
    const present = new Set(nums);
    const missing: number[] = [];
    
    for (let i = 1; i <= nums.length; i++) {
        if (!present.has(i)) {
            missing.push(i);
        }
    }
    
    return missing;
}
 
/**
 * In-Place Marking: O(n) time, O(1) space
 * 
 * Use negation to mark indices as "seen"
 */
function findDisappearedNumbersInPlace(nums: number[]): number[] {
    // Mark indices corresponding to present values
    for (let i = 0; i < nums.length; i++) {
        const index = Math.abs(nums[i]) - 1;
        if (nums[index] > 0) {
            nums[index] = -nums[index];
        }
    }
    
    // Indices still positive are missing
    const missing: number[] = [];
    for (let i = 0; i < nums.length; i++) {
        if (nums[i] > 0) {
            missing.push(i + 1);
        }
    }
    
    // Restore (optional)
    for (let i = 0; i < nums.length; i++) {
        nums[i] = Math.abs(nums[i]);
    }
    
    return missing;
}
 
console.log(findDisappearedNumbersInPlace([4, 3, 2, 7, 8, 2, 3, 1])); 
// [5, 6]

Summary: Duplicate Detection Mastery

Duplicate detection, while conceptually simple, underpins critical systems across computing. Let's consolidate the key insights:

Key Takeaways

•Set-based detection is the standard — O(n) time with clear, simple code
•In-place techniques trade clarity for space — Use when memory is constrained and values are in range [1, n]
•Sliding window handles distance constraints — Maintain a window set of size k for nearby duplicates
•Near-duplicates require normalization — Canonicalize before comparing
•Scale requires probabilistic structures — Bloom filters for space-efficient approximate checking
•XOR is powerful for exact-pair duplicates — O(1) space when every element appears exactly twice (or is missing)

Duplicate Detection Quick Reference
Problem Variant	Approach	Time	Space
Contains any duplicate	Hash Set	O(n)	O(n)
Find all duplicates	Set + Set or Map	O(n)	O(n)
Duplicates within k	Sliding Window Set	O(n)	O(min(n,k))
Almost duplicates	Bucket Sort	O(n)	O(k)
In-place (values 1-n)	Index negation	O(n)	O(1)
Single non-duplicate	XOR	O(n)	O(1)

Page Complete

3 / 5