Common Hashing Patterns - Learning Module

Loading content...

0/279

Frequency Counting

The Universal Pattern

If you were to analyze every algorithm problem ever solved using hash tables, one pattern would appear more frequently than any other: frequency counting. It is so fundamental, so universally applicable, that mastering it unlocks a vast portion of algorithmic problem-solving.

Frequency counting answers the simplest yet most powerful question: How many times does each thing appear?

This deceptively simple question underlies countless real-world applications:

How many times has each word appeared in a document?
Which characters appear in one string but not another?
What is the most common IP address hitting our server?
How many votes did each candidate receive?
What items appear more than k times in our dataset?

In every case, the naive approach—iterating through the data repeatedly for each unique element—yields O(n²) complexity. The hash-based approach? O(n). This single pattern optimization has saved more computing hours than perhaps any other algorithmic technique.

What You Will Master

By the end of this page, you will deeply understand frequency counting as a pattern: when to apply it, how to implement it efficiently in multiple languages, and how to recognize disguised frequency problems. You will see how this single pattern forms the foundation for dozens of classic interview problems and real-world applications.

The Core Concept

At its heart, frequency counting is a mapping problem. We need to map each unique element to its count of occurrences. This is precisely what hash tables excel at: mapping keys to values in O(1) average time.

The Mental Model:

Imagine you're a librarian cataloging books. As each book arrives, you check if you've seen this title before:

If not, create a new entry with count = 1
If yes, increment the existing count

After processing all books, you have a complete count of every title. The hash table is your catalog index—providing instant lookup by title.

The Invariant:

At any point during frequency counting, the hash table maintains a critical invariant:

For every key k in the map, map[k] equals the exact number of times k appeared in the processed portion of the input.

This invariant is what makes frequency counting so reliable. It's self-correcting—any element we encounter either creates or updates an entry, never leaving the count in an inconsistent state.

frequency_counting_basic
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/**
 * Basic frequency counting: count occurrences of each element
 * Time: O(n) where n is the number of elements
 * Space: O(k) where k is the number of unique elements
 */
function countFrequencies<T>(elements: T[]): Map<T, number> {
    const frequencyMap = new Map<T, number>();
    
    for (const element of elements) {
        // Get current count (defaults to 0 if not present)
        const currentCount = frequencyMap.get(element) ?? 0;
        // Increment and store
        frequencyMap.set(element, currentCount + 1);
    }
    
    return frequencyMap;
}
 
// Example usage
const words = ["apple", "banana", "apple", "cherry", "banana", "apple"];
const wordCounts = countFrequencies(words);
 
// Result: Map { "apple" => 3, "banana" => 2, "cherry" => 1 }
console.log(wordCounts);
 
// Converting to array for analysis
const sorted = [...wordCounts.entries()]
    .sort((a, b) => b[1] - a[1]); // Sort by count descending
    
console.log("Most frequent:", sorted[0]); // ["apple", 3]

Complexity Analysis

Understanding the complexity of frequency counting requires examining both the best and worst cases, as well as understanding why this pattern is so efficient.

Time Complexity:

Best/Average Case: O(n) — We iterate through n elements exactly once. Each hash table operation (get and set) is O(1) amortized.
Worst Case: O(n²) — If all elements hash to the same bucket (extremely rare with good hash functions), each operation becomes O(n).

Space Complexity:

Best Case: O(1) — If all elements are identical, we store only one entry.
Worst Case: O(n) — If all elements are unique, we store n entries.
Typical Case: O(k) — Where k is the number of unique elements. For many applications, k << n.

Why O(n) Matters:

Consider the alternative without hash tables. To count frequencies, you might:

For each element, scan the entire array to count occurrences
Track which elements you've already counted

This approach is O(n²). On a million elements:

Hash-based: ~1 million operations
Naive: ~1 trillion operations

The speedup factor is 1,000,000x. This is why frequency counting with hash tables is considered one of the most important algorithmic patterns.

Frequency Counting Complexity Summary
Metric	Hash Table Approach	Naive Approach	Improvement Factor
Time (n=1,000)	O(1,000)	O(1,000,000)	1,000x
Time (n=1,000,000)	O(1,000,000)	O(1,000,000,000,000)	1,000,000x
Space (worst)	O(n)	O(n)	Same
Space (typical)	O(k) where k << n	O(n)	Better

The Practical Perspective

In real-world applications, the space complexity O(k) is often much better than O(n). Counting word frequencies in a novel has millions of words but typically only 10,000-30,000 unique words. Counting customer purchase categories for a retailer might involve billions of transactions but only hundreds of product categories.

Variations and Extensions

The basic frequency counting pattern extends to many powerful variations. Each builds on the core concept while adding specific capabilities.

Variation 1: Bounded Frequency Counting

Sometimes we only care about counts up to some threshold. For example, when detecting duplicates, we only need to know if count > 1. This can optimize early termination:

bounded_frequency
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/**
 * Find elements that appear more than k times
 * Optimization: Track only until we exceed threshold
 */
function findFrequentElements<T>(elements: T[], k: number): T[] {
    const counts = new Map<T, number>();
    const result: T[] = [];
    const alreadyAdded = new Set<T>();
    
    for (const element of elements) {
        const newCount = (counts.get(element) ?? 0) + 1;
        counts.set(element, newCount);
        
        // Add to result when threshold exceeded (only once)
        if (newCount > k && !alreadyAdded.has(element)) {
            result.push(element);
            alreadyAdded.add(element);
        }
    }
    
    return result;
}
 
/**
 * Find the first duplicate element (early termination)
 */
function findFirstDuplicate<T>(elements: T[]): T | undefined {
    const seen = new Set<T>();
    
    for (const element of elements) {
        if (seen.has(element)) {
            return element; // Early exit!
        }
        seen.add(element);
    }
    
    return undefined;
}

Variation 2: Frequency Comparison

Comparing frequencies between two collections reveals relationships like anagrams, permutations, or subset containment:

frequency_comparison
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
/**
 * Check if two strings are anagrams
 * Key insight: Anagrams have identical character frequencies
 */
function areAnagrams(s1: string, s2: string): boolean {
    if (s1.length !== s2.length) return false;
    
    const freq = new Map<string, number>();
    
    // Build frequency map for first string
    for (const char of s1) {
        freq.set(char, (freq.get(char) ?? 0) + 1);
    }
    
    // Subtract frequencies for second string
    for (const char of s2) {
        const count = freq.get(char);
        if (count === undefined || count === 0) {
            return false; // Character not in s1 or depleted
        }
        freq.set(char, count - 1);
    }
    
    return true; // All counts are now 0
}
 
/**
 * Alternative: Compare two frequency maps
 */
function mapsEqual<K, V>(map1: Map<K, V>, map2: Map<K, V>): boolean {
    if (map1.size !== map2.size) return false;
    
    for (const [key, value] of map1) {
        if (map2.get(key) !== value) return false;
    }
    
    return true;
}
 
// Examples
console.log(areAnagrams("listen", "silent")); // true
console.log(areAnagrams("hello", "world"));   // false

Variation 3: Rolling/Sliding Window Frequency

Maintaining frequencies over a sliding window requires careful updates as elements enter and leave:

sliding_frequency
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
/**
 * Find all anagram occurrences of pattern in text
 * Uses sliding window with frequency comparison
 */
function findAnagramIndices(text: string, pattern: string): number[] {
    if (pattern.length > text.length) return [];
    
    const result: number[] = [];
    const patternFreq = new Map<string, number>();
    const windowFreq = new Map<string, number>();
    
    // Build pattern frequency map
    for (const char of pattern) {
        patternFreq.set(char, (patternFreq.get(char) ?? 0) + 1);
    }
    
    // Track how many characters have matching counts
    let matchCount = 0;
    const targetMatchCount = patternFreq.size;
    
    for (let i = 0; i < text.length; i++) {
        // Add character entering window
        const entering = text[i];
        const newEnteringCount = (windowFreq.get(entering) ?? 0) + 1;
        windowFreq.set(entering, newEnteringCount);
        
        // Update match count for entering character
        const patternEnteringCount = patternFreq.get(entering) ?? 0;
        if (newEnteringCount === patternEnteringCount) {
            matchCount++;
        } else if (newEnteringCount === patternEnteringCount + 1) {
            matchCount--;
        }
        
        // Remove character leaving window (if window is full)
        if (i >= pattern.length) {
            const leaving = text[i - pattern.length];
            const oldLeavingCount = windowFreq.get(leaving)!;
            const newLeavingCount = oldLeavingCount - 1;
            
            if (newLeavingCount === 0) {
                windowFreq.delete(leaving);
            } else {
                windowFreq.set(leaving, newLeavingCount);
            }
            
            // Update match count for leaving character
            const patternLeavingCount = patternFreq.get(leaving) ?? 0;
            if (oldLeavingCount === patternLeavingCount) {
                matchCount--;
            } else if (newLeavingCount === patternLeavingCount) {
                matchCount++;
            }
        }
        
        // Check if current window is an anagram
        if (i >= pattern.length - 1 && matchCount === targetMatchCount) {
            result.push(i - pattern.length + 1);
        }
    }
    
    return result;
}
 
// Example
console.log(findAnagramIndices("cbaebabacd", "abc")); // [0, 6]

Real-World Applications

Frequency counting isn't just an interview pattern—it's a cornerstone of real production systems across every industry. Let's examine how this pattern manifests in different domains:

1. Search Engines: TF-IDF and Document Indexing

When you search for "best pizza near me," search engines don't match your query against every document sequentially. Instead, they've pre-computed term frequencies for every document in their index.

Term Frequency (TF): How often does "pizza" appear in this document?
Inverse Document Frequency (IDF): How common is "pizza" across all documents?

Both require frequency counting at massive scale. Google processes over 8.5 billion searches per day, each matching against billions of indexed documents—all powered by frequency-based ranking.

tf_idf_example
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
from collections import Counter
from math import log
from typing import List, Dict
 
def compute_tf(document: str) -> Dict[str, float]:
    """
    Compute term frequency for each word in a document.
    TF(word) = count(word) / total_words
    """
    words = document.lower().split()
    word_counts = Counter(words)
    total_words = len(words)
    
    return {word: count / total_words 
            for word, count in word_counts.items()}
 
 
def compute_idf(documents: List[str]) -> Dict[str, float]:
    """
    Compute inverse document frequency across corpus.
    IDF(word) = log(total_docs / docs_containing_word)
    """
    total_docs = len(documents)
    doc_counts: Counter = Counter()
    
    for doc in documents:
        # Count each word once per document (use set)
        unique_words = set(doc.lower().split())
        doc_counts.update(unique_words)
    
    return {word: log(total_docs / count)
            for word, count in doc_counts.items()}
 
 
def compute_tfidf(document: str, idf: Dict[str, float]) -> Dict[str, float]:
    """Compute TF-IDF scores for a document."""
    tf = compute_tf(document)
    return {word: tf_score * idf.get(word, 0)
            for word, tf_score in tf.items()}
 
 
# Example: Simple search corpus
corpus = [
    "The quick brown fox jumps over the lazy dog",
    "The lazy cat sleeps all day long",
    "A quick solution requires quick thinking"
]
 
idf = compute_idf(corpus)
document_scores = compute_tfidf(corpus[0], idf)
 
# Words unique to document 0 have higher TF-IDF
print(sorted(document_scores.items(), 
             key=lambda x: x[1], reverse=True)[:5])

2. Network Security: DDoS Detection and Rate Limiting

Frequency counting is essential for detecting abuse patterns in network traffic:

Rate Limiting: Count requests per IP per time window
DDoS Detection: Identify IPs exceeding normal request thresholds
Fraud Detection: Flag accounts with unusual transaction frequencies

3. Bioinformatics: Genome Analysis

Counting nucleotide (A, C, G, T) frequencies helps identify:

Gene coding regions
Evolutionary relationships between species
Potential mutation sites

4. Recommendation Systems

Frequency-based collaborative filtering:

Count how often users rate items similarly
Identify frequently co-purchased products
Find popular items within user segments

5. Log Analysis and Observability

Count error types per service
Identify most frequent API endpoints
Track feature usage patterns

Scale Considerations

At extreme scale (billions of events), exact frequency counting becomes impractical due to memory constraints. Production systems often use probabilistic data structures like Count-Min Sketch or HyperLogLog for approximate counting with bounded error guarantees. These trade exact accuracy for O(1) space complexity.

Classic Interview Problems

Frequency counting appears in numerous interview problems, often in disguised forms. Learning to recognize the pattern is key to solving them efficiently.

Problem 1: Top K Frequent Elements

Given an array of integers, return the k most frequent elements. This is a direct application of frequency counting followed by selection.

top_k_frequent
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
/**
 * Top K Frequent Elements
 * 
 * Approach 1: Sort by frequency - O(n log n)
 * Approach 2: Min-heap of size k - O(n log k)
 * Approach 3: Bucket sort - O(n) (when k can be large)
 */
 
// Approach 1: Simple sorting (O(n log n))
function topKFrequentSort(nums: number[], k: number): number[] {
    const freq = new Map<number, number>();
    
    for (const num of nums) {
        freq.set(num, (freq.get(num) ?? 0) + 1);
    }
    
    return [...freq.entries()]
        .sort((a, b) => b[1] - a[1])
        .slice(0, k)
        .map(entry => entry[0]);
}
 
// Approach 3: Bucket sort (O(n))
// Key insight: frequency can be at most n, so we can use an array of buckets
function topKFrequentBucket(nums: number[], k: number): number[] {
    const freq = new Map<number, number>();
    
    for (const num of nums) {
        freq.set(num, (freq.get(num) ?? 0) + 1);
    }
    
    // Create buckets: bucket[i] contains elements with frequency i
    // Index 0 unused since no element has frequency 0
    const buckets: number[][] = Array.from(
        { length: nums.length + 1 }, 
        () => []
    );
    
    for (const [num, count] of freq) {
        buckets[count].push(num);
    }
    
    // Collect from highest frequency buckets
    const result: number[] = [];
    for (let i = buckets.length - 1; i >= 0 && result.length < k; i--) {
        result.push(...buckets[i]);
    }
    
    return result.slice(0, k);
}
 
// Example
console.log(topKFrequentBucket([1,1,1,2,2,3], 2)); // [1, 2]

Problem 2: Valid Anagram

Given two strings s and t, return true if t is an anagram of s. This is pure frequency comparison.

Problem 3: First Unique Character in a String

Given a string, find the first non-repeating character and return its index. If it doesn't exist, return -1.

first_unique_char
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
/**
 * First Unique Character in a String
 * 
 * Key insight: Count frequencies first, then scan for count === 1
 * Time: O(n), Space: O(1) if alphabet is fixed (26 lowercase letters)
 */
function firstUniqChar(s: string): number {
    const freq = new Map<string, number>();
    
    // First pass: count frequencies
    for (const char of s) {
        freq.set(char, (freq.get(char) ?? 0) + 1);
    }
    
    // Second pass: find first with frequency 1
    for (let i = 0; i < s.length; i++) {
        if (freq.get(s[i]) === 1) {
            return i;
        }
    }
    
    return -1;
}
 
/**
 * Variant: Track order of appearance for single-pass streaming
 * Useful when you can't re-scan the input
 */
function firstUniqCharStream(s: string): number {
    const freq = new Map<string, number>();
    const order: number[] = []; // Index of first occurrence
    const charToIndex = new Map<string, number>();
    
    for (let i = 0; i < s.length; i++) {
        const char = s[i];
        const count = (freq.get(char) ?? 0) + 1;
        freq.set(char, count);
        
        if (count === 1) {
            charToIndex.set(char, i);
            order.push(i);
        }
    }
    
    // Find first index in order with frequency 1
    for (const idx of order) {
        if (freq.get(s[idx]) === 1) {
            return idx;
        }
    }
    
    return -1;
}
 
console.log(firstUniqChar("onenoughtone")); // 2 (e is unique)
console.log(firstUniqChar("algorithm")); // 0 (a is first unique)

Problem 4: Ransom Note

Given two strings ransomNote and magazine, return true if ransomNote can be constructed from the letters in magazine. Each letter in magazine can only be used once.

ransom_note
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from collections import Counter
 
def can_construct(ransom_note: str, magazine: str) -> bool:
    """
    Can ransom_note be constructed from magazine letters?
    
    Key insight: magazine must have >= frequency of each char in note
    """
    note_freq = Counter(ransom_note)
    magazine_freq = Counter(magazine)
    
    for char, count in note_freq.items():
        if magazine_freq[char] < count:
            return False
    
    return True
 
 
def can_construct_subtract(ransom_note: str, magazine: str) -> bool:
    """Alternative: subtract and check for negatives"""
    magazine_freq = Counter(magazine)
    
    for char in ransom_note:
        magazine_freq[char] -= 1
        if magazine_freq[char] < 0:
            return False
    
    return True
 
 
def can_construct_pythonic(ransom_note: str, magazine: str) -> bool:
    """
    Most Pythonic: Counter subtraction leaves only shortage
    Counter with negative counts indicates what's missing
    """
    return not (Counter(ransom_note) - Counter(magazine))
 
 
# Examples
print(can_construct("aa", "aab"))  # True
print(can_construct("aa", "ab"))   # False

Common Pitfalls and Best Practices

Even experienced engineers make mistakes with frequency counting. Here are the most common pitfalls and how to avoid them:

Pitfall 1: Forgetting to Handle Missing Keys

•Problem: Accessing a non-existent key throws an error or returns undefined
•JavaScript/TypeScript: map.get(key) returns undefined for missing keys
•Python: dict[key] raises KeyError for missing keys
•Java: map.get(key) returns null for missing keys
•Solution: Always use get(key, default) or equivalent: map.get(key) ?? 0

Pitfall 2: Case Sensitivity

•Problem: 'Apple' and 'apple' are counted as different words
•Solution: Normalize case before counting: str.toLowerCase() or str.lower()
•Consider: Should 'Apple' (brand) and 'apple' (fruit) be the same? Domain matters!

Pitfall 3: Floating-Point Keys

•Problem: Floating-point comparison issues cause hash collisions
•Example: 0.1 + 0.2 !== 0.3 due to floating-point representation
•Solution: Round to fixed precision or use string representation
•Better: Avoid floating-point keys entirely when possible

Pitfall 4: Mutable Keys

•Problem: Using arrays or objects as keys directly
•JavaScript Issue: [1,2,3] !== [1,2,3] - Arrays are compared by reference
•Python Issue: Lists aren't hashable
•Solution: Convert to immutable representation: JSON.stringify() or tuple()

pitfall_examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// BAD: Using arrays as Map keys
const badMap = new Map<number[], number>();
const key1 = [1, 2, 3];
const key2 = [1, 2, 3];
badMap.set(key1, 1);
console.log(badMap.get(key2)); // undefined! (different reference)
 
// GOOD: Use string representation
const goodMap = new Map<string, number>();
goodMap.set(JSON.stringify([1, 2, 3]), 1);
console.log(goodMap.get(JSON.stringify([1, 2, 3]))); // 1
 
// GOOD: Group by creating composite key strings
function groupByPair(items: [string, string][]): Map<string, number> {
    const counts = new Map<string, number>();
    
    for (const [a, b] of items) {
        // Create deterministic composite key
        const key = `${a}||${b}`; // Use delimiter unlikely in data
        counts.set(key, (counts.get(key) ?? 0) + 1);
    }
    
    return counts;
}

Best Practice: Defensive Counting

When frequency counting in production code:

Always normalize input (case, whitespace, encoding)
Use explicit defaults for missing keys
Consider whether keys are truly unique (beware of collisions)
Document your key normalization strategy
Test with edge cases: empty input, single element, all duplicates

Summary: Mastering Frequency Counting

We've explored frequency counting from first principles to advanced applications. Let's consolidate the key insights:

Key Takeaways

•Frequency counting transforms O(n²) to O(n) — The fundamental speedup that makes this pattern invaluable
•The pattern is: iterate once, build a map, query as needed — Simple structure, profound impact
•Variations include bounded counting, comparison, and sliding windows — Each extends the core pattern for specific use cases
•Real-world applications span search engines to bioinformatics — This isn't just interview prep; it's real engineering
•Language-specific APIs matter — Counter, defaultdict, Map.merge() make code cleaner and less error-prone
•Watch for key normalization and mutability issues — Most bugs come from inconsistent key handling

Page Complete

You now deeply understand frequency counting—the most fundamental hashing pattern. This single concept underpins countless algorithms and appears in nearly every technical interview. Next, we'll explore the Two-Sum pattern and how hashing enables pair-finding problems.