Description

Editorial

Design Concurrent Bloom Filter

Advanced

Design a thread-safe Bloom filter — a space-efficient probabilistic data structure for set membership testing that guarantees no false negatives but allows a controllable false positive rate (FPR).

Implement three variants: (1) Standard Bloom Filter with optimal parameter calculation (m = −(n·ln p)/(ln 2)², k = (m/n)·ln 2), double hashing via Kirschner-Mitzenmacher optimisation (h_i(x) = h1(x) + i·h2(x) mod m), thread safety via ReadWrite lock (shared reads, exclusive writes), and support for merge (bitwise OR union), fill ratio, and runtime FPR estimation; (2) Counting Bloom Filter using counter arrays instead of bits, enabling deletion (decrement) with overflow protection (4-bit cap at 15), at 4× the memory cost; (3) Scalable Bloom Filter with dynamic layer growth — each new layer has tighter FPR (p_i = p₀·r^i) so the combined FPR across all layers stays within the target.

Core Use Cases

add() — insert element

Hash item with k hash functions; set the corresponding k bit positions to 1

mightContain() — membership test

Returns DEFINITELY NOT (if any bit is 0) or POSSIBLY YES (if all k bits are 1); false positives possible, false negatives impossible

Optimal parameter calculation

Compute optimal bit array size m = −(n·ln p)/(ln 2)² and hash count k = (m/n)·ln 2 from expected items n and target FPR p

Double hashing (Kirschner-Mitzenmacher)

Generate k hash values from two base hashes: h_i(x) = (h1(x) + i·h2(x)) mod m — avoids computing k independent hash functions

Thread safety via RW lock

mightContain() acquires shared/read lock (concurrent reads); add() acquires exclusive/write lock

Counting Bloom Filter — deletion support

Replace each bit with a counter (4-bit/int); add increments, remove decrements; enables deletion at 4× memory cost

Scalable Bloom Filter — dynamic growth

Stack of Bloom filter layers; when current layer is full, add a new layer with tighter FPR (p_i = p₀·r^i) to maintain overall target

Merge (union) of filters

Bitwise OR of two Bloom filters with same m and k produces a filter representing the union of both sets

Estimated FPR and fill ratio

Runtime FPR estimation: (1 − e^(−kn/m))^k; fill ratio = bits set / total bits — monitor saturation

Constraints

False positives are possible; false negatives are NEVER possible
Standard Bloom filter does not support deletion (removing bits may cause false negatives for other items)
Counting Bloom filter counters are capped at max_count (e.g., 15 for 4-bit) to prevent overflow
Counting Bloom filter remove() only succeeds if all k counters are > 0
Scalable Bloom filter tightens per-layer FPR geometrically so combined FPR stays within target
Merge is only valid between filters with identical m and k parameters
Thread safety: reads use shared lock, writes use exclusive lock

Assumptions

Items can be serialized to strings for hashing
MD5 or FNV1a/DJB2 provide sufficient hash quality for the double-hashing scheme
In-memory only (no persistence)
Counter overflow in Counting BF is handled by capping (not wrapping)
Scalable BF does not compact/merge layers

In Scope

Standard Bloom Filter: add, mightContain, clear, merge, fill ratio, estimated FPR
Optimal m and k calculation from expected items and target FPR
Double hashing (Kirschner-Mitzenmacher) for k hash values from 2 base hashes
Thread safety via ReadWrite lock / shared_mutex
Counting Bloom Filter: add, remove, mightContain with counter array
Scalable Bloom Filter: dynamic layer growth with FPR tightening
Runtime statistics: count, fill ratio, estimated FPR

Out of Scope

Cuckoo filter (alternative with deletion + lower FPR)
Quotient filter
Distributed Bloom filter (across multiple nodes)
Persistence / serialization to disk
Compressed Bloom filter for network transfer
Aging / time-decaying Bloom filter

Approach Guide(Click to expand each section)

Before diving into code, clarify the use cases and edge cases. Understanding the problem deeply leads to better class design.

List Core Use Cases

Identify the primary actions users will perform. For a parking lot: park vehicle, exit vehicle, check availability. Each becomes a method.

Identify Actors

Who interacts with the system? Customers, admins, automated systems? Each actor type may need different interfaces.

Clarify Constraints

What are the limits? Max vehicles, supported vehicle types, payment methods. Constraints drive your data structures.

Ask About Edge Cases

What happens on overflow? Concurrent access? Payment failures? Thinking about edge cases reveals hidden complexity.

Follow-up Questions(Questions an interviewer might ask)

UML

Code

Submissions

Solutions

💡 Draw UML class diagrams, sequence diagrams, or any design visualizations. Submit from the Code tab.

Loading canvas...

Loading LLD design...

Description

Editorial

Design Concurrent Bloom Filter

Advanced

Design a thread-safe Bloom filter — a space-efficient probabilistic data structure for set membership testing that guarantees no false negatives but allows a controllable false positive rate (FPR).

Core Use Cases

add() — insert element

Hash item with k hash functions; set the corresponding k bit positions to 1

mightContain() — membership test

Returns DEFINITELY NOT (if any bit is 0) or POSSIBLY YES (if all k bits are 1); false positives possible, false negatives impossible

Optimal parameter calculation

Compute optimal bit array size m = −(n·ln p)/(ln 2)² and hash count k = (m/n)·ln 2 from expected items n and target FPR p

Double hashing (Kirschner-Mitzenmacher)

Generate k hash values from two base hashes: h_i(x) = (h1(x) + i·h2(x)) mod m — avoids computing k independent hash functions

Thread safety via RW lock

mightContain() acquires shared/read lock (concurrent reads); add() acquires exclusive/write lock

Counting Bloom Filter — deletion support

Replace each bit with a counter (4-bit/int); add increments, remove decrements; enables deletion at 4× memory cost

Scalable Bloom Filter — dynamic growth

Stack of Bloom filter layers; when current layer is full, add a new layer with tighter FPR (p_i = p₀·r^i) to maintain overall target

Merge (union) of filters

Bitwise OR of two Bloom filters with same m and k produces a filter representing the union of both sets

Estimated FPR and fill ratio

Runtime FPR estimation: (1 − e^(−kn/m))^k; fill ratio = bits set / total bits — monitor saturation

Constraints

False positives are possible; false negatives are NEVER possible
Standard Bloom filter does not support deletion (removing bits may cause false negatives for other items)
Counting Bloom filter counters are capped at max_count (e.g., 15 for 4-bit) to prevent overflow
Counting Bloom filter remove() only succeeds if all k counters are > 0
Scalable Bloom filter tightens per-layer FPR geometrically so combined FPR stays within target
Merge is only valid between filters with identical m and k parameters
Thread safety: reads use shared lock, writes use exclusive lock

Assumptions

Items can be serialized to strings for hashing
MD5 or FNV1a/DJB2 provide sufficient hash quality for the double-hashing scheme
In-memory only (no persistence)
Counter overflow in Counting BF is handled by capping (not wrapping)
Scalable BF does not compact/merge layers

In Scope

Standard Bloom Filter: add, mightContain, clear, merge, fill ratio, estimated FPR
Optimal m and k calculation from expected items and target FPR
Double hashing (Kirschner-Mitzenmacher) for k hash values from 2 base hashes
Thread safety via ReadWrite lock / shared_mutex
Counting Bloom Filter: add, remove, mightContain with counter array
Scalable Bloom Filter: dynamic layer growth with FPR tightening
Runtime statistics: count, fill ratio, estimated FPR

Out of Scope

Cuckoo filter (alternative with deletion + lower FPR)
Quotient filter
Distributed Bloom filter (across multiple nodes)
Persistence / serialization to disk
Compressed Bloom filter for network transfer
Aging / time-decaying Bloom filter

Approach Guide(Click to expand each section)

Before diving into code, clarify the use cases and edge cases. Understanding the problem deeply leads to better class design.

List Core Use Cases

Identify the primary actions users will perform. For a parking lot: park vehicle, exit vehicle, check availability. Each becomes a method.

Identify Actors

Who interacts with the system? Customers, admins, automated systems? Each actor type may need different interfaces.

Clarify Constraints

What are the limits? Max vehicles, supported vehicle types, payment methods. Constraints drive your data structures.

Ask About Edge Cases

What happens on overflow? Concurrent access? Payment failures? Thinking about edge cases reveals hidden complexity.

Follow-up Questions(Questions an interviewer might ask)

UML

Code

Submissions

Solutions

💡 Draw UML class diagrams, sequence diagrams, or any design visualizations. Submit from the Code tab.

Loading canvas...

Design Concurrent Bloom Filter

Core Use Cases

Constraints

Assumptions

In Scope

Out of Scope

Approach Guide(Click to expand each section)

Gather Requirements~3 min

Identify Classes & Entities~4 min

Define Relationships~3 min

Apply Design Patterns~4 min

Apply SOLID Principles~3 min

Code Organization~2 min

Follow-up Questions(Questions an interviewer might ask)

1How would you choose between a Bloom filter and a Cuckoo filter?

2How would you make the Bloom filter lock-free for reads?

3How would you handle counter overflow in the Counting Bloom Filter?

4How is a Bloom filter used in real systems like Cassandra, BigTable, or Chrome?

5How would you implement a partitioned (blocked) Bloom filter for better cache performance?

Key Topics

Asked At

Design Concurrent Bloom Filter

Core Use Cases

Constraints

Assumptions

In Scope

Out of Scope

Approach Guide(Click to expand each section)

Gather Requirements~3 min

Identify Classes & Entities~4 min

Define Relationships~3 min

Apply Design Patterns~4 min

Apply SOLID Principles~3 min

Code Organization~2 min

Follow-up Questions(Questions an interviewer might ask)

1How would you choose between a Bloom filter and a Cuckoo filter?

2How would you make the Bloom filter lock-free for reads?

3How would you handle counter overflow in the Counting Bloom Filter?

4How is a Bloom filter used in real systems like Cassandra, BigTable, or Chrome?

5How would you implement a partitioned (blocked) Bloom filter for better cache performance?

Key Topics

Asked At