Design a Unique ID Generator

Design a distributed unique ID generation system inspired by Twitter's Snowflake. The system must generate globally unique, roughly time-ordered, 64-bit numeric IDs at massive scale with zero per-ID coordination between servers.

Scale Estimates

Metric	Value
IDs generated	Millions per second cluster-wide
Per-worker throughput	4,096 IDs per millisecond (12-bit sequence)
Workers (machines)	Up to 1,024 (10-bit worker ID: 5 DC + 5 machine)
Cluster capacity	1,024 × 4,096/ms = ~4 billion IDs/sec
ID size	64 bits (fits in a long / bigint)
Timestamp range	41 bits = 2⁴¹ ms ≈ 69 years from custom epoch
Latency per ID	< 1μs (in-process computation; no network call)

Non-Functional Requirements

Ultra-low latency: ID generation is a simple bitwise operation — sub-microsecond; no network round-trip
High availability: Each worker generates IDs independently — zero coordination per ID; no SPOF
Time-ordered: Timestamp is the most significant field → IDs sort chronologically; critical for DB indexing, event ordering, and time-range queries
64-bit compact: Fits in a database BIGINT column; efficient B-tree indexing (vs 128-bit UUID which fragments indexes)
Scalable: Add more workers by assigning new worker IDs; linear throughput growth
Clock-resilient: Handles NTP clock adjustments without generating duplicate IDs

Scale Estimates

Metric

Value

IDs generated

Millions per second cluster-wide

Per-worker throughput

4,096 IDs per millisecond (12-bit sequence)

Workers (machines)

Up to 1,024 (10-bit worker ID: 5 DC + 5 machine)

Cluster capacity

1,024 × 4,096/ms = ~4 billion IDs/sec

ID size

64 bits (fits in a long / bigint)

Timestamp range

41 bits = 2⁴¹ ms ≈ 69 years from custom epoch

Latency per ID

< 1μs (in-process computation; no network call)

Non-Functional Requirements

Ultra-low latency: ID generation is a simple bitwise operation — sub-microsecond; no network round-trip

High availability: Each worker generates IDs independently — zero coordination per ID; no SPOF

Time-ordered: Timestamp is the most significant field → IDs sort chronologically; critical for DB indexing, event ordering, and time-range queries

64-bit compact: Fits in a database BIGINT column; efficient B-tree indexing (vs 128-bit UUID which fragments indexes)

Scalable: Add more workers by assigning new worker IDs; linear throughput growth

Clock-resilient: Handles NTP clock adjustments without generating duplicate IDs

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Unique ID Generator

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Unique ID Generator

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1Compare the major approaches: UUID, database auto-increment, ticket server, Snowflake. When would you pick each?

2Explain the Twitter Snowflake ID layout in detail. How does each bit field work?

3What happens if the system clock goes backward (clock skew / NTP adjustment)?

4How would you assign unique worker IDs to each server?

5How would you handle the sequence number exhaustion within a single millisecond?

6How does this system achieve high availability?

7How would you extend or modify Snowflake for specific use cases?

Key Topics

Asked At

Design a Unique ID Generator

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1Compare the major approaches: UUID, database auto-increment, ticket server, Snowflake. When would you pick each?

2Explain the Twitter Snowflake ID layout in detail. How does each bit field work?

3What happens if the system clock goes backward (clock skew / NTP adjustment)?

4How would you assign unique worker IDs to each server?

5How would you handle the sequence number exhaustion within a single millisecond?

6How does this system achieve high availability?

7How would you extend or modify Snowflake for specific use cases?

Key Topics

Asked At