Loading system design...
Design a distributed unique ID generation system inspired by Twitter's Snowflake. The system must generate globally unique, roughly time-ordered, 64-bit numeric IDs at massive scale with zero per-ID coordination between servers.
| Metric | Value |
|---|---|
| IDs generated | Millions per second cluster-wide |
| Per-worker throughput | 4,096 IDs per millisecond (12-bit sequence) |
| Workers (machines) | Up to 1,024 (10-bit worker ID: 5 DC + 5 machine) |
| Cluster capacity | 1,024 × 4,096/ms = ~4 billion IDs/sec |
| ID size | 64 bits (fits in a long / bigint) |
| Timestamp range | 41 bits = 2⁴¹ ms ≈ 69 years from custom epoch |
| Latency per ID | < 1μs (in-process computation; no network call) |
Generate globally unique 64-bit numeric IDs across all servers with zero coordination between nodes for every ID
IDs must be roughly time-ordered (sortable by creation time) — newer IDs are always larger than older ones
Generate at least 10,000 unique IDs per second per server (and millions cluster-wide)
IDs must fit in 64 bits (fit in a long / bigint) for efficient indexing and storage
The system must remain available even when individual nodes fail — no single point of failure
IDs must not leak sensitive information (e.g., exact generation rate, number of servers)
Support multiple independent ID namespaces / sequences (e.g., user IDs, order IDs, transaction IDs)
Provide an HTTP/gRPC API for ID generation: single ID or batch of N IDs per request
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?