Loading system design...
Design a scalable notification system that delivers notifications to users across multiple channels — push notifications (APNs/FCM), email, SMS, and in-app — triggered by events from various backend services. The system must respect user preferences, support templating, handle batching/grouping, and reliably deliver at massive scale.
| Metric | Value |
|---|---|
| Users | 500 million |
| Notifications sent per day | 10 billion (across all channels) |
| Push notifications / day | 5 billion (~58,000 per second) |
| Emails / day | 3 billion |
| SMS / day | 500 million |
| In-app notifications / day | 1.5 billion |
| Peak throughput (Black Friday) | 10× normal → 580,000 pushes/sec |
| Notification event size | ~500 bytes |
| Template count | ~5,000 (type × channel × locale) |
Send notifications via multiple channels: push notification (iOS APNs / Android FCM), SMS, email, and in-app notification feed
Support event-driven triggering: any backend service can publish a notification event (e.g., order shipped, new follower, payment received)
Template-based notifications: maintain templates with placeholders (e.g., 'Hi {{name}}, your order {{orderId}} has shipped') per channel per locale
User preference management: users can opt-in/out per notification type and per channel (e.g., receive 'order updates' via push but not email)
Respect quiet hours: do not send push/SMS during user-configured quiet hours (e.g., 10 PM – 8 AM local time); queue and deliver after quiet hours end
Notification grouping/batching: aggregate similar notifications (e.g., '5 people liked your photo' instead of 5 separate pushes)
In-app notification feed: a persistently queryable list of all notifications with read/unread status
Delivery tracking and analytics: track sent, delivered, opened, clicked per notification per channel; provide dashboards and APIs
Schedule notifications for future delivery (e.g., reminder 1 hour before event, daily digest at 9 AM)
Priority levels: critical notifications (security alerts, OTP) bypass quiet hours, batching, and rate limits; delivered immediately
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?