Loading learning content...
Every modern application, from social networks to banking apps, relies on notifications to keep users informed and engaged. When you receive a push notification about a friend's message, an email receipt for your purchase, or an SMS verification code, you're witnessing a sophisticated distributed system in action—one that must deliver billions of messages per day with remarkable reliability.
Notification systems appear deceptively simple on the surface. Send a message when something happens—how hard can that be? But beneath this simplicity lies extraordinary complexity: coordinating multiple delivery channels, respecting user preferences, handling failures gracefully, preventing abuse, and scaling to millions of concurrent users while maintaining sub-second delivery latency.
This page establishes the comprehensive requirements for designing a notification system at scale. You'll understand the functional and non-functional requirements, explore real-world use cases across different industries, and learn the key constraints that make notification system design one of the most challenging and interesting problems in system design.
A notification system is a platform that enables applications to communicate with users through various channels in a timely, reliable, and personalized manner. Unlike direct request-response communication, notifications are asynchronous, event-driven messages that inform users about events, actions, or information they care about.
The Core Mission:
At its heart, a notification system must accomplish three fundamental objectives:
| Channel | Latency | Reach | Cost | Best For |
|---|---|---|---|---|
| Push (Mobile) | Milliseconds | Users with app installed | Very Low | Real-time alerts, time-sensitive updates |
| Push (Web) | Milliseconds | Users with browser opt-in | Very Low | Web app engagement, breaking news |
| Seconds to Minutes | All users with email | Low | Detailed content, receipts, digests | |
| SMS | Seconds | All users with phone | High | Critical alerts, 2FA, verification |
| In-App | Milliseconds | Active app users only | Minimal | Feature announcements, contextual tips |
| Voice Call | Seconds | All users with phone | Very High | Emergency alerts, urgent verification |
Multi-Channel Requirement:
Modern notification systems must support multiple channels simultaneously because:
This multi-channel nature introduces significant complexity. A single notification event might need to be routed to multiple channels, formatted differently for each, delivered with channel-specific retry logic, and tracked for engagement across all channels.
Before diving into architecture, we must precisely define what the notification system must do. Functional requirements describe the features and capabilities the system must provide.
Critical notifications (security alerts, transaction confirmations) must have priority delivery paths that bypass batching and rate limiting, ensuring delivery even under high load. Non-critical notifications (marketing, social updates) can be batched and rate-limited to prevent user fatigue.
Non-functional requirements define the quality attributes of the system—how well it performs its functions. For notification systems, these requirements are particularly demanding due to the high volume, real-time nature, and user-facing impact of the service.
| Requirement | Target | Rationale |
|---|---|---|
| Throughput | 1M+ notifications/second | Handle peak loads during major events, promotions, or viral content |
| Latency (P99) | < 500ms for high-priority | Time-sensitive notifications lose value if delayed significantly |
| Availability | 99.99% (52 min downtime/year) | Notifications are critical; outages impact user trust and business |
| Durability | Zero message loss | Every notification request must be persisted before acknowledgment |
| Delivery Rate | 99.9% successful delivery | Accounting for unreachable devices, invalid tokens, bounced emails |
| Scalability | 10x growth without re-architecture | System must handle organic growth and sudden spikes |
Reliability Considerations:
Notification systems must handle a unique reliability challenge: external dependencies. Unlike purely internal systems, notifications depend on:
Each external provider has its own rate limits, failure modes, delivery semantics, and SLAs. The notification system must abstract these differences while providing consistent reliability guarantees to internal clients.
Most notification systems provide at-least-once delivery semantics because exactly-once is extremely difficult with external providers. This means duplicate notifications are possible during failure recovery. The system must implement deduplication at the application layer to minimize duplicates while prioritizing deliverability.
Understanding the scale of notification systems is crucial for making informed architectural decisions. Let's analyze the volume characteristics of a notification system serving a platform like Facebook, Twitter, or a major e-commerce site.
Assume we're designing for a social platform with 500 million monthly active users (MAU), 100 million daily active users (DAU), averaging 10 notifications received per user per day. This gives us a baseline of 1 billion notifications per day, or approximately 12,000 notifications per second on average, with peak loads of 5-10x during major events.
| Metric | Calculation | Result |
|---|---|---|
| Daily Notifications | 100M DAU × 10 notifications/user | 1 billion/day |
| Average QPS | 1B / 86,400 seconds | ~12,000 notifications/second |
| Peak QPS (10x) | 12,000 × 10 | ~120,000 notifications/second |
| Storage (1 month) | 1B × 30 days × 500 bytes/notification | ~15 TB/month |
| Device Tokens | 500M users × 2 devices avg | 1 billion device tokens |
| Email Addresses | 500M users × 1 email | 500 million emails |
Traffic Patterns:
Notification traffic is highly variable and exhibits distinct patterns:
Diurnal Patterns — Traffic peaks during waking hours in major time zones, with valleys overnight. A global platform sees rolling peaks as different regions wake up.
Event-Driven Spikes — Breaking news, viral content, or major sports events can cause 10-100x normal traffic within seconds.
Scheduled Campaigns — Marketing email blasts at 9 AM can create massive synchronized load.
Cascade Effects — A viral post generates comments, which generate notifications, which generate more engagement—a positive feedback loop.
Infrastructure Implications:
These patterns require:
Notification systems serve vastly different use cases across industries. Understanding these use cases helps prioritize features and make appropriate trade-offs during design.
Social Engagement Notifications (Facebook, Twitter, Instagram)
Social platforms generate massive notification volumes from user interactions:
Challenges:
Designing a notification system involves navigating numerous constraints and challenges that don't exist in simpler systems. Understanding these upfront prevents costly re-architecture later.
When a celebrity with 50 million followers posts content, the notification system must generate 50 million notifications instantly. This 'fanout' problem is one of the hardest challenges in notification system design. Do you generate all 50 million notifications synchronously (hot path becomes slow) or asynchronously (delivery delay)? Neither option is ideal, requiring sophisticated fanout strategies.
Compliance and Legal Constraints:
Notification systems must navigate a complex regulatory landscape:
These regulations affect data retention, consent management, and unsubscribe handling—all of which must be built into the system from the start.
A well-designed notification system must be measurable. Defining success metrics upfront ensures alignment between engineering decisions and business outcomes.
| Metric | Definition | Target |
|---|---|---|
| Delivery Rate | Notifications successfully delivered / Total sent | 99.9% for transactional, > 95% for marketing |
| Time-to-Delivery (P50) | Median time from request to delivery | < 100ms for push, < 5s for email |
| Time-to-Delivery (P99) | 99th percentile delivery time | < 1s for push, < 30s for email |
| Open Rate | Notifications opened / Delivered | Varies by type (10-40% typical) |
| Click-Through Rate | Action clicks / Opens | 5% for actionable notifications |
| Opt-Out Rate | Unsubscribes / Delivered (rolling 7 days) | < 0.1% (lower is better) |
| Bounce Rate | Hard bounces / Email sent | < 2% (above indicates list hygiene issues) |
| Duplicate Rate | Duplicate deliveries / Total deliveries | < 0.01% (near-zero target) |
Delivery rate and time-to-delivery are leading indicators—they measure system health directly. Open rate and click-through rate are lagging indicators—they measure content and targeting effectiveness. A high-performing notification system requires excellence in both categories.
Let's consolidate everything we've covered into a clear requirements specification that will guide our architecture decisions in subsequent pages.
What's Next:
With requirements clearly defined, we'll now dive into the channels themselves. The next page explores Push, Email, SMS, and In-App delivery channels in detail—their protocols, providers, best practices, and failure modes. Understanding each channel's characteristics is essential for designing an effective multi-channel routing strategy.
You now have a comprehensive understanding of notification system requirements. You've learned about functional and non-functional requirements, scale analysis, industry use cases, design constraints, and success metrics. This foundation prepares you for the architectural deep dives ahead.