Loading learning content...
The most sophisticated notification system in the world is worthless if users mute it because they're being bombarded, if external providers block you for exceeding rate limits, or if a sudden traffic spike brings down the entire platform. Rate limiting is the discipline of knowing when NOT to send a notification.
Rate limiting in notification systems operates at multiple levels: protecting individual users from notification fatigue, preventing malicious actors from abusing the system, respecting the quotas imposed by external providers (APNs, FCM, email providers), and safeguarding system resources during traffic spikes. Each level requires different strategies and algorithms.
This page covers the algorithms and architectures for rate limiting notifications: per-user limits to prevent fatigue, per-sender limits to prevent abuse, provider-aware throttling, system-wide protection mechanisms, and strategies for graceful degradation when limits are reached.
Rate limiting in notification systems must operate across multiple dimensions simultaneously. Each dimension addresses different concerns and uses different techniques.
| Dimension | Purpose | Typical Limits | Action on Exceed |
|---|---|---|---|
| Per-User | Prevent notification fatigue | 10 push/hour, 3 email/day | Queue for later or discard |
| Per-Sender | Prevent spam/abuse | 1000 notifications/hour | Rate limit API response |
| Per-Notification-Type | Balance notification mix | 5 marketing/day per user | Queue or drop |
| Per-Provider | Respect external limits | APNs: varies, FCM: 500/sec | Throttle output queue |
| Global/System | Protect infrastructure | 1M notifications/minute | Backpressure, shedding |
Multi-Dimensional Rate Limiting:
A single notification may be subject to multiple rate limits:
def check_rate_limits(notification: Notification) -> RateLimitResult:
checks = [
# Per-user limits
self.check_per_user_limit(
notification.recipient_id,
notification.channel,
limit=10, window_minutes=60
),
# Per-user per-type limits
self.check_per_user_type_limit(
notification.recipient_id,
notification.type,
notification.channel,
limit=5, window_minutes=60
),
# Per-sender limits (for user-generated notifications)
self.check_per_sender_limit(
notification.sender_id,
limit=1000, window_minutes=60
) if notification.sender_id else None,
# Provider limits
self.check_provider_limit(
notification.channel,
provider=self.get_provider(notification)
),
# Global system limits
self.check_global_limit(),
]
# Return first failing limit
for check in checks:
if check and check.exceeded:
return check
return RateLimitResult(exceeded=False)
Each check can have different consequences: per-user limits may queue or drop, per-sender limits may return errors to the API caller, and provider limits must throttle the output queue.
Critical notifications (security alerts, fraud warnings, 2FA codes) must bypass most rate limits. A user receiving their 11th push notification today should still get a fraud alert. Implement priority classes that skip per-user limits while still respecting provider limits.
Several algorithms can implement rate limiting, each with different characteristics. The choice depends on your consistency requirements, fairness goals, and implementation constraints.
Token Bucket Algorithm
Tokens are added to a bucket at a fixed rate. Each request consumes a token. If no tokens are available, the request is rate-limited.
class TokenBucket:
def __init__(self, rate: float, capacity: int):
self.rate = rate # tokens per second
self.capacity = capacity # maximum tokens
self.tokens = capacity
self.last_update = time.time()
def consume(self, tokens: int = 1) -> bool:
now = time.time()
# Add tokens based on time elapsed
elapsed = now - self.last_update
self.tokens = min(
self.capacity,
self.tokens + elapsed * self.rate
)
self.last_update = now
# Try to consume
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
Characteristics:
Algorithm Selection Guide:
| Use Case | Recommended Algorithm |
|---|---|
| Per-user notification limits | Sliding Window Counter |
| API rate limiting | Token Bucket |
| Provider output throttling | Leaky Bucket |
| Exact rate enforcement | Fixed Window Counter |
| Burst-tolerant limiting | Token Bucket with high capacity |
Per-user rate limiting prevents notification fatigue by capping the number of notifications any single user can receive. This is perhaps the most important rate limiting dimension for user experience.
| Channel | Limit | Window | Rationale |
|---|---|---|---|
| Push | 15 notifications | 1 hour | Prevent badge count explosion |
| Push | 50 notifications | 24 hours | Daily cap for very active users |
| 5 emails | 24 hours | Prevent inbox flooding | |
| SMS | 3 messages | 24 hours | SMS is intrusive and expensive |
| In-App | 100 items | 24 hours | Less intrusive, higher tolerance |
Implementation with Redis:
class PerUserRateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
def check_and_increment(
self,
user_id: str,
channel: str,
notification_type: str,
priority: str
) -> RateLimitResult:
# Critical notifications bypass per-user limits
if priority == 'critical':
return RateLimitResult(allowed=True, bypassed=True)
# Get limit configuration
config = self.get_limit_config(channel, notification_type)
# Build rate limit key
window_id = int(time.time() / config.window_seconds)
key = f"rate:{user_id}:{channel}:{window_id}"
# Atomic increment and check
current = self.redis.incr(key)
# Set expiry on first increment
if current == 1:
self.redis.expire(key, config.window_seconds + 60) # Buffer
if current > config.limit:
# Exceeded - decide action
return RateLimitResult(
allowed=False,
current=current,
limit=config.limit,
retry_after=self.calculate_retry_after(key, config),
action=config.exceed_action, # 'queue' or 'drop'
)
return RateLimitResult(allowed=True, current=current)
def calculate_retry_after(self, key: str, config) -> int:
"""Calculate seconds until limit resets."""
ttl = self.redis.ttl(key)
return max(0, ttl)
Handling Exceeded Limits:
When per-user limits are exceeded, you have several options:
When dropping notifications due to rate limits, drop oldest first (they're already stale) and prioritize notifications with higher engagement probability. Use historical data to identify which notification types the user typically engages with.
External providers impose rate limits that your system must respect. Exceeding these limits can result in temporary blocks, degraded service, or even permanent bans.
| Provider | Limit Type | Details | Consequences of Exceeding |
|---|---|---|---|
| Apple APNs | Per-device | Undocumented but exists | HTTP 429, exponential backoff required |
| Firebase FCM | Per-project | 500 msg/sec (burstable to 1000) | HTTP 429, may queue or drop |
| Twilio SMS | Account-based | Configurable, varies by number type | HTTP 429, queuing on their side |
| SendGrid | Tier-based | 100-1M/month depending on plan | Emails queued, may bounce |
| Amazon SES | Account-based | Sending quota (starts at 200/day) | Throttling exception, emails bounced |
Adaptive Throttling:
class AdaptiveProviderThrottler:
def __init__(self, provider: str):
self.provider = provider
self.base_rate = self.get_provider_base_rate(provider)
self.current_rate = self.base_rate
self.error_count = 0
self.success_count = 0
self.last_adjustment = time.time()
def record_result(self, success: bool, error_code: Optional[int] = None):
if success:
self.success_count += 1
self.error_count = 0
else:
if error_code == 429: # Rate limited
self.error_count += 1
self.reduce_rate()
elif error_code in [500, 502, 503]: # Server error
self.error_count += 1
self.reduce_rate(factor=0.5) # More aggressive
# Gradually recover rate after sustained success
if self.success_count > 100 and self.current_rate < self.base_rate:
self.increase_rate()
def reduce_rate(self, factor: float = 0.8):
self.current_rate = max(
self.current_rate * factor,
self.base_rate * 0.1 # Never go below 10% of base
)
self.log_rate_adjustment('decrease', self.current_rate)
def increase_rate(self, factor: float = 1.1):
now = time.time()
# Only increase if enough time has passed
if now - self.last_adjustment > 60:
self.current_rate = min(
self.current_rate * factor,
self.base_rate
)
self.last_adjustment = now
self.success_count = 0
def get_delay(self) -> float:
"""Get delay in seconds between sends."""
return 1.0 / self.current_rate
Provider-Aware Queue Processing:
class ProviderQueueProcessor:
def __init__(self, provider: str):
self.throttler = AdaptiveProviderThrottler(provider)
self.queue = ProviderQueue(provider)
async def process_loop(self):
while True:
# Get delay from throttler
delay = self.throttler.get_delay()
# Fetch batch (size based on current rate)
batch_size = max(1, int(self.throttler.current_rate / 10))
notifications = await self.queue.fetch_batch(batch_size)
if not notifications:
await asyncio.sleep(1)
continue
# Send batch
for notification in notifications:
try:
result = await self.send_to_provider(notification)
self.throttler.record_result(True)
except ProviderError as e:
self.throttler.record_result(False, e.code)
if e.code == 429:
# Re-queue with delay
await self.queue.requeue(notification, delay=60)
await asyncio.sleep(delay)
Many providers return rate limit information in response headers (X-RateLimit-Remaining, X-RateLimit-Reset). Parse these to proactively slow down before hitting limits. This is more efficient than waiting for 429 errors.
In distributed systems, rate limiting must be coordinated across multiple workers. A user's rate limit must be enforced consistently regardless of which worker processes their notifications.
Redis Lua Script for Atomic Rate Limiting:
-- rate_limit.lua
-- KEYS[1] = rate limit key
-- ARGV[1] = limit
-- ARGV[2] = window in seconds
-- ARGV[3] = current timestamp
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Clean old entries
local window_start = now - window
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)
-- Count current entries
local count = redis.call('ZCARD', key)
if count < limit then
-- Add this request
redis.call('ZADD', key, now, now .. ':' .. math.random())
redis.call('EXPIRE', key, window + 1)
return {1, count + 1, limit, 0} -- allowed, current, limit, retry_after
else
-- Find oldest entry to calculate retry time
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retry_after = 0
if oldest[2] then
retry_after = oldest[2] + window - now
end
return {0, count, limit, retry_after} -- denied, current, limit, retry_after
end
Using the Script:
class DistributedRateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
self.script = self.redis.register_script(RATE_LIMIT_LUA)
def check_rate_limit(
self,
key: str,
limit: int,
window_seconds: int
) -> RateLimitResult:
result = self.script(
keys=[key],
args=[limit, window_seconds, int(time.time())]
)
allowed, current, limit, retry_after = result
return RateLimitResult(
allowed=bool(allowed),
current=current,
limit=limit,
retry_after=retry_after
)
Distributed rate limiting depends on consistent time across servers. Use NTP to keep clocks synchronized. Alternatively, use Redis server time (TIME command) instead of local time for rate limit calculations.
Beyond per-user and per-provider limits, the system needs global protection mechanisms to survive traffic spikes, prevent cascade failures, and maintain service quality.
Circuit Breaker Implementation:
class CircuitBreaker:
def __init__(
self,
failure_threshold: int = 5,
recovery_timeout: int = 60,
half_open_requests: int = 3
):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.half_open_requests = half_open_requests
self.state = 'closed' # closed, open, half-open
self.failure_count = 0
self.success_count = 0
self.last_failure_time = 0
def can_execute(self) -> bool:
if self.state == 'closed':
return True
elif self.state == 'open':
# Check if recovery timeout has passed
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = 'half-open'
self.success_count = 0
return True
return False
elif self.state == 'half-open':
# Allow limited requests to test recovery
return self.success_count < self.half_open_requests
def record_success(self):
self.failure_count = 0
if self.state == 'half-open':
self.success_count += 1
if self.success_count >= self.half_open_requests:
self.state = 'closed'
def record_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = 'open'
if self.state == 'half-open':
self.state = 'open'
Load Shedding Strategy:
class LoadShedder:
def __init__(self):
self.queue_depths = QueueMonitor()
self.priority_thresholds = {
'critical': float('inf'), # Never shed
'high': 100000, # Shed above 100K queue depth
'medium': 50000, # Shed above 50K
'low': 20000, # Shed above 20K
'bulk': 10000, # Shed above 10K
}
def should_shed(self, notification: Notification) -> bool:
queue_depth = self.queue_depths.get(notification.channel)
threshold = self.priority_thresholds.get(
notification.priority,
10000
)
return queue_depth > threshold
When rate limits are exceeded, the system must communicate this clearly to both internal services (API callers) and end users (via preference UI).
API Response Format:
Follow industry standards for rate limit response headers:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 3600
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1609459200
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded for this sender",
"details": {
"limit": 1000,
"window": "1h",
"current": 1000,
"retry_after": 3600,
"reset_at": "2024-01-01T12:00:00Z"
}
}
}
Internal Metrics and Alerting:
class RateLimitMetrics:
def record_check(self, result: RateLimitResult, context: dict):
labels = {
'dimension': context['dimension'], # 'per_user', 'per_sender', etc.
'action': result.action, # 'allowed', 'queued', 'dropped'
'priority': context.get('priority'),
}
# Counter for all rate limit checks
self.metrics.increment('rate_limit_checks_total', labels)
if not result.allowed:
# Counter for exceeded limits
self.metrics.increment('rate_limit_exceeded_total', labels)
# Histogram of overage amount
overage = result.current - result.limit
self.metrics.histogram(
'rate_limit_overage',
overage,
labels
)
def alert_on_thresholds(self):
# Alert if too many users hitting rate limits
pct_limited = self.calculate_limited_percentage()
if pct_limited > 5: # More than 5% of users being limited
self.alert(
'high_rate_limit_percentage',
f'{pct_limited}% of users hitting rate limits'
)
| Audience | Communication Method | Content |
|---|---|---|
| API Caller | HTTP 429 + headers | Limit, remaining, reset time |
| Dashboard User | Usage graphs | Current usage vs limits over time |
| End User | Settings UI | 'You've reached your daily limit for...' |
| Ops Team | Metrics/Alerts | % of requests rate-limited by dimension |
Don't wait for limits to be hit. When a sender is at 80% of their limit, include a warning header. This gives callers the opportunity to back off before they're fully blocked.
Rate limiting is essential for a sustainable notification system. It protects users, respects external providers, prevents abuse, and keeps the system stable under load.
Module Complete:
You have now completed the comprehensive study of Notification System design. From requirements gathering through multi-channel delivery, intelligent routing, batching, user preferences, and rate limiting, you understand all the components needed to build a notification system that serves millions of users reliably.
This knowledge prepares you to design notification systems for any scale, make informed trade-offs between user experience and system complexity, and handle the unique challenges of multi-channel delivery at scale.
Congratulations! You've mastered notification system design from end to end. You can now architect systems that deliver billions of notifications while maintaining excellent user experience, compliance, and operational stability. These skills are directly applicable to roles at companies like Facebook, Uber, WhatsApp, and any platform with significant user engagement requirements.