System Design (HLD)Notification System

Design a Notification System

LevelAdvanced

Duration90 mins

TopicNotification System

6 / 6

Rate Limiting

The Art of Restraint

The most sophisticated notification system in the world is worthless if users mute it because they're being bombarded, if external providers block you for exceeding rate limits, or if a sudden traffic spike brings down the entire platform. Rate limiting is the discipline of knowing when NOT to send a notification.

Rate limiting in notification systems operates at multiple levels: protecting individual users from notification fatigue, preventing malicious actors from abusing the system, respecting the quotas imposed by external providers (APNs, FCM, email providers), and safeguarding system resources during traffic spikes. Each level requires different strategies and algorithms.

What You Will Learn

This page covers the algorithms and architectures for rate limiting notifications: per-user limits to prevent fatigue, per-sender limits to prevent abuse, provider-aware throttling, system-wide protection mechanisms, and strategies for graceful degradation when limits are reached.

Rate Limiting Dimensions

Rate limiting in notification systems must operate across multiple dimensions simultaneously. Each dimension addresses different concerns and uses different techniques.

Rate Limiting Dimensions
Dimension	Purpose	Typical Limits	Action on Exceed
Per-User	Prevent notification fatigue	10 push/hour, 3 email/day	Queue for later or discard
Per-Sender	Prevent spam/abuse	1000 notifications/hour	Rate limit API response
Per-Notification-Type	Balance notification mix	5 marketing/day per user	Queue or drop
Per-Provider	Respect external limits	APNs: varies, FCM: 500/sec	Throttle output queue
Global/System	Protect infrastructure	1M notifications/minute	Backpressure, shedding

Multi-Dimensional Rate Limiting:

A single notification may be subject to multiple rate limits:

def check_rate_limits(notification: Notification) -> RateLimitResult:
    checks = [
        # Per-user limits
        self.check_per_user_limit(
            notification.recipient_id,
            notification.channel,
            limit=10, window_minutes=60
        ),
        
        # Per-user per-type limits
        self.check_per_user_type_limit(
            notification.recipient_id,
            notification.type,
            notification.channel,
            limit=5, window_minutes=60
        ),
        
        # Per-sender limits (for user-generated notifications)
        self.check_per_sender_limit(
            notification.sender_id,
            limit=1000, window_minutes=60
        ) if notification.sender_id else None,
        
        # Provider limits
        self.check_provider_limit(
            notification.channel,
            provider=self.get_provider(notification)
        ),
        
        # Global system limits
        self.check_global_limit(),
    ]
    
    # Return first failing limit
    for check in checks:
        if check and check.exceeded:
            return check
    
    return RateLimitResult(exceeded=False)

Each check can have different consequences: per-user limits may queue or drop, per-sender limits may return errors to the API caller, and provider limits must throttle the output queue.

Priority Bypass

Critical notifications (security alerts, fraud warnings, 2FA codes) must bypass most rate limits. A user receiving their 11th push notification today should still get a fraud alert. Implement priority classes that skip per-user limits while still respecting provider limits.

Rate Limiting Algorithms

Several algorithms can implement rate limiting, each with different characteristics. The choice depends on your consistency requirements, fairness goals, and implementation constraints.

Token Bucket Algorithm

Tokens are added to a bucket at a fixed rate. Each request consumes a token. If no tokens are available, the request is rate-limited.

class TokenBucket:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate          # tokens per second
        self.capacity = capacity  # maximum tokens
        self.tokens = capacity
        self.last_update = time.time()
    
    def consume(self, tokens: int = 1) -> bool:
        now = time.time()
        
        # Add tokens based on time elapsed
        elapsed = now - self.last_update
        self.tokens = min(
            self.capacity,
            self.tokens + elapsed * self.rate
        )
        self.last_update = now
        
        # Try to consume
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False

Characteristics:

✅ Allows controlled bursting up to bucket capacity
✅ Smooth rate limiting over time
✅ Simple to understand and implement
❌ Can allow brief exceeding of average rate

Algorithm Selection Guide:

Use Case	Recommended Algorithm
Per-user notification limits	Sliding Window Counter
API rate limiting	Token Bucket
Provider output throttling	Leaky Bucket
Exact rate enforcement	Fixed Window Counter
Burst-tolerant limiting	Token Bucket with high capacity

Per-User Rate Limiting

Per-user rate limiting prevents notification fatigue by capping the number of notifications any single user can receive. This is perhaps the most important rate limiting dimension for user experience.

Typical Per-User Rate Limits
Channel	Limit	Window	Rationale
Push	15 notifications	1 hour	Prevent badge count explosion
Push	50 notifications	24 hours	Daily cap for very active users
Email	5 emails	24 hours	Prevent inbox flooding
SMS	3 messages	24 hours	SMS is intrusive and expensive
In-App	100 items	24 hours	Less intrusive, higher tolerance

Implementation with Redis:

class PerUserRateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client
    
    def check_and_increment(
        self,
        user_id: str,
        channel: str,
        notification_type: str,
        priority: str
    ) -> RateLimitResult:
        # Critical notifications bypass per-user limits
        if priority == 'critical':
            return RateLimitResult(allowed=True, bypassed=True)
        
        # Get limit configuration
        config = self.get_limit_config(channel, notification_type)
        
        # Build rate limit key
        window_id = int(time.time() / config.window_seconds)
        key = f"rate:{user_id}:{channel}:{window_id}"
        
        # Atomic increment and check
        current = self.redis.incr(key)
        
        # Set expiry on first increment
        if current == 1:
            self.redis.expire(key, config.window_seconds + 60)  # Buffer
        
        if current > config.limit:
            # Exceeded - decide action
            return RateLimitResult(
                allowed=False,
                current=current,
                limit=config.limit,
                retry_after=self.calculate_retry_after(key, config),
                action=config.exceed_action,  # 'queue' or 'drop'
            )
        
        return RateLimitResult(allowed=True, current=current)
    
    def calculate_retry_after(self, key: str, config) -> int:
        """Calculate seconds until limit resets."""
        ttl = self.redis.ttl(key)
        return max(0, ttl)

Handling Exceeded Limits:

When per-user limits are exceeded, you have several options:

Drop — Discard the notification entirely (use for low-priority)
Queue — Save for delivery when limit resets (use for medium-priority)
Batch — Convert to digest format and aggregate (best for social updates)
Demote — Change channel (push → in-app) if less intrusive channel has capacity

Smart Shedding

When dropping notifications due to rate limits, drop oldest first (they're already stale) and prioritize notifications with higher engagement probability. Use historical data to identify which notification types the user typically engages with.

Provider Rate Limiting

External providers impose rate limits that your system must respect. Exceeding these limits can result in temporary blocks, degraded service, or even permanent bans.

Provider Rate Limit Examples
Provider	Limit Type	Details	Consequences of Exceeding
Apple APNs	Per-device	Undocumented but exists	HTTP 429, exponential backoff required
Firebase FCM	Per-project	500 msg/sec (burstable to 1000)	HTTP 429, may queue or drop
Twilio SMS	Account-based	Configurable, varies by number type	HTTP 429, queuing on their side
SendGrid	Tier-based	100-1M/month depending on plan	Emails queued, may bounce
Amazon SES	Account-based	Sending quota (starts at 200/day)	Throttling exception, emails bounced

Adaptive Throttling:

class AdaptiveProviderThrottler:
    def __init__(self, provider: str):
        self.provider = provider
        self.base_rate = self.get_provider_base_rate(provider)
        self.current_rate = self.base_rate
        self.error_count = 0
        self.success_count = 0
        self.last_adjustment = time.time()
    
    def record_result(self, success: bool, error_code: Optional[int] = None):
        if success:
            self.success_count += 1
            self.error_count = 0
        else:
            if error_code == 429:  # Rate limited
                self.error_count += 1
                self.reduce_rate()
            elif error_code in [500, 502, 503]:  # Server error
                self.error_count += 1
                self.reduce_rate(factor=0.5)  # More aggressive
        
        # Gradually recover rate after sustained success
        if self.success_count > 100 and self.current_rate < self.base_rate:
            self.increase_rate()
    
    def reduce_rate(self, factor: float = 0.8):
        self.current_rate = max(
            self.current_rate * factor,
            self.base_rate * 0.1  # Never go below 10% of base
        )
        self.log_rate_adjustment('decrease', self.current_rate)
    
    def increase_rate(self, factor: float = 1.1):
        now = time.time()
        # Only increase if enough time has passed
        if now - self.last_adjustment > 60:
            self.current_rate = min(
                self.current_rate * factor,
                self.base_rate
            )
            self.last_adjustment = now
            self.success_count = 0
    
    def get_delay(self) -> float:
        """Get delay in seconds between sends."""
        return 1.0 / self.current_rate

Provider-Aware Queue Processing:

class ProviderQueueProcessor:
    def __init__(self, provider: str):
        self.throttler = AdaptiveProviderThrottler(provider)
        self.queue = ProviderQueue(provider)
    
    async def process_loop(self):
        while True:
            # Get delay from throttler
            delay = self.throttler.get_delay()
            
            # Fetch batch (size based on current rate)
            batch_size = max(1, int(self.throttler.current_rate / 10))
            notifications = await self.queue.fetch_batch(batch_size)
            
            if not notifications:
                await asyncio.sleep(1)
                continue
            
            # Send batch
            for notification in notifications:
                try:
                    result = await self.send_to_provider(notification)
                    self.throttler.record_result(True)
                except ProviderError as e:
                    self.throttler.record_result(False, e.code)
                    if e.code == 429:
                        # Re-queue with delay
                        await self.queue.requeue(notification, delay=60)
                
                await asyncio.sleep(delay)

Rate Limit Headers

Many providers return rate limit information in response headers (X-RateLimit-Remaining, X-RateLimit-Reset). Parse these to proactively slow down before hitting limits. This is more efficient than waiting for 429 errors.

Distributed Rate Limiting

In distributed systems, rate limiting must be coordinated across multiple workers. A user's rate limit must be enforced consistently regardless of which worker processes their notifications.

Centralized (Redis)

•Single source of truth
•Consistent across all workers
•Network latency per check
•Redis becomes critical dependency
•May use Lua scripts for atomicity

Local + Sync

•Local counters for speed
•Periodic sync to central store
•Allows slight over-limiting
•Resilient to Redis failures
•Good for eventual consistency

Redis Lua Script for Atomic Rate Limiting:

-- rate_limit.lua
-- KEYS[1] = rate limit key
-- ARGV[1] = limit
-- ARGV[2] = window in seconds
-- ARGV[3] = current timestamp

local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Clean old entries
local window_start = now - window
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)

-- Count current entries
local count = redis.call('ZCARD', key)

if count < limit then
    -- Add this request
    redis.call('ZADD', key, now, now .. ':' .. math.random())
    redis.call('EXPIRE', key, window + 1)
    return {1, count + 1, limit, 0}  -- allowed, current, limit, retry_after
else
    -- Find oldest entry to calculate retry time
    local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
    local retry_after = 0
    if oldest[2] then
        retry_after = oldest[2] + window - now
    end
    return {0, count, limit, retry_after}  -- denied, current, limit, retry_after
end

Using the Script:

class DistributedRateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.script = self.redis.register_script(RATE_LIMIT_LUA)
    
    def check_rate_limit(
        self,
        key: str,
        limit: int,
        window_seconds: int
    ) -> RateLimitResult:
        result = self.script(
            keys=[key],
            args=[limit, window_seconds, int(time.time())]
        )
        
        allowed, current, limit, retry_after = result
        
        return RateLimitResult(
            allowed=bool(allowed),
            current=current,
            limit=limit,
            retry_after=retry_after
        )

Clock Synchronization

Distributed rate limiting depends on consistent time across servers. Use NTP to keep clocks synchronized. Alternatively, use Redis server time (TIME command) instead of local time for rate limit calculations.

System-Wide Protection

Beyond per-user and per-provider limits, the system needs global protection mechanisms to survive traffic spikes, prevent cascade failures, and maintain service quality.

System Protection Mechanisms

•Circuit Breakers — Stop sending to providers with high error rates. Prevents futile retries and allows provider recovery.
•Back Pressure — When queues grow too large, push back on producers. Rate limit notification API itself.
•Load Shedding — Drop low-priority notifications when system is overloaded. Preserve capacity for critical notifications.
•Admission Control — Reject new notifications when processing capacity is exhausted. Return clear error to callers.
•Bulkheads — Isolate resources by notification type/priority. Marketing surge shouldn't affect transactional delivery.

Circuit Breaker Implementation:

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: int = 60,
        half_open_requests: int = 3
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.half_open_requests = half_open_requests
        
        self.state = 'closed'  # closed, open, half-open
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = 0
    
    def can_execute(self) -> bool:
        if self.state == 'closed':
            return True
        
        elif self.state == 'open':
            # Check if recovery timeout has passed
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = 'half-open'
                self.success_count = 0
                return True
            return False
        
        elif self.state == 'half-open':
            # Allow limited requests to test recovery
            return self.success_count < self.half_open_requests
    
    def record_success(self):
        self.failure_count = 0
        
        if self.state == 'half-open':
            self.success_count += 1
            if self.success_count >= self.half_open_requests:
                self.state = 'closed'
    
    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = 'open'
        
        if self.state == 'half-open':
            self.state = 'open'

Load Shedding Strategy:

class LoadShedder:
    def __init__(self):
        self.queue_depths = QueueMonitor()
        self.priority_thresholds = {
            'critical': float('inf'),  # Never shed
            'high': 100000,             # Shed above 100K queue depth
            'medium': 50000,            # Shed above 50K
            'low': 20000,               # Shed above 20K
            'bulk': 10000,              # Shed above 10K
        }
    
    def should_shed(self, notification: Notification) -> bool:
        queue_depth = self.queue_depths.get(notification.channel)
        threshold = self.priority_thresholds.get(
            notification.priority, 
            10000
        )
        
        return queue_depth > threshold

Rate Limit Communication

When rate limits are exceeded, the system must communicate this clearly to both internal services (API callers) and end users (via preference UI).

API Response Format:

Follow industry standards for rate limit response headers:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 3600
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1609459200

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded for this sender",
    "details": {
      "limit": 1000,
      "window": "1h",
      "current": 1000,
      "retry_after": 3600,
      "reset_at": "2024-01-01T12:00:00Z"
    }
  }
}

Internal Metrics and Alerting:

class RateLimitMetrics:
    def record_check(self, result: RateLimitResult, context: dict):
        labels = {
            'dimension': context['dimension'],  # 'per_user', 'per_sender', etc.
            'action': result.action,  # 'allowed', 'queued', 'dropped'
            'priority': context.get('priority'),
        }
        
        # Counter for all rate limit checks
        self.metrics.increment('rate_limit_checks_total', labels)
        
        if not result.allowed:
            # Counter for exceeded limits
            self.metrics.increment('rate_limit_exceeded_total', labels)
            
            # Histogram of overage amount
            overage = result.current - result.limit
            self.metrics.histogram(
                'rate_limit_overage',
                overage,
                labels
            )
    
    def alert_on_thresholds(self):
        # Alert if too many users hitting rate limits
        pct_limited = self.calculate_limited_percentage()
        if pct_limited > 5:  # More than 5% of users being limited
            self.alert(
                'high_rate_limit_percentage',
                f'{pct_limited}% of users hitting rate limits'
            )

Rate Limit Status Communication
Audience	Communication Method	Content
API Caller	HTTP 429 + headers	Limit, remaining, reset time
Dashboard User	Usage graphs	Current usage vs limits over time
End User	Settings UI	'You've reached your daily limit for...'
Ops Team	Metrics/Alerts	% of requests rate-limited by dimension

Proactive Warnings

Don't wait for limits to be hit. When a sender is at 80% of their limit, include a warning header. This gives callers the opportunity to back off before they're fully blocked.

Summary: Rate Limiting

Rate limiting is essential for a sustainable notification system. It protects users, respects external providers, prevents abuse, and keeps the system stable under load.

Key Takeaways

•Multi-Dimensional Limits — Apply limits at per-user, per-sender, per-type, per-provider, and system levels
•Algorithm Selection — Token bucket for burst tolerance, sliding window for accuracy, leaky bucket for output smoothing
•Per-User Limits — Protect users from fatigue; most impactful for user experience
•Priority Bypass — Critical notifications must bypass per-user limits while still respecting provider limits
•Adaptive Throttling — Adjust sending rate based on provider feedback; back off on errors
•Distributed Coordination — Use Redis with Lua scripts for atomic, distributed rate limiting
•System Protection — Circuit breakers, load shedding, and back pressure prevent cascade failures
•Clear Communication — Return informative rate limit responses with retry guidance

Module Complete:

You have now completed the comprehensive study of Notification System design. From requirements gathering through multi-channel delivery, intelligent routing, batching, user preferences, and rate limiting, you understand all the components needed to build a notification system that serves millions of users reliably.

This knowledge prepares you to design notification systems for any scale, make informed trade-offs between user experience and system complexity, and handle the unique challenges of multi-channel delivery at scale.

Module Complete

Congratulations! You've mastered notification system design from end to end. You can now architect systems that deliver billions of notifications while maintaining excellent user experience, compliance, and operational stability. These skills are directly applicable to roles at companies like Facebook, Uber, WhatsApp, and any platform with significant user engagement requirements.

6 / 6

Loading learning content...

System Design (HLD)Notification System

Design a Notification System

LevelAdvanced

Duration90 mins

TopicNotification System

6 / 6

Rate Limiting

The Art of Restraint

What You Will Learn

Rate Limiting Dimensions

Rate limiting in notification systems must operate across multiple dimensions simultaneously. Each dimension addresses different concerns and uses different techniques.

Rate Limiting Dimensions
Dimension	Purpose	Typical Limits	Action on Exceed
Per-User	Prevent notification fatigue	10 push/hour, 3 email/day	Queue for later or discard
Per-Sender	Prevent spam/abuse	1000 notifications/hour	Rate limit API response
Per-Notification-Type	Balance notification mix	5 marketing/day per user	Queue or drop
Per-Provider	Respect external limits	APNs: varies, FCM: 500/sec	Throttle output queue
Global/System	Protect infrastructure	1M notifications/minute	Backpressure, shedding

Multi-Dimensional Rate Limiting:

A single notification may be subject to multiple rate limits:

def check_rate_limits(notification: Notification) -> RateLimitResult:
    checks = [
        # Per-user limits
        self.check_per_user_limit(
            notification.recipient_id,
            notification.channel,
            limit=10, window_minutes=60
        ),
        
        # Per-user per-type limits
        self.check_per_user_type_limit(
            notification.recipient_id,
            notification.type,
            notification.channel,
            limit=5, window_minutes=60
        ),
        
        # Per-sender limits (for user-generated notifications)
        self.check_per_sender_limit(
            notification.sender_id,
            limit=1000, window_minutes=60
        ) if notification.sender_id else None,
        
        # Provider limits
        self.check_provider_limit(
            notification.channel,
            provider=self.get_provider(notification)
        ),
        
        # Global system limits
        self.check_global_limit(),
    ]
    
    # Return first failing limit
    for check in checks:
        if check and check.exceeded:
            return check
    
    return RateLimitResult(exceeded=False)

Each check can have different consequences: per-user limits may queue or drop, per-sender limits may return errors to the API caller, and provider limits must throttle the output queue.

Priority Bypass

Rate Limiting Algorithms

Several algorithms can implement rate limiting, each with different characteristics. The choice depends on your consistency requirements, fairness goals, and implementation constraints.

Token Bucket Algorithm

Tokens are added to a bucket at a fixed rate. Each request consumes a token. If no tokens are available, the request is rate-limited.

class TokenBucket:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate          # tokens per second
        self.capacity = capacity  # maximum tokens
        self.tokens = capacity
        self.last_update = time.time()
    
    def consume(self, tokens: int = 1) -> bool:
        now = time.time()
        
        # Add tokens based on time elapsed
        elapsed = now - self.last_update
        self.tokens = min(
            self.capacity,
            self.tokens + elapsed * self.rate
        )
        self.last_update = now
        
        # Try to consume
        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False

Characteristics:

✅ Allows controlled bursting up to bucket capacity
✅ Smooth rate limiting over time
✅ Simple to understand and implement
❌ Can allow brief exceeding of average rate

Algorithm Selection Guide:

Use Case	Recommended Algorithm
Per-user notification limits	Sliding Window Counter
API rate limiting	Token Bucket
Provider output throttling	Leaky Bucket
Exact rate enforcement	Fixed Window Counter
Burst-tolerant limiting	Token Bucket with high capacity

Per-User Rate Limiting

Typical Per-User Rate Limits
Channel	Limit	Window	Rationale
Push	15 notifications	1 hour	Prevent badge count explosion
Push	50 notifications	24 hours	Daily cap for very active users
Email	5 emails	24 hours	Prevent inbox flooding
SMS	3 messages	24 hours	SMS is intrusive and expensive
In-App	100 items	24 hours	Less intrusive, higher tolerance

Implementation with Redis:

class PerUserRateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client
    
    def check_and_increment(
        self,
        user_id: str,
        channel: str,
        notification_type: str,
        priority: str
    ) -> RateLimitResult:
        # Critical notifications bypass per-user limits
        if priority == 'critical':
            return RateLimitResult(allowed=True, bypassed=True)
        
        # Get limit configuration
        config = self.get_limit_config(channel, notification_type)
        
        # Build rate limit key
        window_id = int(time.time() / config.window_seconds)
        key = f"rate:{user_id}:{channel}:{window_id}"
        
        # Atomic increment and check
        current = self.redis.incr(key)
        
        # Set expiry on first increment
        if current == 1:
            self.redis.expire(key, config.window_seconds + 60)  # Buffer
        
        if current > config.limit:
            # Exceeded - decide action
            return RateLimitResult(
                allowed=False,
                current=current,
                limit=config.limit,
                retry_after=self.calculate_retry_after(key, config),
                action=config.exceed_action,  # 'queue' or 'drop'
            )
        
        return RateLimitResult(allowed=True, current=current)
    
    def calculate_retry_after(self, key: str, config) -> int:
        """Calculate seconds until limit resets."""
        ttl = self.redis.ttl(key)
        return max(0, ttl)

Handling Exceeded Limits:

When per-user limits are exceeded, you have several options:

Drop — Discard the notification entirely (use for low-priority)
Queue — Save for delivery when limit resets (use for medium-priority)
Batch — Convert to digest format and aggregate (best for social updates)
Demote — Change channel (push → in-app) if less intrusive channel has capacity

Smart Shedding

Provider Rate Limiting

External providers impose rate limits that your system must respect. Exceeding these limits can result in temporary blocks, degraded service, or even permanent bans.

Provider Rate Limit Examples
Provider	Limit Type	Details	Consequences of Exceeding
Apple APNs	Per-device	Undocumented but exists	HTTP 429, exponential backoff required
Firebase FCM	Per-project	500 msg/sec (burstable to 1000)	HTTP 429, may queue or drop
Twilio SMS	Account-based	Configurable, varies by number type	HTTP 429, queuing on their side
SendGrid	Tier-based	100-1M/month depending on plan	Emails queued, may bounce
Amazon SES	Account-based	Sending quota (starts at 200/day)	Throttling exception, emails bounced

Adaptive Throttling:

class AdaptiveProviderThrottler:
    def __init__(self, provider: str):
        self.provider = provider
        self.base_rate = self.get_provider_base_rate(provider)
        self.current_rate = self.base_rate
        self.error_count = 0
        self.success_count = 0
        self.last_adjustment = time.time()
    
    def record_result(self, success: bool, error_code: Optional[int] = None):
        if success:
            self.success_count += 1
            self.error_count = 0
        else:
            if error_code == 429:  # Rate limited
                self.error_count += 1
                self.reduce_rate()
            elif error_code in [500, 502, 503]:  # Server error
                self.error_count += 1
                self.reduce_rate(factor=0.5)  # More aggressive
        
        # Gradually recover rate after sustained success
        if self.success_count > 100 and self.current_rate < self.base_rate:
            self.increase_rate()
    
    def reduce_rate(self, factor: float = 0.8):
        self.current_rate = max(
            self.current_rate * factor,
            self.base_rate * 0.1  # Never go below 10% of base
        )
        self.log_rate_adjustment('decrease', self.current_rate)
    
    def increase_rate(self, factor: float = 1.1):
        now = time.time()
        # Only increase if enough time has passed
        if now - self.last_adjustment > 60:
            self.current_rate = min(
                self.current_rate * factor,
                self.base_rate
            )
            self.last_adjustment = now
            self.success_count = 0
    
    def get_delay(self) -> float:
        """Get delay in seconds between sends."""
        return 1.0 / self.current_rate

Provider-Aware Queue Processing:

class ProviderQueueProcessor:
    def __init__(self, provider: str):
        self.throttler = AdaptiveProviderThrottler(provider)
        self.queue = ProviderQueue(provider)
    
    async def process_loop(self):
        while True:
            # Get delay from throttler
            delay = self.throttler.get_delay()
            
            # Fetch batch (size based on current rate)
            batch_size = max(1, int(self.throttler.current_rate / 10))
            notifications = await self.queue.fetch_batch(batch_size)
            
            if not notifications:
                await asyncio.sleep(1)
                continue
            
            # Send batch
            for notification in notifications:
                try:
                    result = await self.send_to_provider(notification)
                    self.throttler.record_result(True)
                except ProviderError as e:
                    self.throttler.record_result(False, e.code)
                    if e.code == 429:
                        # Re-queue with delay
                        await self.queue.requeue(notification, delay=60)
                
                await asyncio.sleep(delay)

Rate Limit Headers

Distributed Rate Limiting

In distributed systems, rate limiting must be coordinated across multiple workers. A user's rate limit must be enforced consistently regardless of which worker processes their notifications.

Centralized (Redis)

•Single source of truth
•Consistent across all workers
•Network latency per check
•Redis becomes critical dependency
•May use Lua scripts for atomicity

Local + Sync

•Local counters for speed
•Periodic sync to central store
•Allows slight over-limiting
•Resilient to Redis failures
•Good for eventual consistency

Redis Lua Script for Atomic Rate Limiting:

-- rate_limit.lua
-- KEYS[1] = rate limit key
-- ARGV[1] = limit
-- ARGV[2] = window in seconds
-- ARGV[3] = current timestamp

local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Clean old entries
local window_start = now - window
redis.call('ZREMRANGEBYSCORE', key, 0, window_start)

-- Count current entries
local count = redis.call('ZCARD', key)

if count < limit then
    -- Add this request
    redis.call('ZADD', key, now, now .. ':' .. math.random())
    redis.call('EXPIRE', key, window + 1)
    return {1, count + 1, limit, 0}  -- allowed, current, limit, retry_after
else
    -- Find oldest entry to calculate retry time
    local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
    local retry_after = 0
    if oldest[2] then
        retry_after = oldest[2] + window - now
    end
    return {0, count, limit, retry_after}  -- denied, current, limit, retry_after
end

Using the Script:

class DistributedRateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.script = self.redis.register_script(RATE_LIMIT_LUA)
    
    def check_rate_limit(
        self,
        key: str,
        limit: int,
        window_seconds: int
    ) -> RateLimitResult:
        result = self.script(
            keys=[key],
            args=[limit, window_seconds, int(time.time())]
        )
        
        allowed, current, limit, retry_after = result
        
        return RateLimitResult(
            allowed=bool(allowed),
            current=current,
            limit=limit,
            retry_after=retry_after
        )

Clock Synchronization

System-Wide Protection

Beyond per-user and per-provider limits, the system needs global protection mechanisms to survive traffic spikes, prevent cascade failures, and maintain service quality.

System Protection Mechanisms

•Circuit Breakers — Stop sending to providers with high error rates. Prevents futile retries and allows provider recovery.
•Back Pressure — When queues grow too large, push back on producers. Rate limit notification API itself.
•Load Shedding — Drop low-priority notifications when system is overloaded. Preserve capacity for critical notifications.
•Admission Control — Reject new notifications when processing capacity is exhausted. Return clear error to callers.
•Bulkheads — Isolate resources by notification type/priority. Marketing surge shouldn't affect transactional delivery.

Circuit Breaker Implementation:

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: int = 60,
        half_open_requests: int = 3
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.half_open_requests = half_open_requests
        
        self.state = 'closed'  # closed, open, half-open
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = 0
    
    def can_execute(self) -> bool:
        if self.state == 'closed':
            return True
        
        elif self.state == 'open':
            # Check if recovery timeout has passed
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = 'half-open'
                self.success_count = 0
                return True
            return False
        
        elif self.state == 'half-open':
            # Allow limited requests to test recovery
            return self.success_count < self.half_open_requests
    
    def record_success(self):
        self.failure_count = 0
        
        if self.state == 'half-open':
            self.success_count += 1
            if self.success_count >= self.half_open_requests:
                self.state = 'closed'
    
    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = 'open'
        
        if self.state == 'half-open':
            self.state = 'open'

Load Shedding Strategy:

class LoadShedder:
    def __init__(self):
        self.queue_depths = QueueMonitor()
        self.priority_thresholds = {
            'critical': float('inf'),  # Never shed
            'high': 100000,             # Shed above 100K queue depth
            'medium': 50000,            # Shed above 50K
            'low': 20000,               # Shed above 20K
            'bulk': 10000,              # Shed above 10K
        }
    
    def should_shed(self, notification: Notification) -> bool:
        queue_depth = self.queue_depths.get(notification.channel)
        threshold = self.priority_thresholds.get(
            notification.priority, 
            10000
        )
        
        return queue_depth > threshold

Rate Limit Communication

When rate limits are exceeded, the system must communicate this clearly to both internal services (API callers) and end users (via preference UI).

API Response Format:

Follow industry standards for rate limit response headers:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 3600
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1609459200

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded for this sender",
    "details": {
      "limit": 1000,
      "window": "1h",
      "current": 1000,
      "retry_after": 3600,
      "reset_at": "2024-01-01T12:00:00Z"
    }
  }
}

Internal Metrics and Alerting:

class RateLimitMetrics:
    def record_check(self, result: RateLimitResult, context: dict):
        labels = {
            'dimension': context['dimension'],  # 'per_user', 'per_sender', etc.
            'action': result.action,  # 'allowed', 'queued', 'dropped'
            'priority': context.get('priority'),
        }
        
        # Counter for all rate limit checks
        self.metrics.increment('rate_limit_checks_total', labels)
        
        if not result.allowed:
            # Counter for exceeded limits
            self.metrics.increment('rate_limit_exceeded_total', labels)
            
            # Histogram of overage amount
            overage = result.current - result.limit
            self.metrics.histogram(
                'rate_limit_overage',
                overage,
                labels
            )
    
    def alert_on_thresholds(self):
        # Alert if too many users hitting rate limits
        pct_limited = self.calculate_limited_percentage()
        if pct_limited > 5:  # More than 5% of users being limited
            self.alert(
                'high_rate_limit_percentage',
                f'{pct_limited}% of users hitting rate limits'
            )

Rate Limit Status Communication
Audience	Communication Method	Content
API Caller	HTTP 429 + headers	Limit, remaining, reset time
Dashboard User	Usage graphs	Current usage vs limits over time
End User	Settings UI	'You've reached your daily limit for...'
Ops Team	Metrics/Alerts	% of requests rate-limited by dimension

Proactive Warnings

Don't wait for limits to be hit. When a sender is at 80% of their limit, include a warning header. This gives callers the opportunity to back off before they're fully blocked.

Summary: Rate Limiting

Rate limiting is essential for a sustainable notification system. It protects users, respects external providers, prevents abuse, and keeps the system stable under load.

Key Takeaways

•Multi-Dimensional Limits — Apply limits at per-user, per-sender, per-type, per-provider, and system levels
•Algorithm Selection — Token bucket for burst tolerance, sliding window for accuracy, leaky bucket for output smoothing
•Per-User Limits — Protect users from fatigue; most impactful for user experience
•Priority Bypass — Critical notifications must bypass per-user limits while still respecting provider limits
•Adaptive Throttling — Adjust sending rate based on provider feedback; back off on errors
•Distributed Coordination — Use Redis with Lua scripts for atomic, distributed rate limiting
•System Protection — Circuit breakers, load shedding, and back pressure prevent cascade failures
•Clear Communication — Return informative rate limit responses with retry guidance

Module Complete:

Module Complete

6 / 6