Loading content...
You've built the infrastructure, integrated with APNs, FCM, and web push services, and your server is sending notifications. But here's the uncomfortable truth: sending a notification is not the same as delivering it. The journey from your server to the user's eyeballs involves multiple systems, any of which can delay, drop, or modify your notification.
Consider the path a notification takes:
At each step, things can go wrong. Devices can be offline. Users can have notifications disabled. Battery optimization can delay delivery. The notification might be silently dropped or grouped with others. Understanding these delivery mechanics is essential for building reliable push notification systems.
Industry benchmarks suggest that only 50-80% of push notifications are actually seen by users. The rest are dropped, delayed past relevance, or lost to disabled permissions. Your goal isn't 100% delivery—it's understanding your delivery funnel and optimizing each step.
Push notification platforms provide best-effort delivery—they try to deliver notifications but do not guarantee success. Understanding what this means in practice is crucial for designing systems that work reliably.
What APNs Guarantees (and Doesn't):
What FCM Guarantees (and Doesn't):
collapse_key replace each other, reducing redundancy.| Aspect | APNs | FCM | Web Push |
|---|---|---|---|
| Messages stored offline | 1 per app | Up to 100 | Varies by service |
| Message expiration | Up to 30 days | Up to 28 days | Up to 28 days |
| Delivery confirmation | Not available | Via Analytics | Not available |
| Collapse/replace | apns-collapse-id | collapse_key | Topic header |
| Priority levels | 5 or 10 | normal or high | very-low to high |
Implications for System Design:
Don't rely on push for critical data delivery: Push should trigger the app to fetch data, not be the sole data transport.
Design for idempotency: Users may receive old notifications after app updates or device restarts. Clicking a notification about an already-resolved issue creates confusion.
Use appropriate TTL: Time-sensitive notifications (e.g., "Your driver is arriving") should have short TTL. Stale notifications are worse than no notification.
Leverage collapse keys wisely: For notifications that supersede each other (unread count, status updates), use collapse keys to prevent notification spam.
Best practice: treat notifications as triggers, not payloads. Send minimal data in the notification ('You have a new message') and fetch the actual content when the app opens. This ensures the user always sees current data, handles offline scenarios gracefully, and reduces payload size.
To optimize delivery, you must understand where notifications can fail. Let's trace the complete pipeline and identify failure points at each stage:
Stage 1: Your Application Server
Stage 2: Platform Push Service
Stage 3: Network Transit
Stage 4: Device Processing
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
# Comprehensive delivery tracking implementation from enum import Enumfrom dataclasses import dataclassfrom datetime import datetimeimport uuid class DeliveryStatus(Enum): # Server-side states PENDING = "pending" # Queued, not yet sent SENDING = "sending" # Currently being sent SENT = "sent" # Accepted by push service FAILED_PERMANENT = "failed_permanent" # Won't retry FAILED_TRANSIENT = "failed_transient" # Will retry # Delivery states (where trackable) DELIVERED = "delivered" # Confirmed delivered (FCM only) DISPLAYED = "displayed" # Confirmed displayed CLICKED = "clicked" # User clicked notification DISMISSED = "dismissed" # User dismissed notification # Terminal failure states EXPIRED = "expired" # TTL exceeded before delivery INVALID_TOKEN = "invalid_token" # Token no longer valid UNSUBSCRIBED = "unsubscribed" # User unsubscribed @dataclassclass NotificationRecord: """ Complete record of a notification's lifecycle. Used for tracking, debugging, and analytics. """ id: str user_id: str device_token: str platform: str # 'ios', 'android', 'web' # Content title: str body: str payload_hash: str # For deduplication # State tracking status: DeliveryStatus created_at: datetime sent_at: datetime | None = None delivered_at: datetime | None = None clicked_at: datetime | None = None # Platform responses platform_message_id: str | None = None # APNs ID, FCM message_id, etc. error_code: str | None = None error_message: str | None = None # Retry tracking attempt_count: int = 0 next_retry_at: datetime | None = None class DeliveryTracker: """ Tracks notification delivery through the entire pipeline. Provides: - Per-notification lifecycle tracking - Aggregated delivery metrics - Failure analysis """ def __init__(self, db, metrics): self.db = db self.metrics = metrics async def record_send_attempt( self, notification_id: str, platform_response: dict ) -> None: """Record the result of a send attempt.""" if platform_response.get('success'): status = DeliveryStatus.SENT await self.db.update_notification( notification_id, status=status, sent_at=datetime.utcnow(), platform_message_id=platform_response.get('message_id'), attempt_count=notification_record.attempt_count + 1 ) self.metrics.increment('notification.sent', tags={ 'platform': notification_record.platform }) else: error_code = platform_response.get('error_code') if self._is_permanent_failure(error_code): status = DeliveryStatus.FAILED_PERMANENT # Handle token invalidation if error_code in ('UNREGISTERED', 'BadDeviceToken', '410'): status = DeliveryStatus.INVALID_TOKEN await self._handle_invalid_token( notification_record.device_token ) else: status = DeliveryStatus.FAILED_TRANSIENT await self.db.update_notification( notification_id, status=status, error_code=error_code, error_message=platform_response.get('error_message'), attempt_count=notification_record.attempt_count + 1, next_retry_at=self._compute_retry_time( notification_record.attempt_count ) if status == DeliveryStatus.FAILED_TRANSIENT else None ) self.metrics.increment('notification.failed', tags={ 'platform': notification_record.platform, 'error': error_code, 'permanent': status == DeliveryStatus.FAILED_PERMANENT }) async def record_delivery_event( self, platform_message_id: str, event_type: str, # 'delivered', 'displayed', 'clicked', 'dismissed' timestamp: datetime ) -> None: """ Record delivery/engagement events from client or platform. For FCM: Uses Firebase Analytics delivery receipts For client confirmation: App reports receipt to your server """ notification = await self.db.find_by_platform_id(platform_message_id) if not notification: return # Unknown notification, may be older than retention update_fields = {} if event_type == 'delivered': update_fields['status'] = DeliveryStatus.DELIVERED update_fields['delivered_at'] = timestamp elif event_type == 'clicked': update_fields['status'] = DeliveryStatus.CLICKED update_fields['clicked_at'] = timestamp elif event_type == 'dismissed': update_fields['status'] = DeliveryStatus.DISMISSED await self.db.update_notification(notification.id, **update_fields) self.metrics.increment(f'notification.{event_type}', tags={ 'platform': notification.platform }) def _is_permanent_failure(self, error_code: str) -> bool: """Determine if error indicates permanent failure.""" permanent_codes = { # APNs 'BadDeviceToken', 'Unregistered', 'DeviceTokenNotForTopic', 'InvalidProviderToken', 'Forbidden', # FCM 'UNREGISTERED', 'INVALID_ARGUMENT', 'SENDER_ID_MISMATCH', # Web Push '404', '410', } return error_code in permanent_codes def _compute_retry_time(self, attempt_count: int) -> datetime: """Exponential backoff for retry timing.""" # 1min, 2min, 4min, 8min, max 30min delay_seconds = min(60 * (2 ** attempt_count), 1800) return datetime.utcnow() + timedelta(seconds=delay_seconds)Platform push services provide limited delivery visibility. To truly understand whether notifications reach users, you need to implement additional tracking mechanisms.
Platform-Native Options:
APNs:
FCM:
Web Push:
notificationclick handlerClient-Side Confirmation:
The most reliable delivery confirmation comes from the client device itself reporting receipt. This requires modifying your app to send acknowledgments:
iOS Implementation:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
// iOS: Report notification delivery to backend// Add to your UNUserNotificationCenterDelegate import UserNotifications extension AppDelegate: UNUserNotificationCenterDelegate { // Called when notification is delivered while app is in foreground func userNotificationCenter( _ center: UNUserNotificationCenter, willPresent notification: UNNotification, withCompletionHandler completionHandler: @escaping (UNNotificationPresentationOptions) -> Void ) { // Report delivery reportNotificationDelivered(notification.request.identifier) // Show notification completionHandler([.banner, .sound, .badge]) } // Called when user interacts with notification func userNotificationCenter( _ center: UNUserNotificationCenter, didReceive response: UNNotificationResponse, withCompletionHandler completionHandler: @escaping () -> Void ) { let notificationId = response.notification.request.identifier switch response.actionIdentifier { case UNNotificationDefaultActionIdentifier: // User tapped notification reportNotificationClicked(notificationId) case UNNotificationDismissActionIdentifier: // User dismissed notification reportNotificationDismissed(notificationId) default: // Custom action reportNotificationAction(notificationId, action: response.actionIdentifier) } completionHandler() } private func reportNotificationDelivered(_ notificationId: String) { // Fire-and-forget HTTP request to your backend let url = URL(string: "https://api.yourapp.com/notifications/delivered")! var request = URLRequest(url: url) request.httpMethod = "POST" request.httpBody = try? JSONEncoder().encode([ "notification_id": notificationId, "timestamp": ISO8601DateFormatter().string(from: Date()), "device_id": getDeviceId() ]) request.setValue("application/json", forHTTPHeaderField: "Content-Type") URLSession.shared.dataTask(with: request).resume() } private func reportNotificationClicked(_ notificationId: String) { // Similar to above, with event type "clicked" }} // For background notifications using Notification Service Extension:class NotificationService: UNNotificationServiceExtension { override func didReceive( _ request: UNNotificationRequest, withContentHandler contentHandler: @escaping (UNNotificationContent) -> Void ) { // Report delivery before modifying content reportBackgroundDelivery(request.identifier) // Continue with content modification... contentHandler(request.content) } private func reportBackgroundDelivery(_ notificationId: String) { // Note: Limited network access in extension // Use background URL session for reliability }}Android Implementation:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
// Android: Report notification delivery via FirebaseMessagingService class MyFirebaseMessagingService : FirebaseMessagingService() { override fun onMessageReceived(remoteMessage: RemoteMessage) { // Report delivery immediately val notificationId = remoteMessage.data["notification_id"] if (notificationId != null) { reportNotificationDelivered(notificationId) } // Process and display notification if (remoteMessage.notification != null) { // Display notification (system will handle if app in background) showNotification(remoteMessage) } else { // Data-only message, create notification manually showDataNotification(remoteMessage.data) } } private fun reportNotificationDelivered(notificationId: String) { // Use WorkManager for reliable delivery (survives process death) val data = workDataOf( "notification_id" to notificationId, "event_type" to "delivered", "timestamp" to System.currentTimeMillis() ) val request = OneTimeWorkRequestBuilder<DeliveryReportWorker>() .setInputData(data) .setConstraints(Constraints.Builder() .setRequiredNetworkType(NetworkType.CONNECTED) .build()) .build() WorkManager.getInstance(this).enqueue(request) }} // Worker for reliable delivery reportingclass DeliveryReportWorker( context: Context, params: WorkerParameters) : CoroutineWorker(context, params) { override suspend fun doWork(): Result { val notificationId = inputData.getString("notification_id") ?: return Result.failure() val eventType = inputData.getString("event_type") ?: return Result.failure() val timestamp = inputData.getLong("timestamp", System.currentTimeMillis()) return try { // Send to your backend val api = RetrofitClient.notificationApi api.reportDelivery( NotificationEvent( notificationId = notificationId, eventType = eventType, timestamp = timestamp, deviceId = getDeviceId() ) ) Result.success() } catch (e: Exception) { if (runAttemptCount < 3) { Result.retry() } else { Result.failure() } } }} // Track notification clicks via BroadcastReceiver or Activity handlingclass NotificationClickReceiver : BroadcastReceiver() { override fun onReceive(context: Context, intent: Intent) { val notificationId = intent.getStringExtra("notification_id") val action = intent.action notificationId?.let { reportNotificationEvent(context, it, action ?: "clicked") } }}Delivery rate = (Confirmed Delivered / Total Sent) × 100%. Aim for 70-90% for mobile apps. Lower rates indicate token hygiene issues, permission problems, or device targeting issues. Track by platform and segment to identify specific problem areas.
Device state significantly impacts when (and whether) notifications are delivered. Understanding these states helps set realistic expectations and design appropriate fallback strategies.
iOS Device States:
Active (Foreground): App is in use. Notifications delivered immediately; your app's delegate handles them.
Background: App is in memory but not visible. Notifications displayed by system. Silent notifications wake app briefly.
Suspended: App is in memory but not running. Notifications displayed. App wakes only when user interacts.
Terminated: App is not in memory. Notifications displayed. App launches fresh on interaction.
Device Locked: Notification appears on lock screen based on notification settings.
Do Not Disturb / Focus Mode: Notifications may be silenced or delayed based on Focus mode configuration.
Android Device States and Power Management:
Android's power management introduces additional complexity:
Doze Mode (Android 6.0+):
App Standby:
App-Specific Battery Optimization:
Manufacturer Customizations:
| Priority | iOS Behavior | Android Behavior |
|---|---|---|
| High/10 | Immediate delivery attempt | Wakes device from Doze, immediate delivery |
| Normal/5 | May be batched by system | Delivered during maintenance window if in Doze |
| Background | Silent, wakes app briefly | May be delayed significantly |
Don't abuse high priority for non-urgent notifications. Platforms track priority usage and may throttle or deprioritize senders who overuse high priority. FCM explicitly limits high-priority data messages to approximately 10 per minute per device.
Time-of-Day Considerations:
Notification timing significantly impacts engagement:
When notifications aren't being delivered, systematic debugging is essential. Follow this troubleshooting guide to identify and resolve issues:
Step 1: Verify Server-Side Success
First, confirm your server is successfully sending to the push service:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
# Diagnostic queries and health checks for push delivery class PushDeliveryDiagnostics: """ Comprehensive diagnostics for push notification delivery issues. """ def __init__(self, db, push_client, metrics): self.db = db self.push = push_client self.metrics = metrics async def diagnose_user_delivery(self, user_id: str) -> DiagnosticReport: """ Complete diagnostic for a specific user's notification delivery. """ report = DiagnosticReport(user_id=user_id) # 1. Check token status tokens = await self.db.get_user_tokens(user_id) if not tokens: report.add_issue( severity="critical", category="tokens", message="No device tokens registered for user" ) return report for token in tokens: # Check token freshness if token.last_seen < datetime.utcnow() - timedelta(days=30): report.add_issue( severity="warning", category="tokens", message=f"Stale token (last seen: {token.last_seen})", token_id=token.id ) if not token.is_valid: report.add_issue( severity="critical", category="tokens", message=f"Token marked invalid: {token.invalidation_reason}", token_id=token.id ) # 2. Check recent notification history recent_notifications = await self.db.get_recent_notifications( user_id=user_id, limit=20 ) failure_rate = sum( 1 for n in recent_notifications if n.status in (DeliveryStatus.FAILED_PERMANENT, DeliveryStatus.EXPIRED) ) / len(recent_notifications) if recent_notifications else 0 if failure_rate > 0.3: report.add_issue( severity="warning", category="delivery", message=f"High failure rate: {failure_rate:.0%}", details=[n.to_dict() for n in recent_notifications[:5]] ) # 3. Check common error patterns error_counts = await self.db.get_error_counts( user_id=user_id, since=datetime.utcnow() - timedelta(days=7) ) if error_counts.get('UNREGISTERED', 0) > 0: report.add_issue( severity="critical", category="tokens", message="Token reported as unregistered by push service", recommendation="User may have uninstalled app" ) # 4. Verify current token is valid with push service for token in tokens: if token.is_valid: validation = await self._validate_token_with_service(token) if not validation.valid: report.add_issue( severity="critical", category="tokens", message=f"Token validation failed: {validation.error}", token_id=token.id ) return report async def get_delivery_health_metrics(self) -> HealthMetrics: """ System-wide delivery health assessment. """ window = datetime.utcnow() - timedelta(hours=1) metrics = await self.db.get_aggregated_metrics(since=window) return HealthMetrics( total_sent=metrics['total_sent'], success_rate=metrics['success_count'] / metrics['total_sent'], # Breakdown by platform ios_success_rate=metrics['ios_success'] / metrics['ios_total'], android_success_rate=metrics['android_success'] / metrics['android_total'], web_success_rate=metrics['web_success'] / metrics['web_total'], # Error analysis top_errors=metrics['error_breakdown'], invalid_token_rate=metrics['invalid_tokens'] / metrics['total_sent'], # Latency p50_latency_ms=metrics['p50_send_latency'], p99_latency_ms=metrics['p99_send_latency'], ) def check_common_issues(self, metrics: HealthMetrics) -> list[Issue]: """Automated detection of common delivery problems.""" issues = [] # High invalid token rate if metrics.invalid_token_rate > 0.05: issues.append(Issue( severity="warning", title="High invalid token rate", message=f"{metrics.invalid_token_rate:.1%} of sends to invalid tokens", recommendation="Review token cleanup job; ensure app sends fresh tokens" )) # Platform-specific problems if metrics.ios_success_rate < 0.8: issues.append(Issue( severity="critical", title="iOS delivery degradation", message=f"iOS success rate: {metrics.ios_success_rate:.1%}", recommendation="Check APNs credentials, entitlements, certificate expiry" )) # High latency if metrics.p99_latency_ms > 5000: issues.append(Issue( severity="warning", title="High send latency", message=f"P99 latency: {metrics.p99_latency_ms}ms", recommendation="Check connection pooling, rate limiting, or service issues" )) return issuesStep 2: Verify Device-Side Configuration
If server-side shows success, the issue is between the push service and device:
iOS Checklist:
Android Checklist:
Web Checklist:
Build a diagnostic endpoint that sends a test notification to a specific device token and returns detailed success/failure information. Include this in your admin tools for customer support and debugging.
Effective push notification systems require comprehensive observability. You need to know what's happening at every stage, catch problems early, and have the data to investigate issues.
Essential Metrics:
Alerting Strategy:
Set up alerts for critical delivery issues:
| Metric | Warning | Critical | Alert Action |
|---|---|---|---|
| Send Success Rate | < 95% | < 80% | Page on-call engineer |
| Invalid Token Rate | 10% | 25% | Investigate token hygiene |
| Queue Depth | 10,000 | 100,000 | Scale workers or investigate bottleneck |
| P99 Latency | 2s | 10s | Check connection pooling, rate limits |
| Authentication Errors | 0 | Sustained > 1min | Check credentials rotation |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
# Comprehensive metrics collection for push notifications from prometheus_client import Counter, Histogram, Gaugeimport time # Define metricsnotifications_sent = Counter( 'push_notifications_sent_total', 'Total notifications sent', ['platform', 'priority', 'status']) notification_latency = Histogram( 'push_notification_send_latency_seconds', 'Time to send notification to push service', ['platform'], buckets=[.01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]) queue_depth = Gauge( 'push_notification_queue_depth', 'Number of notifications waiting to send', ['priority']) token_operations = Counter( 'push_token_operations_total', 'Token registry operations', ['operation', 'result'] # operation: register, invalidate; result: success, failure) delivery_funnel = Counter( 'push_notification_funnel_total', 'Notification funnel stages', ['stage', 'platform'] # stage: sent, delivered, displayed, clicked, dismissed) class MetricsMiddleware: """ Wraps push sending with metrics collection. """ def __init__(self, push_service, metrics_backend): self.push = push_service self.metrics = metrics_backend async def send( self, token: str, platform: str, notification: dict, priority: str = 'normal' ) -> SendResult: """Send with full metrics instrumentation.""" start_time = time.time() try: result = await self.push.send(token, notification) # Record success notifications_sent.labels( platform=platform, priority=priority, status='success' ).inc() delivery_funnel.labels( stage='sent', platform=platform ).inc() except InvalidTokenError: notifications_sent.labels( platform=platform, priority=priority, status='invalid_token' ).inc() token_operations.labels( operation='invalidate', result='success' # The invalidation itself succeeded ).inc() raise except Exception as e: notifications_sent.labels( platform=platform, priority=priority, status='error' ).inc() raise finally: # Always record latency duration = time.time() - start_time notification_latency.labels(platform=platform).observe(duration) return result def record_delivery_event( self, platform: str, event: str # 'delivered', 'displayed', 'clicked', 'dismissed' ): """Record downstream delivery events.""" delivery_funnel.labels( stage=event, platform=platform ).inc() # Dashboard queries (Prometheus/Grafana)"""# Send success rate (last 5 min)sum(rate(push_notifications_sent_total{status="success"}[5m])) / sum(rate(push_notifications_sent_total[5m])) * 100 # Click-through ratesum(rate(push_notification_funnel_total{stage="clicked"}[1h]))/sum(rate(push_notification_funnel_total{stage="delivered"}[1h])) * 100 # Invalid token ratesum(rate(push_notifications_sent_total{status="invalid_token"}[5m]))/sum(rate(push_notifications_sent_total[5m])) * 100 # P99 latencyhistogram_quantile(0.99, sum(rate(push_notification_send_latency_seconds_bucket[5m])) by (le, platform))"""With observability in place, you can actively optimize delivery. Here are proven strategies used by high-scale notification systems:
Strategy 1: Intelligent Retry with Backoff
Not all failures are permanent. Transient network issues, temporary rate limits, and overloaded services warrant retry:
Strategy 2: Proactive Token Maintenance
Strategy 3: Segment and Prioritize
Not all notifications are equally important:
Focus optimization efforts on your most impactful notifications. If 80% of business value comes from 20% of notification types (e.g., order updates, driver assignments), prioritize reliable delivery for those. Marketing notifications can tolerate higher loss rates.
Notification delivery is where push notification theory meets reality. Here are the essential takeaways:
What's Next:
With delivery mechanics understood, we'll complete our push notification coverage by examining batching and throttling strategies—essential techniques for sending notifications at scale without overwhelming users or hitting platform rate limits.
You now understand the complexities of notification delivery—from platform guarantees and device states to troubleshooting strategies and observability patterns. This knowledge enables you to build push notification systems that deliver reliably at scale.