Design an Ad Click Aggregation System

Design a real-time ad click aggregation system that captures billions of ad click and impression events, aggregates them across multiple dimensions (campaign, geography, device, time window) using stream processing (Apache Flink), ensures exactly-once counting for billing accuracy, detects and filters click fraud, stores results in an OLAP database for interactive analytics, and reconciles real-time aggregates with batch-recomputed counts for billing integrity.

Scale Estimates

Metric	Value
Ad impressions per day	10 billion
Ad clicks per day	500 million
Click rate (avg)	10,000 clicks/sec
Click rate (peak)	100,000 clicks/sec
Unique ads	10 million
Active campaigns	1 million
Advertisers	500,000
Aggregation dimensions	50+ (ad_id × campaign × country × device × time)
Aggregation window	1 minute (primary), 5-min, hourly, daily rollups
Click-to-dashboard latency	< 1 minute
Billing accuracy	99.99%+ (exactly-once)
Fraud rate (industry)	10–30% of clicks

Non-Functional Requirements

Exactly-once: Each valid click counted exactly once in billing aggregates; achieved via click_id dedup (Redis SETNX) + Flink checkpointing + idempotent UPSERT to OLAP; < 0.01% discrepancy with batch reconciliation
Low latency: Click captured → aggregated → visible on dashboard in < 1 minute; Flink 1-minute tumbling windows; Redis for real-time counters (< 5 second visibility)
Late events: Support events arriving up to 5 minutes late via Flink watermarks + allowed lateness; very-late events handled by batch reconciliation
Fraud resilience: Real-time rule + ML fraud detection; quarantine pipeline for suspicious clicks; IP/device reputation database; 24-hour deep analysis before confirming fraud
Scale: 100K clicks/sec peak; Kafka handles ingestion; Flink distributes aggregation; ClickHouse serves TB-scale interactive queries in sub-second; globally distributed click trackers
Billing accuracy: Nightly batch reconciliation (Spark on S3 raw events) produces authoritative counts; reconcile with real-time; discrepancies corrected; billing invoices generated from reconciled data

Scale Estimates

Metric

Value

Ad impressions per day

10 billion

Ad clicks per day

500 million

Click rate (avg)

10,000 clicks/sec

Click rate (peak)

100,000 clicks/sec

Unique ads

10 million

Active campaigns

1 million

Advertisers

500,000

Aggregation dimensions

50+ (ad_id × campaign × country × device × time)

Aggregation window

1 minute (primary), 5-min, hourly, daily rollups

Click-to-dashboard latency

< 1 minute

Billing accuracy

99.99%+ (exactly-once)

Fraud rate (industry)

10–30% of clicks

Non-Functional Requirements

Exactly-once: Each valid click counted exactly once in billing aggregates; achieved via click_id dedup (Redis SETNX) + Flink checkpointing + idempotent UPSERT to OLAP; < 0.01% discrepancy with batch reconciliation

Low latency: Click captured → aggregated → visible on dashboard in < 1 minute; Flink 1-minute tumbling windows; Redis for real-time counters (< 5 second visibility)

Late events: Support events arriving up to 5 minutes late via Flink watermarks + allowed lateness; very-late events handled by batch reconciliation

Fraud resilience: Real-time rule + ML fraud detection; quarantine pipeline for suspicious clicks; IP/device reputation database; 24-hour deep analysis before confirming fraud

Scale: 100K clicks/sec peak; Kafka handles ingestion; Flink distributes aggregation; ClickHouse serves TB-scale interactive queries in sub-second; globally distributed click trackers

Billing accuracy: Nightly batch reconciliation (Spark on S3 raw events) produces authoritative counts; reconcile with real-time; discrepancies corrected; billing invoices generated from reconciled data

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design an Ad Click Aggregation System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design an Ad Click Aggregation System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How would you design the real-time click aggregation pipeline?

2How would you ensure exactly-once counting for billing accuracy?

3How would you handle late-arriving and out-of-order events?

4How would you detect and filter click fraud?

5How would you design the OLAP storage for multi-dimensional ad analytics?

6How would you implement data reconciliation (Lambda architecture)?

7How would you design the end-to-end system architecture?

Key Topics

Asked At

Design an Ad Click Aggregation System

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How would you design the real-time click aggregation pipeline?

2How would you ensure exactly-once counting for billing accuracy?

3How would you handle late-arriving and out-of-order events?

4How would you detect and filter click fraud?

5How would you design the OLAP storage for multi-dimensional ad analytics?

6How would you implement data reconciliation (Lambda architecture)?

7How would you design the end-to-end system architecture?

Key Topics

Asked At