Design a Stream Processing System (Flink / Spark Streaming)

Design a distributed stream processing system like Apache Flink or Spark Streaming that continuously processes unbounded data streams with low latency. The system supports windowed aggregations (tumbling, sliding, session windows) over event time with watermark-based progress tracking, maintains large distributed state (keyed per-event state backed by RocksDB for TB-scale), achieves exactly-once semantics via Chandy-Lamport distributed snapshot checkpointing (barriers → state snapshot → source offset capture), distributes computation as parallel subtasks across a cluster, and integrates end-to-end exactly-once with external systems via two-phase commit sinks or idempotent writes.

Scale Estimates

Metric	Value
Events processed per second	1–10 million
End-to-end latency	Milliseconds (Flink) to seconds (Spark micro-batch)
Checkpoint interval	30 seconds – 5 minutes
Checkpoint duration	1–30 seconds
State size (per job)	GB to TB (RocksDB backend)
Parallelism (per job)	10–10,000 subtasks
TaskManagers per cluster	10–1,000
Task slots per TaskManager	4–16
Jobs per cluster (multi-tenant)	10–100
Watermark lag (event time)	Seconds to minutes

Non-Functional Requirements

Low latency: Event-to-output in milliseconds (Flink true streaming); sub-second for most stateful operations; event-time watermark lag tracked and minimised; Spark micro-batch: 100ms–seconds
Exactly-once: Chandy-Lamport checkpointing (operator state + source offsets); barrier alignment (or unaligned for backpressured jobs); end-to-end with Kafka transactional producer or idempotent DB upserts; < 0.001% discrepancy
Scalability: Horizontal scaling via parallelism; state partitioned by key; operator chaining minimises serialisation overhead; rescaling via savepoints + key-group redistribution; proven at 10M+ events/sec (Uber, Alibaba)
Fault tolerance: Checkpoint-based recovery (restore state + replay source); recovery time: seconds to minutes; no data loss; configurable checkpoint interval balances recovery speed vs overhead
Stateful processing: TB-scale state via RocksDB backend with incremental checkpoints; State TTL for automatic cleanup; keyed state per event key; queryable state for external access
Operability: Flink SQL for rapid development; savepoints for job upgrades and rescaling; comprehensive metrics (throughput, lag, backpressure, checkpoint health); Kubernetes-native deployment

Scale Estimates

Metric

Value

Events processed per second

1–10 million

End-to-end latency

Milliseconds (Flink) to seconds (Spark micro-batch)

Checkpoint interval

30 seconds – 5 minutes

Checkpoint duration

1–30 seconds

State size (per job)

GB to TB (RocksDB backend)

Parallelism (per job)

10–10,000 subtasks

TaskManagers per cluster

10–1,000

Task slots per TaskManager

4–16

Jobs per cluster (multi-tenant)

10–100

Watermark lag (event time)

Seconds to minutes

Non-Functional Requirements

Low latency: Event-to-output in milliseconds (Flink true streaming); sub-second for most stateful operations; event-time watermark lag tracked and minimised; Spark micro-batch: 100ms–seconds

Exactly-once: Chandy-Lamport checkpointing (operator state + source offsets); barrier alignment (or unaligned for backpressured jobs); end-to-end with Kafka transactional producer or idempotent DB upserts; < 0.001% discrepancy

Scalability: Horizontal scaling via parallelism; state partitioned by key; operator chaining minimises serialisation overhead; rescaling via savepoints + key-group redistribution; proven at 10M+ events/sec (Uber, Alibaba)

Fault tolerance: Checkpoint-based recovery (restore state + replay source); recovery time: seconds to minutes; no data loss; configurable checkpoint interval balances recovery speed vs overhead

Stateful processing: TB-scale state via RocksDB backend with incremental checkpoints; State TTL for automatic cleanup; keyed state per event key; queryable state for external access

Operability: Flink SQL for rapid development; savepoints for job upgrades and rescaling; comprehensive metrics (throughput, lag, backpressure, checkpoint health); Kubernetes-native deployment

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Stream Processing System (Flink / Spark Streaming)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Stream Processing System (Flink / Spark Streaming)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does windowed aggregation work with event time and watermarks?

2How does Flink's checkpointing mechanism achieve exactly-once?

3How does the distributed execution model work?

4How does stateful processing work and how is state managed?

5How does Flink compare to Spark Streaming and Kafka Streams?

6How would you handle exactly-once end-to-end with external systems?

7How would you architect a production stream processing platform?

Key Topics

Asked At

Design a Stream Processing System (Flink / Spark Streaming)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does windowed aggregation work with event time and watermarks?

2How does Flink's checkpointing mechanism achieve exactly-once?

3How does the distributed execution model work?

4How does stateful processing work and how is state managed?

5How does Flink compare to Spark Streaming and Kafka Streams?

6How would you handle exactly-once end-to-end with external systems?

7How would you architect a production stream processing platform?

Key Topics

Asked At