๐ Problem Description
Design a distributed message streaming platform like Apache Kafka. Support high-throughput message ingestion, durable storage, consumer groups, and exactly-once semantics.
๐ค Use Cases
1.
Producer wants to publishes message so that message stored durably
2.
Consumer wants to reads messages so that receives in order
3.
Consumer Group wants to scales consumers so that partitions distributed
4.
Stream Processor wants to transforms data so that writes to new topic
โ Functional Requirements
- โขPublish messages to topics
- โขTopic partitioning for parallelism
- โขConsumer groups with rebalancing
- โขMessage retention (time or size based)
- โขSeek to offset/timestamp
- โขReplication for durability
โก Non-Functional Requirements
- โขThroughput: 1M messages/sec
- โขLatency: < 10ms for publish
- โขDurability: No message loss
- โขOrdering: Per-partition guarantee
โ ๏ธ Constraints & Assumptions
- โขOrdering only within partition
- โขMust handle broker failures
- โขConsumer rebalancing can cause delays
๐ Capacity Estimation
๐ฅ Users
1000 producers, 10000 consumers
๐พ Storage
100TB (7-day retention)
โก QPS
Writes: 1M/sec, Reads: 5M/sec
๐ Assumptions
- โข 1M messages/sec ingestion
- โข Average message: 1KB
- โข 7-day retention
- โข 10,000 partitions across 100 topics
- โข Replication factor: 3
- โข 5:1 read-to-write ratio (consumers replay)
๐ก Key Concepts
CRITICAL
Log-Structured Storage
Append-only log segments for sequential I/O.
CRITICAL
ISR (In-Sync Replicas)
Replicas that are caught up with leader.
HIGH
Consumer Offset
Track consumption position per partition.
HIGH
Partition Rebalancing
Redistribute partitions when consumers change.
๐ก Interview Tips
- ๐กStart with the core concepts: topics, partitions, consumer groups
- ๐กEmphasize the log-based architecture and its benefits
- ๐กDiscuss exactly-once semantics and idempotent producers
- ๐กBe prepared to explain consumer group rebalancing
- ๐กKnow the tradeoffs between throughput and durability
- ๐กUnderstand the difference between Kafka and traditional message queues