📝 Problem Description
Design a distributed message queue like Kafka, SQS, or RabbitMQ. Support publish/subscribe, guaranteed delivery, ordering, and horizontal scaling. Handle producer backpressure and consumer groups. Key challenges include: - **Durability**: Ensuring no message loss even during failures - **Ordering**: Maintaining message order within partitions - **Scalability**: Handling millions of messages per second - **Fault Tolerance**: Surviving broker failures without data loss - **Consumer Groups**: Load balancing messages across consumers - **Exactly-Once Semantics**: Preventing duplicate processing
👤 Use Cases
1.
Producer wants to publishes message to topic so that message stored durably and replicated
2.
Consumer wants to polls for messages so that receives messages from assigned partitions
3.
Consumer wants to acknowledges message so that offset committed, message marked as processed
4.
System wants to consumer fails mid-processing so that message redelivered to another consumer in group
5.
Admin wants to creates topic with partitions so that topic partitions distributed across brokers
6.
Consumer Group wants to member joins/leaves so that partitions rebalanced among members
7.
System wants to broker fails so that leader election, failover to replica
8.
Producer wants to sends batch of messages so that batch persisted atomically
✅ Functional Requirements
- •Publish messages to topics with optional key for partitioning
- •Subscribe and consume messages from topics
- •At-least-once delivery guarantee (configurable to exactly-once)
- •Message ordering within partition (strict FIFO)
- •Consumer groups for parallel consumption and load distribution
- •Dead letter queue for messages that fail processing
- •Message retention (time-based or size-based)
- •Message replay from any offset
- •Batch produce and consume for efficiency
- •Message compression (gzip, snappy, lz4, zstd)
- •Transaction support for atomic multi-partition writes
- •Schema registry integration for message validation
⚡ Non-Functional Requirements
- •Throughput: 1M messages/sec per cluster
- •Latency: < 10ms p99 for produce, < 5ms p99 for consume
- •Durability: Zero message loss (RPO = 0)
- •Availability: 99.99% uptime (52 min downtime/year)
- •Horizontal scaling: Add brokers without downtime
- •Storage efficiency: Compression + log compaction
- •Recovery time: < 30 seconds for leader failover
⚠️ Constraints & Assumptions
- •Messages must be persisted to disk and replicated before acknowledgment
- •Network partitions must be handled (prefer consistency over availability)
- •Cannot rely on single node - minimum 3 brokers for production
- •Consumer offset commits must be durable
- •Partition count cannot be decreased after creation
- •Message ordering only guaranteed within a single partition
📊 Capacity Estimation
👥 Users
1000 producer/consumer services
💾 Storage
100TB (7-day retention)
⚡ QPS
Writes: 1M/sec, Reads: 2M/sec
📐 Assumptions
- • 1M messages per second peak throughput
- • Average message size: 1KB (range: 100B - 1MB)
- • 7-day default retention period
- • Replication factor: 3 (one leader, two followers)
- • 100 topics with average 10 partitions each = 1000 partitions
- • 3:1 read-to-write ratio (consumers often replay)
- • Consumer groups: 500 active groups
- • Average consumer lag: < 1000 messages
💡 Key Concepts
CRITICAL
Partitioning
Topic divided into partitions for parallelism. Each partition is an ordered, immutable sequence of messages. Partition count determines max parallelism.
CRITICAL
Replication
Each partition replicated to N brokers (replication factor). One leader handles all reads/writes; followers replicate passively. Provides fault tolerance.
HIGH
Consumer Groups
Partitions distributed among group members. Each partition assigned to exactly one consumer in a group. Multiple groups can consume same topic independently.
HIGH
Offset Tracking
Consumer position tracked per partition as offset (64-bit integer). Stored in __consumer_offsets topic. Enables replay from any point.
HIGH
In-Sync Replicas (ISR)
Set of replicas caught up with leader (within replica.lag.time.max.ms). Writes acked only after all ISR have the message. Guarantees durability.
HIGH
High Watermark
Offset of last message replicated to all ISR. Consumers only see messages up to HW. Prevents reading uncommitted data.
MEDIUM
Log Compaction
Retention mode that keeps only latest value per key. Useful for changelog topics. Enables infinite retention with bounded storage.
MEDIUM
Idempotent Producer
Producer assigns sequence numbers to messages. Broker deduplicates based on producer ID + sequence. Prevents duplicates on retry.
MEDIUM
Transactions
Atomic writes across multiple partitions. All-or-nothing semantics. Enables exactly-once stream processing.
💡 Interview Tips
- 💡Start with core concepts: topics, partitions, offsets, consumer groups
- 💡Emphasize partitioning as the key to scalability and parallelism
- 💡Explain the leader/follower replication model and ISR
- 💡Discuss the tradeoff between durability (acks=all) and latency (acks=1)
- 💡Be prepared to explain exactly-once semantics with idempotent producers
- 💡Know the difference between log retention and log compaction
- 💡Understand consumer group rebalancing and its impact
- 💡Discuss the role of ZooKeeper/KRaft in coordination
- 💡Be ready to calculate capacity: storage, bandwidth, broker count
- 💡Know common failure scenarios and how the system recovers
- 💡Explain why message ordering is only guaranteed within a partition
- 💡Discuss the zero-copy optimization for fetch requests