Design Distributed Message Queue - System Design Interview

📝 Problem Description

Design a distributed message queue like Kafka, SQS, or RabbitMQ. Support publish/subscribe, guaranteed delivery, ordering, and horizontal scaling. Handle producer backpressure and consumer groups. Key challenges include: - **Durability**: Ensuring no message loss even during failures - **Ordering**: Maintaining message order within partitions - **Scalability**: Handling millions of messages per second - **Fault Tolerance**: Surviving broker failures without data loss - **Consumer Groups**: Load balancing messages across consumers - **Exactly-Once Semantics**: Preventing duplicate processing

👤 Use Cases

Producer wants to publishes message to topic so that message stored durably and replicated

Consumer wants to polls for messages so that receives messages from assigned partitions

Consumer wants to acknowledges message so that offset committed, message marked as processed

System wants to consumer fails mid-processing so that message redelivered to another consumer in group

Admin wants to creates topic with partitions so that topic partitions distributed across brokers

Consumer Group wants to member joins/leaves so that partitions rebalanced among members

System wants to broker fails so that leader election, failover to replica

Producer wants to sends batch of messages so that batch persisted atomically

✅ Functional Requirements

•Publish messages to topics with optional key for partitioning
•Subscribe and consume messages from topics
•At-least-once delivery guarantee (configurable to exactly-once)
•Message ordering within partition (strict FIFO)
•Consumer groups for parallel consumption and load distribution
•Dead letter queue for messages that fail processing
•Message retention (time-based or size-based)
•Message replay from any offset
•Batch produce and consume for efficiency
•Message compression (gzip, snappy, lz4, zstd)
•Transaction support for atomic multi-partition writes
•Schema registry integration for message validation

⚡ Non-Functional Requirements

•Throughput: 1M messages/sec per cluster
•Latency: < 10ms p99 for produce, < 5ms p99 for consume
•Durability: Zero message loss (RPO = 0)
•Availability: 99.99% uptime (52 min downtime/year)
•Horizontal scaling: Add brokers without downtime
•Storage efficiency: Compression + log compaction
•Recovery time: < 30 seconds for leader failover

⚠️ Constraints & Assumptions

•Messages must be persisted to disk and replicated before acknowledgment
•Network partitions must be handled (prefer consistency over availability)
•Cannot rely on single node - minimum 3 brokers for production
•Consumer offset commits must be durable
•Partition count cannot be decreased after creation
•Message ordering only guaranteed within a single partition

📊 Capacity Estimation

👥 Users

1000 producer/consumer services

💾 Storage

100TB (7-day retention)

⚡ QPS

Writes: 1M/sec, Reads: 2M/sec

📐 Assumptions

• 1M messages per second peak throughput
• Average message size: 1KB (range: 100B - 1MB)
• 7-day default retention period
• Replication factor: 3 (one leader, two followers)
• 100 topics with average 10 partitions each = 1000 partitions
• 3:1 read-to-write ratio (consumers often replay)
• Consumer groups: 500 active groups
• Average consumer lag: < 1000 messages

💡 Key Concepts

CRITICAL

Partitioning

Topic divided into partitions for parallelism. Each partition is an ordered, immutable sequence of messages. Partition count determines max parallelism.

CRITICAL

Replication

Each partition replicated to N brokers (replication factor). One leader handles all reads/writes; followers replicate passively. Provides fault tolerance.

HIGH

Consumer Groups

Partitions distributed among group members. Each partition assigned to exactly one consumer in a group. Multiple groups can consume same topic independently.

HIGH

Offset Tracking

Consumer position tracked per partition as offset (64-bit integer). Stored in __consumer_offsets topic. Enables replay from any point.

HIGH

In-Sync Replicas (ISR)

Set of replicas caught up with leader (within replica.lag.time.max.ms). Writes acked only after all ISR have the message. Guarantees durability.

HIGH

High Watermark

Offset of last message replicated to all ISR. Consumers only see messages up to HW. Prevents reading uncommitted data.

MEDIUM

Log Compaction

Retention mode that keeps only latest value per key. Useful for changelog topics. Enables infinite retention with bounded storage.

MEDIUM

Idempotent Producer

Producer assigns sequence numbers to messages. Broker deduplicates based on producer ID + sequence. Prevents duplicates on retry.

MEDIUM

Transactions

Atomic writes across multiple partitions. All-or-nothing semantics. Enables exactly-once stream processing.

💡 Interview Tips

💡Start with core concepts: topics, partitions, offsets, consumer groups
💡Emphasize partitioning as the key to scalability and parallelism
💡Explain the leader/follower replication model and ISR
💡Discuss the tradeoff between durability (acks=all) and latency (acks=1)
💡Be prepared to explain exactly-once semantics with idempotent producers
💡Know the difference between log retention and log compaction
💡Understand consumer group rebalancing and its impact
💡Discuss the role of ZooKeeper/KRaft in coordination
💡Be ready to calculate capacity: storage, bandwidth, broker count
💡Know common failure scenarios and how the system recovers
💡Explain why message ordering is only guaranteed within a partition
💡Discuss the zero-copy optimization for fetch requests