← Back to All Questions
Medium~50 minDatabase Design

Design Distributed Key-Value Store

AmazonMetaGoogleNetflixLinkedIn

📝 Problem Description

Design a distributed key-value store like DynamoDB, Redis Cluster, or Cassandra. Support high availability, partition tolerance, and tunable consistency. Handle node failures, data replication, and efficient lookups.

👤 Use Cases

1.
Application wants to puts key-value pair so that data is stored durably across nodes
2.
Application wants to gets value by key so that receives value or not found
3.
Operator wants to adds nodes to cluster so that data rebalances automatically
4.
System wants to node fails so that data remains available from replicas

✅ Functional Requirements

  • PUT(key, value) - store data
  • GET(key) - retrieve data
  • DELETE(key) - remove data
  • Support variable-length keys and values
  • TTL (time-to-live) support
  • Atomic compare-and-swap operations

⚡ Non-Functional Requirements

  • Read latency p99 < 10ms
  • Write latency p99 < 50ms
  • Handle 1M ops/sec
  • 99.99% availability
  • Support petabytes of data
  • Survive N-1 replica failures

⚠️ Constraints & Assumptions

  • CAP theorem - must choose between CP or AP
  • Key size max: 256 bytes
  • Value size max: 1MB
  • Cannot use external databases

📊 Capacity Estimation

👥 Users
Thousands of application instances
💾 Storage
1 PB across cluster
⚡ QPS
1M ops/sec (70% reads, 30% writes)
📐 Assumptions
  • 100 nodes in cluster
  • 10TB per node
  • Replication factor: 3
  • Average key: 100 bytes, value: 1KB

💡 Key Concepts

CRITICAL
Consistent Hashing
Distribute keys across nodes with minimal redistribution on node changes. Use virtual nodes for balance.
CRITICAL
Quorum (R + W > N)
Read R replicas, write W replicas. If R + W > N, guaranteed to see latest write.
HIGH
LSM Tree
Write to in-memory buffer, flush to sorted files. Great for write-heavy workloads.
HIGH
Gossip Protocol
Nodes periodically exchange state to detect failures and spread membership info.
MEDIUM
Vector Clocks
Track causality between versions for conflict detection in eventually consistent systems.

💡 Interview Tips

  • 💡Start with single-node design, then distribute
  • 💡Discuss CAP theorem trade-offs early
  • 💡Consistent hashing is essential - explain thoroughly
  • 💡Cover failure scenarios and recovery