← Back to All Questions
Easy~35 minData Processing

Design Top K / Heavy Hitters System

GoogleFacebookTwitterAmazonNetflix

📝 Problem Description

Design a system to find the top K most frequent items in a data stream. Examples: top 10 trending hashtags, most searched queries, most viewed products. Handle high-throughput streams with memory constraints.

👤 Use Cases

1.
System wants to processes event stream so that identifies top K frequent items
2.
User wants to views trending page so that sees top 10 hashtags
3.
Analyst wants to queries top products so that gets real-time ranking

✅ Functional Requirements

  • Track frequency of items in stream
  • Return top K items by frequency
  • Support time-windowed queries (last hour, day)
  • Handle high-cardinality streams (millions of unique items)

⚡ Non-Functional Requirements

  • Process 1M events per second
  • Query top K in < 10ms
  • Memory efficient (can't store all counts)
  • Approximate results acceptable (within 5% error)

⚠️ Constraints & Assumptions

  • Cannot store exact count for every item
  • Stream is unbounded and continuous
  • Memory limited (e.g., 10GB for the system)

📊 Capacity Estimation

👥 Users
N/A (internal system)
💾 Storage
10GB for approximate data structures
⚡ QPS
Ingest: 1M events/sec, Queries: 1K/sec
📐 Assumptions
  • 1M events per second
  • 100M unique items per day
  • Only top 1000 items needed accurately
  • Exact counts not required for long tail

💡 Key Concepts

CRITICAL
Count-Min Sketch
Probabilistic data structure for approximate frequency counting in sublinear space.
HIGH
Min Heap
Track top K items efficiently with O(log K) updates.
HIGH
Space-Saving Algorithm
Alternative to CM Sketch - tracks top K with guaranteed error bounds.
MEDIUM
Time Windowing
Maintain separate counts for different time windows (tumbling, sliding).

💡 Interview Tips

  • 💡Start with the streaming constraints
  • 💡Discuss Count-Min Sketch and Space-Saving algorithms
  • 💡Emphasize the accuracy vs memory tradeoff
  • 💡Be prepared to discuss heap-based solutions
  • 💡Know the difference between exact and approximate
  • 💡Understand the merging problem for distributed systems