Design Top K / Heavy Hitters System - System Design Interview

📝 Problem Description

Design a system to find the top K most frequent items in a data stream. Examples: top 10 trending hashtags, most searched queries, most viewed products. Handle high-throughput streams with memory constraints.

👤 Use Cases

System wants to processes event stream so that identifies top K frequent items

User wants to views trending page so that sees top 10 hashtags

Analyst wants to queries top products so that gets real-time ranking

✅ Functional Requirements

•Track frequency of items in stream
•Return top K items by frequency
•Support time-windowed queries (last hour, day)
•Handle high-cardinality streams (millions of unique items)

⚡ Non-Functional Requirements

•Process 1M events per second
•Query top K in < 10ms
•Memory efficient (can't store all counts)
•Approximate results acceptable (within 5% error)

⚠️ Constraints & Assumptions

•Cannot store exact count for every item
•Stream is unbounded and continuous
•Memory limited (e.g., 10GB for the system)

📊 Capacity Estimation

👥 Users

N/A (internal system)

💾 Storage

10GB for approximate data structures

⚡ QPS

Ingest: 1M events/sec, Queries: 1K/sec

📐 Assumptions

• 1M events per second
• 100M unique items per day
• Only top 1000 items needed accurately
• Exact counts not required for long tail

💡 Key Concepts

CRITICAL

Count-Min Sketch

Probabilistic data structure for approximate frequency counting in sublinear space.

HIGH

Min Heap

Track top K items efficiently with O(log K) updates.

HIGH

Space-Saving Algorithm

Alternative to CM Sketch - tracks top K with guaranteed error bounds.

MEDIUM

Time Windowing

Maintain separate counts for different time windows (tumbling, sliding).

💡 Interview Tips

💡Start with the streaming constraints
💡Discuss Count-Min Sketch and Space-Saving algorithms
💡Emphasize the accuracy vs memory tradeoff
💡Be prepared to discuss heap-based solutions
💡Know the difference between exact and approximate
💡Understand the merging problem for distributed systems