← Back to All Questions
Medium~40 minData Processing

Design Top K Shared Articles System

FacebookTwitterLinkedInRedditPinterest

📝 Problem Description

Design a system to compute and display the most shared articles in real-time. Handle millions of share events per minute and maintain accurate top-K rankings with minimal latency.

👤 Use Cases

1.
User wants to shares article so that share event recorded
2.
System wants to updates ranking so that top-K list refreshed
3.
User wants to views trending page so that sees current top articles

✅ Functional Requirements

  • Track article share counts in real-time
  • Compute top-K for multiple time windows (1h, 24h, 7d)
  • Handle different share types (FB, Twitter, Email)
  • Display trending articles with share counts
  • Decay older shares for freshness

⚡ Non-Functional Requirements

  • Process 1M shares/minute
  • Top-K update latency < 5 seconds
  • Serve trending page < 100ms
  • High accuracy (eventual consistency acceptable)

⚠️ Constraints & Assumptions

  • Exact counts expensive at scale
  • Top-K must be fresh
  • Millions of articles compete

📊 Capacity Estimation

👥 Users
100M daily shares
💾 Storage
10GB (article metadata, counts)
⚡ QPS
Shares: 15K/sec, Trending: 100K/sec
📐 Assumptions
  • 100M shares per day
  • 10M unique articles shared
  • Top 100 displayed
  • Multiple time windows

💡 Key Concepts

HIGH
Count-Min Sketch
Approximate counting with bounded error.
CRITICAL
Heap-based Top-K
Maintain top K using min-heap of size K.
HIGH
Sliding Windows
Time-decayed windows for freshness.
MEDIUM
Approximate Algorithms
Trade accuracy for speed/memory.

💡 Interview Tips

  • 💡Start with the streaming constraint
  • 💡Discuss Count-Min Sketch and Space-Saving
  • 💡Emphasize the memory efficiency
  • 💡Be prepared to discuss windowing
  • 💡Know the accuracy vs memory tradeoff
  • 💡Understand the heap-based top K extraction