π Problem Description
Design a system to count and display likes for posts at Facebook scale. Handle billions of posts, millions of likes per second, and display accurate counts. This problem explores the classic "hot key" challenge in distributed systems and introduces sharded counters as a solution. Focus on write scalability, read efficiency, and the tradeoffs between consistency and performance.
π€ Use Cases
1.
User wants to likes a post so that like count increments and UI shows liked state
2.
User wants to views a post so that sees current like count formatted (e.g., "1.2M")
3.
User wants to unlikes a post so that like count decrements and UI shows unliked state
4.
User wants to views who liked so that sees paginated list of likers (friends first)
5.
User wants to double-taps on mobile so that like is registered with animation feedback
6.
Celebrity wants to posts viral content so that system handles millions of likes without degradation
7.
System wants to reconciles counts so that counter matches actual likes in source of truth
8.
Admin wants to views analytics so that sees like velocity and trending posts
β Functional Requirements
- β’Like/unlike posts with immediate UI feedback
- β’Display like count on posts (exact or approximate)
- β’Show if current user liked the post
- β’List users who liked (paginated, friends prioritized)
- β’Prevent duplicate likes per user per post
- β’Support multiple reaction types (like, love, haha, etc.)
- β’Show like velocity for trending detection
- β’Provide analytics on likes over time
β‘ Non-Functional Requirements
- β’Handle 1M+ likes/sec during viral events
- β’Like action < 100ms latency (user-facing)
- β’Count display < 50ms latency
- β’Support billions of posts
- β’Eventual consistency acceptable (< 5 second lag)
- β’99.99% availability for like operations
- β’Zero duplicate likes even under race conditions
β οΈ Constraints & Assumptions
- β’Cannot update single counter on every like (hot key problem)
- β’Must prevent duplicate likes per user per post
- β’Exact count not required for very large numbers (show "1.2M" not "1,234,567")
- β’Unlike must succeed even if original like is not found in cache
- β’Counter must never go negative
- β’Counts should be recoverable from source of truth (likes table)
- β’Must handle celebrity posts with millions of likes/hour
π Capacity Estimation
π₯ Users
2B users, 100M DAU
πΎ Storage
15TB (likes data + indices)
β‘ QPS
Writes: 1M/sec peak, Reads: 10M/sec peak
π Assumptions
- β’ 2B registered users
- β’ 100M daily active users
- β’ 100B posts total
- β’ Average 50 likes per post
- β’ 5 trillion total likes
- β’ 1M likes per second during peak (viral events)
- β’ 10:1 read to write ratio on counts
- β’ Average like record: 50 bytes
π‘ Key Concepts
CRITICAL
Sharded Counters
Split a single counter across N shards (e.g., 100) to distribute writes. Write to random shard, read by summing all shards. Avoids hot key problem where one Redis key receives millions of updates/sec.
HIGH
Async Counting
Counter updates happen asynchronously via Kafka. The like operation returns immediately without waiting for counter update. This provides eventual consistency with high availability.
HIGH
Bloom Filter
Space-efficient probabilistic data structure for "has user liked this post" checks. Returns "definitely not liked" or "possibly liked". 99% of checks are "not liked" so this saves most DB reads.
MEDIUM
Approximate Display
Show "1.2M" instead of exact count for large numbers. Users don't care about exact count for popular posts. Allows for eventual consistency without confusing users.
MEDIUM
Counter Reconciliation
Daily job that compares Redis counters with actual Cassandra counts and fixes drift. Ensures counters stay accurate despite lost events or failures.
MEDIUM
Optimistic UI
Show the like/unlike state immediately before server confirms. If server fails, revert the UI. Provides instant feedback for better UX.
π‘ Interview Tips
- π‘Start by explaining the hot key problem - single counter can't handle 1M writes/sec
- π‘Introduce sharded counters as the core solution - this shows distributed systems knowledge
- π‘Discuss the sync write (Cassandra) + async count update (Kafka) architecture
- π‘Explain why eventual consistency is acceptable for like counts
- π‘Mention Bloom filter optimization for "has user liked" checks
- π‘Be ready to discuss counter reconciliation and drift correction
- π‘Know the numbers: 1M writes/sec, 10M reads/sec, 100 shards, 5s cache TTL
- π‘Discuss the tradeoff between accuracy and scalability