📝 Problem Description
Design a web analytics platform like Google Analytics. Track page views, user sessions, events, and conversions. Handle billions of events per day with near-real-time dashboards and flexible querying.
👤 Use Cases
1.
Website wants to sends page view event so that event recorded for analytics
2.
Analyst wants to views dashboard so that sees traffic metrics in real-time
3.
Analyst wants to creates custom report so that queries data with dimensions and metrics
4.
System wants to processes events so that aggregates for fast queries
✅ Functional Requirements
- •Track page views, sessions, events
- •Real-time dashboard (seconds delay)
- •Historical reports (up to 2 years)
- •Segmentation and filtering
- •Funnel analysis
- •User demographics and behavior
- •Custom events and dimensions
⚡ Non-Functional Requirements
- •Handle 1M events/sec
- •Dashboard latency < 1 second
- •Store 2 years of data
- •99.9% availability
⚠️ Constraints & Assumptions
- •Events are high volume, small size
- •Real-time needs faster path than batch
- •Flexible queries on many dimensions
📊 Capacity Estimation
👥 Users
10M tracked websites
💾 Storage
1PB (2 years of events)
⚡ QPS
Events: 1M/sec, Queries: 10K/sec
📐 Assumptions
- • 1M events per second
- • Average event: 500 bytes
- • 10M tracked websites
- • 100M unique users per day
- • Real-time latency: < 5 seconds
- • Dashboard query latency: < 1 second
💡 Key Concepts
CRITICAL
Lambda Architecture
Separate real-time (speed) and batch (accuracy) paths, query layer merges.
CRITICAL
Pre-aggregation
Aggregate metrics by dimensions (time, page, country) for fast queries.
HIGH
Session Stitching
Link page views into sessions using cookies/IDs with timeout.
MEDIUM
Sampling
For high-traffic sites, sample events to reduce processing cost.
💡 Interview Tips
- 💡Start with the event collection pipeline
- 💡Discuss the stream processing architecture
- 💡Emphasize the real-time vs batch tradeoff
- 💡Be prepared to discuss data modeling
- 💡Know the OLAP database options
- 💡Understand the session attribution