← Back to All Questions
Hard~55 minData Processing

Design Google Analytics - Web Analytics

GoogleAdobeMixpanelAmplitudeHeap

📝 Problem Description

Design a web analytics platform like Google Analytics. Track page views, user sessions, events, and conversions. Handle billions of events per day with near-real-time dashboards and flexible querying.

👤 Use Cases

1.
Website wants to sends page view event so that event recorded for analytics
2.
Analyst wants to views dashboard so that sees traffic metrics in real-time
3.
Analyst wants to creates custom report so that queries data with dimensions and metrics
4.
System wants to processes events so that aggregates for fast queries

✅ Functional Requirements

  • Track page views, sessions, events
  • Real-time dashboard (seconds delay)
  • Historical reports (up to 2 years)
  • Segmentation and filtering
  • Funnel analysis
  • User demographics and behavior
  • Custom events and dimensions

⚡ Non-Functional Requirements

  • Handle 1M events/sec
  • Dashboard latency < 1 second
  • Store 2 years of data
  • 99.9% availability

⚠️ Constraints & Assumptions

  • Events are high volume, small size
  • Real-time needs faster path than batch
  • Flexible queries on many dimensions

📊 Capacity Estimation

👥 Users
10M tracked websites
💾 Storage
1PB (2 years of events)
⚡ QPS
Events: 1M/sec, Queries: 10K/sec
📐 Assumptions
  • 1M events per second
  • Average event: 500 bytes
  • 10M tracked websites
  • 100M unique users per day
  • Real-time latency: < 5 seconds
  • Dashboard query latency: < 1 second

💡 Key Concepts

CRITICAL
Lambda Architecture
Separate real-time (speed) and batch (accuracy) paths, query layer merges.
CRITICAL
Pre-aggregation
Aggregate metrics by dimensions (time, page, country) for fast queries.
HIGH
Session Stitching
Link page views into sessions using cookies/IDs with timeout.
MEDIUM
Sampling
For high-traffic sites, sample events to reduce processing cost.

💡 Interview Tips

  • 💡Start with the event collection pipeline
  • 💡Discuss the stream processing architecture
  • 💡Emphasize the real-time vs batch tradeoff
  • 💡Be prepared to discuss data modeling
  • 💡Know the OLAP database options
  • 💡Understand the session attribution