π Problem Description
Design a URL shortening service like TinyURL or bit.ly that takes long URLs and creates much shorter aliases for them. When users access the short URL, they should be redirected to the original URL. This is a classic system design interview problem that tests fundamentals of distributed systems.
π€ Use Cases
1.
User wants to submit a long URL so that they receive a short, shareable link
2.
User wants to click on a short URL so that they are redirected to the original long URL
3.
User wants to create a custom short link so that they can use a memorable alias for their URL
4.
User wants to view analytics for their short URL so that they can see click counts, geographic data, and referrers
5.
User wants to set an expiration time so that the short URL becomes invalid after the specified time
β Functional Requirements
- β’Given a URL, generate a shorter unique alias (short link)
- β’When users access the short link, redirect to the original URL
- β’Users should optionally be able to pick a custom short link
- β’Links should expire after a configurable timespan
- β’Track analytics: click count, location, device type
- β’Users should be able to delete their short URLs
β‘ Non-Functional Requirements
- β’System should be highly available (URL redirection should not fail)
- β’URL redirection should happen in real-time with minimal latency (<100ms)
- β’Shortened links should not be predictable (security)
- β’System should handle 100M URL shortenings per month
- β’The system should be scalable to handle billions of redirects
β οΈ Constraints & Assumptions
- β’Short URLs should be 7 characters or less (excluding domain)
- β’URLs must be unique - no collisions allowed
- β’System must support 1000+ write requests per second
- β’Read-to-write ratio is approximately 100:1
- β’Short URLs cannot be reused after deletion (for security)
- β’Maximum original URL length: 2048 characters
π Capacity Estimation
π₯ Users
500M monthly active users
πΎ Storage
~15TB for 5 years (100 bytes per URL Γ 500M URLs/month Γ 60 months)
β‘ QPS
Write: ~40 URLs/sec, Read: ~4000 redirects/sec
π Bandwidth
Read: 50GB/day, Write: 5GB/day (10:1 read/write ratio)
π Assumptions
- β’ 100M new URLs shortened per month
- β’ 100:1 read to write ratio (more redirects than shortenings)
- β’ Each URL entry: ~500 bytes (URL + metadata)
- β’ 5 year data retention
- β’ Average original URL length: 200 characters
- β’ 7 characters for short key using Base62 = 62^7 = 3.5 trillion combinations
π‘ Key Concepts
CRITICAL
Base62 Encoding
Use [a-zA-Z0-9] to encode numeric IDs into short strings. 62^7 = 3.5 trillion combinations is enough for our scale.
CRITICAL
Key Generation Service (KGS)
Pre-generate keys to avoid collisions and improve performance. KGS maintains a pool of unused keys.
CRITICAL
Read-Heavy Optimization
Use caching extensively since reads >> writes (100:1 ratio). Cache at multiple layers: CDN, Redis, application.
HIGH
301 vs 302 Redirects
301 (permanent) is cacheable by browsers - good for SEO but reduces analytics accuracy. 302 (temporary) allows tracking each click.
HIGH
Consistent Hashing
For database sharding, use consistent hashing on short key to distribute data and minimize redistribution on node changes.
MEDIUM
Cache-Aside Pattern
Application checks cache first, on miss queries DB and populates cache. Works well for read-heavy workloads.
π‘ Interview Tips
- π‘Start with clarifying requirements - ask about scale, features, and constraints
- π‘Do back-of-envelope calculations early to understand scale (QPS, storage)
- π‘Discuss the key generation algorithm in detail - it's the core problem
- π‘Mention caching strategy - it's critical for read-heavy systems
- π‘Don't forget to discuss monitoring and rate limiting
- π‘Proactively mention security (malware detection, abuse prevention)
- π‘Draw diagrams - visual communication is important in interviews
- π‘Discuss trade-offs for each decision you make