📝 Problem Description
Design a cloud file storage and synchronization service like Dropbox or Google Drive. Users can upload, download, and sync files across multiple devices. The system should handle large files efficiently, support real-time sync, file versioning, and sharing capabilities.
👤 Use Cases
1.
User wants to upload a file from their laptop so that the file is stored in the cloud and synced to other devices
2.
User wants to edit a file on their phone so that changes sync to all their devices automatically
3.
User wants to share a folder with colleagues so that colleagues can view/edit files based on permissions
4.
User wants to restore a previous version of a file so that they recover accidentally deleted content
5.
User wants to work offline so that changes sync when they reconnect to the internet
6.
System wants to detect file conflicts so that it notifies users and helps resolve them
✅ Functional Requirements
- •Upload, download, and delete files/folders
- •Sync files across multiple devices in real-time
- •Support large files (up to 50GB)
- •File versioning with ability to restore
- •Share files/folders with permissions (view/edit)
- •Work offline with later synchronization
- •Efficient sync using delta/incremental updates
- •Conflict detection and resolution
⚡ Non-Functional Requirements
- •High availability (99.99%)
- •Data durability (11 nines - 99.999999999%)
- •Low latency sync (< 5 seconds for small files)
- •Efficient bandwidth usage (only transfer changes)
- •Support millions of concurrent users
- •Secure file storage (encryption at rest and in transit)
⚠️ Constraints & Assumptions
- •Maximum file size: 50GB
- •Maximum files per account: 1 million
- •Storage quota per user: 2GB free, up to 2TB paid
- •Sync delay acceptable: up to 30 seconds for large files
- •Support desktop (Windows, Mac, Linux), mobile (iOS, Android), and web
📊 Capacity Estimation
👥 Users
500M total users, 100M DAU
💾 Storage
1 Exabyte total (500M users × 2GB average)
⚡ QPS
Metadata: 500K/sec, File uploads: 50K/sec
🌐 Bandwidth
100 PB/month data transfer
📐 Assumptions
- • 500 million registered users
- • 100 million daily active users
- • Average 200 files per user
- • Average file size: 1MB (varies from KB to GB)
- • 5 devices per user on average
- • 10% of files modified daily
- • Average 3 versions per file
💡 Key Concepts
CRITICAL
Chunking
Split files into fixed-size chunks (4MB typical). Enables parallel upload, incremental sync, and deduplication.
CRITICAL
Content-Addressable Storage
Chunks are identified by their hash (SHA-256). Identical content stored once, enabling deduplication.
CRITICAL
Delta Sync
Only transfer changed chunks, not entire files. Rsync-like algorithm computes deltas efficiently.
HIGH
Version Vectors
Track file versions per device to detect conflicts and ensure consistency.
HIGH
Conflict Resolution
When two devices modify same file offline, system detects and creates "conflicted copy" for user resolution.
MEDIUM
Metadata vs Data Separation
Metadata in RDBMS for consistency, file chunks in object storage for scale. Different scaling strategies.
💡 Interview Tips
- 💡Start with separating metadata from file storage - it's the key insight
- 💡Explain chunking early - essential for large files and dedup
- 💡Draw the sync flow between multiple devices
- 💡Discuss delta sync / rsync algorithm to show depth
- 💡Mention conflict resolution strategy
- 💡Cover both online sync and offline scenarios