📝 Problem Description
Design a distributed file system like Google File System (GFS) or Hadoop Distributed File System (HDFS) that can store petabytes of data across thousands of commodity servers with fault tolerance and high throughput for large sequential reads/writes.
👤 Use Cases
1.
MapReduce Job wants to reads 1TB file so that high throughput parallel read
2.
Client wants to writes large file so that file split into chunks and replicated
3.
ChunkServer wants to fails so that automatic re-replication maintains redundancy
4.
Multiple Writers wants to append to same file so that atomic concurrent appends
✅ Functional Requirements
- •Store and retrieve large files (multi-GB to TB)
- •Support append operations efficiently
- •Provide namespace operations (create, delete, rename)
- •Replicate data for fault tolerance
- •Handle chunk server failures transparently
- •Support atomic record append for concurrent writers
- •Provide strong consistency for file metadata
⚡ Non-Functional Requirements
- •Store 100+ petabytes of data
- •Support thousands of concurrent clients
- •High throughput (100+ GB/sec aggregate)
- •Tolerate simultaneous failures of multiple servers
- •Automatic recovery without data loss
- •Optimize for large sequential I/O
⚠️ Constraints & Assumptions
- •Optimize for large sequential reads/writes; small-file workloads are not the primary target
- •Fixed chunk/block size (e.g., 64MB/128MB); clients read/write by chunk
- •Replication factor typically 3 with rack/zone awareness to survive rack failures
- •Commodity servers fail frequently; system must detect failures via heartbeats and re-replicate automatically
- •Metadata must be strongly consistent (namespace + chunk mapping); data replicas may be eventually consistent within write pipeline rules
- •High write throughput via append and pipelined replication; tail latency is secondary to throughput
- •Master/NameNode is logically single-writer for metadata; must support standby/failover
📊 Capacity Estimation
👥 Users
10,000 concurrent clients
💾 Storage
100 PB across 10,000 chunk servers
⚡ QPS
Metadata: 10K/sec; Chunk operations: 1M/sec
🌐 Bandwidth
Aggregate: 100 GB/sec reads, 50 GB/sec writes
📐 Assumptions
- • Average file size: 1GB (many are 100GB+)
- • Chunk size: 64MB (GFS) or 128MB (HDFS)
- • Replication factor: 3
- • Chunk server: 10TB storage each
💡 Key Concepts
CRITICAL
Master/NameNode
Manages file system namespace and chunk-to-server mapping
CRITICAL
ChunkServer/DataNode
Stores and serves file chunks on local disk
HIGH
Write Pipeline
Chain replication for fault-tolerant writes
💡 Interview Tips
- 💡Start with the GFS paper concepts - single master, many chunk servers, large chunks
- 💡Emphasize the separation of control plane (master) and data plane (chunk servers)
- 💡Discuss rack-aware placement early - it shows understanding of real-world constraints
- 💡Be prepared to discuss the CAP theorem tradeoffs in GFS/HDFS
- 💡Know the numbers: 64MB chunks, 3x replication, 30s heartbeat interval
- 💡Understand why append is the primary write pattern and how atomic append works
- 💡Discuss master high availability - this is often a follow-up question
- 💡Be ready to compare with modern systems like Ceph, MinIO, or cloud object storage