← Back to All Questions
Hard~55 minRate Limiting

Design a Distributed API Rate Limiter

StripeCloudFlareGoogleAmazonNetflixGitHubUber

📝 Problem Description

Design a distributed rate limiting system that can throttle API requests based on various criteria (user, IP, API key). The system should protect backend services from abuse and ensure fair usage while maintaining low latency and high availability.

👤 Use Cases

1.
API Client wants to make requests within limit so that requests are processed normally
2.
API Client wants to exceed rate limit so that receives 429 Too Many Requests response
3.
API Provider wants to set different limits per tier so that free users get 100 req/min, paid get 1000 req/min
4.
System wants to detect abuse pattern so that blocks malicious traffic before it reaches backend
5.
DevOps wants to adjust rate limits dynamically so that can react to load without deployment

✅ Functional Requirements

  • Limit requests per client (user, IP, API key)
  • Support multiple rate limiting strategies (fixed window, sliding window, token bucket)
  • Support different limits per endpoint
  • Support different limits per pricing tier
  • Return remaining quota in response headers
  • Allow dynamic configuration changes
  • Support burst traffic within limits

⚡ Non-Functional Requirements

  • Add minimal latency (< 1ms p99)
  • Work across distributed servers (consistent limiting)
  • High availability (rate limiter failure should not block requests)
  • Handle 1M+ requests per second globally
  • Accurate counting (no significant over/under limiting)
  • Scale horizontally with no single point of failure

⚠️ Constraints & Assumptions

  • Rate check must complete in < 1ms
  • No false rejections of valid requests (high accuracy)
  • Soft limits OK: 1-5% overage acceptable for performance
  • Must work with stateless API servers
  • Rules can be per user, IP, API key, endpoint, or combination

📊 Capacity Estimation

💾 Storage
~10GB for rate limit counters (100M users × 100 bytes)
⚡ QPS
1M requests/second across all endpoints
📐 Assumptions
  • 100 million registered users/API keys
  • 1 million concurrent active users
  • 1 million requests per second peak
  • Average rate limit window: 1 minute
  • Each counter: 100 bytes (key + count + timestamp)

💡 Key Concepts

CRITICAL
Token Bucket
Tokens refill at constant rate. Request consumes token. Allows bursts up to bucket size while maintaining average rate.
HIGH
Sliding Window Log
Store timestamp of each request. Count requests in sliding window. Most accurate but memory-intensive.
CRITICAL
Sliding Window Counter
Approximate sliding window using weighted average of current and previous fixed windows. Good balance of accuracy and performance.
MEDIUM
Fixed Window Counter
Simple counter reset at window boundary. Fast but can allow 2x burst at window edges.
MEDIUM
Leaky Bucket
Requests enter bucket, processed at fixed rate. Smooths bursts but adds latency.
HIGH
Distributed Counter Consistency
Redis provides atomic increment. Lua scripts for complex logic. Accept slight inconsistency for performance.

💡 Interview Tips

  • 💡Start by clarifying requirements: what to limit by, accuracy needs
  • 💡Explain different algorithms and their trade-offs
  • 💡Draw the request flow showing where rate limiting happens
  • 💡Discuss distributed consistency challenges
  • 💡Mention graceful degradation (what if rate limiter fails)
  • 💡Cover the response format (headers, 429 status)