Design a Distributed API Rate Limiter - System Design Interview

📝 Problem Description

Design a distributed rate limiting system that can throttle API requests based on various criteria (user, IP, API key). The system should protect backend services from abuse and ensure fair usage while maintaining low latency and high availability.

👤 Use Cases

API Client wants to make requests within limit so that requests are processed normally

API Client wants to exceed rate limit so that receives 429 Too Many Requests response

API Provider wants to set different limits per tier so that free users get 100 req/min, paid get 1000 req/min

System wants to detect abuse pattern so that blocks malicious traffic before it reaches backend

DevOps wants to adjust rate limits dynamically so that can react to load without deployment

✅ Functional Requirements

•Limit requests per client (user, IP, API key)
•Support multiple rate limiting strategies (fixed window, sliding window, token bucket)
•Support different limits per endpoint
•Support different limits per pricing tier
•Return remaining quota in response headers
•Allow dynamic configuration changes
•Support burst traffic within limits

⚡ Non-Functional Requirements

•Add minimal latency (< 1ms p99)
•Work across distributed servers (consistent limiting)
•High availability (rate limiter failure should not block requests)
•Handle 1M+ requests per second globally
•Accurate counting (no significant over/under limiting)
•Scale horizontally with no single point of failure

⚠️ Constraints & Assumptions

•Rate check must complete in < 1ms
•No false rejections of valid requests (high accuracy)
•Soft limits OK: 1-5% overage acceptable for performance
•Must work with stateless API servers
•Rules can be per user, IP, API key, endpoint, or combination

📊 Capacity Estimation

💾 Storage

~10GB for rate limit counters (100M users × 100 bytes)

⚡ QPS

1M requests/second across all endpoints

📐 Assumptions

• 100 million registered users/API keys
• 1 million concurrent active users
• 1 million requests per second peak
• Average rate limit window: 1 minute
• Each counter: 100 bytes (key + count + timestamp)

💡 Key Concepts

CRITICAL

Token Bucket

Tokens refill at constant rate. Request consumes token. Allows bursts up to bucket size while maintaining average rate.

HIGH

Sliding Window Log

Store timestamp of each request. Count requests in sliding window. Most accurate but memory-intensive.

CRITICAL

Sliding Window Counter

Approximate sliding window using weighted average of current and previous fixed windows. Good balance of accuracy and performance.

MEDIUM

Fixed Window Counter

Simple counter reset at window boundary. Fast but can allow 2x burst at window edges.

MEDIUM

Leaky Bucket

Requests enter bucket, processed at fixed rate. Smooths bursts but adds latency.

HIGH

Distributed Counter Consistency

Redis provides atomic increment. Lua scripts for complex logic. Accept slight inconsistency for performance.

💡 Interview Tips

💡Start by clarifying requirements: what to limit by, accuracy needs
💡Explain different algorithms and their trade-offs
💡Draw the request flow showing where rate limiting happens
💡Discuss distributed consistency challenges
💡Mention graceful degradation (what if rate limiter fails)
💡Cover the response format (headers, 429 status)