Rate limiting is essential for any public or heavily-used API. Without it, a single bad actor—or a bug in a client—can take down your service. But rate limiting is nuanced: too aggressive and you frustrate users, too lax and you have no protection.
Here are rate limiting strategies that work.
Why Rate Limit
Protection Goals
rate_limiting_goals:
availability:
- Prevent resource exhaustion
- Protect against DDoS
- Maintain service for all users
fairness:
- Equal access for users
- Prevent single user monopolizing
- Metered access for business tiers
cost_control:
- Limit expensive operations
- Prevent runaway costs
- Enforce usage quotas
security:
- Prevent brute force attacks
- Limit credential stuffing
- Detect suspicious patterns
Rate Limiting Algorithms
Fixed Window
fixed_window:
approach: Count requests in fixed time windows (e.g., per minute)
pros:
- Simple to implement
- Easy to understand
cons:
- Burst at window boundaries
- User can do 2x limit across boundary
example:
limit: 100 requests per minute
window: 0:00-0:59, 1:00-1:59, etc.
problem: 100 at 0:59 + 100 at 1:00 = 200 in 2 seconds
# Fixed window implementation
def is_rate_limited(user_id, limit=100, window_seconds=60):
key = f"rate_limit:{user_id}:{int(time.time() / window_seconds)}"
current = redis.incr(key)
if current == 1:
redis.expire(key, window_seconds)
return current > limit
Sliding Window Log
sliding_window_log:
approach: Track timestamp of each request, count in rolling window
pros:
- Most accurate
- No boundary issues
cons:
- Memory intensive
- Expensive at high volume
# Sliding window log implementation
def is_rate_limited(user_id, limit=100, window_seconds=60):
key = f"rate_limit:{user_id}"
now = time.time()
cutoff = now - window_seconds
pipe = redis.pipeline()
# Remove old entries
pipe.zremrangebyscore(key, 0, cutoff)
# Add current request
pipe.zadd(key, {str(now): now})
# Count requests in window
pipe.zcard(key)
# Set expiry
pipe.expire(key, window_seconds)
results = pipe.execute()
return results[2] > limit
Sliding Window Counter
sliding_window_counter:
approach: Combine current and previous window with weights
pros:
- Good accuracy
- Memory efficient
- Smooth rate limiting
cons:
- Slightly more complex
calculation:
previous_window_count * (1 - elapsed_ratio) + current_window_count
# Sliding window counter
def is_rate_limited(user_id, limit=100, window_seconds=60):
now = time.time()
current_window = int(now / window_seconds)
previous_window = current_window - 1
current_key = f"rate_limit:{user_id}:{current_window}"
previous_key = f"rate_limit:{user_id}:{previous_window}"
current_count = int(redis.get(current_key) or 0)
previous_count = int(redis.get(previous_key) or 0)
# Weight based on position in current window
elapsed = now % window_seconds
weight = (window_seconds - elapsed) / window_seconds
estimated_count = previous_count * weight + current_count
if estimated_count >= limit:
return True
# Increment current window
pipe = redis.pipeline()
pipe.incr(current_key)
pipe.expire(current_key, window_seconds * 2)
pipe.execute()
return False
Token Bucket
token_bucket:
approach: Bucket fills with tokens at fixed rate, request consumes token
pros:
- Allows controlled bursting
- Smooth average rate
- Intuitive model
cons:
- Slightly more complex
parameters:
bucket_size: Maximum burst capacity
refill_rate: Tokens added per second
# Token bucket implementation
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.refill_rate = refill_rate
def allow_request(self, user_id, tokens=1):
key = f"token_bucket:{user_id}"
now = time.time()
# Get current state
data = redis.hgetall(key)
if data:
last_refill = float(data[b'last_refill'])
current_tokens = float(data[b'tokens'])
else:
last_refill = now
current_tokens = self.capacity
# Refill tokens
elapsed = now - last_refill
current_tokens = min(
self.capacity,
current_tokens + elapsed * self.refill_rate
)
# Check if enough tokens
if current_tokens >= tokens:
current_tokens -= tokens
redis.hset(key, mapping={
'tokens': current_tokens,
'last_refill': now
})
redis.expire(key, int(self.capacity / self.refill_rate) + 60)
return True
return False
Implementation Patterns
HTTP Headers
rate_limit_headers:
standard:
X-RateLimit-Limit: Maximum requests allowed
X-RateLimit-Remaining: Requests remaining in window
X-RateLimit-Reset: Unix timestamp when limit resets
draft_standard:
RateLimit-Limit: "100"
RateLimit-Remaining: "45"
RateLimit-Reset: "30" # Seconds until reset
# Adding rate limit headers
def rate_limit_middleware(request, response):
user_id = get_user_id(request)
limit_info = get_rate_limit_info(user_id)
response.headers['X-RateLimit-Limit'] = str(limit_info.limit)
response.headers['X-RateLimit-Remaining'] = str(limit_info.remaining)
response.headers['X-RateLimit-Reset'] = str(limit_info.reset_at)
if limit_info.remaining <= 0:
return Response(
status=429,
body={'error': 'Rate limit exceeded'},
headers={'Retry-After': str(limit_info.retry_after)}
)
return response
Tiered Limits
tiered_limits:
free:
requests_per_minute: 60
requests_per_day: 1000
pro:
requests_per_minute: 600
requests_per_day: 50000
enterprise:
requests_per_minute: 6000
requests_per_day: unlimited
# Tiered rate limiting
TIER_LIMITS = {
'free': {'minute': 60, 'day': 1000},
'pro': {'minute': 600, 'day': 50000},
'enterprise': {'minute': 6000, 'day': None},
}
def get_user_limits(user):
return TIER_LIMITS.get(user.tier, TIER_LIMITS['free'])
def is_rate_limited(user):
limits = get_user_limits(user)
if limits['minute']:
if check_limit(user.id, limits['minute'], 60):
return True
if limits['day']:
if check_limit(user.id, limits['day'], 86400):
return True
return False
Endpoint-Specific Limits
endpoint_limits:
"/api/login":
limit: 5 per minute
reason: Prevent brute force
"/api/search":
limit: 30 per minute
reason: Expensive operation
"/api/users":
limit: 100 per minute
reason: Standard CRUD
"/api/webhooks":
limit: 10 per minute
reason: Triggers external calls
Client vs. Server Keys
rate_limit_keys:
by_ip:
use: Unauthenticated endpoints
key: IP address
challenge: NAT, shared IPs
by_user:
use: Authenticated endpoints
key: User ID
challenge: Account sharing
by_api_key:
use: API integrations
key: API key
challenge: Key sharing
combination:
approach: Multiple limits (IP + user)
benefit: Defense in depth
Distributed Rate Limiting
Redis-Based
# Distributed rate limiting with Redis
import redis
class DistributedRateLimiter:
def __init__(self, redis_client, prefix='rate_limit'):
self.redis = redis_client
self.prefix = prefix
def is_limited(self, key, limit, window):
full_key = f"{self.prefix}:{key}"
current = int(time.time() / window)
redis_key = f"{full_key}:{current}"
# Atomic increment and check
count = self.redis.incr(redis_key)
if count == 1:
self.redis.expire(redis_key, window + 1)
return count > limit, limit - min(count, limit)
Handling Failures
graceful_degradation:
redis_down:
option_1: Fail open (allow requests)
option_2: Fail closed (deny requests)
option_3: Local fallback rate limiting
recommendation:
production: Fail open for user-facing
security: Fail closed for auth endpoints
# Graceful degradation
def is_rate_limited(user_id, limit, window):
try:
return check_redis_rate_limit(user_id, limit, window)
except RedisConnectionError:
logger.warning("Rate limiter unavailable, failing open")
return False # Fail open
Key Takeaways
- Rate limiting protects availability, ensures fairness, controls costs
- Fixed window is simple but allows boundary bursting
- Sliding window counter provides good accuracy with low memory
- Token bucket allows controlled bursting
- Return rate limit headers so clients can adapt
- Use tiered limits for different customer segments
- Apply stricter limits on expensive or sensitive endpoints
- Combine keys (IP + user) for defense in depth
- Use Redis for distributed rate limiting
- Decide fail-open vs. fail-closed based on endpoint criticality
Rate limiting is a balance. Too strict frustrates users; too lenient provides no protection.