Rate Limiting Strategies for APIs

June 27, 2022

Rate limiting is essential for any public or heavily-used API. Without it, a single bad actor—or a bug in a client—can take down your service. But rate limiting is nuanced: too aggressive and you frustrate users, too lax and you have no protection.

Here are rate limiting strategies that work.

Why Rate Limit

Protection Goals

rate_limiting_goals:
  availability:
    - Prevent resource exhaustion
    - Protect against DDoS
    - Maintain service for all users

  fairness:
    - Equal access for users
    - Prevent single user monopolizing
    - Metered access for business tiers

  cost_control:
    - Limit expensive operations
    - Prevent runaway costs
    - Enforce usage quotas

  security:
    - Prevent brute force attacks
    - Limit credential stuffing
    - Detect suspicious patterns

Rate Limiting Algorithms

Fixed Window

fixed_window:
  approach: Count requests in fixed time windows (e.g., per minute)
  pros:
    - Simple to implement
    - Easy to understand
  cons:
    - Burst at window boundaries
    - User can do 2x limit across boundary

  example:
    limit: 100 requests per minute
    window: 0:00-0:59, 1:00-1:59, etc.
    problem: 100 at 0:59 + 100 at 1:00 = 200 in 2 seconds
# Fixed window implementation
def is_rate_limited(user_id, limit=100, window_seconds=60):
    key = f"rate_limit:{user_id}:{int(time.time() / window_seconds)}"
    current = redis.incr(key)
    if current == 1:
        redis.expire(key, window_seconds)
    return current > limit

Sliding Window Log

sliding_window_log:
  approach: Track timestamp of each request, count in rolling window
  pros:
    - Most accurate
    - No boundary issues
  cons:
    - Memory intensive
    - Expensive at high volume
# Sliding window log implementation
def is_rate_limited(user_id, limit=100, window_seconds=60):
    key = f"rate_limit:{user_id}"
    now = time.time()
    cutoff = now - window_seconds

    pipe = redis.pipeline()
    # Remove old entries
    pipe.zremrangebyscore(key, 0, cutoff)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Count requests in window
    pipe.zcard(key)
    # Set expiry
    pipe.expire(key, window_seconds)
    results = pipe.execute()

    return results[2] > limit

Sliding Window Counter

sliding_window_counter:
  approach: Combine current and previous window with weights
  pros:
    - Good accuracy
    - Memory efficient
    - Smooth rate limiting
  cons:
    - Slightly more complex

  calculation:
    previous_window_count * (1 - elapsed_ratio) + current_window_count
# Sliding window counter
def is_rate_limited(user_id, limit=100, window_seconds=60):
    now = time.time()
    current_window = int(now / window_seconds)
    previous_window = current_window - 1

    current_key = f"rate_limit:{user_id}:{current_window}"
    previous_key = f"rate_limit:{user_id}:{previous_window}"

    current_count = int(redis.get(current_key) or 0)
    previous_count = int(redis.get(previous_key) or 0)

    # Weight based on position in current window
    elapsed = now % window_seconds
    weight = (window_seconds - elapsed) / window_seconds

    estimated_count = previous_count * weight + current_count

    if estimated_count >= limit:
        return True

    # Increment current window
    pipe = redis.pipeline()
    pipe.incr(current_key)
    pipe.expire(current_key, window_seconds * 2)
    pipe.execute()

    return False

Token Bucket

token_bucket:
  approach: Bucket fills with tokens at fixed rate, request consumes token
  pros:
    - Allows controlled bursting
    - Smooth average rate
    - Intuitive model
  cons:
    - Slightly more complex

  parameters:
    bucket_size: Maximum burst capacity
    refill_rate: Tokens added per second
# Token bucket implementation
class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.refill_rate = refill_rate

    def allow_request(self, user_id, tokens=1):
        key = f"token_bucket:{user_id}"
        now = time.time()

        # Get current state
        data = redis.hgetall(key)
        if data:
            last_refill = float(data[b'last_refill'])
            current_tokens = float(data[b'tokens'])
        else:
            last_refill = now
            current_tokens = self.capacity

        # Refill tokens
        elapsed = now - last_refill
        current_tokens = min(
            self.capacity,
            current_tokens + elapsed * self.refill_rate
        )

        # Check if enough tokens
        if current_tokens >= tokens:
            current_tokens -= tokens
            redis.hset(key, mapping={
                'tokens': current_tokens,
                'last_refill': now
            })
            redis.expire(key, int(self.capacity / self.refill_rate) + 60)
            return True

        return False

Implementation Patterns

HTTP Headers

rate_limit_headers:
  standard:
    X-RateLimit-Limit: Maximum requests allowed
    X-RateLimit-Remaining: Requests remaining in window
    X-RateLimit-Reset: Unix timestamp when limit resets

  draft_standard:
    RateLimit-Limit: "100"
    RateLimit-Remaining: "45"
    RateLimit-Reset: "30"  # Seconds until reset
# Adding rate limit headers
def rate_limit_middleware(request, response):
    user_id = get_user_id(request)
    limit_info = get_rate_limit_info(user_id)

    response.headers['X-RateLimit-Limit'] = str(limit_info.limit)
    response.headers['X-RateLimit-Remaining'] = str(limit_info.remaining)
    response.headers['X-RateLimit-Reset'] = str(limit_info.reset_at)

    if limit_info.remaining <= 0:
        return Response(
            status=429,
            body={'error': 'Rate limit exceeded'},
            headers={'Retry-After': str(limit_info.retry_after)}
        )

    return response

Tiered Limits

tiered_limits:
  free:
    requests_per_minute: 60
    requests_per_day: 1000

  pro:
    requests_per_minute: 600
    requests_per_day: 50000

  enterprise:
    requests_per_minute: 6000
    requests_per_day: unlimited
# Tiered rate limiting
TIER_LIMITS = {
    'free': {'minute': 60, 'day': 1000},
    'pro': {'minute': 600, 'day': 50000},
    'enterprise': {'minute': 6000, 'day': None},
}

def get_user_limits(user):
    return TIER_LIMITS.get(user.tier, TIER_LIMITS['free'])

def is_rate_limited(user):
    limits = get_user_limits(user)

    if limits['minute']:
        if check_limit(user.id, limits['minute'], 60):
            return True

    if limits['day']:
        if check_limit(user.id, limits['day'], 86400):
            return True

    return False

Endpoint-Specific Limits

endpoint_limits:
  "/api/login":
    limit: 5 per minute
    reason: Prevent brute force

  "/api/search":
    limit: 30 per minute
    reason: Expensive operation

  "/api/users":
    limit: 100 per minute
    reason: Standard CRUD

  "/api/webhooks":
    limit: 10 per minute
    reason: Triggers external calls

Client vs. Server Keys

rate_limit_keys:
  by_ip:
    use: Unauthenticated endpoints
    key: IP address
    challenge: NAT, shared IPs

  by_user:
    use: Authenticated endpoints
    key: User ID
    challenge: Account sharing

  by_api_key:
    use: API integrations
    key: API key
    challenge: Key sharing

  combination:
    approach: Multiple limits (IP + user)
    benefit: Defense in depth

Distributed Rate Limiting

Redis-Based

# Distributed rate limiting with Redis
import redis

class DistributedRateLimiter:
    def __init__(self, redis_client, prefix='rate_limit'):
        self.redis = redis_client
        self.prefix = prefix

    def is_limited(self, key, limit, window):
        full_key = f"{self.prefix}:{key}"
        current = int(time.time() / window)
        redis_key = f"{full_key}:{current}"

        # Atomic increment and check
        count = self.redis.incr(redis_key)
        if count == 1:
            self.redis.expire(redis_key, window + 1)

        return count > limit, limit - min(count, limit)

Handling Failures

graceful_degradation:
  redis_down:
    option_1: Fail open (allow requests)
    option_2: Fail closed (deny requests)
    option_3: Local fallback rate limiting

  recommendation:
    production: Fail open for user-facing
    security: Fail closed for auth endpoints
# Graceful degradation
def is_rate_limited(user_id, limit, window):
    try:
        return check_redis_rate_limit(user_id, limit, window)
    except RedisConnectionError:
        logger.warning("Rate limiter unavailable, failing open")
        return False  # Fail open

Key Takeaways

Rate limiting is a balance. Too strict frustrates users; too lenient provides no protection.