Serverless at Scale: Patterns and Anti-Patterns

Serverless has moved from experimentation to production workloads. The promise of automatic scaling and zero infrastructure management is real—but so are the challenges that emerge at scale.

Here’s what I’ve learned running serverless workloads in production.

Where Serverless Shines

Event Processing

Natural fit for event-driven workloads:

# Process S3 uploads
Events:
  S3Event:
    Type: S3
    Properties:
      Bucket: !Ref UploadBucket
      Events: s3:ObjectCreated:*

def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        process_upload(bucket, key)

Why it works:

Sporadic, unpredictable traffic
Scales to zero when idle
Scales massively during spikes
Per-invocation billing

API Backends

HTTP APIs with variable traffic:

# API Gateway + Lambda
Resources:
  ApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: api.handler
      Events:
        Api:
          Type: HttpApi
          Properties:
            Path: /users/{id}
            Method: GET

Works well for:

CRUD operations
Low to medium request rates
Bursty traffic patterns

Scheduled Tasks

Cron-style jobs:

Events:
  ScheduledEvent:
    Type: Schedule
    Properties:
      Schedule: rate(1 hour)

Better than:

Running a cron server 24/7
Container scheduled tasks for simple jobs

Challenges at Scale

Cold Starts

First invocation after idle period:

Cold start breakdown:
├── Download code (10-50ms)
├── Start runtime (100-300ms)
├── Initialize dependencies (50-500ms+)
└── Your code init (varies)

Total: 200ms - 2000ms+ depending on language/dependencies

Mitigation strategies:

# Bad: Import in handler
def handler(event, context):
    import heavy_library  # Runs every cold start
    return heavy_library.process(event)

# Good: Import at module level
import heavy_library  # Runs once per cold start

def handler(event, context):
    return heavy_library.process(event)

# Provisioned concurrency (keeps instances warm)
ProvisionedConcurrencyConfig:
  ProvisionedConcurrentExecutions: 5

Connection Management

Traditional connection pools don’t work:

# Problem: Each instance creates connections
# 1000 concurrent Lambdas = 1000 database connections

# Solution: RDS Proxy / Connection pooling service
import boto3

rds = boto3.client('rds')
connection = connect_via_rds_proxy()  # Managed pooling

State Management

Lambdas are stateless between invocations:

# Bad: Relying on state between calls
cache = {}  # Lost when instance recycled

def handler(event, context):
    if key in cache:  # Unreliable
        return cache[key]

# Good: External state store
import redis
cache = redis.Redis(host='elasticache-endpoint')

def handler(event, context):
    return cache.get(key) or compute_and_store(key)

Timeout Limits

Lambda max: 15 minutes. Step Functions for longer:

# Step Functions for long-running workflows
States:
  ProcessChunk:
    Type: Task
    Resource: !GetAtt ProcessFunction.Arn
    Next: MoreChunks?

  MoreChunks?:
    Type: Choice
    Choices:
      - Variable: $.hasMore
        BooleanEquals: true
        Next: ProcessChunk
    Default: Done

Observability Challenges

Distributed tracing essential:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

patch_all()  # Instrument AWS SDK calls

@xray_recorder.capture('process_order')
def process_order(order_id):
    # Traced automatically
    pass

Cost Optimization

Understand Pricing

Lambda cost = (Invocations × $0.20/1M) + (GB-seconds × $0.0000166667)

Example:
- 10M invocations/month
- 500ms average duration
- 1GB memory

Cost = ($2) + (10M × 0.5s × 1GB × $0.0000166667)
     = $2 + $83.33
     = $85.33/month

Right-Size Memory

Memory affects CPU allocation:

# Test different memory configurations
# More memory = more CPU = potentially faster = less GB-seconds

128MB:  Duration 2000ms, Cost: 0.128 × 2 × $rate = $X
512MB:  Duration 600ms,  Cost: 0.512 × 0.6 × $rate = $Y
1024MB: Duration 400ms,  Cost: 1.024 × 0.4 × $rate = $Z

# Sometimes more memory is cheaper!

Tools like AWS Lambda Power Tuning automate this.

Avoid Serverless for Steady Load

Break-even analysis:

Lambda (1M requests, 200ms, 256MB):
  = $0.20 + (1M × 0.2s × 0.256GB × $0.0000166667)
  = $0.20 + $0.85 = $1.05/month

Fargate (always-on small container):
  = ~$13/month

At ~12M requests/month, Fargate becomes cheaper
For steady 24/7 load, containers win

Anti-Patterns

Lambda as Everything

❌ Long-running processes (> 15 min)
❌ Steady high-throughput workloads
❌ WebSocket servers
❌ Applications needing local state
❌ GPU/specialized hardware needs

✓ Event handlers
✓ API backends with variable traffic
✓ Scheduled tasks
✓ Data transformation pipelines
✓ Glue between services

Synchronous Chains

❌ Bad: Lambda → Lambda → Lambda (synchronous)
   - Coupled scaling
   - Timeout multiplication
   - Error handling complex

✓ Good: Lambda → Queue → Lambda → Queue → Lambda
   - Independent scaling
   - Natural retry/DLQ
   - Loose coupling

Ignoring VPC Costs

Lambda in VPC adds cold start latency:

Outside VPC: ~100ms cold start
Inside VPC: ~500ms-1s cold start (ENI attachment)

Only use VPC when necessary:
- Accessing RDS/ElastiCache
- Internal services
- Compliance requirements

Massive Deployment Packages

❌ 500MB deployment package
   - Slow cold starts
   - Slow deployments
   - Often includes unused dependencies

✓ Optimize packages:
   - Use layers for shared dependencies
   - Tree-shake unused code
   - Consider separate functions

Patterns That Work

Fan-Out Processing

# Trigger function
def trigger(event, context):
    items = get_items_to_process()
    for item in items:
        sqs.send_message(
            QueueUrl=queue_url,
            MessageBody=json.dumps(item)
        )

# Worker function (scaled by SQS)
def worker(event, context):
    for record in event['Records']:
        item = json.loads(record['body'])
        process_item(item)

API with Caching

# API Gateway caching
MethodSettings:
  - ResourcePath: /users
    HttpMethod: GET
    CachingEnabled: true
    CacheTtlInSeconds: 300

Event Sourcing

# EventBridge for event distribution
def handler(event, context):
    # Process and emit event
    result = process_order(event)

    eventbridge.put_events(
        Entries=[{
            'Source': 'orders.service',
            'DetailType': 'OrderProcessed',
            'Detail': json.dumps(result)
        }]
    )

Strangler Pattern Migration

# Gradually migrate endpoints to Lambda
/api/v1/orders    → Lambda
/api/v1/users     → Legacy (for now)
/api/v1/products  → Lambda

# Route at API Gateway level

Monitoring and Debugging

Key Metrics

Critical:
  - Invocation count
  - Error rate
  - Duration (p50, p95, p99)
  - Concurrent executions
  - Throttles
  - Cold starts percentage

Cost-related:
  - GB-seconds consumed
  - Provisioned concurrency utilization

Structured Logging

import json
import logging

logger = logging.getLogger()

def handler(event, context):
    logger.info(json.dumps({
        'level': 'INFO',
        'message': 'Processing request',
        'request_id': context.aws_request_id,
        'order_id': event.get('order_id'),
        'function_version': context.function_version
    }))

Distributed Tracing

X-Ray for request tracing across services:

from aws_xray_sdk.core import xray_recorder

@xray_recorder.capture('external_api_call')
def call_external_api(data):
    # Traced segment
    return requests.post(api_url, json=data)

Key Takeaways

Serverless excels at event processing, variable traffic APIs, and scheduled tasks
Cold starts matter; use provisioned concurrency for latency-sensitive workloads
Connection pooling needs special handling (RDS Proxy, external pools)
Right-size memory; more memory can be cheaper due to faster execution
Avoid synchronous Lambda chains; use queues for loose coupling
VPC adds cold start latency; only use when necessary
For steady high-throughput workloads, containers may be more cost-effective
Observability is harder; invest in tracing and structured logging
Serverless is a tool, not a religion; use it where it fits

Serverless at scale works when you understand its characteristics and design accordingly. The patterns that succeed are event-driven, loosely coupled, and stateless.