Serverless has moved from experimentation to production workloads. The promise of automatic scaling and zero infrastructure management is real—but so are the challenges that emerge at scale.
Here’s what I’ve learned running serverless workloads in production.
Where Serverless Shines
Event Processing
Natural fit for event-driven workloads:
# Process S3 uploads
Events:
S3Event:
Type: S3
Properties:
Bucket: !Ref UploadBucket
Events: s3:ObjectCreated:*
def handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
process_upload(bucket, key)
Why it works:
- Sporadic, unpredictable traffic
- Scales to zero when idle
- Scales massively during spikes
- Per-invocation billing
API Backends
HTTP APIs with variable traffic:
# API Gateway + Lambda
Resources:
ApiFunction:
Type: AWS::Serverless::Function
Properties:
Handler: api.handler
Events:
Api:
Type: HttpApi
Properties:
Path: /users/{id}
Method: GET
Works well for:
- CRUD operations
- Low to medium request rates
- Bursty traffic patterns
Scheduled Tasks
Cron-style jobs:
Events:
ScheduledEvent:
Type: Schedule
Properties:
Schedule: rate(1 hour)
Better than:
- Running a cron server 24/7
- Container scheduled tasks for simple jobs
Challenges at Scale
Cold Starts
First invocation after idle period:
Cold start breakdown:
├── Download code (10-50ms)
├── Start runtime (100-300ms)
├── Initialize dependencies (50-500ms+)
└── Your code init (varies)
Total: 200ms - 2000ms+ depending on language/dependencies
Mitigation strategies:
# Bad: Import in handler
def handler(event, context):
import heavy_library # Runs every cold start
return heavy_library.process(event)
# Good: Import at module level
import heavy_library # Runs once per cold start
def handler(event, context):
return heavy_library.process(event)
# Provisioned concurrency (keeps instances warm)
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 5
Connection Management
Traditional connection pools don’t work:
# Problem: Each instance creates connections
# 1000 concurrent Lambdas = 1000 database connections
# Solution: RDS Proxy / Connection pooling service
import boto3
rds = boto3.client('rds')
connection = connect_via_rds_proxy() # Managed pooling
State Management
Lambdas are stateless between invocations:
# Bad: Relying on state between calls
cache = {} # Lost when instance recycled
def handler(event, context):
if key in cache: # Unreliable
return cache[key]
# Good: External state store
import redis
cache = redis.Redis(host='elasticache-endpoint')
def handler(event, context):
return cache.get(key) or compute_and_store(key)
Timeout Limits
Lambda max: 15 minutes. Step Functions for longer:
# Step Functions for long-running workflows
States:
ProcessChunk:
Type: Task
Resource: !GetAtt ProcessFunction.Arn
Next: MoreChunks?
MoreChunks?:
Type: Choice
Choices:
- Variable: $.hasMore
BooleanEquals: true
Next: ProcessChunk
Default: Done
Observability Challenges
Distributed tracing essential:
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
patch_all() # Instrument AWS SDK calls
@xray_recorder.capture('process_order')
def process_order(order_id):
# Traced automatically
pass
Cost Optimization
Understand Pricing
Lambda cost = (Invocations × $0.20/1M) + (GB-seconds × $0.0000166667)
Example:
- 10M invocations/month
- 500ms average duration
- 1GB memory
Cost = ($2) + (10M × 0.5s × 1GB × $0.0000166667)
= $2 + $83.33
= $85.33/month
Right-Size Memory
Memory affects CPU allocation:
# Test different memory configurations
# More memory = more CPU = potentially faster = less GB-seconds
128MB: Duration 2000ms, Cost: 0.128 × 2 × $rate = $X
512MB: Duration 600ms, Cost: 0.512 × 0.6 × $rate = $Y
1024MB: Duration 400ms, Cost: 1.024 × 0.4 × $rate = $Z
# Sometimes more memory is cheaper!
Tools like AWS Lambda Power Tuning automate this.
Avoid Serverless for Steady Load
Break-even analysis:
Lambda (1M requests, 200ms, 256MB):
= $0.20 + (1M × 0.2s × 0.256GB × $0.0000166667)
= $0.20 + $0.85 = $1.05/month
Fargate (always-on small container):
= ~$13/month
At ~12M requests/month, Fargate becomes cheaper
For steady 24/7 load, containers win
Anti-Patterns
Lambda as Everything
❌ Long-running processes (> 15 min)
❌ Steady high-throughput workloads
❌ WebSocket servers
❌ Applications needing local state
❌ GPU/specialized hardware needs
✓ Event handlers
✓ API backends with variable traffic
✓ Scheduled tasks
✓ Data transformation pipelines
✓ Glue between services
Synchronous Chains
❌ Bad: Lambda → Lambda → Lambda (synchronous)
- Coupled scaling
- Timeout multiplication
- Error handling complex
✓ Good: Lambda → Queue → Lambda → Queue → Lambda
- Independent scaling
- Natural retry/DLQ
- Loose coupling
Ignoring VPC Costs
Lambda in VPC adds cold start latency:
Outside VPC: ~100ms cold start
Inside VPC: ~500ms-1s cold start (ENI attachment)
Only use VPC when necessary:
- Accessing RDS/ElastiCache
- Internal services
- Compliance requirements
Massive Deployment Packages
❌ 500MB deployment package
- Slow cold starts
- Slow deployments
- Often includes unused dependencies
✓ Optimize packages:
- Use layers for shared dependencies
- Tree-shake unused code
- Consider separate functions
Patterns That Work
Fan-Out Processing
# Trigger function
def trigger(event, context):
items = get_items_to_process()
for item in items:
sqs.send_message(
QueueUrl=queue_url,
MessageBody=json.dumps(item)
)
# Worker function (scaled by SQS)
def worker(event, context):
for record in event['Records']:
item = json.loads(record['body'])
process_item(item)
API with Caching
# API Gateway caching
MethodSettings:
- ResourcePath: /users
HttpMethod: GET
CachingEnabled: true
CacheTtlInSeconds: 300
Event Sourcing
# EventBridge for event distribution
def handler(event, context):
# Process and emit event
result = process_order(event)
eventbridge.put_events(
Entries=[{
'Source': 'orders.service',
'DetailType': 'OrderProcessed',
'Detail': json.dumps(result)
}]
)
Strangler Pattern Migration
# Gradually migrate endpoints to Lambda
/api/v1/orders → Lambda
/api/v1/users → Legacy (for now)
/api/v1/products → Lambda
# Route at API Gateway level
Monitoring and Debugging
Key Metrics
Critical:
- Invocation count
- Error rate
- Duration (p50, p95, p99)
- Concurrent executions
- Throttles
- Cold starts percentage
Cost-related:
- GB-seconds consumed
- Provisioned concurrency utilization
Structured Logging
import json
import logging
logger = logging.getLogger()
def handler(event, context):
logger.info(json.dumps({
'level': 'INFO',
'message': 'Processing request',
'request_id': context.aws_request_id,
'order_id': event.get('order_id'),
'function_version': context.function_version
}))
Distributed Tracing
X-Ray for request tracing across services:
from aws_xray_sdk.core import xray_recorder
@xray_recorder.capture('external_api_call')
def call_external_api(data):
# Traced segment
return requests.post(api_url, json=data)
Key Takeaways
- Serverless excels at event processing, variable traffic APIs, and scheduled tasks
- Cold starts matter; use provisioned concurrency for latency-sensitive workloads
- Connection pooling needs special handling (RDS Proxy, external pools)
- Right-size memory; more memory can be cheaper due to faster execution
- Avoid synchronous Lambda chains; use queues for loose coupling
- VPC adds cold start latency; only use when necessary
- For steady high-throughput workloads, containers may be more cost-effective
- Observability is harder; invest in tracing and structured logging
- Serverless is a tool, not a religion; use it where it fits
Serverless at scale works when you understand its characteristics and design accordingly. The patterns that succeed are event-driven, loosely coupled, and stateless.