Cloud costs have a way of surprising organizations. The flexibility that makes cloud attractive also makes it easy to overspend. What starts as affordable experimentation becomes significant expense as services scale.
Here’s how to optimize cloud spending systematically.
Understanding Your Bill
Cost Breakdown
Before optimizing, understand where money goes:
Typical breakdown:
├── Compute (EC2, GCE): 40-60%
├── Storage (S3, EBS): 15-25%
├── Data Transfer: 10-20%
├── Database Services: 10-20%
└── Other Services: 5-15%
Use Cost Explorer (AWS), Billing Reports (GCP), or Cost Analysis (Azure).
Tagging for Attribution
Without tags, cost attribution is guesswork:
# Required tags
Environment: production | staging | development
Team: platform | orders | payments
Service: api | worker | web
CostCenter: CC-12345
Enforce tagging:
- AWS Organizations SCP
- GCP Organization Policies
- Azure Policy
Cost Anomaly Detection
Set up alerts for unusual spending:
# AWS Cost Anomaly Detection
aws ce create-anomaly-monitor \
--anomaly-monitor \
MonitorName=SpendAnomaly,\
MonitorType=DIMENSIONAL
Compute Optimization
Right-Sizing
Most instances are oversized:
Before: m5.2xlarge (8 vCPU, 32 GB) - $0.384/hour
Actual usage: 15% CPU, 8 GB memory
After: m5.large (2 vCPU, 8 GB) - $0.096/hour
Savings: 75%
Tools:
- AWS Compute Optimizer
- GCP Recommender
- Azure Advisor
Process:
- Monitor actual utilization
- Compare against instance specs
- Right-size during maintenance windows
- Monitor for performance impact
Spot/Preemptible Instances
60-90% savings for interruptible workloads:
# Good for spot:
- CI/CD workers
- Batch processing
- Dev/test environments
- Stateless services with multiple replicas
# Bad for spot:
- Single-instance databases
- Stateful services
- Latency-sensitive applications
Strategies:
- Diversify instance types
- Use spot fleets
- Implement graceful shutdown handling
- Maintain on-demand capacity for critical paths
Reserved Instances / Savings Plans
Commit for predictable workloads:
| Option | Discount | Flexibility |
|---|---|---|
| On-Demand | 0% | Maximum |
| Savings Plans (1yr) | 25-40% | Instance family |
| Reserved (1yr) | 30-40% | Specific instance |
| Reserved (3yr) | 50-60% | Specific instance |
When to commit:
- Baseline capacity (running 24/7)
- Predictable growth
- After right-sizing
Calculate carefully:
Break-even: ~7-9 months for 1-year
If utilization < 65%, may lose money
Scheduled Scaling
Not everything runs 24/7:
# Scale down dev/staging at night
schedule:
- cron: "0 20 * * MON-FRI"
action: scale-to-zero
- cron: "0 8 * * MON-FRI"
action: scale-to-normal
40 hours/week vs 168 hours = 76% savings.
Storage Optimization
Storage Classes
Use appropriate tiers:
| Class | Use Case | Price (AWS) |
|---|---|---|
| Standard | Frequent access | $0.023/GB |
| Infrequent Access | Monthly access | $0.0125/GB |
| Glacier | Archival | $0.004/GB |
| Glacier Deep Archive | Rare access | $0.00099/GB |
Lifecycle Policies
Automate tier transitions:
{
"Rules": [{
"ID": "Archive old data",
"Status": "Enabled",
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 2555}
}]
}
Delete Unused Resources
Common orphaned resources:
- Unattached EBS volumes
- Old snapshots
- Unused Elastic IPs
- Outdated AMIs
Cleanup script:
# Find unattached volumes
aws ec2 describe-volumes \
--filters "Name=status,Values=available" \
--query 'Volumes[*].VolumeId'
# Delete old snapshots (> 90 days)
aws ec2 describe-snapshots \
--owner-ids self \
--query 'Snapshots[?StartTime<`2019-01-01`].SnapshotId'
S3 Intelligent-Tiering
Automatic tier optimization:
# Enable for buckets with unknown access patterns
aws s3api put-bucket-intelligent-tiering-configuration \
--bucket my-bucket \
--id tier-config \
--intelligent-tiering-configuration \
'{"Status": "Enabled", "Tierings": [...]}'
Data Transfer Optimization
Understand Transfer Costs
Free:
- Inbound data
- Same-AZ within AWS
Costs money:
- Cross-AZ: $0.01/GB
- Cross-region: $0.02/GB
- Outbound to internet: $0.09/GB
- NAT Gateway: $0.045/GB
VPC Endpoints
Avoid NAT Gateway for AWS services:
# S3 Gateway Endpoint (free)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-123 \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids rtb-456
Saves NAT Gateway data processing fees.
CloudFront for Egress
CDN egress is cheaper than direct:
Direct S3 egress: $0.09/GB
CloudFront egress: $0.085/GB (first 10TB)
Plus caching reduces origin requests.
Keep Data in Region
Cross-region transfer adds up:
10 TB/month cross-region = $200/month
Keep data local when possible
Database Optimization
Right-Size RDS
Database instances are often oversized:
- Check CPU and memory utilization
- Consider burstable instances (t3) for dev/test
- Use Reserved Instances for production
Aurora Serverless for Variable Workloads
Pay per ACU-second:
# Good for:
- Development databases
- Infrequently used applications
- Variable workloads with idle periods
Read Replicas vs. Scaling Up
Reads scale out cheaper than scaling up:
Option A: db.r5.4xlarge ($2.30/hr)
Option B: db.r5.large + 3 read replicas ($1.15/hr)
Application must support read/write splitting.
FinOps Practices
Monthly Cost Reviews
Regular review cadence:
Weekly: Anomaly review
Monthly: Detailed cost analysis
Quarterly: Architecture review for optimization
Cost Ownership
Teams own their costs:
- Dashboards per team
- Budget alerts
- Cost in sprint planning
Showback/Chargeback
Make costs visible:
Team A:
- Compute: $5,000
- Storage: $800
- Data Transfer: $400
Total: $6,200
Trend: +12% from last month
Key Takeaways
- Tag everything; without tags, attribution is impossible
- Right-size first; most instances are oversized
- Use spot for interruptible workloads; 60-90% savings
- Commit (Reserved/Savings Plans) only for baseline, predictable capacity
- Implement storage lifecycle policies; automatic tier transitions
- Delete orphaned resources; unattached volumes, old snapshots
- Use VPC endpoints to avoid NAT Gateway fees
- CDN egress is cheaper than direct; use CloudFront
- Make costs visible to teams; ownership drives optimization
Cloud cost optimization is ongoing. Measure, optimize, measure again.