Cloud Cost Optimization: Beyond Reserved Instances

Cloud costs have a way of surprising organizations. The flexibility that makes cloud attractive also makes it easy to overspend. What starts as affordable experimentation becomes significant expense as services scale.

Here’s how to optimize cloud spending systematically.

Understanding Your Bill

Cost Breakdown

Before optimizing, understand where money goes:

Typical breakdown:
├── Compute (EC2, GCE): 40-60%
├── Storage (S3, EBS): 15-25%
├── Data Transfer: 10-20%
├── Database Services: 10-20%
└── Other Services: 5-15%

Use Cost Explorer (AWS), Billing Reports (GCP), or Cost Analysis (Azure).

Tagging for Attribution

Without tags, cost attribution is guesswork:

# Required tags
Environment: production | staging | development
Team: platform | orders | payments
Service: api | worker | web
CostCenter: CC-12345

Enforce tagging:

AWS Organizations SCP
GCP Organization Policies
Azure Policy

Cost Anomaly Detection

Set up alerts for unusual spending:

# AWS Cost Anomaly Detection
aws ce create-anomaly-monitor \
    --anomaly-monitor \
    MonitorName=SpendAnomaly,\
    MonitorType=DIMENSIONAL

Compute Optimization

Right-Sizing

Most instances are oversized:

Before: m5.2xlarge (8 vCPU, 32 GB) - $0.384/hour
Actual usage: 15% CPU, 8 GB memory
After: m5.large (2 vCPU, 8 GB) - $0.096/hour
Savings: 75%

Tools:

AWS Compute Optimizer
GCP Recommender
Azure Advisor

Process:

Monitor actual utilization
Compare against instance specs
Right-size during maintenance windows
Monitor for performance impact

Spot/Preemptible Instances

60-90% savings for interruptible workloads:

# Good for spot:
- CI/CD workers
- Batch processing
- Dev/test environments
- Stateless services with multiple replicas

# Bad for spot:
- Single-instance databases
- Stateful services
- Latency-sensitive applications

Strategies:

Diversify instance types
Use spot fleets
Implement graceful shutdown handling
Maintain on-demand capacity for critical paths

Reserved Instances / Savings Plans

Commit for predictable workloads:

Option	Discount	Flexibility
On-Demand	0%	Maximum
Savings Plans (1yr)	25-40%	Instance family
Reserved (1yr)	30-40%	Specific instance
Reserved (3yr)	50-60%	Specific instance

When to commit:

Baseline capacity (running 24/7)
Predictable growth
After right-sizing

Calculate carefully:

Break-even: ~7-9 months for 1-year
If utilization < 65%, may lose money

Scheduled Scaling

Not everything runs 24/7:

# Scale down dev/staging at night
schedule:
  - cron: "0 20 * * MON-FRI"
    action: scale-to-zero
  - cron: "0 8 * * MON-FRI"
    action: scale-to-normal

40 hours/week vs 168 hours = 76% savings.

Storage Optimization

Storage Classes

Use appropriate tiers:

Class	Use Case	Price (AWS)
Standard	Frequent access	$0.023/GB
Infrequent Access	Monthly access	$0.0125/GB
Glacier	Archival	$0.004/GB
Glacier Deep Archive	Rare access	$0.00099/GB

Lifecycle Policies

Automate tier transitions:

{
  "Rules": [{
    "ID": "Archive old data",
    "Status": "Enabled",
    "Transitions": [
      {"Days": 30, "StorageClass": "STANDARD_IA"},
      {"Days": 90, "StorageClass": "GLACIER"},
      {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
    ],
    "Expiration": {"Days": 2555}
  }]
}

Delete Unused Resources

Common orphaned resources:

Unattached EBS volumes
Old snapshots
Unused Elastic IPs
Outdated AMIs

Cleanup script:

# Find unattached volumes
aws ec2 describe-volumes \
    --filters "Name=status,Values=available" \
    --query 'Volumes[*].VolumeId'

# Delete old snapshots (> 90 days)
aws ec2 describe-snapshots \
    --owner-ids self \
    --query 'Snapshots[?StartTime<`2019-01-01`].SnapshotId'

S3 Intelligent-Tiering

Automatic tier optimization:

# Enable for buckets with unknown access patterns
aws s3api put-bucket-intelligent-tiering-configuration \
    --bucket my-bucket \
    --id tier-config \
    --intelligent-tiering-configuration \
    '{"Status": "Enabled", "Tierings": [...]}'

Data Transfer Optimization

Understand Transfer Costs

Free:
- Inbound data
- Same-AZ within AWS

Costs money:
- Cross-AZ: $0.01/GB
- Cross-region: $0.02/GB
- Outbound to internet: $0.09/GB
- NAT Gateway: $0.045/GB

VPC Endpoints

Avoid NAT Gateway for AWS services:

# S3 Gateway Endpoint (free)
aws ec2 create-vpc-endpoint \
    --vpc-id vpc-123 \
    --service-name com.amazonaws.us-east-1.s3 \
    --route-table-ids rtb-456

Saves NAT Gateway data processing fees.

CloudFront for Egress

CDN egress is cheaper than direct:

Direct S3 egress: $0.09/GB
CloudFront egress: $0.085/GB (first 10TB)

Plus caching reduces origin requests.

Keep Data in Region

Cross-region transfer adds up:

10 TB/month cross-region = $200/month
Keep data local when possible

Database Optimization

Right-Size RDS

Database instances are often oversized:

Check CPU and memory utilization
Consider burstable instances (t3) for dev/test
Use Reserved Instances for production

Aurora Serverless for Variable Workloads

Pay per ACU-second:

# Good for:
- Development databases
- Infrequently used applications
- Variable workloads with idle periods

Read Replicas vs. Scaling Up

Reads scale out cheaper than scaling up:

Option A: db.r5.4xlarge ($2.30/hr)
Option B: db.r5.large + 3 read replicas ($1.15/hr)

Application must support read/write splitting.

FinOps Practices

Monthly Cost Reviews

Regular review cadence:

Weekly: Anomaly review
Monthly: Detailed cost analysis
Quarterly: Architecture review for optimization

Cost Ownership

Teams own their costs:

Dashboards per team
Budget alerts
Cost in sprint planning

Showback/Chargeback

Make costs visible:

Team A:
- Compute: $5,000
- Storage: $800
- Data Transfer: $400
Total: $6,200

Trend: +12% from last month

Key Takeaways

Tag everything; without tags, attribution is impossible
Right-size first; most instances are oversized
Use spot for interruptible workloads; 60-90% savings
Commit (Reserved/Savings Plans) only for baseline, predictable capacity
Implement storage lifecycle policies; automatic tier transitions
Delete orphaned resources; unattached volumes, old snapshots
Use VPC endpoints to avoid NAT Gateway fees
CDN egress is cheaper than direct; use CloudFront
Make costs visible to teams; ownership drives optimization

Cloud cost optimization is ongoing. Measure, optimize, measure again.