Multi-cloud is a popular talking point. The pitch: avoid vendor lock-in, leverage best-of-breed services, increase resilience. The reality is more nuanced. Multi-cloud adds significant complexity and often doesn’t deliver the promised benefits.
Here’s when multi-cloud makes sense and how to approach it realistically.
What Multi-Cloud Actually Means
Types of Multi-Cloud
accidental_multi_cloud:
description: Different teams chose different clouds
example: Marketing uses GCP, Engineering uses AWS
benefit: None (just complexity)
common: Very
best_of_breed:
description: Specific services from specific clouds
example: AWS for compute, GCP for ML
benefit: Access to best tools
complexity: Medium
redundancy_multi_cloud:
description: Same workload on multiple clouds
example: Run in AWS and Azure for resilience
benefit: No single cloud dependency
complexity: Very high
arbitrage_multi_cloud:
description: Move workloads for cost/capacity
example: Burst to cheapest available cloud
benefit: Cost optimization
complexity: High
The Complexity Reality
Single cloud:
- One IAM system
- One networking model
- One set of services
- One billing system
- One support relationship
Multi-cloud:
- Multiple IAM systems to secure
- Different networking paradigms
- Similar but different services
- Multiple billing relationships
- Multiple support contracts
- Cross-cloud networking complexity
- Skill requirements multiply
When Multi-Cloud Makes Sense
Valid Use Cases
regulatory_requirements:
scenario: Data must stay in specific region/provider
example: EU data on EU-based cloud
approach: Policy-based placement
acquisition:
scenario: Acquired company uses different cloud
example: AWS company acquires Azure company
approach: Gradual consolidation or maintain both
specific_capabilities:
scenario: Service only available on one cloud
example: BigQuery for analytics, AWS for main workload
approach: Well-defined integration points
customer_requirements:
scenario: Customers require specific cloud
example: SaaS must be available on customer's cloud
approach: Multi-cloud deployment for customer choice
risk_management:
scenario: Cannot tolerate cloud-wide outage
example: Financial services requiring extreme resilience
approach: Active-active or active-passive across clouds
When It Doesn’t Make Sense
avoid_multicloud_when:
lock_in_fear:
reality: You're already locked into many things
better: Design for portability within one cloud
theoretical_cost_savings:
reality: Operational costs often exceed savings
better: Optimize within one cloud first
resume_driven:
reality: Engineers want multi-cloud experience
better: Focus on business value
vague_resilience:
reality: Multi-region in one cloud is usually sufficient
better: Multi-AZ, multi-region first
Practical Multi-Cloud Patterns
Abstraction Layers
Portable where it matters:
abstract:
compute:
- Kubernetes (runs anywhere)
- Containers (portable)
data:
- PostgreSQL (managed or self-managed)
- Standard protocols (S3-compatible)
messaging:
- Kafka (portable)
- Standard protocols (AMQP)
cloud_specific:
managed_services:
- Use native services when better
- Accept some lock-in for value
integration:
- Define clear boundaries
- Minimize cross-cloud dependencies
Kubernetes as Common Layer
┌─────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ (Kubernetes workloads) │
├─────────────────────────────────────────────────────────────────┤
│ Kubernetes │
│ (EKS, GKE, AKS, or self-managed) │
├───────────────────┬───────────────────┬─────────────────────────┤
│ AWS │ GCP │ Azure │
└───────────────────┴───────────────────┴─────────────────────────┘
# Portable workload definition
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: app
image: myregistry.io/my-app:v1
# Same definition works on any Kubernetes
Data Strategy
portable_data:
option_1_open_source:
database: PostgreSQL, MySQL
cache: Redis
message_queue: Kafka
advantage: Can run anywhere
disadvantage: More operational burden
option_2_compatible_apis:
storage: S3-compatible (MinIO, GCS interop)
advantage: Familiar APIs
disadvantage: Not fully portable
option_3_managed_with_export:
use: Cloud-native managed services
plan: Export capability for migration
advantage: Best experience now
disadvantage: Migration effort later
Networking
Cross-cloud connectivity:
networking_options:
vpn:
description: Encrypted tunnel over internet
pros: Simple, cheap
cons: Variable latency, bandwidth
dedicated_interconnect:
description: Direct connection between clouds
example: AWS Direct Connect to Azure ExpressRoute
pros: Low latency, consistent
cons: Expensive, complex setup
software_defined:
description: Overlay network across clouds
tools: HashiCorp Consul, Cilium Cluster Mesh
pros: Flexibility
cons: Complexity, overhead
Operational Considerations
Team Skills
skill_requirements:
single_cloud:
- Deep expertise in one platform
- Efficient operations
multi_cloud:
- Broad knowledge across platforms
- Or separate teams per cloud
- Higher training/hiring cost
Tooling
cross_cloud_tools:
infrastructure:
- Terraform (multi-cloud)
- Pulumi (multi-cloud)
monitoring:
- Datadog, New Relic (works across clouds)
- Prometheus + Grafana (self-managed)
security:
- HashiCorp Vault (secrets)
- Snyk, Prisma (security scanning)
deployment:
- Kubernetes + Argo CD
- Spinnaker (multi-cloud CD)
Cost Management
cost_complexity:
challenges:
- Different pricing models
- Different discounting mechanisms
- Cross-cloud data transfer costs
- Multiple billing systems
approach:
- Unified cost visibility (Cloudability, Kubecost)
- Tag consistently across clouds
- Monitor cross-cloud data transfer
- Regular cost review
Migration Considerations
Gradual Approach
migration_pattern:
phase_1:
- New workloads on target cloud
- Existing workloads unchanged
phase_2:
- Identify candidates for migration
- Prioritize by value/complexity
phase_3:
- Migrate workloads incrementally
- Maintain both during transition
phase_4:
- Decommission source
- Or maintain multi-cloud intentionally
Exit Planning
Even within multi-cloud, plan for changes:
exit_considerations:
data_export:
- Can you get your data out?
- What format?
- How long does it take?
application_portability:
- What cloud-specific dependencies?
- How hard to re-platform?
contracts:
- Committed spend obligations
- Notice periods
- Data retention requirements
Key Takeaways
- Multi-cloud is often accidental or unnecessary; don’t do it without clear reasons
- Valid reasons: regulatory requirements, acquisitions, specific capabilities, customer needs
- Invalid reasons: vague lock-in fears, theoretical cost savings, resume building
- Kubernetes provides a useful abstraction layer but doesn’t eliminate differences
- Operational complexity multiplies: skills, tools, networking, cost management
- If multi-cloud, abstract where it matters (compute, data), use native where valuable
- Multi-region in one cloud provides most resilience benefits with less complexity
- Start single-cloud, add multi-cloud only when genuinely needed
- Always have an exit plan, regardless of cloud strategy
Multi-cloud is a tool, not a goal. Use it when the business requires it, not because it sounds sophisticated.