Infrastructure as Code (IaC) has moved from nice-to-have to essential. But scaling IaC from a few resources to enterprise-grade infrastructure introduces complexity. How you structure, test, and manage IaC determines whether it helps or creates new problems.
Here are patterns that work at scale.
Structuring IaC
Module Design
module_principles:
single_responsibility:
- One module, one purpose
- Example: VPC module, EKS module, RDS module
- Avoid monolithic "kitchen sink" modules
composability:
- Modules can be combined
- Clear inputs and outputs
- Minimal interdependencies
versioning:
- Semantic versioning
- Pin versions in consumers
- Changelog for breaking changes
# Well-structured module
# modules/eks-cluster/main.tf
resource "aws_eks_cluster" "main" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.arn
version = var.kubernetes_version
vpc_config {
subnet_ids = var.subnet_ids
endpoint_private_access = var.endpoint_private_access
endpoint_public_access = var.endpoint_public_access
}
depends_on = [
aws_iam_role_policy_attachment.cluster_policy,
]
}
# modules/eks-cluster/variables.tf
variable "cluster_name" {
type = string
description = "Name of the EKS cluster"
}
variable "kubernetes_version" {
type = string
default = "1.24"
description = "Kubernetes version"
}
variable "subnet_ids" {
type = list(string)
description = "Subnet IDs for the cluster"
}
# modules/eks-cluster/outputs.tf
output "cluster_endpoint" {
value = aws_eks_cluster.main.endpoint
description = "EKS cluster endpoint"
}
output "cluster_ca_certificate" {
value = aws_eks_cluster.main.certificate_authority[0].data
description = "Cluster CA certificate"
}
Repository Structure
repository_patterns:
monorepo:
structure:
- modules/ # Reusable modules
- environments/ # Environment configs
- production/
- staging/
- development/
- policies/ # Compliance policies
benefits:
- Everything in one place
- Easy cross-cutting changes
- Single CI/CD pipeline
polyrepo:
structure:
- terraform-modules/ # Shared modules repo
- infra-production/ # Production configs
- infra-staging/ # Staging configs
benefits:
- Team ownership
- Independent lifecycles
- Access control
State Management
state_management:
backends:
s3_dynamodb:
state: S3 bucket
locking: DynamoDB table
encryption: SSE-S3 or KMS
terraform_cloud:
state: Terraform Cloud
locking: Built-in
collaboration: Built-in
state_structure:
per_environment:
- production.tfstate
- staging.tfstate
- development.tfstate
per_component:
- production/networking.tfstate
- production/compute.tfstate
- production/database.tfstate
# Backend configuration
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "production/networking.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
Environment Patterns
DRY Environments
dry_environments:
problem:
- Copy-paste between environments
- Drift over time
- Maintenance burden
solution:
- Shared modules
- Environment-specific variables
- Terragrunt or workspaces
# Using Terragrunt for DRY configs
# environments/production/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "../../modules//eks-cluster"
}
inputs = {
cluster_name = "production-cluster"
kubernetes_version = "1.24"
node_instance_type = "m5.xlarge"
node_min_count = 3
node_max_count = 10
}
# environments/staging/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
terraform {
source = "../../modules//eks-cluster"
}
inputs = {
cluster_name = "staging-cluster"
kubernetes_version = "1.24"
node_instance_type = "m5.large" # Smaller
node_min_count = 1 # Fewer nodes
node_max_count = 3
}
Environment Promotion
promotion_pattern:
workflow:
1. Change merged to main
2. Auto-deploy to development
3. Manual promotion to staging
4. Approval and deploy to production
implementation:
- Git branches per environment (anti-pattern)
- Same code, different variables (better)
- Promotion workflow in CI/CD
Testing IaC
Validation Layers
testing_pyramid:
static_analysis:
- Syntax validation (terraform validate)
- Linting (tflint)
- Security scanning (checkov, tfsec)
- Policy checks (OPA, Sentinel)
unit_tests:
- Module logic tests
- Terraform test framework
- Mock providers
integration_tests:
- Deploy to test environment
- Verify resources created
- Destroy after test
end_to_end:
- Full environment deployment
- Application functionality
- Rarely automated
# Example CI pipeline
name: Terraform CI
on:
pull_request:
paths:
- 'terraform/**'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Format Check
run: terraform fmt -check -recursive
- name: Validate
run: |
cd terraform
terraform init -backend=false
terraform validate
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: tfsec
uses: aquasecurity/tfsec-action@v1
- name: checkov
uses: bridgecrewio/checkov-action@v12
plan:
runs-on: ubuntu-latest
needs: [validate, security]
steps:
- uses: actions/checkout@v3
- name: Terraform Plan
run: |
terraform init
terraform plan -out=plan.out
- name: Upload Plan
uses: actions/upload-artifact@v3
with:
name: terraform-plan
path: plan.out
Policy as Code
# OPA policy: Require encryption
package terraform.aws.s3
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
not resource.server_side_encryption_configuration
msg := sprintf("S3 bucket '%s' must have encryption enabled", [name])
}
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
resource.acl == "public-read"
msg := sprintf("S3 bucket '%s' must not be public", [name])
}
Handling Drift
Drift Detection
drift_management:
detection:
- Scheduled terraform plan
- Compare plan output to expected
- Alert on unexpected changes
causes:
- Manual changes (console, CLI)
- External automation
- Resource auto-scaling
- Provider updates
prevention:
- Lock down console access
- All changes through IaC
- Education and culture
# GitHub Action for drift detection
name: Drift Detection
on:
schedule:
- cron: '0 8 * * *' # Daily at 8 AM
jobs:
detect-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Terraform Plan
id: plan
run: |
terraform init
terraform plan -detailed-exitcode -out=plan.out
continue-on-error: true
- name: Check for Drift
if: steps.plan.outputs.exitcode == 2
run: |
echo "Drift detected!"
terraform show plan.out
# Send Slack notification
Secrets Management
secrets_patterns:
avoid:
- Secrets in terraform files
- Secrets in state file
- Hardcoded values
approaches:
external_secrets:
- Reference from Vault/AWS Secrets Manager
- Data source lookup
- Injected at runtime
sensitive_variables:
- Mark as sensitive
- Pass via environment
- CI/CD secrets management
# Reference secrets from AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "production/database/password"
}
resource "aws_db_instance" "main" {
identifier = "production-db"
engine = "postgres"
instance_class = "db.r5.large"
password = data.aws_secretsmanager_secret_version.db_password.secret_string
lifecycle {
ignore_changes = [password] # Managed externally
}
}
Key Takeaways
- Structure modules with single responsibility and composability
- Version modules semantically, pin versions in consumers
- Manage state per environment or component, never commit state
- Use DRY patterns (Terragrunt, shared modules) across environments
- Test at multiple layers: static analysis, unit, integration
- Implement policy as code for security and compliance
- Detect and prevent drift with scheduled plans
- Never store secrets in IaC files; use external secret managers
- CI/CD should validate, plan, and apply with appropriate approvals
- IaC is code—apply software engineering practices
Infrastructure as Code enables scale. Patterns enable maintainability.