Infrastructure as Code Patterns and Practices

October 29, 2018

Infrastructure as Code (IaC) has become essential. Managing infrastructure manually doesn’t scale, isn’t reproducible, and is error-prone. But IaC poorly done creates its own problems: sprawling codebases, drift, and deployment nightmares.

Here’s how to do IaC well.

Foundational Principles

Declarative Over Imperative

Describe what you want, not how to get there:

# Declarative (Terraform)
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name = "web-server"
  }
}
# Imperative (scripts) - avoid
aws ec2 run-instances --image-id ami-0c55b159... --instance-type t3.micro

Declarative IaC:

Immutable Infrastructure

Replace, don’t modify:

# Instance replacement on AMI change
resource "aws_launch_template" "web" {
  image_id = var.ami_id
  # Changes trigger replacement, not in-place update
}

resource "aws_autoscaling_group" "web" {
  launch_template {
    id      = aws_launch_template.web.id
    version = aws_launch_template.web.latest_version
  }
  # Rolling replacement on template change
}

Benefits:

Version Control Everything

All infrastructure code lives in Git:

infrastructure/
├── terraform/
│   ├── modules/
│   ├── environments/
│   └── global/
├── kubernetes/
│   ├── base/
│   └── overlays/
└── scripts/

Benefits:

Repository Organization

Monorepo vs. Multi-Repo

Monorepo:

infrastructure/
├── modules/
│   ├── vpc/
│   ├── eks/
│   └── rds/
└── environments/
    ├── dev/
    ├── staging/
    └── production/

Pros: Easier refactoring, atomic changes across modules Cons: Larger blast radius, complex CI/CD

Multi-Repo:

terraform-vpc/
terraform-eks/
terraform-rds/
terraform-env-dev/
terraform-env-prod/

Pros: Independent deployments, smaller blast radius Cons: Harder to coordinate changes, version management

Recommendation: Start with monorepo, split when pain exceeds benefit.

State Organization

Split state to limit blast radius:

terraform/
├── global/           # Account-wide resources (IAM, Route53)
│   └── main.tf
├── network/          # VPC, subnets
│   └── main.tf
├── data/             # Databases, caches
│   └── main.tf
└── services/
    ├── api/          # Per-service state
    └── web/

Each directory = separate state file:

Module Structure

modules/
├── vpc/
│   ├── main.tf       # Resources
│   ├── variables.tf  # Input variables
│   ├── outputs.tf    # Output values
│   ├── versions.tf   # Provider versions
│   └── README.md     # Documentation

Standard structure makes modules predictable.

Module Design

Single Responsibility

Modules do one thing:

# Good - focused module
module "vpc" {
  source = "./modules/vpc"
  cidr   = "10.0.0.0/16"
  azs    = ["us-east-1a", "us-east-1b"]
}

module "eks" {
  source     = "./modules/eks"
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnet_ids
}
# Bad - kitchen sink module
module "infrastructure" {
  source = "./modules/everything"
  # Creates VPC, EKS, RDS, Redis, S3...
}

Sensible Defaults

Provide good defaults, allow override:

variable "instance_type" {
  type        = string
  default     = "t3.micro"
  description = "EC2 instance type"
}

variable "enable_monitoring" {
  type        = bool
  default     = true
  description = "Enable detailed monitoring"
}

Most uses need no customization. Power users can override.

Composition Over Inheritance

Build complex infrastructure from simple modules:

# Root module composes simple modules
module "vpc" {
  source = "./modules/vpc"
}

module "security_groups" {
  source = "./modules/security-groups"
  vpc_id = module.vpc.vpc_id
}

module "rds" {
  source            = "./modules/rds"
  subnet_ids        = module.vpc.database_subnet_ids
  security_group_id = module.security_groups.rds_sg_id
}

Version Your Modules

# Pin module versions
module "vpc" {
  source  = "git::https://github.com/org/terraform-modules.git//vpc?ref=v1.2.0"
}

# Or with registry
module "vpc" {
  source  = "hashicorp/vpc/aws"
  version = "3.0.0"
}

Unpinned modules break unexpectedly.

Environment Management

Directory Per Environment

environments/
├── dev/
│   ├── main.tf
│   └── terraform.tfvars
├── staging/
│   ├── main.tf
│   └── terraform.tfvars
└── production/
    ├── main.tf
    └── terraform.tfvars

Each environment has its own state.

Workspaces for Simple Cases

terraform workspace new staging
terraform workspace new production

terraform workspace select staging
terraform apply

Works for simple cases but can get confusing.

Environment-Specific Configuration

# terraform.tfvars - environment-specific
environment    = "production"
instance_count = 3
instance_type  = "t3.large"
# main.tf - common infrastructure
module "app" {
  source         = "../modules/app"
  environment    = var.environment
  instance_count = var.instance_count
  instance_type  = var.instance_type
}

Promoting Between Environments

# Dev → Staging → Production
# Same code, different variables

cd environments/dev && terraform apply
# Test...

cd environments/staging && terraform apply
# Test...

cd environments/production && terraform apply

CI/CD for Infrastructure

Plan on PR

# .github/workflows/terraform.yml
on:
  pull_request:
    paths:
      - 'terraform/**'

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Terraform Init
        run: terraform init

      - name: Terraform Plan
        run: terraform plan -no-color
        continue-on-error: true

      - name: Comment Plan
        uses: actions/github-script@v6
        with:
          script: |
            // Post plan output as PR comment

Every PR shows infrastructure changes.

Apply on Merge

on:
  push:
    branches: [main]

jobs:
  apply:
    runs-on: ubuntu-latest
    environment: production  # Requires approval
    steps:
      - name: Terraform Apply
        run: terraform apply -auto-approve

Drift Detection

Schedule regular plans to detect drift:

on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours

jobs:
  drift:
    runs-on: ubuntu-latest
    steps:
      - name: Check for drift
        run: |
          terraform plan -detailed-exitcode
          if [ $? -eq 2 ]; then
            echo "Drift detected!"
            # Alert
          fi

Security Practices

Secrets Management

Never commit secrets:

# Bad
resource "aws_db_instance" "main" {
  password = "hardcoded-password"  # Never!
}

# Good - use variables
resource "aws_db_instance" "main" {
  password = var.db_password
}

# Better - use secrets manager
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "db-password"
}

resource "aws_db_instance" "main" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}

State Security

State files contain sensitive data:

# Remote state with encryption
terraform {
  backend "s3" {
    bucket         = "terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

Least Privilege

IaC execution should have minimal permissions:

# CI/CD role with limited permissions
resource "aws_iam_role" "terraform" {
  name = "terraform-ci"
  # Only permissions needed for managed resources
}

Key Takeaways

Infrastructure as Code is essential but requires discipline. Invest in good practices early—they’re much harder to adopt later.