Why I Moved Our Infrastructure to Terraform

June 20, 2016

For years, our infrastructure existed in a state I’ll charitably call “organic.” Servers were provisioned through the AWS console. Configuration lived in a mix of shell scripts, wiki pages, and institutional memory. When someone asked how our production environment was configured, the honest answer was “check the console and hope the wiki is current.”

This worked—until it didn’t. A production incident required rebuilding infrastructure quickly, and we discovered our documentation was dangerously incomplete. What should have been a straightforward recovery became hours of archaeology, reconstructing configuration by examining running systems.

That incident convinced me: our infrastructure needed to be code. After evaluating options, we chose Terraform. Six months later, every piece of our infrastructure is defined in version-controlled configuration files. Here’s what we learned.

The Case for Infrastructure as Code

Infrastructure as code (IaC) treats infrastructure configuration the same way we treat application code: version-controlled, reviewed, tested, and automated.

The benefits are substantial:

Reproducibility. Infrastructure defined in code can be recreated exactly. Disaster recovery becomes running a command, not following a checklist. New environments mirror production precisely.

Visibility. Code review for infrastructure changes. Git history for audit trails. Diff commands for understanding what changed and when.

Consistency. The same configuration deploys everywhere. No manual deviations, no forgotten settings, no configuration drift.

Safety. Changes can be previewed before applying. Automated testing catches errors. Rollback means reverting to a previous commit.

Why Terraform

We evaluated several tools: CloudFormation, Ansible, Chef, and Terraform. Each has strengths, but Terraform aligned best with our needs.

Provider Agnostic

Terraform supports multiple cloud providers through its provider model. We’re primarily AWS, but we use some GCP services and manage DNS through Cloudflare. Terraform handles all three with consistent syntax.

CloudFormation is AWS-only. If we ever need multi-cloud (or even multi-service), Terraform is ready.

Declarative Model

Terraform configuration describes desired state. You declare what should exist; Terraform figures out how to achieve it. This is easier to reason about than imperative scripts that describe a sequence of actions.

resource "aws_instance" "web" {
  ami           = "ami-abc123"
  instance_type = "t2.micro"

  tags = {
    Name = "web-server"
  }
}

This declares an EC2 instance should exist with these properties. Terraform handles creation, updates, and deletion.

State Management

Terraform maintains state: a record of what it has created and the mapping between configuration and real resources. This enables:

State management has sharp edges (more on this later), but the capability is essential for managing real infrastructure.

Plan Before Apply

Terraform’s plan command shows exactly what changes will occur:

$ terraform plan
+ aws_instance.web
    ami: "ami-abc123"
    instance_type: "t2.micro"
    ...

Plan: 1 to add, 0 to change, 0 to destroy.

This preview eliminates surprises. You see what will be created, modified, or destroyed before it happens. For infrastructure changes that could cause downtime, this preview is invaluable.

HCL Language

Terraform’s HashiCorp Configuration Language (HCL) is designed for infrastructure definition. It’s more readable than JSON or YAML, supports variables and modules, and has just enough programming capability without becoming a full programming language.

variable "environment" {
  description = "Environment name"
  default     = "production"
}

resource "aws_instance" "web" {
  count = var.environment == "production" ? 3 : 1

  ami           = var.ami_id
  instance_type = var.instance_type

  tags = {
    Name        = "web-${var.environment}-${count.index}"
    Environment = var.environment
  }
}

The Migration

Migrating existing infrastructure to Terraform requires care. You can’t just delete everything and recreate it.

Import Existing Resources

Terraform’s import command associates existing resources with Terraform configuration:

$ terraform import aws_instance.web i-1234567890abcdef0

This tells Terraform that the EC2 instance i-1234567890abcdef0 corresponds to the aws_instance.web configuration. Terraform then manages it going forward.

Importing is tedious—each resource requires a separate command—but it’s safer than recreating infrastructure.

Start with New Resources

We took a hybrid approach. New infrastructure was Terraform from the start. Existing infrastructure was imported gradually, prioritizing:

  1. Resources we modified frequently
  2. Resources that were complex or poorly documented
  3. Resources critical to production stability

Low-change, well-understood resources were imported last.

Module Everything

Terraform modules encapsulate related resources:

module "vpc" {
  source = "./modules/vpc"

  cidr_block = "10.0.0.0/16"
  environment = var.environment
}

module "web_cluster" {
  source = "./modules/web-cluster"

  vpc_id = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnet_ids
  instance_count = 3
}

Modules provide reusability and abstraction. Our VPC module creates a consistent network topology across environments. Our web cluster module creates auto-scaling groups with consistent configuration.

Build modules for your common patterns. The initial investment pays off in consistency and reduced duplication.

State Management Realities

Terraform state is powerful and dangerous. Understanding state is essential for Terraform success.

Remote State

By default, Terraform stores state in a local file. This is fine for learning but terrible for teams. Local state can’t be shared, doesn’t support locking, and gets lost with your laptop.

Use remote state backends. S3 with DynamoDB locking is the common AWS choice:

terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Remote state enables team collaboration and provides durability.

State Locking

Concurrent Terraform runs can corrupt state. State locking ensures only one operation runs at a time. DynamoDB provides locking for S3 backends; other backends have their own locking mechanisms.

Always enable locking in team environments.

State Secrets

Terraform state contains sensitive information: database passwords, API keys, and other secrets that appear in configuration. State files should be encrypted at rest and access-controlled strictly.

Never commit state files to version control. Never share state files casually. Treat state with the same sensitivity as production credentials.

State Manipulation

Sometimes you need to modify state directly: renaming resources, moving resources between modules, or removing resources Terraform shouldn’t manage anymore.

terraform state commands provide this capability:

$ terraform state mv aws_instance.old aws_instance.new
$ terraform state rm aws_instance.manual

State manipulation is risky. Always backup state before manipulation. Prefer configuration refactoring over state manipulation when possible.

Lessons Learned

Start with Conventions

Establish conventions before writing much Terraform:

Conventions are easier to establish early than to retrofit later.

One Environment at a Time

Don’t try to Terraform everything simultaneously. Start with one environment (staging is less risky than production), learn the patterns, then expand.

Our path: development → staging → production. By the time we reached production, we’d made our mistakes in lower environments.

Embrace Modules Early

Resist the temptation to define everything in one big file. Create modules for:

Modules add overhead but pay back in maintainability and reusability.

Plan Review Is Essential

Every Terraform change should be reviewed:

  1. Author runs terraform plan
  2. Plan output is included in code review
  3. Reviewers verify plan matches expectations
  4. Only after approval does terraform apply run

Plan review catches mistakes before they affect infrastructure. We’ve caught numerous issues—wrong regions, missing dependencies, unintended deletions—through plan review.

Automate Application

Manual terraform apply from laptops doesn’t scale. Implement CI/CD for Terraform:

Automation ensures consistency and provides audit trails.

Where We Are Now

Six months in, all our infrastructure is Terraform-managed. We can:

The investment was significant—weeks of migration effort and ongoing learning. But the payoff is substantial: infrastructure that’s visible, reproducible, and safe to change.

If you’re still managing infrastructure through consoles and scripts, the transition is worth it. Your future self, recovering from a disaster at 2 AM, will thank you.

Key Takeaways