Terraform Without Tears: Our Practical Team Playbook

terraform

Terraform Without Tears: Our Practical Team Playbook

How we keep terraform boring, predictable, and reviewable.

Why We Treat terraform Like Shared Infrastructure, Not Magic

We’ve all seen it: one heroic engineer writes a handful of terraform files, applies on a Friday afternoon, and suddenly the entire team is living in a suspense thriller. The fix isn’t more heroics—it’s treating terraform like what it really is: shared, long-lived infrastructure code that needs the same care as application code.

Our baseline mindset is simple: terraform should be predictable. That means changes are small, plans are reviewed, state is protected, and the blast radius is obvious before we type apply. We also aim for boring repeatability: if two people run the same workflow, they should get the same result. When terraform is “surprising,” it’s almost always because of weak structure (everything in one place), weak process (no plan review), or weak guardrails (state stored who-knows-where).

We also try to keep the learning curve kind. terraform has enough sharp edges without us adding home-grown weirdness. So we standardise: consistent module patterns, consistent naming, consistent backend config, and a consistent pipeline. That consistency is what lets new folks ship infra changes without needing to decode tribal lore.

Finally, we accept that terraform is not a general-purpose orchestration tool. When we force it to do runbooks, data migrations, or app deploys, it gets cranky—and so do we. We let terraform handle provisioning and configuration that’s truly infrastructure, and we hand off the rest to tools better suited for it.

If you want official grounding, HashiCorp’s own docs are worth bookmarking: Terraform Documentation. We’ll build on those basics with the patterns that keep teams sane.

Repository Layout That Doesn’t Turn Into a Junk Drawer

A terraform repo can either feel like a tidy workshop or like the kitchen drawer where batteries, chopsticks, and mystery keys go to retire. Our goal is clarity: where do modules live, where do environment stacks live, and where do shared bits go?

A layout we’ve had consistent success with is:

  • modules/ for reusable components (VPC, S3 bucket, IAM role sets, etc.)
  • envs/ (or stacks/) for live compositions per environment/region
  • globals/ only if you truly have account-wide resources (and even then, keep it minimal)

Why separate modules from envs? Because modules are “products” with inputs/outputs and versions, while env stacks are “deployments” with provider config, backend config, and wiring. Mixing them creates a mess where every change feels risky because it’s hard to tell what’s reusable versus what’s one-off.

We also keep one environment per directory to avoid accidental cross-environment changes. That makes terraform plan and state boundaries match real-world boundaries. If prod is a separate directory with a separate state, “oops” moments become rarer.

Naming matters too. We don’t get poetic with resource names. We encode purpose and scope (service, env, region) and we make it easy to search. It’s not glamorous, but it saves hours.

One more rule: avoid “utility” modules that do 30 things. If a module has too many toggles and count tricks, it’s probably hiding multiple responsibilities. Smaller modules are easier to test, version, and reason about. For module guidance, we often refer folks to HashiCorp’s write-up on Module Development.

Remote State, Locking, and the “Please Don’t Brick Prod” Kit

Local state is fine for learning and terrible for teams. The moment more than one person touches terraform, remote state and locking stop being optional. Our rule: no remote backend, no merge. It’s that simple.

Remote state gives us three things:
1. A single source of truth for what’s deployed
2. Locking to prevent concurrent applies
3. A place to store state securely with versioning

On AWS, we commonly use S3 + DynamoDB for locking (or Terraform Cloud if we want managed workflows). Here’s a minimal, sane backend example:

terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "network/prod/terraform.tfstate"
    region         = "eu-west-1"
    dynamodb_table = "terraform-state-locks"
    encrypt        = true
  }
}

A couple of non-negotiables we stick to:
– Turn on bucket versioning and block public access.
– Restrict who can read state. State can contain secrets (or secret-adjacent data).
– Keep keys unique per stack. If two stacks write to the same key, the universe will punish you.

We also make state moves intentional. When refactoring, we plan moved blocks or terraform state mv carefully, and we do it in small steps. Terraform’s own notes on state are worth a read: State and Backends.

Locking is your seatbelt. It’s mildly annoying when you’re in a hurry, and priceless when two people are “just applying a tiny change.” We prefer mild annoyance.

Modules We Can Reuse Without Summoning Future Regret

Reusable terraform modules are great until they become a dumping ground for every edge case anyone ever had. Our module philosophy is: opinionated defaults, minimal knobs, and crisp outputs.

A module should answer three questions cleanly:
– What does it create?
– What inputs does it require?
– What does it output for others to use?

Here’s a small module skeleton that shows how we keep things structured:

# modules/s3_bucket/main.tf
resource "aws_s3_bucket" "this" {
  bucket = var.name
}

resource "aws_s3_bucket_versioning" "this" {
  bucket = aws_s3_bucket.this.id
  versioning_configuration {
    status = var.versioning ? "Enabled" : "Suspended"
  }
}

output "bucket_name" {
  value = aws_s3_bucket.this.bucket
}
# modules/s3_bucket/variables.tf
variable "name" {
  type        = string
  description = "Bucket name"
}

variable "versioning" {
  type        = bool
  description = "Enable versioning"
  default     = true
}

The key is resisting the urge to add 25 optional flags “just in case.” If a new requirement is truly common, we add it. If it’s niche, we either fork a specialised module or handle it at the stack level.

Versioning modules helps too. Even if you’re not publishing to the Terraform Registry, using git tags and pinning module refs avoids surprise changes. If you are publishing internally or publicly, follow the Terraform Registry module guidelines.

Finally, we don’t let modules hide security choices. Encryption, logging, least-privilege IAM—these defaults belong in modules, not as optional afterthoughts.

CI That Plans First and Makes Applies Boring

The best terraform workflow is one where apply is the least dramatic part of your week. We get there by putting “plan and review” at the centre of the pipeline.

Our baseline pipeline steps per stack:
1. terraform fmt -check
2. terraform init (with the right backend)
3. terraform validate
4. terraform plan and store the plan output
5. Require approval for production
6. terraform apply using the saved plan

Saving the plan is a small detail with big payoff: you apply exactly what was reviewed, not whatever happens to exist at apply time.

We also like policy checks before merging. Sometimes that’s as simple as “no public S3 buckets” or “no security groups with 0.0.0.0/0 on SSH.” Tools like Checkov can help catch common misconfigurations early, without us having to invent a whole internal compliance bureaucracy.

We also isolate credentials per environment. CI assumes a role with scoped permissions for the specific stack. If the pipeline for dev can mutate prod, we’ve basically built a trap door.

And yes, we make it easy to run the same checks locally. If developers can’t reproduce CI results, they’ll either ignore CI or play “guess the linter rules.” Neither is fun.

The win here is psychological as much as technical: when everyone trusts the plan, terraform becomes routine.

Guardrails: Variables, Naming, and “No Pets” Infrastructure

terraform gives you freedom. Teams need guardrails. Not the “you must file a ticket to breathe” kind—more like the “bowling bumpers” kind that keep us out of the gutter.

We standardise inputs with a small set of required variables across stacks: environment, region, service_name, sometimes owner or cost_center. These feed into naming so resources are traceable and consistent.

We also avoid hardcoding values inside modules unless they’re truly universal defaults. Instead, we keep environment-specific values in env directories and supply them via .tfvars or pipeline variables. That gives us a clean separation: modules define what, stacks define where and for whom.

A big guardrail is resisting “pet” infrastructure. If a server needs manual tweaks to survive, it’s a pet. Pets don’t scale and they don’t recover well. We try to encode everything in terraform (or in configuration management) so rebuilds are routine.

Another guardrail: limit -target. It can be useful in emergencies, but it’s also a great way to create partial state and weird dependencies. We treat it like a fire extinguisher: break glass, document what happened, and follow up with a proper plan.

We also keep provider versions pinned to avoid “it worked yesterday” mysteries. Provider releases can change behaviour, and we’d rather upgrade intentionally than by accident. The Terraform provider version constraints doc is short and worth it.

Refactors, Imports, and Deleting Things Without Panic

Real terraform life isn’t greenfield. It’s refactors, imports, and that awkward moment when you realise a resource was created by hand in 2019 and nobody remembers why. Our approach is to be methodical and keep changes small.

For refactors, we love Terraform’s moved blocks (when available and appropriate) because they encode state moves in code, making them reviewable. For older setups, we’ll use terraform state mv, but we document it and usually pair on it. State operations are not a solo sport.

For importing existing resources, we avoid “import everything into one massive PR.” Instead:
– import one logical component,
– run a plan,
– reconcile drift,
– then move to the next.

We also accept that not everything belongs in terraform. If a managed service is better controlled elsewhere, we don’t force it. But if we do manage it with terraform, we commit to managing it consistently, including deletion policies and lifecycle rules.

Deletion is where people get nervous, so we make it explicit. We’ll use prevent_destroy sparingly for truly critical resources, but we don’t blanket-apply it everywhere because it can block legitimate change. A better habit is reviewing the plan carefully and making sure the pipeline requires approval for destructive operations in production.

And when we do need to delete, we do it in daylight hours. Yes, that’s a technical guideline. We’re a DevOps blog; we’re allowed.

Share