Terraform Without Tears: Practical Patterns We Actually Use
Keep infra readable, reviewable, and a little less terrifying.
Start With A Thin, Boring Terraform Layer
If we had to pick one guiding principle for terraform, it’d be this: keep the root module thin and boring. The root is where we wire things together (providers, remote state, a handful of module calls), not where we build a mini framework. Every time we let “just one more resource” sneak into the root, it grows barnacles—conditional logic, hand-rolled naming, copy-pasted tags, and eventually that one weird resource nobody dares touch on a Friday.
We like to split responsibilities early: a root module per environment (or per account/subscription) and reusable modules for building blocks. The root module should read like a table of contents: networking, identity, compute, data stores—each a module call. That keeps reviews easy: “Oh, we’re changing the VPC module version and adding a subnet.” Not: “We’re adjusting 40 lines of spaghetti and hoping it converges.”
We also keep module inputs boring. Simple types, few required values, sensible defaults. The more “smart” a module tries to be, the more it surprises you later. If a module needs ten booleans to control behaviour, it’s probably trying to be two or three modules wearing a trench coat.
A good check: if a new teammate can’t find where a thing is declared in under two minutes, we’ve overcomplicated it. Terraform is excellent at many things, but “mystery novel” isn’t one of them.
Useful references we keep bookmarked: Terraform Language docs and Module composition guidance.
Remote State: Make It Safe, Then Forget It Exists
Remote state is one of those terraform topics where everyone nods wisely, then someone runs terraform apply from a laptop on café Wi‑Fi and we all learn something new. Our goal is simple: state should be stored centrally, locked during changes, encrypted at rest, and access-controlled. Once that’s true, we want to stop thinking about it.
We usually standardise on a backend per org/platform. On AWS that’s commonly S3 + DynamoDB locking; on Azure it’s an Azure Storage Account with blob leases; on GCP it’s a GCS bucket (locking varies by tooling conventions). The implementation details differ, but the outcome should be consistent: one state per stack, predictable naming, and no ad-hoc local state files drifting around.
Here’s a straightforward AWS example we’ve used (trim to taste):
terraform {
backend "s3" {
bucket = "acme-terraform-state"
key = "prod/network/terraform.tfstate"
region = "eu-west-1"
dynamodb_table = "acme-terraform-locks"
encrypt = true
}
required_version = ">= 1.5.0"
}
We’ll also say the quiet part out loud: state contains secrets more often than we’d like. Even when we use secret stores, resource attributes can still leak sensitive values into state. So we treat state like production data. Least privilege access, auditing, and lifecycle policies.
If you’re still on the fence, HashiCorp’s own docs on Backends and State are worth a read. Also: don’t put the backend config behind lots of indirection. We want it obvious and consistent, not clever.
Modules That Don’t Hate Their Future Maintainers
Our favourite terraform modules have three qualities: clear intent, stable interfaces, and minimal side effects. They’re also unapologetically boring. That’s a compliment.
We start module design by writing the README first: what does this module create, what does it not create, and what are the non-goals? Non-goals matter because modules tend to “grow” until they become unreviewable. If the README says “this module only creates a VPC and subnets,” then someone adding an EKS cluster to it is clearly committing a crime.
Next, we standardise inputs and outputs. Inputs should be typed, documented, and validated. Outputs should be the small set of values other stacks genuinely need. If everything is output “just in case,” you’ll end up with tight coupling and downstream breakage.
A quick pattern we use often is consistent naming/tagging via locals, while keeping the rest explicit:
variable "env" {
type = string
description = "Environment name, e.g. dev/stage/prod"
}
variable "app" {
type = string
description = "Application or service identifier"
}
locals {
name_prefix = "${var.app}-${var.env}"
tags = {
app = var.app
env = var.env
}
}
output "name_prefix" {
value = local.name_prefix
}
We also version modules like real software. Pin them. Change them deliberately. If you source from Git, use tags/SHAs, not “main”. And don’t be afraid to publish an internal module registry if your org is big enough—HashiCorp has guidance on Module registries.
Most importantly: modules should be easy to delete. The day we can’t remove a module without archaeology is the day we know we’ve built a dependency trap.
Plan Reviews: Treat Terraform Like Code, Not Magic
Terraform shines when we treat it like code: peer-reviewed, tested, and merged through a pipeline. It gets… exciting… when we treat it like a wizard’s spell book. We’ve learned (often the hard way) that “I ran apply and it seemed fine” is not a change management process.
Our baseline workflow: developers open a PR, the CI pipeline runs terraform fmt, terraform validate, and a terraform plan against the target workspace/environment. The plan output is posted back to the PR for humans to review. Only after approval do we allow an apply—ideally automated, and ideally only from CI.
We also try to make plans readable. That means stable addresses (don’t unnecessarily rename resources), minimal use of count when for_each provides clearer identity, and not forcing replacements unless it’s truly required. When a plan shows “forces replacement” on a production database, we want it to be a very intentional moment, not a surprise.
One helpful habit: run terraform plan with -out in CI and store the plan file as an artifact, then apply that exact plan after approval. It reduces “plan drift” where the world changes between plan and apply. Yes, reality still moves, but we’ve cut down on the “wait, why is it different now?” conversations.
If you need a good checklist for review discipline, Terraform CLI workflow docs are solid. And for PR integration, there’s no shortage of options—from simple scripts to full systems like Atlantis. We’re not religious about tools; we’re religious about making changes visible.
Drift, Imports, And The Awkward Teen Years Of Infra
At some point, every team inherits cloud resources that terraform didn’t create. Or terraform created them once, and then someone changed them in the console “just for a second” three months ago. Welcome to the awkward teen years of infra: it looks grown up, but it’s making questionable choices.
We deal with this in two tracks: drift detection and adoption. Drift detection is simply running regular plans (read-only) and alerting when the plan shows changes that weren’t merged. Some teams run nightly plans per stack. Others do it on every merge. The important part is noticing drift before it becomes the new normal.
Adoption is trickier. Importing existing resources into terraform state can be painstaking, but it’s usually worth it. Today, terraform import plus careful configuration is still common, but newer terraform versions also support configuration-driven import blocks (which can make the process less error-prone, depending on provider maturity). Either way, we recommend small bites: import one resource (or one small set), plan, verify, commit, repeat. Don’t try to import the universe before lunch.
Also: don’t blindly “fix” drift by applying if you don’t understand the source. Sometimes drift is intentional (someone patched a security rule during an incident). Terraform shouldn’t become the tool that automatically undoes incident response.
When we must tolerate manual changes, we document it and use lifecycle carefully. Which brings us to the next point: lifecycle rules are powerful, and power tools require all your fingers to remain attached.
Lifecycle, Dependencies, And Other Sharp Edges
Terraform’s lifecycle meta-arguments can save the day—or quietly create future chaos. We use them sparingly, and we write down why they exist. If we see ignore_changes without a comment, we assume it’s hiding a problem.
A classic example is ignoring changes to a field that’s managed by an autoscaler or an external system. That can be legitimate, but we want it localised and obvious:
resource "aws_autoscaling_group" "app" {
name = "${local.name_prefix}-asg"
desired_capacity = 3
lifecycle {
ignore_changes = [desired_capacity]
}
}
This says: terraform defines the group, but an autoscaler owns desired capacity. Without this, every apply becomes a tug-of-war.
Dependencies are another sharp edge. Terraform usually infers them from references, which is great—until you’re dealing with side effects, eventual consistency, or resources that don’t reference each other directly. That’s when depends_on becomes useful, but again, we use it as a last resort. If we’re leaning on depends_on all over the place, the module boundaries might be wrong or the provider behaviour might need workarounds.
We also avoid overusing create_before_destroy unless we truly need it. It can prevent downtime, but it can also create surprise costs or hit quotas. It’s a tool, not a default.
The TL;DR: terraform is deterministic, but cloud APIs aren’t always. We try to encode intent clearly, add guardrails, and leave breadcrumbs for whoever debugs it later (often us, with coffee).
Secrets And Sensitive Data: Don’t Let State Snitch
We can’t talk about terraform in production without talking about secrets. Terraform is not a secret manager. It can integrate with secret managers, but it will still happily store values in state if resource attributes contain them. So our approach is: minimise secret material flowing through terraform, and assume state must be protected like it contains secrets (because it might).
Practical things we do:
- Prefer referencing secrets from systems like AWS Secrets Manager, SSM Parameter Store, Azure Key Vault, or Vault, rather than passing raw secret values in variables.
- Mark variables as
sensitive = trueto reduce accidental exposure in CLI output (though it doesn’t magically remove data from state). - Avoid outputs that leak credentials. If an output is sensitive, mark it sensitive—or better, don’t output it.
- Use IAM/service principals with least privilege for terraform runs. The pipeline doesn’t need god-mode.
We also keep tfvars handling strict. No committing secrets to git, no emailing tfvars around, no “temporary” copies living on desktops. If we need per-environment values, we store them in a secure system and inject them at runtime.
A helpful reminder from the official docs: Sensitive data in Terraform. It’s not glamorous reading, but it’s cheaper than a security incident post-mortem.
In short: let terraform build the infrastructure, and let a dedicated system manage the secrets that run on top of it. Terraform’s great, but we shouldn’t ask it to be our password diary.


