Gitops That Works: From Commits To Clusters
How we ship safer changes with less drama and fewer midnight pings.
What We Actually Mean By gitops (No Mysticism)
When we say gitops, we mean a simple deal: Git is the source of truth for what should be running, and automation makes reality match what’s in the repo. Not “Git as a suggestion.” Not “Kubernetes as an art project.” Just a clear contract between desired state (Git) and actual state (cluster). If they drift apart, something reconciles them back.
The best part is that this isn’t a new religion; it’s just good hygiene with better tooling. We get auditability (who changed what), repeatability (same inputs, same outputs), and a rollback story that doesn’t start with “Okay, who remembers what we did last Tuesday?” It also tends to reduce the number of bespoke shell scripts living under someone’s home directory, which is always a win.
gitops shines when we’re managing Kubernetes, but the mindset works anywhere you can declare state. Most teams land on a pattern: developers open pull requests, CI checks the change, and a GitOps controller (like Argo CD or Flux) applies it. The controller continuously watches Git, so “deploy” becomes “merge.”
If you want the official-ish framing, the CNCF GitOps Working Group does a solid job defining the principles without the marketing fog. And yes, once you’ve lived with reconciliation, it’s hard to go back to clicking around in dashboards like it’s 2013.
The Core Loop: PRs In, Reconciliation Out
Let’s map the loop we aim for:
- A change is proposed via a pull request: new image tag, config tweak, Helm value, a whole new service.
- CI validates it: linting, policy checks, unit tests, maybe a “render manifests” step to catch obvious breakage.
- Merge equals intent: merging to
main(or an environment branch) is the explicit approval that this should run. - Controller reconciles: Argo CD/Flux notices the change and applies it to the cluster.
- Drift is corrected: if someone hot-fixes the cluster, the controller either reverts it or flags it.
This is where teams feel the difference. Traditional pipelines often treat deployment like a one-time push: “we ran the job, therefore it’s deployed.” gitops treats deployment like a continuous truth-maintenance problem: “if it’s not matching Git, it’s wrong (or at least suspicious).”
We also get cleaner separation of duties. CI can build and test artifacts; the controller deploys them. That’s great for security reviews because cluster credentials don’t need to sit in CI systems where they’re one misconfigured secret away from becoming an incident.
If you’re comparing tools, Argo CD and Flux are the common starting points. Both support Helm and Kustomize. Both can be operated sensibly. The key is choosing one and committing to the workflow—not running both because we love collecting controllers like Pokémon.
Repo Layouts That Don’t Make Everyone Grumpy
Repo structure is where a lot of gitops efforts quietly succeed or die. If the repo is confusing, people stop trusting it, and then they start “just doing a quick kubectl change,” which is how drift becomes a lifestyle.
Two common patterns work well:
- Monorepo: apps + environment configs in one place. Easier cross-service changes, one PR can update everything. Needs discipline around ownership.
- Multirepo: each app has its own repo, and a separate “env repo” composes releases. Cleaner boundaries, more moving parts.
A practical, low-drama layout we’ve used in an environment repo:
environments/
dev/
kustomization.yaml
apps/
payments.yaml
catalog.yaml
staging/
kustomization.yaml
apps/
payments.yaml
prod/
kustomization.yaml
apps/
payments.yaml
clusters/
us-east-1/
argo-apps/
root-app.yaml
policies/
kyverno/
opa/
The idea: each environment composes “what runs here” via Kustomize or Helm values. Then we have a “root” Argo CD application (or Flux Kustomization) per cluster that points at environments/<env>.
If we’re doing Helm, we keep values-dev.yaml, values-prod.yaml near the environment. If we’re doing Kustomize, we keep patches near the overlays. Either way, we keep environment intent separate from app source code. That separation makes promotions (dev → staging → prod) easier to reason about and audit.
And yes, naming matters. If we call something final-final-prod2.yaml, we deserve what happens next.
A Minimal Argo CD Setup (That We Can Explain)
Argo CD is a popular pick because it’s easy to visualize and has a strong app model. Here’s a minimal “App of Apps” style root application that points at a folder in Git:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-prod
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/acme/env-configs.git
targetRevision: main
path: environments/prod
destination:
server: https://kubernetes.default.svc
namespace: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
A few notes we always explain to newcomers:
automated.prune: truemeans Argo will delete resources removed from Git. This is good, but only if we trust the repo and have guardrails.selfHeal: truemeans it will revert manual changes (drift). Also good—assuming we’ve agreed Git is boss.targetRevisioncan be a branch, tag, or commit. For prod, we sometimes pin to tags for a more explicit release trail.
From here, the environments/prod directory usually contains more Argo Application objects, Helm releases, or Kustomize overlays.
If you want deeper reading on how Argo thinks, their docs are refreshingly direct: Argo CD Documentation.
A Flux Example With Kustomize (Small, Boring, Effective)
Flux feels “Git-native” and pairs nicely with Kustomize. A common approach is: define a GitRepository source, then a Kustomization that applies manifests from a path.
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: env-configs
namespace: flux-system
spec:
interval: 1m
url: https://github.com/acme/env-configs.git
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: prod
namespace: flux-system
spec:
interval: 5m
path: ./environments/prod
prune: true
sourceRef:
kind: GitRepository
name: env-configs
wait: true
timeout: 2m
This is the essence of gitops: Flux polls Git, renders what’s in ./environments/prod, applies it, and prunes what no longer exists. If we want image automation, Flux can also update image tags in Git automatically—handy for fast-moving services, mildly terrifying if we don’t set boundaries.
We also like how Flux encourages us to treat the cluster as “just another consumer of Git.” The cluster doesn’t need someone to press a deploy button; it simply reconciles.
If you’re evaluating Flux, start with their docs and keep the scope tight: Flux Documentation. Install it, sync one namespace, and get the feel for how reconciliation behaves during failures. That learning is worth more than any slide deck.
Guardrails: Policies, Secrets, And “Please Don’t Nuke Prod”
gitops without guardrails is like giving everyone a forklift and hoping they only move pallets. We want safe defaults that let teams move quickly without turning the cluster into a demolition derby.
Three guardrails we recommend early:
-
Policy enforcement
Use admission controls to prevent risky resources (privileged pods, hostPath mounts, load balancers in the wrong place). Tools like Kyverno or OPA Gatekeeper can block bad manifests before they land. We prefer policies that teach, not punish: good error messages, examples, and a path to compliance. -
Secret management
Don’t commit raw secrets to Git (we shouldn’t have to say it, but here we are). Use Sealed Secrets, External Secrets, SOPS, or a vault integration. The winning move is making secrets boring—repeatable patterns, not bespoke wizardry. -
Scoped permissions
Controllers should have the minimum access they need. If Argo/Flux can mutate everything, it eventually will (by accident). Use namespaces, projects, and RBAC boundaries so “this team’s stuff” doesn’t become “everyone’s stuff.”
Also: protect your main branches, require PR reviews, and run policy checks in CI. gitops is only as reliable as the inputs we merge.
Operating gitops Day-To-Day (Alerts, Drift, Rollbacks)
Once it’s running, the daily questions become practical:
-
How do we know it’s healthy?
We alert on reconciliation failures: sync errors, health degradation, and prolonged drift. We don’t page on every transient blip; we page on sustained inability to converge. -
What’s our rollback?
Rollback should be “revert the commit” or “revert the PR.” If rolling back requires manual cluster changes, we’ve broken the contract. We keep rollback paths tested and fast. -
How do we promote changes?
We like promotions that are explicit. For example: dev merges update image tag; staging is a cherry-pick or a PR that bumps the same tag; prod is a PR with approvals. That gives us a clean audit trail and avoids “prod drifted ahead because someone was in a hurry.” -
What about manual hotfixes?
We try not to. When we must (it happens), we treat it as an incident follow-up: capture the change back into Git immediately. Otherwise, the controller will revert it or the cluster will stay snowflakey.
If we’re honest, the best “gitops metric” is how rarely we need to touch the cluster directly. When the repo is trusted and the controller is stable, kubectl becomes a diagnostic tool, not a deployment mechanism.


