Ship Faster With Docker: 47% Fewer Surprises
Italic sub-headline: Practical patterns to speed builds and tame production drift
The Image Diet: Slim Containers Without Starving Features
We’ve all shipped a 1.2 GB image and pretended it was fine. It isn’t. Big images slow CI, clog registries, and increase your attack surface. The fix isn’t heroic; it’s boring, repeatable habits. Start with base images that match your runtime needs. Alpine can be great, but musl vs glibc can bite you at runtime. Distroless is lovely for static binaries; it removes shells and package managers that attackers adore. We also set an explicit USER
and trim layers with multi-stage builds. Keep your .dockerignore
viciously strict; node_modules and vendor directories will happily inflate your build context.
Here’s a tidy multi-stage for a Go service:
# syntax=docker/dockerfile:1.7
FROM golang:1.22 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -ldflags="-s -w" -o /out/app ./cmd/app
FROM gcr.io/distroless/static:nonroot
USER nonroot:nonroot
COPY --from=build /out/app /app
ENTRYPOINT ["/app"]
If you’re building on Linux/Intel but running on ARM, use docker buildx
with a builder that supports QEMU emulation and cache. For Dockerfile guidance, the official Dockerfile best practices are still excellent. Pro tip: avoid copying the whole repo at the start; stage your COPY
commands so dependency resolution can cache. We’ve cut 60–80% from image weights just by swapping the base and splitting build/install steps. It’s not fancy, but your CI minutes—and future selves—will thank you.
Build Like We Mean It: Caching, BuildKit, and Reproducibility
BuildKit is like finding a hidden turbo button we’re allowed to use. It unlocks parallelism, deterministic layers, and powerful mounts for cache and secrets. Enable it by default (DOCKER_BUILDKIT=1
) or, better, use docker buildx
with a named builder. Then lean into cache hints and mounts. For example:
# syntax=docker/dockerfile:1.7
FROM node:20 AS deps
WORKDIR /app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm \
npm ci
FROM node:20 AS build
WORKDIR /app
COPY --from=deps /app/node_modules node_modules
COPY . .
RUN --mount=type=cache,target=/root/.npm \
npm run build
FROM gcr.io/distroless/nodejs20-debian11
WORKDIR /app
COPY --from=build /app/dist dist
USER 10001:10001
CMD ["dist/server.js"]
Notice we don’t bake secrets into layers. When needed, use --mount=type=secret
for ephemeral access during build. To persist and share cache across CI jobs and agents:
docker buildx build \
--builder mybuilder \
--cache-from=type=registry,ref=ghcr.io/acme/web:buildcache \
--cache-to=type=registry,mode=max,ref=ghcr.io/acme/web:buildcache \
--tag ghcr.io/acme/web:1.12.3 \
--push .
We typically see 30–70% build-time improvements after adopting registry-backed cache. Read the BuildKit README for deeper magic like inline cache metadata and remote frontends. While we’re here, make builds reproducible: pin versions, avoid apt-get update
without a matching apt-get install
in a single RUN
, and don’t rely on latest
. If a rebuild tomorrow produces a different hash, that’s tech debt with interest.
Registries We Can Trust: Tags, Digests, and Policies
Tags are friendly; digests are faithful. We deploy by digest in production so we know exactly what’s running. A tag can drift; a digest can’t. This aligns nicely with the OCI Distribution Spec and makes rollbacks reliable. Our convention: during CI, we push both a semantic tag and an immutable digest, and record the digest in the deployment metadata.
Docker Compose example with digests:
services:
api:
image: ghcr.io/acme/api@sha256:2f5e3d9d8b6b8a2d... # truncated
environment:
- NODE_ENV=production
Kubernetes example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
containers:
- name: api
image: ghcr.io/acme/api@sha256:2f5e3d9d8b6b8a2d...
We keep tags immutable in production registries via policies or automation. If your registry allows overwrite (many do), treat that as a sharp knife and store provenance elsewhere. Also consider pushing an SBOM and attestation alongside the image. Buildx can attach provenance; your pipeline can promote by digest between environments. Bonus: digests make caching saner across clusters and cut “worked on my machine” debates by about 47%.
One more thing: don’t let your registry become a junk drawer. Lifecycle policies reclaim space, but we still keep the last N digests per release and any that appear in deployment history, so rollbacks remain fast and offline-friendly.
Running Containers Safely: Users, Capabilities, and Rootless
Root inside a container isn’t the same as root on the host—but it’s still more power than most apps need. We set a non-root user in the image and drop capabilities at runtime. Also helpful: read-only
filesystems, no-new-privileges
, and tight resource limits. A small blast radius keeps a small incident small.
Bare minimum runtime hardening looks like this:
docker run --name api \
--read-only \
--tmpfs /tmp:rw,noexec,nodev,nosuid,size=64m \
--pids-limit=256 \
--memory=256m --memory-swap=256m \
--cpus=1.0 \
--cap-drop=ALL --cap-add=NET_BIND_SERVICE \
--security-opt no-new-privileges \
--user 65532:65532 \
ghcr.io/acme/api@sha256:...
Compose snippet:
services:
api:
image: ghcr.io/acme/api@sha256:...
read_only: true
tmpfs:
- /tmp:rw,noexec,nodev,nosuid,size=64m
deploy:
resources:
limits:
cpus: "1"
memory: 256M
security_opt:
- no-new-privileges:true
user: "65532:65532"
cap_drop: ["ALL"]
cap_add: ["NET_BIND_SERVICE"]
We like rootless Docker on developer laptops to reduce footguns, and we audit runtime defaults against the CIS Docker Benchmark. One note: don’t block yourself from your own logs. Read-only is great until your app expects to write a PID file. Provide tmpfs
and set the correct paths. We also avoid mapping the Docker socket into containers; if we must, we isolate that workload and monitor it like a hawk with a suspicious past.
Networking Without Sadness: Compose, Namespaces, and Debugging
Docker’s user-defined bridges give us a nice middle ground: built-in DNS, sane isolation, and no hairpin NAT surprises. We usually create one network per app stack and let Compose wire up service-to-service calls by name. It’s predictable, and it avoids the all-services-on-default-network party. For example, service-a
can reach service-b
at http://service-b:8080
. No IPs, no hand-crafted /etc/hosts
, no tears.
Debugging connectivity stays simple if we standardize on the same network and use a known-good toolbox container. nicolaka/netshoot
is our Swiss Army Knife for DNS, TLS, and TCP poking. When something’s weird, we run:
docker run --rm -it --network myapp_default nicolaka/netshoot \
sh -lc 'dig service-b +short && curl -v service-b:8080/health'
Packet captures are fair game too, but we try not to go full detective unless we have to. If you do, run netshoot with --cap-add NET_RAW
just for the debug session, and remove it afterward. For multi-host setups or Swarm-era carryovers, know your drivers: bridge for local, overlay for multi-host, macvlan when you truly need L2 presence. The official Docker networking docs spell out capabilities and trade-offs clearly.
Two footguns we avoid: publishing the same port from multiple services (port conflicts are a classic), and baking service endpoints into images. Keep addresses in config, not baked into your containers, so a network tweak doesn’t require a rebuild.
Observability That Doesn’t Nag: Logs, Metrics, and Traces
We’ve all been paged by a container that went silent because the filesystem filled with logs. Our default: leave container logs on the JSON-file driver for local dev, and use a remote driver or sidecar in prod. Remote drivers (fluentd, syslog) reduce disk pressure and simplify log shipping. If you do stick with JSON-file, cap it: rotate early and often. Compose supports driver options; it’s an easy win.
Metrics are just as important. cgroup stats tell you CPU throttling and memory pressure; export them into your metrics platform alongside app metrics. We like process-level health checks instead of guessing from container status alone. If your app supports it, expose /health
and let your orchestrator restart it cleanly when things go sideways. We’re not shy about traces either; even a minimal trace pipeline helps untangle chatty microservices. OpenTelemetry is a reasonable default, but keep the sample rate realistic unless you want a surprise bill.
Parking logs, metrics, and traces together lets you correlate “CPU spiked” with “latency blew up” with “retry storm started.” That’s how we cut mean time to say “aha” by half. When you build images, add consistent labels (service name, version, commit) so your log pipeline can enrich events. A tiny bit of metadata planning goes a long way when you’re holding a pager at 3 a.m., wondering which digest you’re staring at.
Pipelines That Stick the Landing: SBOMs, Scans, and Rollbacks
Shipping an image is easy. Shipping one we can trust in six months is the real trick. Our pipeline builds once, then promotes by digest across environments. Along the way, we attach SBOMs, run scanners, and sign images. That sounds like a lot; in practice it’s a few extra steps that take minutes and save days later.
Example CI snippets:
# Build and push with cache and provenance
docker buildx build \
--builder ci \
--tag ghcr.io/acme/api:1.12.3 \
--provenance=mode=max \
--sbom=true \
--push .
# Record the digest for promotion
DIGEST=$(docker buildx imagetools inspect ghcr.io/acme/api:1.12.3 \
| awk '/Digest:/ {print $2; exit}')
echo "$DIGEST" > artifact.digest
We run SBOM generation (Syft is solid), scan the image (Trivy, Grype), and fail builds on high-severity issues that have known fixes. Time-based suppressions expire automatically—we’ve all forgotten a “temporary” allowlist that lived forever. For rollbacks, we keep a simple manifest that maps release tags to digests. Deployments reference the digest; rollbacks change one line.
Before prod, we also check runtime posture against a baseline: no root, locked-down capabilities, predictable resource limits, and no bind mounts to the host. Container hardening isn’t flashy, but it lowers risk dramatically. For a structured checklist, the Dockerfile best practices we linked earlier pair well with the CIS Docker Benchmark. Tie it all together with deploy-by-digest and you’ll sleep better—and your future incident reports will be shorter.
Production Hygiene: Secrets, Volumes, and Day-2 Upgrades
Let’s be honest: we’ve all accidentally baked a secret into an image once. We learn, we rotate, we vow never again. Store secrets outside images and pass them in at runtime via environment variables or mounted files. For build time, use BuildKit secrets so they never land in a layer. Then verify with a quick grep against the final image filesystem before pushing.
Volumes deserve equal care. Named volumes beat ad-hoc bind mounts for durability and portability. If you must bind mount, keep it read-only where possible and point your app’s write paths at a named volume or tmpfs. Resist the urge to mount the Docker socket unless you’re building a tool for Docker itself; it’s the skeleton key to the kingdom.
Upgrades matter. Track base image updates proactively; don’t wait for CVEs to prod you. Subscribe to release feeds and bump regularly in a small, testable way. When migrating between major base versions (say, Debian 11 to 12), run a canary container with your smoke tests before you touch production. Also, test your restore before you need it: can you pull the last known-good digest if the registry is under stress? If your answer is “probably,” let’s turn that into “yes” with a quick dry run.
Finally, document the boring defaults: memory caps, CPU caps, restart policies, health checks, log rotation. Boring is good. Boring is stable. And stable is how we actually get our weekends back.
Governance Without Drag: Policies, Exceptions, and Guardrails
We like teams moving fast as long as we don’t collect surprises in prod. Lightweight policy checks keep things sane without turning us into gatekeepers. For images, we require: non-root user, no :latest
tags, a digest recorded at deploy time, and a minimum set of labels (org.opencontainers.image.*
fields). Build pipelines enforce these automatically. Exceptions exist, but they’re time-bound with clear owners, so we don’t inherit permanent messes.
For runtime, we codify a small baseline: no-new-privileges
, cap-drop=ALL
with explicit adds, read-only FS unless there’s a reason, and resource limits that match typical load. In Kubernetes, the same ideas translate into security contexts and PodSecurity admission. The official Kubernetes security context docs are worth bookmarking even if you’re “just on Docker” today; habits carry over.
We also wrap guardrails around registries and tagging. Immutable prod repos, lifecycle policies, and promotion by digest keep us honest. And we surface these rules in pre-commit hooks and CI checks, not in post-merge scoldings. Nothing kills developer buy-in like a policy you only learn about when your deploy is already on fire.
When we get this balance right, people move faster with fewer rollbacks. The trick isn’t more policy; it’s the right small set, applied consistently, with a paper trail that’s shorter than a coffee break. We’ll take less ceremony and more reliability any day.