We Audited Our Own Docker Setup. Here's What We Found.

We sell infrastructure audits. So before we sell another one, we ran the full 8-point checklist against our own 57 production containers. The results were humbling: 17 containers running :latest tags, 10 with no memory limits, and 9 running as root. Here's the full audit, what we fixed tonight, and what's still on the roadmap.

57
Production containers audited
17
Running :latest (pinned 0)
10
No memory limits (fixed 4)
9
Running as root
"If you're going to sell infrastructure audits, you'd better have run one on yourself first. We did. We found gaps. We're fixing them. This is what we learned."

Why We Did This

Tomorrow morning, we're publishing a post about the five Docker production mistakes that cost teams real money. Before we ask anyone to trust us with their infrastructure, we needed to know: does our own setup pass the same checklist we're about to sell? Short answer: partially. We caught some things. We missed others. Here's the honest report.

The Audit: 24 June 2026, 22:00 UTC

We ran three checks against every running container on the Sovael VPS. The methodology is simple — three shell commands, 30 seconds each. This is the same audit we'll run on your infrastructure.

Check 1: Pinned Digests

Every container should use a specific image digest, not the :latest tag. :latest is a moving target — it changes without warning when the maintainer publishes a new version. A minor nginx update can change cipher suites and break TLS. A major Redis update can change data formats and corrupt your database.

Our result: 0 of 57 containers use pinned digests. 17 explicitly use :latest. The remaining 40 use version tags (like :3.6 or :v2.0.2) which are better than :latest but still mutable — a minor version bump can include breaking changes. The gold standard is a SHA256 digest: image:nginx@sha256:abc123.... That guarantees you're running exactly the code you tested.

The 17 :latest offenders include critical services: Redis, Evolution API, Firecrawl, and FlareSolverr. Every one of these is a potential 3am outage waiting to happen. We're fixing them — but we haven't fixed them yet. This is the honest state of our infrastructure right now.

Check 2: Memory Limits

Without deploy.resources.limits.memory, a single runaway container can consume all available RAM. The Linux OOM killer terminates random processes — including your database. We've seen this happen. It's ugly.

Our result: 10 containers had no memory limits. These included orchestrator-api (our core webhook handler), sovael-ai (our main website), litellm (LLM proxy), and orchestrator-worker (background job processor). If any of these had a memory leak or a spike in traffic, they could take down the entire server.

✓ Fixed (24 June 2026, 22:15 UTC)

We applied memory limits to 4 critical containers immediately:

ContainerLimitWhy
orchestrator-api1 GBCore webhook handler — if this dies, WhatsApp and customer integrations die
sovael-ai256 MBStatic website — doesn't need more, but must survive traffic spikes
litellm512 MBLLM proxy — handles all AI inference routing
orchestrator-worker512 MBBackground job processor — queue depth can spike unexpectedly

Remaining 6 containers without limits are non-critical or low-memory services. They're on the roadmap for the next maintenance window.

Check 3: Non-Root Users

Docker's default is root. If an attacker compromises your application, they inherit root on the host. They can read environment variables containing API keys, modify filesystem permissions, and install persistence mechanisms.

Our result: 9 containers run as root. Fixing this requires Dockerfile changes — adding USER 1000 and ensuring file permissions work for non-root users. It's not a runtime config change. Every new container we build from now on will use non-root users by default. The existing 9 are on the roadmap.

⚠ Roadmap — Still Needs Work

IssueCountPlan
:latest tags → pinned digests17 containersAudit each service for stable version, pin digest, test in staging, deploy. Estimated: 4-6 hours across 2 maintenance windows.
No memory limits6 remainingNon-critical services — apply during next scheduled maintenance. Estimated: 30 minutes.
Root user9 containersRequires Dockerfile changes per service. Long-term project. New containers use non-root by default.

What We Learned

Running this audit on ourselves before publishing the Docker mistakes post was humbling and necessary. We found exactly the kinds of gaps we tell our clients to fix. The difference is: we're documenting ours publicly. Here's the takeaway:

1. Infrastructure debt is invisible until you look. We thought our setup was solid. It wasn't. The audit took 30 seconds per check. We'd never run it before tonight.

2. Fix the critical stuff immediately. Memory limits on core services took 2 minutes to apply. There's no excuse for leaving those unfixed once you know about them.

3. Some fixes are easy. Some require planning. Pinning 17 containers to digests requires testing each one. Root user fixes require Dockerfile changes. You don't need to fix everything tonight — but you need to know what needs fixing.

4. Eat your own dog food. If you sell a service, use it yourself first. We're publishing this audit because transparency is the only credible position. We found gaps. We're fixing them. We'll update this post when the roadmap items are complete.

Audit Commands (Run These Yourself)

  1. Check for :latest tags: docker ps --format '{{.Image}}' | sort -u | grep latest
  2. Check memory limits: docker ps --format '{{.Names}}' | xargs -I{} sh -c 'm=$(docker inspect {} --format "{{.HostConfig.Memory}}"); [ "$m" = "0" ] && echo "{}: NO LIMIT"'
  3. Check root users: docker ps --format '{{.Names}}' | xargs -I{} sh -c 'u=$(docker inspect {} --format "{{.Config.User}}"); [ -z "$u" ] && echo "{}: root"'