We Audited Our Own Docker Setup. Here's What We Found.
We sell infrastructure audits. So before we sell another one, we ran the full 8-point checklist against our own 57 production containers. The results were humbling: 17 containers running :latest tags, 10 with no memory limits, and 9 running as root. Here's the full audit, what we fixed tonight, and what's still on the roadmap.
Why We Did This
Tomorrow morning, we're publishing a post about the five Docker production mistakes that cost teams real money. Before we ask anyone to trust us with their infrastructure, we needed to know: does our own setup pass the same checklist we're about to sell? Short answer: partially. We caught some things. We missed others. Here's the honest report.
The Audit: 24 June 2026, 22:00 UTC
We ran three checks against every running container on the Sovael VPS. The methodology is simple — three shell commands, 30 seconds each. This is the same audit we'll run on your infrastructure.
Check 1: Pinned Digests
Every container should use a specific image digest, not the :latest tag. :latest is a moving target — it changes without warning when the maintainer publishes a new version. A minor nginx update can change cipher suites and break TLS. A major Redis update can change data formats and corrupt your database.
Our result: 0 of 57 containers use pinned digests. 17 explicitly use :latest. The remaining 40 use version tags (like :3.6 or :v2.0.2) which are better than :latest but still mutable — a minor version bump can include breaking changes. The gold standard is a SHA256 digest: image:nginx@sha256:abc123.... That guarantees you're running exactly the code you tested.
The 17 :latest offenders include critical services: Redis, Evolution API, Firecrawl, and FlareSolverr. Every one of these is a potential 3am outage waiting to happen. We're fixing them — but we haven't fixed them yet. This is the honest state of our infrastructure right now.
Check 2: Memory Limits
Without deploy.resources.limits.memory, a single runaway container can consume all available RAM. The Linux OOM killer terminates random processes — including your database. We've seen this happen. It's ugly.
Our result: 10 containers had no memory limits. These included orchestrator-api (our core webhook handler), sovael-ai (our main website), litellm (LLM proxy), and orchestrator-worker (background job processor). If any of these had a memory leak or a spike in traffic, they could take down the entire server.
✓ Fixed (24 June 2026, 22:15 UTC)
We applied memory limits to 4 critical containers immediately:
| Container | Limit | Why |
|---|---|---|
| orchestrator-api | 1 GB | Core webhook handler — if this dies, WhatsApp and customer integrations die |
| sovael-ai | 256 MB | Static website — doesn't need more, but must survive traffic spikes |
| litellm | 512 MB | LLM proxy — handles all AI inference routing |
| orchestrator-worker | 512 MB | Background job processor — queue depth can spike unexpectedly |
Remaining 6 containers without limits are non-critical or low-memory services. They're on the roadmap for the next maintenance window.
Check 3: Non-Root Users
Docker's default is root. If an attacker compromises your application, they inherit root on the host. They can read environment variables containing API keys, modify filesystem permissions, and install persistence mechanisms.
Our result: 9 containers run as root. Fixing this requires Dockerfile changes — adding USER 1000 and ensuring file permissions work for non-root users. It's not a runtime config change. Every new container we build from now on will use non-root users by default. The existing 9 are on the roadmap.
⚠ Roadmap — Still Needs Work
| Issue | Count | Plan |
|---|---|---|
| :latest tags → pinned digests | 17 containers | Audit each service for stable version, pin digest, test in staging, deploy. Estimated: 4-6 hours across 2 maintenance windows. |
| No memory limits | 6 remaining | Non-critical services — apply during next scheduled maintenance. Estimated: 30 minutes. |
| Root user | 9 containers | Requires Dockerfile changes per service. Long-term project. New containers use non-root by default. |
What We Learned
Running this audit on ourselves before publishing the Docker mistakes post was humbling and necessary. We found exactly the kinds of gaps we tell our clients to fix. The difference is: we're documenting ours publicly. Here's the takeaway:
1. Infrastructure debt is invisible until you look. We thought our setup was solid. It wasn't. The audit took 30 seconds per check. We'd never run it before tonight.
2. Fix the critical stuff immediately. Memory limits on core services took 2 minutes to apply. There's no excuse for leaving those unfixed once you know about them.
3. Some fixes are easy. Some require planning. Pinning 17 containers to digests requires testing each one. Root user fixes require Dockerfile changes. You don't need to fix everything tonight — but you need to know what needs fixing.
4. Eat your own dog food. If you sell a service, use it yourself first. We're publishing this audit because transparency is the only credible position. We found gaps. We're fixing them. We'll update this post when the roadmap items are complete.
Audit Commands (Run These Yourself)
- Check for :latest tags:
docker ps --format '{{.Image}}' | sort -u | grep latest - Check memory limits:
docker ps --format '{{.Names}}' | xargs -I{} sh -c 'm=$(docker inspect {} --format "{{.HostConfig.Memory}}"); [ "$m" = "0" ] && echo "{}: NO LIMIT"' - Check root users:
docker ps --format '{{.Names}}' | xargs -I{} sh -c 'u=$(docker inspect {} --format "{{.Config.User}}"); [ -z "$u" ] && echo "{}: root"'