The Deploy That Didn't

The CI was green. Not kind-of-green — confidently green. Three stages, three checkmarks:

Build & Transfer ✅
Deploy Staging ✅
Verify ✅

We were looking at last week’s code.

Not a flaky test that passed by luck. Not a feature flag we forgot to toggle. The deployment ran. The scripts completed. The containers were healthy. We had deployed — just not what we thought.

I’ve started calling this a ghost deploy: a deployment that succeeds without replacing anything.

How It Hid

The context: the Cashback App, pre-launch. We were consolidating 27 database migration scripts into a single one — last chance before real user data makes that impossible. Routine hygiene. Somewhere in that work, we checked a page that should have looked different. It didn’t. And we realized: staging has been running old code for several releases. The CI never noticed.

We deploy without a registry. Build on GitHub Actions, save to a tarball, SCP to EC2, docker load, docker compose up -d. Straightforward. We’d done it dozens of times. The problem is that every assumption we’d carried about what “deploying a new image” means turned out to be wrong.

There were four of them.

The Four Things Docker Doesn’t Mean

1. docker tag doesn’t replace an image.

When you tag a new image myapp:latest, you’re moving a label. The new image gets the name. The old image loses the name and becomes anonymous — no tag, sitting in the local image store, using disk, going nowhere. This is called a dangling image. We had 67 of them. Combined: 18.95GB. Disk usage: 96%.

2. docker load doesn’t guarantee the old content is gone.

When you SCP a tarball to a server and docker load it, you’re adding an image. The new image materializes. If there was already a myapp:latest tag, it gets reassigned to the new image. The old image becomes dangling. But nothing stops. No containers are affected. Nothing restarts.

3. docker images shows build date, not load date.

We kept checking the timestamp. The image had today’s date — surely the new image was there. But the timestamp is baked in at build time, on the CI machine. An image built yesterday and SCP’d to the server today shows yesterday’s date. It tells you when it was created, not when it arrived.

4. docker compose up -d doesn’t care about freshness.

This is the one that actually caused the ghost deploy.

Compose asks one question: is there a container already running this configuration? If yes, it does nothing. It doesn’t ask whether the running container uses the same image the tag now points to. It doesn’t compare image IDs. It’s state-aware, not freshness-aware.

The tag moved. The container didn't notice.

A running container holds a reference to an image ID, not a tag name. When the new image loaded and the tag reassigned, the container’s reference didn’t update. Compose looked at the container, looked at the compose file, found nothing different, and left it running.

Running perfectly. Running last week’s code.

How AI Built the Bug, Then Fixed It

There’s a layer to this story worth naming.

The deployment pipeline wasn’t written by a DevOps engineer who’d spent years wrestling with Docker image identity. It was built by Cairn — my persistent AI orchestrator — in a single four-hour session. The GitHub Actions workflow, the SCP transfer, the docker load, the compose deploy — Cairn assembled all of it while simultaneously refactoring three frontends and running SQL migrations against production RDS.

It even caught a Docker bug in that session: images were tagged :latest in CI but the compose file expected :prod and :staging. Two CI iterations, fix applied. Cairn identified it from the GitHub Actions failure log without any manual diagnosis.

But there’s a difference between a tag mismatch — which fails loudly in CI — and a container that runs happily on a stale image. That second kind of problem doesn’t fail. It silently succeeds.

So when the ghost deploy surfaced — several releases later, during pre-launch maintenance — the same AI that built the pipeline had to find what was wrong with it.

What happened in that debug session is worth describing precisely.

Cairn didn’t recognize the pattern from training. There was no “I’ve seen this Docker issue before” reflex to reach for. Instead, it did what a good DevOps engineer does on an unfamiliar problem: it questioned what each step in the pipeline actually guarantees — including its own pipeline.

First hypothesis: the image didn’t load correctly. It had.
Second: compose was referencing the wrong image name. It wasn’t.
Third: what does docker compose up -d actually verify?

That question was the turn. Not does the image exist. Not does the tag match. But: what condition would have to change for compose to restart a container?

Configuration. Not image content. Not image ID. The compose file configuration.

Three hypothesis cycles. Each one eliminating a wrong assumption before landing on the right question. Not immediate — but the shape of it was exactly the shape of a human DevOps engineer working through an unfamiliar trap for the first time. Form hypothesis. Test. Eliminate. Adapt.

The difference: no frustration. No “it’s probably fine.” No stopping when it got late. Just the next hypothesis.

The Fix and the Race Condition

The fix is docker rmi myapp:latest before docker load, combined with docker compose up -d --force-recreate. Force-recreate tells compose to rebuild containers regardless of configuration diff.

But there’s a window.

WARNING

Between docker rmi and docker load, the tag doesn’t exist. If a container crashes during that window and tries to restart, it can’t — there’s no image. On a server with no source code (just loaded tarballs), that container stays down until you manually load an image again. This is a real production risk.

The pattern we landed on: be release-aware.

Staging: rmi before every load. Staging can briefly go down.
Production: rmi only on a genuine new release — tracked via the releases_created output from release-please. Between releases (CI reruns, hotfixes of the same version), load the new image without removing the old tag, then force-recreate. The old image becomes dangling, but the container can still restart if something goes wrong mid-load.

Add docker image prune -f after every deploy. Add docker builder prune -af for build cache. Otherwise that 67-image, 18.95GB accumulation happens quietly until your next deploy fails because the disk is full and nobody knows why.

What Running Without a Registry Actually Means

When you push to a registry and a container orchestrator pulls from it, image identity is managed for you. Tags, digests, content-addressable storage — the infrastructure tracks freshness. You don’t think about what docker tag does at the pointer level because the registry handles identity correctly.

When you SCP a tarball and docker load it, you’ve taken that job on yourself. Without the tooling. Without the guard rails. Every assumption inherited from “images in a registry” needs to be re-examined from scratch.

This is the kind of bug that hides well in small teams. The deployment script runs. The CI is green. The containers are healthy. Nobody checks whether the running containers are actually running the new code — why would they? The CI checked.

The CI was always telling the truth. We just weren’t asking what it was actually verifying.

How It Hid

The Four Things Docker Doesn’t Mean

How AI Built the Bug, Then Fixed It

The Fix and the Race Condition

What Running Without a Registry Actually Means

Want to work through this together?

AI Roundtable