How to Debug a Failed Docker Build

Q: How to Debug a Failed Docker Build

Docker build failing with no useful output? Use --progress=plain, --no-cache, and --target to expose errors and isolate failing stages.

Docker build failures are notoriously opaque. The default output hides most of what's happening, collapses layer output into a single spinner, and leaves you staring at a red exited with code 1 with no useful context. The fix is usually simple once you can see what's actually happening.

This article walks through the diagnostic flags and techniques that expose what Docker is actually doing during the build — from verbose output and cache busting, to isolating individual stages in a multi-stage Dockerfile.

Make the output readable first

The first thing to do with any failed build is run it again with --progress=plain. By default, BuildKit uses a TUI that overwrites output in place. With plain progress, every layer runs sequentially and prints its full output to stdout, which means you can actually read the error.

bash

1
docker build --progress=plain -t myapp:debug .

You'll see output like this for each step:

1234567
#8 [build 3/5] RUN npm ci
#8 0.312 npm warn old lockfile
#8 1.847 npm error code ENOENT
#8 1.848 npm error syscall open
#8 1.848 npm error path /app/package-lock.json
#8 1.849 npm error errno -2
#8 ERROR: process "/bin/sh -c npm ci" did not complete successfully: exit code: 1

With TUI mode, that ENOENT would have been swallowed. With plain output, the path that doesn't exist is right there.

Disable the cache when the problem is intermittent or stale

If your build was passing before and suddenly fails, or you've made changes but Docker seems to be pulling from an old layer, add --no-cache.

bash

1
docker build --no-cache --progress=plain -t myapp:debug .

BuildKit caches aggressively. A RUN apt-get update from last week may still be cached even though the package index has drifted. --no-cache forces every layer to re-execute from scratch. This is slow, but it's the only way to rule out a stale cache as the cause.

A lighter alternative is --cache-from with an explicit image to import cache from a previous build, but when debugging, just disable the cache entirely until you've isolated the problem.

Inspect intermediate layers by targeting a specific stage

If you have a multi-stage Dockerfile and the failure is somewhere mid-build, use --target to build only up to and including the stage you care about. This saves time and lets you shell into the intermediate image to poke around.

Suppose your Dockerfile has stages deps, build, and release, and the build stage is failing:

bash

1
docker build --target build --progress=plain -t myapp:build-stage .

If that succeeds, you've confirmed the problem is in the release stage. If it fails, you've isolated the scope.

Once the target stage builds successfully, run a container from it and inspect the filesystem:

bash

1
docker run --rm -it myapp:build-stage /bin/sh

From inside, you can check whether expected files exist, verify environment variables, and run the failing command manually to see its actual output.

Read BuildKit's structured output carefully

BuildKit (the default builder since Docker Engine 23.0) groups its output by step number (#8, #9, etc.) and includes timing information. When a step fails, the relevant lines are prefixed with the step number, so you can grep for them:

bash

1
docker build --progress=plain -t myapp:debug . 2>&1 | grep "^#8"

That might produce output like this:

12345
#8 [build 3/5] RUN npm ci
#8 0.312 npm warn old lockfile
#8 1.847 npm error code ENOENT
#8 1.848 npm error path /app/package-lock.json
#8 ERROR: process "/bin/sh -c npm ci" did not complete successfully: exit code: 1

The 2>&1 redirect matters here. BuildKit writes its output to stderr, so without it you'll get nothing.

When you see a step number referenced in an error, trace it back to the corresponding RUN or COPY instruction in your Dockerfile. The step numbers correspond to the execution order, which may differ from the line order in the Dockerfile due to parallel stage resolution.

Common pitfalls worth knowing

COPY failures can look like permissions problems when the real issue is a missing file. If you have COPY ./config /app/config and the config directory doesn't exist in the build context, you'll get a not found error with misleading wording. Check your .dockerignore file first. It's easy to accidentally exclude something you need.

--no-cache doesn't clear BuildKit's content-addressable cache. If you're chasing a ghost — a stale base image, a corrupted layer — --no-cache alone may not be enough. Clear the build cache explicitly with:

bash

1
docker builder prune

Use --all to remove even layers that are still referenced by existing images, but be aware your next build will be cold.

Multi-platform builds fail differently. If you're building with docker buildx build --platform linux/arm64 on an x86 host, failures in RUN steps may be QEMU-related rather than build logic issues. The plain output will usually show this, but the error messages can be misleading. Test the affected commands in a matching base image first.

Build args need to be re-declared per stage. If you're passing --build-arg NODE_ENV=production and the variable comes up empty in a RUN step, check that you've declared it in the Dockerfile with ARG NODE_ENV before using it. Build args defined before FROM are only available for that FROM instruction. They need to be re-declared in each stage that uses them.

Final thoughts

Most Docker build failures become obvious once you can see the full output. Start with --progress=plain on every failed build. It's the single highest-value flag for build debugging and there's no reason not to use it. Add --no-cache when you suspect a stale layer. Use --target to narrow scope in multi-stage builds.

Build failures are rarely mysterious. They're almost always a missing file, a failed package install, or a bad assumption about what's in the build context. The flags above just give you enough visibility to see which one it is.

If you're running Docker builds in CI, failures often have infrastructure causes that aren't visible in the build log itself: a dependency registry timing out, a build agent under memory pressure, a flaky network hop to an artifact store. Dash0 is an OpenTelemetry-native observability platform that correlates signals across your environment — logs, metrics, and distributed traces in a single view — so you can tell the difference between a broken Dockerfile and a broken environment. If you want to go deeper on container observability, Dash0 also has guides on monitoring Docker container resource usage and centralizing Docker Compose logs with OpenTelemetry.

Start a free trial — no credit card required.