Your app container starts, immediately tries to connect to Postgres, and dies with Connection refused. You added depends_on, ran docker compose up again, and it worked the second time. That intermittent behavior is the tell.
depends_on controls the order in which containers start, but it does not wait for the process inside a container to be ready. Docker Compose starts the database container, sees it running, and immediately launches your app, while Postgres is still ten seconds away from accepting connections. The fix is to gate the dependent service on a health check rather than on container start.
This article shows the correct way to wait for readiness with depends_on and condition: service_healthy, how to write the health checks that make it work, and the older approaches you'll still see in tutorials and why most of them don't solve the actual problem.
The fix: gate on a health check
This Compose file makes the web service wait until Postgres is ready for connections before it starts:
1234567891011121314151617181920services:db:image: postgres:18environment:POSTGRES_PASSWORD: secretPOSTGRES_DB: apphealthcheck:test: ["CMD-SHELL", "pg_isready -U postgres -d app"]interval: 5stimeout: 5sretries: 5start_period: 30sweb:build: .depends_on:db:condition: service_healthyenvironment:DATABASE_URL: postgres://postgres:secret@db:5432/app
Two things make this work, and both are easy to get wrong.
The health check on db defines what "ready" means. pg_isready ships with Postgres and returns exit code 0 only when the server is accepting connections. Compose runs that command inside the container on a loop, and the container's status moves from starting to healthy once the check passes.
The long-form depends_on on web consumes that status. Instead of the short list syntax (depends_on: [db]), each dependency gets a condition. With service_healthy, Compose blocks web from starting until db reports healthy.
When you run docker compose up, you'll see Compose pause before creating web:
12✔ Container app-db-1 Healthy 8.4s✔ Container app-web-1 Started 8.6s
The app- prefix is just the project name, which Compose takes from the directory you run it in, so yours may differ. The Healthy line is the part that matters. Without the health check and condition, web would have started at second zero.
The three conditions
The condition field takes three values, and picking the right one depends on what "ready" means for that dependency:
service_startedis the default and the equivalent of the short syntax. It waits only for the container to start. Use it when you only care about ordering.service_healthywaits for the dependency's health check to pass. This is what you want for databases, brokers, and caches.service_completed_successfullywaits for the dependency container to exit with code 0. This is built for one-shot containers like database migrations or seed scripts.
A common real-world setup chains all three. A migration container waits for the database to be healthy, and the app waits for the migration to finish cleanly:
1234567891011121314151617181920212223242526services:db:image: postgres:18environment:POSTGRES_PASSWORD: secretPOSTGRES_DB: apphealthcheck:test: ["CMD-SHELL", "pg_isready -U postgres -d app"]interval: 5stimeout: 5sretries: 5migrate:image: my-app:latestcommand: ["npm", "run", "migrate"]depends_on:db:condition: service_healthyweb:image: my-app:latestdepends_on:migrate:condition: service_completed_successfullydb:condition: service_healthy
Tuning the health check
The healthcheck block has five fields worth knowing:
testis the command. As a list, the first element must beCMD(run the binary directly) orCMD-SHELL(run through/bin/sh, so you can use||and other shell syntax). A plain string is treated asCMD-SHELL.intervalis how often the check runs once the container is up.timeoutis how long a single check may run before it counts as a failure. Keep it shorter thaninterval.retriesis how many consecutive failures flip the status tounhealthy.start_periodis a grace window during startup where failures don't count againstretries. This is the field people forget, and it's the one that matters most for slow starters like a JVM app or Elasticsearch.
There's also start_interval (Compose v2.20.2 and later), which lets you poll more frequently during the start period so a dependency is detected as healthy the moment it's ready instead of waiting for the next full interval.
Health check commands you'll reach for most often: pg_isready -U postgres for Postgres, redis-cli ping for Redis, mysqladmin ping -h localhost for MySQL, and curl -f http://localhost:8080/health or wget --spider for an HTTP service with a health endpoint.
When you can't add a health check
Sometimes you don't control the dependency's image, or the tool consuming it doesn't understand Compose conditions. That's where wait scripts come in.
wait-for-it.sh is a small pure-bash script that blocks until a TCP host and port accept a connection, then runs your command. You copy it into your image and put it in front of your start command:
12345678910services:web:build: .depends_on:- dbcommand: ["./wait-for-it.sh", "db:5432", "--timeout=30", "--", "python", "app.py"]db:image: postgres:18environment:POSTGRES_PASSWORD: secret
The catch is right there in how it works: it checks that the port is open, which is not the same as the service being ready. Postgres opens its port before the database is fully initialized, so a TCP check can pass while a real query still fails. wait-for-it.sh also needs bash, which Alpine-based images don't ship by default. If you're on Alpine, reach for wait-for (sh-compatible) or dockerize, which adds HTTP checks on top of TCP.
Use a wait script when the health-check route is closed to you. When you can define a health check, prefer it, because it tests actual readiness instead of an open socket.
What about links?
If you're reading an old tutorial, you may see links used to wire services together, and it happened to start linked containers in order as a side effect. Don't use it for new work. The links keyword is a legacy Docker feature, and Compose has not needed it since it started creating a shared network for every project automatically. Containers on that network already reach each other by service name, so db:5432 resolves with no configuration. On user-defined networks links is silently ignored, and the related legacy-link environment variables are slated for removal in a future Docker Engine release. It never waited for readiness either, so it was never a solution to this problem in the first place.
Common pitfalls
The most common mistake is assuming the short depends_on: [db] syntax waits for the database. It waits for the container and nothing more. If your stack works on the second up but not the first, this is almost always the reason.
condition: service_healthy only works if the dependency actually defines a healthcheck. The condition reads the dependency's health status, so a db service with no health check has no status to wait on, and Compose will refuse to start. The health check belongs on the dependency, not on the service that's waiting.
Watch out for missing tools in minimal images. A health check that calls curl or wget fails forever if the binary isn't in the image, which distroless and many Alpine images don't include. When that happens, the container never reports healthy and the dependent service hangs at startup. Use a tool you know is present, or a language-native check that doesn't shell out.
Old Stack Overflow answers often claim depends_on conditions don't work in version 3. They were dropped from the v3 schema years ago, then restored under the unified Compose Spec that Docker Compose v2 uses. On a current docker compose the conditions work, and the top-level version: field is obsolete, so you can delete it.
One last thing: condition: service_healthy only governs startup. If the database restarts halfway through a run, Compose won't restart your app unless you add restart: true to that dependency entry, and even then the more reliable fix is retry logic in the application itself.
Final thoughts
A correct depends_on graph solves the local race condition, but the same readiness problem reappears in production as flaky deploys and connection errors that surface only under load. Catching those early means watching container health and the connection failures between services, not just whether processes are running.
Dash0's infrastructure monitoring gives you OpenTelemetry-native visibility into container health and resource usage alongside real-time logs and distributed traces, so when a service starts before its dependency is ready you can see the failed connections and the timing in one place instead of guessing from intermittent restarts.
Start a free trial to see your container health, logs, and traces in a single view.