Skip to content

fix: repair production Docker Compose (#400)#420

Open
Freezaa9 wants to merge 2 commits intodevelopfrom
fix/issue-400-prod-compose
Open

fix: repair production Docker Compose (#400)#420
Freezaa9 wants to merge 2 commits intodevelopfrom
fix/issue-400-prod-compose

Conversation

@Freezaa9
Copy link
Copy Markdown
Contributor

Context

Closes #400. Running docker compose up (the production stack using pre-built images) was broken in several ways, making it impossible to start the platform without editing the compose file manually.

What was broken

Six bugs were identified by cross-referencing docker-compose.yml with the Manager Dockerfile, the web nginx.conf, docker-entrypoint.sh, and the engine source code:

# File Bug Impact
1 docker-compose.yml Manager healthcheck used :8080 but compose set PORT: 8000 Agent service depended on this healthcheck → startup hung forever
2 docker-compose.yml DATABASEAUTO_MIGRATE: true (missing __ separators) Dead env var — never read by the Settings model; auto-migration is always enabled in main.py
3 docker-compose.yml Web port mapped 3000:3000 but nginx listens on 8080 connection refused on host port 3000
4 docker-compose.yml CHOKIDAR_USEPOLLING=true in web service Dev-only hot-reload env var; harmless but wrong
5 docker-compose.yml VITE_API_URL set instead of API_URL docker-entrypoint.sh reads ${API_URL:-}, not VITE_API_URL → web UI always pointed at wrong/empty API URL
6 docker-compose.build.yml IDUN_MANAGER_HOST had full /api/v1/agents/config path Engine's with_config_from_api appends this path — resulting in a doubled URL

Changes

docker-compose.yml — fixes 1–5 above, plus re-enables the explicit idun_network (was fully commented out) so services communicate over an explicit bridge.

docker-compose.build.yml — fix 6: trim IDUN_MANAGER_HOST to base URL only.

.github/workflows/smoke-test-compose.yml (new) — CI smoke test that:

  • Triggers on any PR/push touching docker-compose.yml, Dockerfiles, or nginx config
  • Starts db + manager + web using the production compose
  • Waits for the manager healthcheck to pass
  • Asserts GET /api/v1/healthz returns {"status": ...}
  • Asserts the web UI returns HTML
  • Dumps all container logs on failure

docs/deployment/overview.mdx — rewrites the "Managed (full stack)" tab to document the production docker compose up path (pre-built images). Adds a "Development" tab for the docker-compose.dev.yml build-from-source path. Updates the embedded YAML example to match the real file.

docs/quickstart.mdx — adds a nested tabs block in the "Start the platform" step showing both the prod and dev compose commands.

How to test manually

git checkout fix/issue-400-prod-compose
docker compose up -d db manager web
# Wait ~30s for the manager healthcheck to pass
curl http://localhost:8000/api/v1/healthz     # → {"status":"ok",...}
open http://localhost:3000                    # → Idun login page

Tests

  • 124 manager unit tests pass (uv run pytest services/idun_agent_manager/tests)
  • Both compose files validate cleanly (docker compose config --quiet)
  • Pre-commit hooks (ruff, gitleaks) pass on all changed files
  • New CI workflow will verify the prod stack on every PR that touches compose or Dockerfile paths

🤖 Generated with Claude Code

Freezaa9 and others added 2 commits March 22, 2026 23:15
Five bugs prevented `docker compose up` (production mode) from working:

1. **Manager healthcheck wrong port** — compose set `PORT: 8000` but
   the healthcheck URL used `:8080`. The agent service depended on this
   healthcheck, so startup would hang indefinitely.

2. **`DATABASEAUTO_MIGRATE` typo** — missing double-underscore separator.
   The env var was never read (auto-migration is always enabled in
   `main.py`). Removed the dead variable to avoid confusion.

3. **Web port mapping** — nginx inside the web container listens on 8080
   (see `nginx.conf`). The compose file mapped `3000:3000`, so the host
   got a "connection refused" on port 3000. Fixed to `3000:8080`.

4. **`CHOKIDAR_USEPOLLING=true` in prod** — a Vite/Webpack hot-reload
   env var that serves no purpose in the production nginx image. Removed.

5. **`VITE_API_URL` vs `API_URL`** — `docker-entrypoint.sh` injects the
   API base URL via `\${API_URL:-}` into `config.js` at container start.
   The compose file set `VITE_API_URL`, which the entrypoint never reads.
   Renamed to `API_URL`.

Also fixed in `docker-compose.build.yml`:

6. **`IDUN_MANAGER_HOST` with path** — the engine's `with_config_from_api`
   already appends `/api/v1/agents/config` to the base host. The previous
   value included the full path, resulting in a doubled URL.
   Trimmed to the base URL.

Additionally:
- Re-enabled the `idun_network` Docker network (was commented out) so
  all services communicate over an explicit bridge rather than the
  implicit default network.
- Added CI workflow `.github/workflows/smoke-test-compose.yml` that
  starts the production stack on every PR touching compose/Dockerfile
  paths and asserts the Manager health endpoint and web UI respond.
- Updated `docs/deployment/overview.mdx` and `docs/quickstart.mdx` to
  document both the production (pre-built images) and development (build
  from source) compose variants, replacing the outdated dev-only example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The smoke test was trying to pull `freezaa9/idun-ai:0.5.1` from Docker Hub,
which doesn't exist until a release is published. This caused the CI job to
fail immediately with "manifest unknown".

Fix: build the manager and web images directly from their Dockerfiles in the
smoke test, then tag them to match the names expected by `docker-compose.yml`.
This lets CI test the exact same compose configuration end-users run without
depending on published images.

Also added a separate `validate-compose` job that validates the YAML syntax of
both `docker-compose.yml` and `docker-compose.build.yml` using `docker compose
config --quiet`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant