Skip to content

feat(ci): build nightly distribution images from source#6068

Draft
cdoern wants to merge 1 commit into
ogx-ai:mainfrom
cdoern:feat/nightly-distro-images
Draft

feat(ci): build nightly distribution images from source#6068
cdoern wants to merge 1 commit into
ogx-ai:mainfrom
cdoern:feat/nightly-distro-images

Conversation

@cdoern

@cdoern cdoern commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

What

Adds a maintained nightly distribution-image build so users can pull a trustworthy "latest main" container, and retires the flaky test.pypi-based nightly docker build.

nightly-distro.yml (new)

  • Builds distribution images from the current main source tree (INSTALL_MODE=editable) every night at 02:00 UTC. Building from source removes the test.pypi propagation wait that was the main flake source in the old nightly.
  • Each architecture builds on its own native runner (ubuntu-24.04 + ubuntu-24.04-arm) — no QEMU at runtime — so each image is boot-smoke-tested on its real architecture before it's trusted.
  • A push only happens if the image actually booted. Per-arch tags are then merged into a single multi-arch manifest via docker buildx imagetools create.
  • Tags: :nightly, :<YYYYMMDD>, :<short-sha>. :latest is intentionally not touched — it stays owned by the release pipeline and means "latest stable release". Pull :nightly to track main.
  • Per-PR gate: on PRs touching build-relevant paths, it builds + boot-smoke-tests the starter distro on amd64 only, no push — so a change that breaks server startup is caught before merge instead of at the next nightly. (providers-build.yml builds venv-only on PRs, so nothing booted a distro container per-PR before this.)

pypi.yml (changed)

  • Removes the schedule trigger from the publish-docker-images job. Nightly images are now built from source by nightly-distro.yml. Release and workflow_dispatch image builds are unchanged, and the nightly test.pypi package publish is untouched.

scripts/smoke-test-distro.sh (new)

  • Boots a built image with stubbed provider keys, polls /v1/health for OK, asserts /v1/models returns valid JSON, and dumps container logs on failure. Reusable locally and in CI.

Test plan

Static checks:

  • actionlint on both workflows — clean
  • shellcheck scripts/smoke-test-distro.sh — clean
  • Matrix resolution simulated for pull_request / schedule / workflow_dispatch:
    • PR → [starter/amd64], push=false
    • schedule → [starter/amd64] [starter/arm64] [postgres-demo/amd64] [postgres-demo/arm64], push=true

End-to-end, locally (native arm64):

docker buildx build --load -f containers/Containerfile \
  --build-arg INSTALL_MODE=editable --build-arg DISTRO_NAME=starter \
  --tag ogxai/distribution-starter:smoke-local .
bash scripts/smoke-test-distro.sh ogxai/distribution-starter:smoke-local

Output:

Waiting for http://localhost:8321/v1/health (up to 150s)...
Server is healthy after 15s
Checking /v1/models endpoint...
Smoke test passed for ogxai/distribution-starter:smoke-local
EXIT CODE = 0

(The 401 provider model-refresh errors from the dummy keys are expected and non-fatal — the server still starts and serves.)

Notes / follow-ups

  • Requires ubuntu-24.04-arm GitHub-hosted runners enabled for the org.
  • Reuses existing DOCKERHUB_USERNAME / DOCKERHUB_TOKEN secrets — nothing new to provision.
  • Curated nightly set is starter,postgres-demo; expand later as needed.

🤖 Generated with Claude Code

Add a nightly workflow that builds distribution container images from the
current main source tree (INSTALL_MODE=editable), boot-smoke-tests each image
on a native per-arch runner, and publishes a multi-arch manifest to DockerHub.
Building from source removes the test.pypi propagation wait that made the
previous nightly docker builds flaky.

Images are published as :nightly, :<date>, and :<short-sha>. The :latest tag
is deliberately left to the release pipeline so it continues to mean "latest
stable release"; pull :nightly to track main. On pull requests touching
build-relevant paths the workflow builds and boot-smoke-tests the starter
distro on amd64 without pushing, so startup regressions are caught before
merge.

Remove the schedule trigger from the pypi.yml docker job, since nightly images
are now built from source here. Release and workflow_dispatch image builds in
pypi.yml are unchanged, and the nightly test.pypi package publish is untouched.

Add scripts/smoke-test-distro.sh, which boots a built image, waits for
/v1/health, and verifies /v1/models responds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant