Add Docker Compose deployment workflow#65
Conversation
|
Hi @wangwanjie! 🎉 Thanks for this comprehensive Docker deployment contribution — great to see a production-ready Alpine runtime with multi-arch support, health checks, and memory limits built in! I will run a deep review across code correctness + deployment design and get back to you within 24h. Join our community while you wait:
Thanks for making open-design better! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cc7d4ee173
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| image_sources=() | ||
|
|
||
| for platform in "${platform_list[@]}"; do | ||
| ensure_base_images_preloaded "$platform" |
There was a problem hiding this comment.
Gate base-image preloading by push strategy
The main loop always calls ensure_base_images_preloaded before each build, and that helper always invokes skopeo copy when PRELOAD_BASE_IMAGES=1. This makes --push_strategy buildx fail on hosts without skopeo, even though the script advertises buildx as an alternative push path. In practice, users selecting buildx to avoid skopeo still hit a hard failure before any build starts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is already handled in the current script: ensure_base_images_preloaded calls should_use_preloaded_base_images first, and that helper only returns true when PRELOAD_BASE_IMAGES=1 and PUSH_STRATEGY=skopeo. With --push_strategy buildx, the preload path returns before requiring or invoking skopeo.
| [[ "$INSPECT_AFTER_PUSH" == "1" ]] || return 0 | ||
|
|
||
| if [[ "$PUSH_STRATEGY" == "skopeo" ]]; then | ||
| skopeo inspect --raw "docker://${image}" >/dev/null |
There was a problem hiding this comment.
Reuse authfile for post-push skopeo inspection
After a successful skopeo push, the script validates the remote image with skopeo inspect --raw but does not pass --authfile. When credentials come from credsStore (and are materialized into EFFECTIVE_SKOPEO_AUTHFILE earlier), private repositories can push successfully and then fail at inspection with an auth error, causing false-negative publish failures.
Useful? React with 👍 / 👎.
| os="${platform%/*}" | ||
| arch="${platform#*/}" | ||
| if ! jq -e --arg os "$os" --arg arch "$arch" ' | ||
| .mediaType == "application/vnd.docker.distribution.manifest.list.v2+json" and |
There was a problem hiding this comment.
Accept OCI index manifests in platform verifier
The manifest verifier hard-requires Docker manifest-list media type before checking platform entries. Valid multi-arch images published as OCI image indexes are therefore reported as missing platforms even when the manifests[] entries are correct, which can block release verification for compliant OCI images.
Useful? React with 👍 / 👎.
lefarcen
left a comment
There was a problem hiding this comment.
Hey @wangwanjie, thanks for the PR! 🎉 This is a solid foundation for Docker deployment — Alpine multi-stage build, multi-arch support, memory limits, health checks, and thorough verification scripts. The structure is clean and the test plan is comprehensive.
Found 5 P1 security/correctness issues that need fixes before merge, plus several P2/P3 improvements. The core concerns:
- Shell injection risk in publish-images.sh (unquoted user-supplied variables)
- Docker credential exposure via credential helpers (cleartext secrets in temp file)
- TOCTOU race in temp file handling
- Missing input validation on critical paths
- Container escape risk via --add-host in untrusted environments
P1 Issues (must fix)
deploy/scripts/publish-images.sh:218 - Credential exposure via cleartext temp file
When Docker uses a credential helper, this code extracts the secret and writes it to a temp file in cleartext. If the process crashes before cleanup, the temp file persists with credentials readable by any user.
Fix: Use mktemp --tmpdir=/dev/shm/... (memory-backed) + chmod 600 immediately.
deploy/scripts/publish-images.sh:99 - Shell injection via unquoted variable
proxy_url (from HTTP_PROXY env) is passed unquoted into Perl. Add input validation: [[ ! "$proxy_url" =~ ^https?://[a-zA-Z0-9._-]+(:[0-9]+)?$ ]]
deploy/scripts/publish-images.sh:240 - TOCTOU race in temp directory
Use rm -rfP to avoid symlink attacks, or verify directory before deletion.
deploy/scripts/publish-images.sh:322 - Missing input validation on --image_namespace/--image_repository
Validate: [[ ! "$IMAGE_NAMESPACE" =~ ^[a-zA-Z0-9_-]+$ ]] to prevent path traversal in image refs.
deploy/scripts/publish-images.sh:319 - Container escape risk via --add-host
--add-host host.docker.internal=host-gateway exposes host network. Make opt-in or document security assumption.
P2 Issues (recommended)
- Dockerfile:34 - Native module stripping may break better-sqlite3. Either test after stripping or skip .node files.
- docker-compose.yml:20 - Add
security_opt: no-new-privileges+read_only: truefor defense in depth. - .env.example:9 - Document tested memory workloads (384m may OOM on large exports).
P3 Improvements
- server.js:1087 - IPv6 :: bind shows misleading 'localhost' URL
- README.md:23 - Add guidance on agent CLI installation (volume mount vs derived image)
Most P1s can be fixed with input validation + proper quoting. Let me know if you need clarification on any of these!
Harden publishing inputs and temporary credential handling, and tighten Docker runtime defaults requested by the PR review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the detailed review. I've pushed fixes for the reported security/correctness issues:
Validation run:
I could not run |
lefarcen
left a comment
There was a problem hiding this comment.
Excellent work, @wangwanjie! 🎉 All P1 security issues have been fixed comprehensively:
✅ Shell injection - Proxy URL validation with regex checks
✅ Credential exposure - Memory-backed temp roots ( or 700-perms fallback) + on authfiles
✅ TOCTOU race - Centralized temp dir management with cleanup validation
✅ Input validation - Namespace/repository validation with Docker name component regex
✅ Container escape - now conditional (only when proxy rewrite needed)
✅ P2/P3 fixes:
- Native .node stripping removed from Dockerfile
- IPv6 listen host formatting ( for IPv6 any)
- Compose hardening:
read_only: true,tmpfs,no-new-privileges - Skopeo auth reused for post-push inspection
- Base-image preloading gated to skopeo strategy
The implementation is solid — temp file handling is now defense-in-depth (private roots + restrictive perms + centralized cleanup), input validation is thorough, and the Compose runtime is properly hardened.
Remaining optional improvements (P3, non-blocking):
-
deploy/.env.example memory docs (P2 → P3 after your validation work) — Consider adding a workload note like:
-
deploy/README.md agent CLI installation — Add the options subsection I suggested (volume-mount vs derived image)
But these are polish — the core security/correctness is now excellent. Approving! 🚀
lefarcen
left a comment
There was a problem hiding this comment.
Excellent work, @wangwanjie! 🎉 All P1 security issues have been fixed comprehensively:
✅ Shell injection - Proxy URL validation with regex checks
✅ Credential exposure - Memory-backed temp roots (/dev/shm or 700-perms fallback) + chmod 600 on authfiles
✅ TOCTOU race - Centralized temp dir management with cleanup validation
✅ Input validation - Namespace/repository validation with Docker name component regex
✅ Container escape - --add-host now conditional (only when proxy rewrite needed)
✅ P2/P3 fixes:
- Native .node stripping removed from Dockerfile
- IPv6 listen host formatting for :: bind
- Compose hardening: read_only, tmpfs, no-new-privileges
- Skopeo auth reused for post-push inspection
- Base-image preloading gated to skopeo strategy
The implementation is solid — temp file handling is now defense-in-depth (private roots + restrictive perms + centralized cleanup), input validation is thorough, and the Compose runtime is properly hardened.
Remaining optional improvements (P3, non-blocking):
- deploy/.env.example memory docs — Consider adding a workload note (idle ~20 MB; single 10 MB export peaks ~180 MB; concurrent agent +50 MB each → 384m supports ~3 concurrent + moderate exports).
- deploy/README.md agent CLI installation — Add the options subsection I suggested (volume-mount vs derived image).
But these are polish — the core security/correctness is now excellent. Approving! 🚀
lefarcen
left a comment
There was a problem hiding this comment.
All security issues resolved! This is ready to merge from a security/correctness perspective. The Docker deployment is production-ready. 🎉
|
Hi @wangwanjie! 👋 The PR has a merge conflict that needs to be resolved before we can merge. Could you please: git checkout main
git pull origin main
git checkout docker-deploy
git rebase main
# Resolve any conflicts
git push -f origin docker-deployOnce the conflicts are resolved, we'll be ready to merge — the code review is already approved! ✅ |
|
Hi @wangwanjie, your PR has a merge conflict with main. Could you rebase and resolve it? Once the conflict is fixed, we can merge immediately — all security issues have been addressed! 🚀 |
Resolve the daemon monorepo migration conflict and adapt Docker deployment files to the new workspace layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set CI=true during the image build so pnpm prune can run non-interactively inside Docker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| FROM ${RUNTIME_IMAGE} | ||
|
|
||
| RUN apk add --no-cache nodejs tini && \ |
There was a problem hiding this comment.
@wangwanjie Thanks for putting this Docker workflow together — the overall direction is really useful. One thing I’d tighten before merge: the build stage uses Node 24, but the runtime stage installs nodejs from Alpine 3.22, which may not be Node 24. Since better-sqlite3 is native and built during install, running it under a different Node major/ABI can cause startup failures. Could we use a Node 24 runtime image too, or otherwise pin/install exactly Node 24 in the final stage? 🙏
| for required_path in \ | ||
| "app/apps/daemon/server.js" \ | ||
| "app/apps/web/out/index.html" \ | ||
| "app/node_modules/better-sqlite3" \ |
There was a problem hiding this comment.
Small verification follow-up: in this pnpm workspace, better-sqlite3 is a daemon package dependency, so the direct workspace symlink should be under app/apps/daemon/node_modules/better-sqlite3 rather than app/node_modules/better-sqlite3. Updating this check would make the verifier match the actual runtime layout and avoid false failures. 😊
Use pnpm deploy for the daemon package so the runtime image includes production dependencies where Node resolves them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allow pnpm v10 deploy to package the daemon workspace without requiring injected workspace packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use Node 24 for both build and runtime stages and update image verification for the workspace daemon dependency layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Hi @wangwanjie! The bot review on this PR is positive overall, but there are now merge conflicts with main (about 44h since the last update). Could you rebase against latest main? If you'd like a hand walking through the rebase, just reply here. Once conflicts are resolved, a maintainer can take a final look and merge. If you've moved on, no worries — feel free to close. Thanks for the contribution! |
Adapt Docker packaging to the latest workspace layout and prevent single-arch publishes from overwriting the multi-arch latest tag.
|
Updated this PR with the latest Changes included:
Validation completed:
Current published The deployed Docker Compose path has also been manually verified behind a reverse proxy using |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6a232527b0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ARCHIVE_CONTAINER_ID="" | ||
|
|
||
| for required_path in \ | ||
| "app/apps/daemon/server.js" \ |
There was a problem hiding this comment.
Verify daemon artifact under dist/ instead of repo root
The required path check expects app/apps/daemon/server.js, but this image runs from apps/daemon/dist/cli.js and the daemon package only ships dist outputs, so this assertion can fail even for a correct runtime image. That creates a false-negative in the release gate and blocks publish verification despite a bootable image.
Useful? React with 👍 / 👎.
| fi | ||
| done | ||
|
|
||
| runtime_tools="$(docker run --rm --entrypoint sh "$IMAGE_REF" -lc 'for tool in python3 g++ make npm pnpm; do if command -v "$tool" >/dev/null 2>&1; then echo "$tool"; fi; done')" |
There was a problem hiding this comment.
Remove npm from forbidden runtime tool list
The runtime tool check treats npm as disallowed, but the chosen runtime base (node:24-alpine) includes npm by default, so this condition will fail for normal images and turn verification into a permanent false alarm. As written, valid builds are rejected before health checks complete.
Useful? React with 👍 / 👎.
Use OD_BIND_HOST as the single daemon bind-host setting for Docker deployment and origin validation.
|
Follow-up update: removed the legacy
Validation rerun:
|
lefarcen
left a comment
There was a problem hiding this comment.
Hey @wangwanjie, thanks for the thorough rework! 👏
All the P1 security issues from the previous review are now properly addressed:
- ✅ Proxy URL validation with strict regex (scheme://host:port only)
- ✅ Credential temp file handling hardened (
chmod 600+ private temp roots in/dev/shmor700-mode fallback) - ✅ Image namespace/repository validation (
validate_image_name_componentwith lowercase Docker naming rules) - ✅ Centralized temp cleanup with whitelist patterns
- ✅
OD_BIND_HOST/OD_ALLOWED_ORIGINSsupport for reverse-proxy deployments added to daemon
The Docker deployment workflow is production-ready from a security standpoint. The architecture changes (workspace layout adaptation, multi-arch manifest handling, origin validation) align well with the current main branch.
Ship when ready from my side — deferring final approval to a maintainer.
Check the packaged daemon dist entrypoint and allow npm from the Node 24 runtime image while still rejecting build-only tools.
|
Addressed the latest non-outdated verifier feedback:
Validation:
The older base-image preloading thread is already handled by |
lefarcen
left a comment
There was a problem hiding this comment.
Verifier refinements look good — daemon entrypoint path and npm runtime check both match the current workspace layout.
All prior security findings remain fixed; this commit only improves the verification script itself. Ship-ready from my side — deferring final approval to a maintainer.
lefarcen
left a comment
There was a problem hiding this comment.
Hey @wangwanjie, reviewed the private LAN origin support — the core security logic looks solid! 🔒
What's good:
- IP range checks are correct per RFC 1918 + link-local (10.x, 172.16-31.x, 192.168.x, 169.254.x)
- The
isAllowedBrowserOriginhostname matching (parsedOrigin.hostname === requestHost.hostname) correctly prevents DNS rebinding attacks when the browser is on a private LAN IP - The new test cases cover the critical security boundary (Origin≠Host rejection)
Suggestions (non-blocking):
- Test coverage gaps (P2): Only 192.168.x.x is tested. Consider adding test cases for:
- 10.x.x.x range
- 172.16-31.x.x range (and boundary rejection: 172.15.x.x / 172.32.x.x)
- 169.254.x.x link-local
- Malformed IPs (e.g., "192.168.1.256" should be rejected)
- Code duplication (P3): The validation helpers are duplicated in both
server.tsand the test file. Future maintenance risk if production logic changes but test helpers don't. Consider extracting to a shared module or importing from server.ts in tests (if test isolation allows).
Verdict: Ship-ready from security/correctness perspective. The suggestions above are quality-of-life improvements for future maintainability.
Thanks for the clean refactor and thorough test coverage! 🙌
Move browser origin validation into a shared daemon module so tests exercise the production logic and cover the remaining private LAN edge cases.
|
Addressed the latest Suggestions in
Validation:
The PR branch has been pushed. Docker image publishing is running now; I will follow up with the new digest once it completes. |
|
Docker publish follow-up: Published and verified Current Validation:
|
mrcfps
left a comment
There was a problem hiding this comment.
@wangwanjie thanks for the continued Docker deployment work and for addressing the earlier review feedback 🙏. I found two blockers in the Compose deployment path: the default published port exposes the unauthenticated daemon beyond localhost, and the documented host-port override currently breaks same-origin browser API calls. Details are inline.
Generated by Looper 0.6.1 · runner=reviewer · agent=opencode| OD_ALLOWED_ORIGINS: ${OPEN_DESIGN_ALLOWED_ORIGINS:-} | ||
| OD_PORT: 7456 | ||
| ports: | ||
| - "${OPEN_DESIGN_PORT:-7456}:7456" |
There was a problem hiding this comment.
This default port mapping publishes the daemon on every host interface. The new docs tell users to set OPEN_DESIGN_ALLOWED_ORIGINS for public-IP/domain/reverse-proxy deployments, but that is only a browser Origin filter: the daemon still allows requests with no Origin header through the /api middleware, so any curl/server-side client that can reach this port can call the unauthenticated API. That makes the default Compose deployment unsafe on machines with a public or shared LAN interface. Please bind localhost by default, e.g. 127.0.0.1:${OPEN_DESIGN_PORT:-7456}:7456, and document putting an authenticated reverse proxy/VPN in front before exposing it remotely (or add real daemon auth for deployed mode).
| NODE_OPTIONS: ${NODE_OPTIONS:---max-old-space-size=192} | ||
| OD_BIND_HOST: 0.0.0.0 | ||
| OD_ALLOWED_ORIGINS: ${OPEN_DESIGN_ALLOWED_ORIGINS:-} | ||
| OD_PORT: 7456 |
There was a problem hiding this comment.
OPEN_DESIGN_PORT can publish the service on a different external port, but the container only advertises OD_PORT=7456 to the daemon. The origin checks only accept OD_PORT (plus OD_WEB_PORT), so the documented OPEN_DESIGN_PORT=8080 override makes browser requests arrive with Host/Origin on :8080 and get rejected as cross-origin even though the page is same-origin from the browser's perspective. Please pass the published port into the allowed-port set, for example by adding OD_WEB_PORT: ${OPEN_DESIGN_PORT:-7456} in this Compose environment (or by otherwise deriving the external origin/port for the daemon).
Bind the Compose deployment to localhost by default and pass the published port through to the daemon origin checks so host-port overrides remain same-origin.
|
Addressed the latest Compose deployment blockers and pushed the branch. Changes:
Commits:
Validation:
|
mrcfps
left a comment
There was a problem hiding this comment.
@wangwanjie I reviewed the current Docker Compose deployment changes, including the localhost-only port binding, OD_WEB_PORT propagation for custom published ports, shared daemon origin-validation helpers and tests, the Docker runtime image layout, and the publish/verify scripts. The previously blocking Compose exposure and host-port origin issues are addressed, and I did not find any remaining actionable correctness, safety, or maintainability concerns in the changed ranges. Thanks for the careful follow-through and validation work 🙏
|
Follow-up after merging latest
Validation:
Pushed merge commit |
mrcfps
left a comment
There was a problem hiding this comment.
@wangwanjie I reviewed the current Docker Compose deployment head, including the localhost-only Compose binding, OD_WEB_PORT custom-port propagation, shared daemon origin-validation helper and coverage, the Node 24 Docker runtime layout, and the publish/verify scripts. The previously raised Compose exposure and custom-port origin issues remain addressed after the latest main merge, and I did not find any remaining actionable correctness, safety, or maintainability concerns in the changed ranges. Thanks for the careful follow-through and validation work 🙏
Summary
/app/.odstorage, health checks, memory limits, and.env.exampledefaults.Test Plan
bash -n deploy/scripts/publish-images.sh && bash -n deploy/scripts/verify-image.sh && bash -n deploy/scripts/verify-image-manifest.shdocker compose --env-file deploy/.env.example -f deploy/docker-compose.yml configdeploy/scripts/publish-images.sh --dry_run --arch arm64 --image_tag pr-checkgit diff --checkdocker buildx build --platform linux/arm64 -t open-design:pr-check -f deploy/Dockerfile --load .deploy/scripts/verify-image.sh open-design:pr-checkPublished Image
docker.io/vanjayak/open-design:latestlinux/amd64andlinux/arm64.