Skip to content

Codex/topology separation research#1655

Open
haowu1234 wants to merge 12 commits intovllm-project:mainfrom
haowu1234:codex/topology-separation-research
Open

Codex/topology separation research#1655
haowu1234 wants to merge 12 commits intovllm-project:mainfrom
haowu1234:codex/topology-separation-research

Conversation

@haowu1234
Copy link
Collaborator

@haowu1234 haowu1234 commented Mar 25, 2026

Summary

This PR adds an opt-in split local runtime topology for vllm-sr serve while preserving the existing local default path.

Instead of replacing the current monolithic vllm-sr-container flow, the split topology is now gated behind explicit local-dev controls:

  • vllm-sr serve --topology split
  • VLLM_SR_TOPOLOGY=split

This keeps the blast radius small for existing local, Helm, Kubernetes, and operator workflows while allowing us to iterate on the split topology safely.

What changed

1. Add opt-in split local topology

  • Keep legacy single-container local runtime as the default
  • Add explicit split topology selection in the CLI
  • Route local startup through either:
    • legacy vllm-sr-container
    • split router/envoy/dashboard containers

2. Add role-specific local runtime images

  • Introduce dedicated local runtime images for:
    • router
    • envoy
    • dashboard
  • Keep compatibility image support for legacy local runtime
  • Wire image selection through the local CLI and Make-based workflows

3. Preserve setup/config/deploy behavior under split topology

  • Support dashboard-first setup mode with router/envoy on standby
  • Switch split setup activation to container lifecycle operations
  • Preserve config apply / runtime config sync semantics
  • Preserve router hot reload behavior after config updates

4. Fix split-topology local runtime regressions discovered during rollout

  • restore dashboard auth sqlite bootstrap
  • mount dashboard docker socket correctly before image args
  • include huggingface-cli in router runtime images
  • build dashboard WASM assets in both dev and runtime image paths
  • harden Envoy readiness/status detection in split topology

5. Make split local dev faster and safer

  • VLLM_SR_TOPOLOGY=split now defaults local dev to skipping compatibility-image rebuilds
  • explicit override remains available with SKIP_COMPAT_IMAGE=0

Why

The goal is to make local topology separation production-like enough for deep development and debugging without breaking the current developer path.

This PR intentionally optimizes for:

  • minimal disruption to existing users
  • explicit rollout control
  • easier regression isolation
  • safer future extension into other deployment surfaces

Validation

Validated through the repo-native harness and local smoke paths, including:

  • make agent-validate
  • make vllm-sr-test
  • focused pytest coverage for CLI topology/image selection
  • local smoke on legacy topology
  • local smoke on explicit split topology
  • dashboard / setup / runtime handler tests for split container lifecycle

Usage

Legacy local runtime remains unchanged:

make vllm-sr-dev
vllm-sr serve --image-pull-policy never

AMD legacy local runtime:

make vllm-sr-dev VLLM_SR_PLATFORM=amd
vllm-sr serve --image-pull-policy never --platform amd

Opt into split local topology explicitly:

make vllm-sr-dev VLLM_SR_TOPOLOGY=split
vllm-sr serve --image-pull-policy never --topology split

Opt into split local topology on AMD:

make vllm-sr-dev VLLM_SR_PLATFORM=amd VLLM_SR_TOPOLOGY=split
vllm-sr serve --image-pull-policy never --platform amd --topology split

If a split-topology debug loop still needs the compatibility image rebuilt:

make vllm-sr-dev VLLM_SR_TOPOLOGY=split SKIP_COMPAT_IMAGE=0
make vllm-sr-dev VLLM_SR_PLATFORM=amd VLLM_SR_TOPOLOGY=split SKIP_COMPAT_IMAGE=0

Useful local runtime commands:

vllm-sr status all
vllm-sr logs router -f
vllm-sr logs envoy -f
vllm-sr logs dashboard -f
vllm-sr stop

Notes

This PR is intentionally scoped to local runtime behavior.

It does not migrate Helm / Kubernetes / operator deployment topologies to the split model. The change is designed so those surfaces are not directly restructured by this rollout.

Related to #1508

@netlify
Copy link

netlify bot commented Mar 25, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 50df862
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69c4cdacd0130b0008cedfe5
😎 Deploy Preview https://deploy-preview-1655--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/docker-publish.yml
  • .github/workflows/integration-test-memory.yml
  • .github/workflows/integration-test-vllm-sr-cli.yml
  • docs/agent/environments.md
  • docs/agent/plans/README.md
  • docs/agent/plans/pl-0015-local-runtime-topology-separation.md
  • docs/agent/plans/pl-0016-local-runtime-three-image-rollout.md

📁 dashboard

Owners: @JaredforReal, @Xunzhuo
Files changed:

  • dashboard/backend/Dockerfile
  • dashboard/backend/handlers/logs.go
  • dashboard/backend/handlers/logs_test.go
  • dashboard/backend/handlers/openclaw.go
  • dashboard/backend/handlers/openclaw_helpers.go
  • dashboard/backend/handlers/openclaw_image_test.go
  • dashboard/backend/handlers/openclaw_provision.go
  • dashboard/backend/handlers/openclaw_test.go
  • dashboard/backend/handlers/runtime_config_apply.go
  • dashboard/backend/handlers/runtime_config_apply_test.go
  • dashboard/backend/handlers/runtime_config_sync.go
  • dashboard/backend/handlers/runtime_config_sync_test.go
  • dashboard/backend/handlers/runtime_managed_container_lifecycle.go
  • dashboard/backend/handlers/runtime_managed_container_lifecycle_test.go
  • dashboard/backend/handlers/runtime_managed_services.go
  • dashboard/backend/handlers/runtime_managed_services_test.go
  • dashboard/backend/handlers/setup.go
  • dashboard/backend/handlers/setup_test.go
  • dashboard/backend/handlers/status_collectors.go
  • dashboard/backend/handlers/status_collectors_test.go
  • dashboard/backend/handlers/status_runtime.go

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/config/config.agent-smoke.amd.yaml
  • e2e/config/config.agent-smoke.cpu.yaml
  • e2e/testing/run_memory_integration.sh
  • e2e/testing/vllm-sr-cli/README.md
  • e2e/testing/vllm-sr-cli/cli_test_base.py
  • e2e/testing/vllm-sr-cli/run_cli_tests.py
  • e2e/testing/vllm-sr-cli/test_integration.py
  • e2e/testing/vllm-sr-cli/test_unit_runtime_topology.py
  • e2e/testing/vllm-sr-cli/test_unit_serve.py

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/modeldownload/config_parser_test.go
  • src/vllm-sr/Dockerfile
  • src/vllm-sr/Dockerfile.envoy
  • src/vllm-sr/Dockerfile.rocm
  • src/vllm-sr/Dockerfile.router
  • src/vllm-sr/Dockerfile.router.rocm
  • src/vllm-sr/Makefile
  • src/vllm-sr/cli/commands/runtime.py
  • src/vllm-sr/cli/commands/runtime_support.py
  • src/vllm-sr/cli/config_generator.py
  • src/vllm-sr/cli/consts.py
  • src/vllm-sr/cli/core.py
  • src/vllm-sr/cli/docker_backend.py
  • src/vllm-sr/cli/docker_images.py
  • src/vllm-sr/cli/docker_run_command.py
  • src/vllm-sr/cli/docker_runtime.py
  • src/vllm-sr/cli/docker_services.py
  • src/vllm-sr/cli/docker_start.py
  • src/vllm-sr/cli/runtime_lifecycle.py
  • src/vllm-sr/cli/runtime_stack.py
  • src/vllm-sr/cli/runtime_topology.py
  • src/vllm-sr/cli/templates/envoy.template.yaml
  • src/vllm-sr/rebuild-and-test.sh
  • src/vllm-sr/start-dashboard.sh
  • src/vllm-sr/start-envoy.sh
  • src/vllm-sr/start-router.sh
  • src/vllm-sr/tests/test_cli_main.py
  • src/vllm-sr/tests/test_config_generator.py
  • src/vllm-sr/tests/test_core_status.py
  • src/vllm-sr/tests/test_deployment_backend.py
  • src/vllm-sr/tests/test_docker_images.py
  • src/vllm-sr/tests/test_docker_runtime.py
  • src/vllm-sr/tests/test_local_dev_make_surface.py
  • src/vllm-sr/tests/test_makefile_surface.py
  • src/vllm-sr/tests/test_openclaw_shared_network.py
  • src/vllm-sr/tests/test_router_dockerfile_surface.py
  • src/vllm-sr/tests/test_runtime_topology.py
  • src/vllm-sr/tests/test_split_runtime_stack.py

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/agent/repo-manifest.yaml
  • tools/make/agent.mk
  • tools/make/dashboard.mk
  • tools/make/docker.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@haowu1234 haowu1234 requested review from Xunzhuo and removed request for Xunzhuo March 25, 2026 13:06
@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

✅ Supply Chain Security Report — All Clear

Scanner Status Findings
AST Codebase Scan (Py, Go, JS/TS, Rust) 28 finding(s) — MEDIUM: 21 · LOW: 7
AST PR Diff Scan 1 finding(s) — LOW: 1
Regex Fallback Scan No issues detected

Findings in this PR's diff

1 finding(s) — click to expand
Severity File Line Description
🔵 LOW dashboard/backend/handlers/openclaw_test.go 529 Function 'TestResolveOpenClawModelBaseURL_TargetEnvoyWinsOverRouterConfig' has high source entropy (5.51 bits/byte)

Scanned at 2026-03-26T04:34:30.537Z · View full workflow logs

@haowu1234 haowu1234 force-pushed the codex/topology-separation-research branch from 9b36500 to 19773e6 Compare March 25, 2026 13:32
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: haowu1234 <13258260125@163.com>
strategy:
matrix:
image: [dashboard, extproc, extproc-rocm, llm-katan, vllm-sr, vllm-sr-rocm, vllm-sr-sim]
image: [dashboard, extproc, extproc-rocm, llm-katan, vllm-sr, vllm-sr-envoy, vllm-sr-rocm, vllm-sr-router, vllm-sr-router-rocm, vllm-sr-sim]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldnt increase the images i think, we already have vllm-sr(now: router only), and dashboard image, and envoy has its own images and we shouldnt maintain

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldnt increase the images i think, we already have vllm-sr(now: router only), and dashboard image, and envoy has its own images and we shouldnt maintain

ok

@haowu1234 haowu1234 force-pushed the codex/topology-separation-research branch from 1e27e46 to b874bc4 Compare March 26, 2026 02:38
qingyangwu and others added 4 commits March 26, 2026 11:21
Signed-off-by: qingyangwu <qingyangwu@tencent.com>
Signed-off-by: haowu1234 <13258260125@163.com>
Signed-off-by: qingyangwu <qingyangwu@tencent.com>
Signed-off-by: qingyangwu <qingyangwu@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants