[Doc] Refresh homepage architecture and research content#1659
[Doc] Refresh homepage architecture and research content#1659Xunzhuo merged 5 commits intovllm-project:mainfrom
Conversation
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
✅ Supply Chain Security Report — All Clear
Scanned at |
There was a problem hiding this comment.
Pull request overview
Refreshes the website homepage + docs architecture narrative to reflect the updated “signal → projection → decision → plugin” mental model, and updates the publications dataset used across the site.
Changes:
- Update homepage stats/capability copy and add a new “projection” capability layer (including a new glyph).
- Expand PaperFigureShowcase and Chinese i18n strings to describe the four-layer architecture and 14-signal taxonomy.
- Add a new research paper entry and update overview/tutorial docs to reference the 14 maintained signal families.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| website/src/pages/index.tsx | Updates homepage stats (signals/papers) and capability copy to match the refreshed architecture. |
| website/src/data/researchContent.js | Adds a new research paper entry used by homepage/publications components. |
| website/src/components/site/CapabilityGlyph.tsx | Adds a new projection glyph kind and renderer. |
| website/src/components/PaperFigureShowcase/index.tsx | Updates figure copy and the interactive architecture panel to a 4-layer flow and 14-signal taxonomy. |
| website/src/components/PaperFigureShowcase/index.module.css | Adjusts the figure layer grid to support 4 layers (extra connector/column). |
| website/i18n/zh-Hans/code.json | Updates zh-Hans translations for the revised figure copy and adds new figure keys. |
| website/docs/tutorials/signal/overview.md | Updates signal catalog summary and table formatting for 14 families (5 heuristic, 9 learned). |
| website/docs/overview/semantic-router-overview.md | Updates the overview architecture diagram/sections to include projection coordination and plugin dispatch. |
| website/docs/overview/goals.md | Refreshes goals copy to reference 14 families and projection coordination. |
| website/docs/overview/collective-intelligence.md | Updates diagrams and examples to include projection coordination and updated signal counts. |
| website/docs/intro.md | Refreshes the “Core System” framing to include projections and updated signal families list. |
| website/blog/2026-03-25-vllm-sr-on-amd-developer-cloud.md | Updates blog copy to match the refreshed routing architecture language and linking. |
| ### 3. Personal AI and Local Personal Agents | ||
|
|
||
| The third opportunity is personal AI. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted. | ||
| The third opportunity is personal AI like deploying a personal model on AMD AI MAX+ and connecting to external Models as needed. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted. |
There was a problem hiding this comment.
Minor style/grammar: “connecting to external Models” capitalizes “Models” mid-sentence and the overall sentence reads a bit run-on. Consider lowercasing “models” and splitting into two sentences for clarity.
| The third opportunity is personal AI like deploying a personal model on AMD AI MAX+ and connecting to external Models as needed. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted. | |
| The third opportunity is personal AI. You might deploy a personal model on AMD AI MAX+ and connect to external models as needed. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted. |
| The most immediate opportunity is intelligent routing. A single ROCm backend on AMD Developer Cloud can serve as the physical execution layer for multiple logical lanes. That means teams can prototype a Mixture-of-Models experience, cost-aware routing, replay-driven debugging, and tiered product behavior without first standing up a large multi-backend fleet. | ||
|
|
||
| In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes all resolve onto one self-hosted Qwen backend. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run. | ||
| In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes all resolve onto different models. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run. |
There was a problem hiding this comment.
The post says the SIMPLE/MEDIUM/COMPLEX/REASONING/PREMIUM lanes “resolve onto different models”, but the referenced deploy/recipes/balance.yaml maps each of those providers.models[*] entries to the same backend_refs endpoint (vllm_primary at vllm:8000). If the intent is the single-backend alias demo, please update this sentence to say the lanes resolve onto one backend via multiple served model names/aliases; if the intent is truly different models, the earlier “single ROCm backend” section and recipe reference should be updated to match.
| In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes all resolve onto different models. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run. | |
| In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes each resolve onto distinct served model names/aliases that all point to the same ROCm-backed vLLM endpoint. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run. |
| ### 2. Privacy Routing and Local-First Governance | ||
|
|
||
| The second opportunity is privacy routing. This repository already includes a maintained privacy recipe that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default. | ||
| The second opportunity is privacy routing, that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default. |
There was a problem hiding this comment.
Grammar/readability: “privacy routing, that keeps …” reads like an incorrect relative clause. Consider removing the comma ("privacy routing that keeps …") or rewriting the sentence to avoid the comma splice.
| The second opportunity is privacy routing, that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default. | |
| The second opportunity is privacy routing that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default. |
| ```yaml | ||
| # Traditional: Simple keyword matching | ||
| if "math" in query: | ||
| route_to_math_model() | ||
| if "math" in query: route_to_math_model() | ||
| ``` | ||
| Signal-driven routing uses multiple signals: | ||
| ```yaml | ||
| # Signal-driven: Multiple signals combined | ||
| if (has_math_keywords AND is_math_domain) OR has_high_math_embedding: | ||
| route_to_math_model() | ||
| if (has_math_keywords AND is_math_domain) OR has_high_math_embedding: route_to_math_model() | ||
| ``` |
There was a problem hiding this comment.
This section uses fenced code blocks labeled as yaml, but the contents are pseudo-Python (if ...: route_to_math_model()) and not valid YAML. Please change the fence language to something appropriate (e.g., python or text) or rewrite the examples into valid YAML so syntax highlighting and copy/paste behavior aren’t misleading.
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>

Summary
Validation
cd website && npm run lint;make markdown-lintmake agent-reportclassified this change asdocumentation-onlyand reported validation commands:none)Checklist
[Doc]git commit -s