Skip to content

Commit ab2aa16

Browse files
authored
[Doc] Refresh homepage architecture and research content (#1659)
* docs: update amd blog details Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai> * docs: refresh homepage architecture and research content Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai> * docs: localize research and paper page chrome Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai> * docs: update amd blog details Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai> * docs: update amd blog details Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai> --------- Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
1 parent 96600b3 commit ab2aa16

File tree

23 files changed

+787
-299
lines changed

23 files changed

+787
-299
lines changed

website/blog/2026-03-25-vllm-sr-on-amd-developer-cloud.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ tags: [amd, rocm, deployment, hardware, vllm, semantic-router]
1111

1212
</div>
1313

14-
Running vLLM Semantic Router on AMD Developer Cloud is not just about bringing up one more inference endpoint. It is about turning it into a routed multi-tier system that can classify requests, choose a semantic lane, and make replay and Insights immediately useful.
14+
Running [vLLM Semantic Router](https://vllm-semantic-router.com) on AMD Developer Cloud is not just about bringing up one more inference endpoint. It is about turning it into a routed multi-tier system that can classify requests, choose a semantic lane, and make replay and Insights immediately useful.
1515

1616
This post walks through the practical path: start the ROCm backend on an AMD Developer Cloud instance, install vLLM-SR, import the reference profile, and validate the deployment end to end.
1717

@@ -73,17 +73,17 @@ This architecture opens up a particularly interesting opportunity for AMD, becau
7373

7474
The most immediate opportunity is intelligent routing. A single ROCm backend on AMD Developer Cloud can serve as the physical execution layer for multiple logical lanes. That means teams can prototype a Mixture-of-Models experience, cost-aware routing, replay-driven debugging, and tiered product behavior without first standing up a large multi-backend fleet.
7575

76-
In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes all resolve onto one self-hosted Qwen backend. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run.
76+
In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes all resolve onto different models. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run.
7777

7878
### 2. Privacy Routing and Local-First Governance
7979

80-
The second opportunity is privacy routing. This repository already includes a maintained privacy recipe that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default.
80+
The second opportunity is privacy routing, that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default.
8181

8282
For enterprises, that means AMD-backed deployments can become the trusted default lane for internal copilots, regulated workloads, or hybrid private AI systems. For developers, it means privacy is not just a hosting choice; it becomes a routing policy.
8383

8484
### 3. Personal AI and Local Personal Agents
8585

86-
The third opportunity is personal AI. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted.
86+
The third opportunity is personal AI like deploying a personal model on AMD AI MAX+ and connecting to external Models as needed. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted.
8787

8888
That makes AMD interesting not only for enterprise infrastructure, but also for self-hosted assistants, home-lab AI, and local-first personal workflows. The important point is that Semantic Router lets the system distinguish between “keep this local,” “this is cheap and routine,” and “this needs deeper reasoning,” instead of treating all personal AI traffic as one undifferentiated workload.
8989

website/docs/intro.md

Lines changed: 19 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -26,36 +26,33 @@ We use the project to answer a small set of hard systems questions:
2626

2727
## Core System
2828

29-
### Signal-Driven Decision Engine
29+
### Signal and Projection Routing
3030

31-
Captures and combines **9 types of request signals** to make intelligent routing decisions:
31+
Captures **14 maintained signal families** and coordinates them with reusable
32+
projections before route selection:
3233

33-
| Signal Type | Description | Use Case |
34-
|------------|-------------|----------|
35-
| **keyword** | Pattern matching with AND/OR operators | Fast rule-based routing for specific terms |
36-
| **embedding** | Semantic similarity using embeddings | Intent detection and semantic understanding |
37-
| **domain** | MMLU domain classification (14 categories) | Academic and professional domain routing |
38-
| **fact_check** | ML-based fact-checking requirement detection | Identify queries needing fact verification |
39-
| **user_feedback** | User satisfaction and feedback classification | Handle follow-up messages and corrections |
40-
| **preference** | LLM-based route preference matching | Complex intent analysis via external LLM |
41-
| **language** | Multi-language detection (100+ languages) | Route queries to language-specific models |
42-
| **context** | Token-count based context classification | Route short/long context requests to suitable models |
43-
| **complexity** | Query difficulty classification (easy/medium/hard) | Match model capability to task difficulty |
34+
| Layer | Components | Role |
35+
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- |
36+
| **Signals** | `authz`, `context`, `keyword`, `language`, `structure`, `complexity`, `domain`, `embedding`, `modality`, `fact-check`, `jailbreak`, `pii`, `preference`, `user-feedback` | Extract reusable request, safety, and preference facts |
37+
| **Projections** | `partitions`, `scores`, `mappings` | Coordinate competing matches and emit named routing bands |
38+
| **Decisions** | AND/OR policy rules over signals and projections | Select the active route and model candidates |
4439

45-
**How it works**: Signals are extracted from requests, combined using AND/OR operators in decision rules, and used to select the best model and configuration.
40+
**How it works**: Signals are extracted from requests, projections coordinate
41+
matched evidence, decision rules evaluate the resulting facts, and the chosen
42+
route drives plugins plus model dispatch.
4643

4744
### Plugin Chain Architecture
4845

4946
Extensible plugin system for request/response processing:
5047

51-
| Plugin Type | Description | Use Case |
52-
|------------|-------------|----------|
53-
| **semantic-cache** | Semantic similarity-based caching | Reduce latency and costs for similar queries |
54-
| **jailbreak** | Adversarial prompt detection | Block prompt injection and jailbreak attempts |
55-
| **pii** | Personally identifiable information detection | Protect sensitive data and ensure compliance |
56-
| **system_prompt** | Dynamic system prompt injection | Add context-aware instructions per route |
57-
| **header_mutation** | HTTP header manipulation | Control routing and backend behavior |
58-
| **hallucination** | Token-level hallucination detection | Real-time fact verification during generation |
48+
| Plugin Type | Description | Use Case |
49+
| ------------------- | --------------------------------------------- | --------------------------------------------- |
50+
| **semantic-cache** | Semantic similarity-based caching | Reduce latency and costs for similar queries |
51+
| **jailbreak** | Adversarial prompt detection | Block prompt injection and jailbreak attempts |
52+
| **pii** | Personally identifiable information detection | Protect sensitive data and ensure compliance |
53+
| **system_prompt** | Dynamic system prompt injection | Add context-aware instructions per route |
54+
| **header_mutation** | HTTP header manipulation | Control routing and backend behavior |
55+
| **hallucination** | Token-level hallucination detection | Real-time fact verification during generation |
5956

6057
**How it works**: Plugins form a processing chain, each plugin can inspect/modify requests and responses, with configurable enable/disable per decision.
6158

website/docs/overview/collective-intelligence.md

Lines changed: 68 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,9 @@ User Query → Single LLM → Response
2626
### Collective Intelligence Approach: System of Models
2727

2828
```
29-
User Query → Signal Extraction → Decision Engine → Best Model → Response
30-
↓ ↓ ↓
31-
8 Signal Types AND/OR Rules Specialized Models
32-
↓ ↓ ↓
33-
Context Analysis Smart Selection Plugin Chain
29+
User Query → Signal Extraction → Projection Coordination → Decision Engine → Plugins + Model Dispatch → Response
30+
↓ ↓ ↓ ↓
31+
14 Signal Families Partitions / Scores / Mappings Boolean Policies Specialized Models
3432
```
3533

3634
**Benefits**:
@@ -46,19 +44,48 @@ User Query → Signal Extraction → Decision Engine → Best Model → Response
4644

4745
Different signals capture different aspects of intelligence:
4846

49-
| Signal Type | Intelligence Aspect |
50-
|------------|-------------------|
51-
| **keyword** | Pattern recognition |
52-
| **embedding** | Semantic understanding |
53-
| **domain** | Knowledge classification |
54-
| **fact_check** | Truth verification needs |
55-
| **user_feedback** | User satisfaction |
56-
| **preference** | Intent matching |
57-
| **language** | Multi-language detection |
47+
| Signal family group | Intelligence aspect |
48+
| ------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------- |
49+
| **Heuristic** (`authz`, `context`, `keyword`, `language`, `structure`) | Fast request-shape, locale, and policy gating |
50+
| **Learned** (`complexity`, `domain`, `embedding`, `modality`, `fact-check`, `jailbreak`, `pii`, `preference`, `user-feedback`) | Semantic, safety, modality, and preference understanding |
5851

5952
**Collective benefit**: The combination of signals provides a richer understanding than any single signal.
6053

61-
### 2. Decision Fusion
54+
### 2. Projection Coordination
55+
56+
Signals become more useful when the router coordinates them into reusable
57+
intermediate facts:
58+
59+
```yaml
60+
projections:
61+
partitions:
62+
- name: balance_domain_partition
63+
semantics: exclusive
64+
members: [mathematics, coding, creative]
65+
default: creative
66+
scores:
67+
- name: reasoning_pressure
68+
method: weighted_sum
69+
inputs:
70+
- type: complexity
71+
name: hard
72+
weight: 0.6
73+
- type: embedding
74+
name: math_intent
75+
weight: 0.4
76+
mappings:
77+
- name: reasoning_band
78+
source: reasoning_pressure
79+
method: threshold_bands
80+
outputs:
81+
- name: balance_reasoning
82+
gte: 0.5
83+
```
84+
85+
**Collective benefit**: Projections turn many weak or competing signals into
86+
named routing facts that multiple decisions can reuse.
87+
88+
### 3. Decision Fusion
6289
6390
Signals are combined using logical operators:
6491
@@ -69,23 +96,21 @@ decisions:
6996
rules:
7097
operator: "AND"
7198
conditions:
72-
- type: "keyword"
73-
name: "math_keywords"
7499
- type: "domain"
75100
name: "mathematics"
76-
- type: "embedding"
77-
name: "math_intent"
101+
- type: "projection"
102+
name: "balance_reasoning"
78103
```
79104
80105
**Collective benefit**: Multiple signals voting together make more accurate decisions than any single signal.
81106
82-
### 3. Model Specialization
107+
### 4. Model Specialization
83108
84109
Different models contribute their strengths:
85110
86111
```yaml
87112
modelRefs:
88-
- model: qwen-math # Best at mathematical reasoning
113+
- model: qwen-math # Best at mathematical reasoning
89114
weight: 1.0
90115
- model: deepseek-coder # Best at code generation
91116
weight: 1.0
@@ -95,7 +120,7 @@ modelRefs:
95120
96121
**Collective benefit**: System-level intelligence emerges from routing to the right specialist.
97122
98-
### 4. Plugin Collaboration
123+
### 5. Plugin Collaboration
99124
100125
Plugins work together to enhance responses:
101126
@@ -104,11 +129,11 @@ routing:
104129
decisions:
105130
- name: "protected-route"
106131
plugins:
107-
- type: "semantic-cache" # Speed optimization
108-
- type: "jailbreak" # Security layer
109-
- type: "pii" # Privacy protection
110-
- type: "system_prompt" # Context injection
111-
- type: "hallucination" # Quality assurance
132+
- type: "semantic-cache" # Speed optimization
133+
- type: "jailbreak" # Security layer
134+
- type: "pii" # Privacy protection
135+
- type: "system_prompt" # Context injection
136+
- type: "hallucination" # Quality assurance
112137
```
113138
114139
**Collective benefit**: Multiple layers of processing create a more robust and secure system.
@@ -127,17 +152,25 @@ Let's see collective intelligence in action:
127152

128153
```yaml
129154
signals_detected:
130-
keyword: ["prove", "square root", "irrational"] # Math keywords detected
131-
embedding: 0.89 # High similarity to math queries
132-
domain: "mathematics" # MMLU classification
133-
fact_check: true # Proof requires verification
155+
keyword: ["prove", "square root", "irrational"] # Math keywords detected
156+
embedding: 0.89 # High similarity to math queries
157+
domain: "mathematics" # MMLU classification
158+
fact_check: true # Proof requires verification
159+
```
160+
161+
### Projection Coordination
162+
163+
```yaml
164+
projection_outputs:
165+
balance_domain_partition: "mathematics"
166+
balance_reasoning: true
134167
```
135168
136169
### Decision Process
137170
138171
```yaml
139172
decision_made: "advanced_math"
140-
reason: "All math signals agree (keyword + embedding + domain)"
173+
reason: "Math domain plus projection-driven reasoning pressure"
141174
confidence: 0.95
142175
```
143176
@@ -165,7 +198,9 @@ plugins_applied:
165198
- **Safe**: Verified no jailbreak attempts
166199
- **High-quality**: Hallucination detection enabled
167200
168-
**This is collective intelligence**: No single component made the decision. The intelligence emerged from the collaboration of signals, rules, models, and plugins.
201+
**This is collective intelligence**: No single component made the decision.
202+
The intelligence emerged from the collaboration of signals, projections, rules,
203+
models, and plugins.
169204
170205
## Benefits of Collective Intelligence
171206

website/docs/overview/goals.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ In traditional LLM routing, we only look at the user's query text. But there's s
1818
- **Quality signals**: Does this query need fact-checking? Is the user giving feedback?
1919
- **User signals**: What are the user's preferences? What's their satisfaction level?
2020

21-
**Our solution**: A comprehensive signal extraction system that captures 9 types of request signals from requests, responses, and context.
21+
**Our solution**: A comprehensive signal extraction system that captures 14
22+
maintained signal families from requests, responses, users, and runtime
23+
context.
2224

2325
### 2. How to combine the signals?
2426

@@ -27,7 +29,8 @@ Having multiple signals is great, but how do we use them together to make better
2729
- Should we route to the math model if we detect **both** math keywords **and** math domain?
2830
- Should we enable fact-checking if we detect **either** a factual question **or** a sensitive domain?
2931

30-
**Our solution**: A flexible decision engine with AND/OR operators that lets you combine signals in powerful ways.
32+
**Our solution**: A reusable signal catalog plus projection coordination and
33+
AND/OR decision logic that lets you combine signals without duplicating policy.
3134

3235
### 3. How to collaborate more efficiently?
3336

@@ -58,7 +61,8 @@ The system should learn and improve over time:
5861
- Collect user feedback to improve signal detection
5962
- Build a self-learning system that gets smarter with use
6063

61-
**Our solution**: Comprehensive observability and feedback collection that feeds back into the signal extraction and decision engine.
64+
**Our solution**: Comprehensive observability and feedback collection that
65+
feeds back into signal extraction, projection tuning, and decision policy.
6266

6367
## The Vision
6468

@@ -68,7 +72,8 @@ We envision a future where:
6872
- **Multiple models collaborate seamlessly**, each contributing their strengths
6973
- **Security is built-in**, not bolted on
7074
- **Systems learn and improve** from every interaction
71-
- **Collective intelligence emerges** from the combination of signals, decisions, and feedback
75+
- **Collective intelligence emerges** from the combination of signals,
76+
projections, decisions, and feedback
7277

7378
## Why This Matters
7479

0 commit comments

Comments
 (0)