vllm-project
diff --git a/‎website/blog/2026-03-25-vllm-sr-on-amd-developer-cloud.md‎
Lines changed: 4 additions & 4 deletions b/‎website/blog/2026-03-25-vllm-sr-on-amd-developer-cloud.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎website/docs/intro.md‎
Lines changed: 19 additions & 22 deletions b/‎website/docs/intro.md‎
Lines changed: 19 additions & 22 deletions
diff --git a/‎website/docs/overview/collective-intelligence.md‎
Lines changed: 68 additions & 33 deletions b/‎website/docs/overview/collective-intelligence.md‎
Lines changed: 68 additions & 33 deletions
diff --git a/‎website/docs/overview/goals.md‎
Lines changed: 9 additions & 4 deletions b/‎website/docs/overview/goals.md‎
Lines changed: 9 additions & 4 deletions
@@ -11,7 +11,7 @@ tags: [amd, rocm, deployment, hardware, vllm, semantic-router]
 
 </div>
 
-Running vLLM Semantic Router on AMD Developer Cloud is not just about bringing up one more inference endpoint. It is about turning it into a routed multi-tier system that can classify requests, choose a semantic lane, and make replay and Insights immediately useful.
+Running [vLLM Semantic Router](https://vllm-semantic-router.com) on AMD Developer Cloud is not just about bringing up one more inference endpoint. It is about turning it into a routed multi-tier system that can classify requests, choose a semantic lane, and make replay and Insights immediately useful.
 
 This post walks through the practical path: start the ROCm backend on an AMD Developer Cloud instance, install vLLM-SR, import the reference profile, and validate the deployment end to end.
 
@@ -73,17 +73,17 @@ This architecture opens up a particularly interesting opportunity for AMD, becau
 
 The most immediate opportunity is intelligent routing. A single ROCm backend on AMD Developer Cloud can serve as the physical execution layer for multiple logical lanes. That means teams can prototype a Mixture-of-Models experience, cost-aware routing, replay-driven debugging, and tiered product behavior without first standing up a large multi-backend fleet.
 
-In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes all resolve onto one self-hosted Qwen backend. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run.
+In the AMD reference profile, the cheapest, medium, complex, reasoning, and premium lanes all resolve onto different models. The router still gives you differentiated behavior because the policy lives in signals, projections, and decisions, not only in the number of containers you run.
 
 ### 2. Privacy Routing and Local-First Governance
 
-The second opportunity is privacy routing. This repository already includes a maintained privacy recipe that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default.
+The second opportunity is privacy routing, that keeps PII, private code, internal documents, and suspicious prompts on a local lane while only escalating clearly non-sensitive reasoning work when policy allows it. That pattern is especially meaningful on AMD because it supports a local-first deployment story: keep sensitive traffic on infrastructure you control, audit every decision, and make cloud escalation a governed exception instead of the default.
 
 For enterprises, that means AMD-backed deployments can become the trusted default lane for internal copilots, regulated workloads, or hybrid private AI systems. For developers, it means privacy is not just a hosting choice; it becomes a routing policy.
 
 ### 3. Personal AI and Local Personal Agents
 
-The third opportunity is personal AI. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted.
+The third opportunity is personal AI like deploying a personal model on AMD AI MAX+ and connecting to external Models as needed. Once routing, privacy, and reasoning are expressed as policy, an AMD-hosted stack can support assistants that feel more personal and more controlled. A personal AI system can keep ordinary tasks, memory-aware follow-ups, and private context on a local lane, while only escalating special cases when explicitly permitted.
 
 That makes AMD interesting not only for enterprise infrastructure, but also for self-hosted assistants, home-lab AI, and local-first personal workflows. The important point is that Semantic Router lets the system distinguish between “keep this local,” “this is cheap and routine,” and “this needs deeper reasoning,” instead of treating all personal AI traffic as one undifferentiated workload.
 
 
@@ -26,36 +26,33 @@ We use the project to answer a small set of hard systems questions:
 
 ## Core System
 
-### Signal-Driven Decision Engine
+### Signal and Projection Routing
 
-Captures and combines **9 types of request signals** to make intelligent routing decisions:
+Captures **14 maintained signal families** and coordinates them with reusable
+projections before route selection:
 
-| Signal Type | Description | Use Case |
-|------------|-------------|----------|
-| **keyword** | Pattern matching with AND/OR operators | Fast rule-based routing for specific terms |
-| **embedding** | Semantic similarity using embeddings | Intent detection and semantic understanding |
-| **domain** | MMLU domain classification (14 categories) | Academic and professional domain routing |
-| **fact_check** | ML-based fact-checking requirement detection | Identify queries needing fact verification |
-| **user_feedback** | User satisfaction and feedback classification | Handle follow-up messages and corrections |
-| **preference** | LLM-based route preference matching | Complex intent analysis via external LLM |
-| **language** | Multi-language detection (100+ languages) | Route queries to language-specific models |
-| **context** | Token-count based context classification | Route short/long context requests to suitable models |
-| **complexity** | Query difficulty classification (easy/medium/hard) | Match model capability to task difficulty |
+| Layer           | Components                                                                                                                                                               | Role                                                      |
+| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- |
+| **Signals**     | `authz`, `context`, `keyword`, `language`, `structure`, `complexity`, `domain`, `embedding`, `modality`, `fact-check`, `jailbreak`, `pii`, `preference`, `user-feedback` | Extract reusable request, safety, and preference facts    |
+| **Projections** | `partitions`, `scores`, `mappings`                                                                                                                                       | Coordinate competing matches and emit named routing bands |
+| **Decisions**   | AND/OR policy rules over signals and projections                                                                                                                         | Select the active route and model candidates              |
 
-**How it works**: Signals are extracted from requests, combined using AND/OR operators in decision rules, and used to select the best model and configuration.
+**How it works**: Signals are extracted from requests, projections coordinate
+matched evidence, decision rules evaluate the resulting facts, and the chosen
+route drives plugins plus model dispatch.
 
 ### Plugin Chain Architecture
 
 Extensible plugin system for request/response processing:
 
-| Plugin Type | Description | Use Case |
-|------------|-------------|----------|
-| **semantic-cache** | Semantic similarity-based caching | Reduce latency and costs for similar queries |
-| **jailbreak** | Adversarial prompt detection | Block prompt injection and jailbreak attempts |
-| **pii** | Personally identifiable information detection | Protect sensitive data and ensure compliance |
-| **system_prompt** | Dynamic system prompt injection | Add context-aware instructions per route |
-| **header_mutation** | HTTP header manipulation | Control routing and backend behavior |
-| **hallucination** | Token-level hallucination detection | Real-time fact verification during generation |
+| Plugin Type         | Description                                   | Use Case                                      |
+| ------------------- | --------------------------------------------- | --------------------------------------------- |
+| **semantic-cache**  | Semantic similarity-based caching             | Reduce latency and costs for similar queries  |
+| **jailbreak**       | Adversarial prompt detection                  | Block prompt injection and jailbreak attempts |
+| **pii**             | Personally identifiable information detection | Protect sensitive data and ensure compliance  |
+| **system_prompt**   | Dynamic system prompt injection               | Add context-aware instructions per route      |
+| **header_mutation** | HTTP header manipulation                      | Control routing and backend behavior          |
+| **hallucination**   | Token-level hallucination detection           | Real-time fact verification during generation |
 
 **How it works**: Plugins form a processing chain, each plugin can inspect/modify requests and responses, with configurable enable/disable per decision.
 
 
@@ -26,11 +26,9 @@ User Query → Single LLM → Response
 ### Collective Intelligence Approach: System of Models
 
 ```
-User Query → Signal Extraction → Decision Engine → Best Model → Response
-              ↓                    ↓                  ↓
-           8 Signal Types      AND/OR Rules      Specialized Models
-              ↓                    ↓                  ↓
-         Context Analysis    Smart Selection    Plugin Chain
+User Query → Signal Extraction → Projection Coordination → Decision Engine → Plugins + Model Dispatch → Response
+              ↓                    ↓                         ↓                         ↓
+        14 Signal Families   Partitions / Scores / Mappings  Boolean Policies     Specialized Models
 ```
 
 **Benefits**:
@@ -46,19 +44,48 @@ User Query → Signal Extraction → Decision Engine → Best Model → Response
 
 Different signals capture different aspects of intelligence:
 
-| Signal Type | Intelligence Aspect |
-|------------|-------------------|
-| **keyword** | Pattern recognition |
-| **embedding** | Semantic understanding |
-| **domain** | Knowledge classification |
-| **fact_check** | Truth verification needs |
-| **user_feedback** | User satisfaction |
-| **preference** | Intent matching |
-| **language** | Multi-language detection |
+| Signal family group                                                                                                            | Intelligence aspect                                      |
+| ------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------- |
+| **Heuristic** (`authz`, `context`, `keyword`, `language`, `structure`)                                                         | Fast request-shape, locale, and policy gating            |
+| **Learned** (`complexity`, `domain`, `embedding`, `modality`, `fact-check`, `jailbreak`, `pii`, `preference`, `user-feedback`) | Semantic, safety, modality, and preference understanding |
 
 **Collective benefit**: The combination of signals provides a richer understanding than any single signal.
 
-### 2. Decision Fusion
+### 2. Projection Coordination
+
+Signals become more useful when the router coordinates them into reusable
+intermediate facts:
+
+```yaml
+projections:
+  partitions:
+    - name: balance_domain_partition
+      semantics: exclusive
+      members: [mathematics, coding, creative]
+      default: creative
+  scores:
+    - name: reasoning_pressure
+      method: weighted_sum
+      inputs:
+        - type: complexity
+          name: hard
+          weight: 0.6
+        - type: embedding
+          name: math_intent
+          weight: 0.4
+  mappings:
+    - name: reasoning_band
+      source: reasoning_pressure
+      method: threshold_bands
+      outputs:
+        - name: balance_reasoning
+          gte: 0.5
+```
+
+**Collective benefit**: Projections turn many weak or competing signals into
+named routing facts that multiple decisions can reuse.
+
+### 3. Decision Fusion
 
 Signals are combined using logical operators:
 
@@ -69,23 +96,21 @@ decisions:
     rules:
       operator: "AND"
       conditions:
-        - type: "keyword"
-          name: "math_keywords"
         - type: "domain"
           name: "mathematics"
-        - type: "embedding"
-          name: "math_intent"
+        - type: "projection"
+          name: "balance_reasoning"
 ```
 
 **Collective benefit**: Multiple signals voting together make more accurate decisions than any single signal.
 
-### 3. Model Specialization
+### 4. Model Specialization
 
 Different models contribute their strengths:
 
 ```yaml
 modelRefs:
-  - model: qwen-math      # Best at mathematical reasoning
+  - model: qwen-math # Best at mathematical reasoning
     weight: 1.0
   - model: deepseek-coder # Best at code generation
     weight: 1.0
@@ -95,7 +120,7 @@ modelRefs:
 
 **Collective benefit**: System-level intelligence emerges from routing to the right specialist.
 
-### 4. Plugin Collaboration
+### 5. Plugin Collaboration
 
 Plugins work together to enhance responses:
 
@@ -104,11 +129,11 @@ routing:
   decisions:
     - name: "protected-route"
       plugins:
-        - type: "semantic-cache"    # Speed optimization
-        - type: "jailbreak"         # Security layer
-        - type: "pii"               # Privacy protection
-        - type: "system_prompt"     # Context injection
-        - type: "hallucination"     # Quality assurance
+        - type: "semantic-cache" # Speed optimization
+        - type: "jailbreak" # Security layer
+        - type: "pii" # Privacy protection
+        - type: "system_prompt" # Context injection
+        - type: "hallucination" # Quality assurance
 ```
 
 **Collective benefit**: Multiple layers of processing create a more robust and secure system.
@@ -127,17 +152,25 @@ Let's see collective intelligence in action:
 
 ```yaml
 signals_detected:
-  keyword: ["prove", "square root", "irrational"]  # Math keywords detected
-  embedding: 0.89                                   # High similarity to math queries
-  domain: "mathematics"                             # MMLU classification
-  fact_check: true                                  # Proof requires verification
+  keyword: ["prove", "square root", "irrational"] # Math keywords detected
+  embedding: 0.89 # High similarity to math queries
+  domain: "mathematics" # MMLU classification
+  fact_check: true # Proof requires verification
+```
+
+### Projection Coordination
+
+```yaml
+projection_outputs:
+  balance_domain_partition: "mathematics"
+  balance_reasoning: true
 ```
 
 ### Decision Process
 
 ```yaml
 decision_made: "advanced_math"
-reason: "All math signals agree (keyword + embedding + domain)"
+reason: "Math domain plus projection-driven reasoning pressure"
 confidence: 0.95
 ```
 
@@ -165,7 +198,9 @@ plugins_applied:
 - **Safe**: Verified no jailbreak attempts
 - **High-quality**: Hallucination detection enabled
 
-**This is collective intelligence**: No single component made the decision. The intelligence emerged from the collaboration of signals, rules, models, and plugins.
+**This is collective intelligence**: No single component made the decision.
+The intelligence emerged from the collaboration of signals, projections, rules,
+models, and plugins.
 
 ## Benefits of Collective Intelligence
 
 
@@ -18,7 +18,9 @@ In traditional LLM routing, we only look at the user's query text. But there's s
 - **Quality signals**: Does this query need fact-checking? Is the user giving feedback?
 - **User signals**: What are the user's preferences? What's their satisfaction level?
 
-**Our solution**: A comprehensive signal extraction system that captures 9 types of request signals from requests, responses, and context.
+**Our solution**: A comprehensive signal extraction system that captures 14
+maintained signal families from requests, responses, users, and runtime
+context.
 
 ### 2. How to combine the signals?
 
@@ -27,7 +29,8 @@ Having multiple signals is great, but how do we use them together to make better
 - Should we route to the math model if we detect **both** math keywords **and** math domain?
 - Should we enable fact-checking if we detect **either** a factual question **or** a sensitive domain?
 
-**Our solution**: A flexible decision engine with AND/OR operators that lets you combine signals in powerful ways.
+**Our solution**: A reusable signal catalog plus projection coordination and
+AND/OR decision logic that lets you combine signals without duplicating policy.
 
 ### 3. How to collaborate more efficiently?
 
@@ -58,7 +61,8 @@ The system should learn and improve over time:
 - Collect user feedback to improve signal detection
 - Build a self-learning system that gets smarter with use
 
-**Our solution**: Comprehensive observability and feedback collection that feeds back into the signal extraction and decision engine.
+**Our solution**: Comprehensive observability and feedback collection that
+feeds back into signal extraction, projection tuning, and decision policy.
 
 ## The Vision
 
@@ -68,7 +72,8 @@ We envision a future where:
 - **Multiple models collaborate seamlessly**, each contributing their strengths
 - **Security is built-in**, not bolted on
 - **Systems learn and improve** from every interaction
-- **Collective intelligence emerges** from the combination of signals, decisions, and feedback
+- **Collective intelligence emerges** from the combination of signals,
+  projections, decisions, and feedback
 
 ## Why This Matters