Add MongoDB Assistant Tools #472

nlarew · 2025-08-22T19:27:22Z

Proposed changes

(EAI-1266) Knowledge Search on MCP

Adds a new class of Assistant tools for interacting with the MongoDB Assistant (AI Chatbot + Search).

This PR scaffolds the generic AssistantToolBase class as well as two initial tools:

list_knowledge_sources - pulls a list of all available data sources. Useful for knowing what data is available and for filtering the search_knowledge
search_knowledge - runs a natural language search against the MongoDB Assistant knowledge base. Returns content chunks for the most relevant results.

Checklist

I have signed the MongoDB CLA

Copilot

Pull Request Overview

This PR introduces MongoDB Assistant tools for the MCP server, enabling natural language search and knowledge source discovery. It adds integration with the MongoDB Assistant AI chatbot and search service.

Scaffolds a new AssistantToolBase class for MongoDB Assistant tool interactions
Implements two initial tools: list_knowledge_sources and search_knowledge
Adds comprehensive test coverage with mocked API responses

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/common/config.ts	Adds `assistantBaseUrl` configuration option
src/tools/tool.ts	Extends `ToolCategory` type to include "assistant"
src/tools/assistant/assistantTool.ts	Base class for Assistant tools with common functionality
src/tools/assistant/list_knowledge_sources.ts	Tool for listing available knowledge sources
src/tools/assistant/search_knowledge.ts	Tool for searching the knowledge base
src/tools/assistant/tools.ts	Exports array of Assistant tools
src/server.ts	Registers Assistant tools with the server
tests/integration/helpers.ts	Updates response element types to support metadata
tests/integration/tools/assistant/assistantHelpers.ts	Test utilities for Assistant tool testing
tests/integration/tools/assistant/listKnowledgeSources.test.ts	Integration tests for list knowledge sources tool
tests/integration/tools/assistant/searchKnowledge.test.ts	Integration tests for search knowledge tool

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/tools/assistant/searchKnowledge.ts

src/tools/assistant/assistantTool.ts

src/tools/assistant/listKnowledgeSources.ts

src/tools/assistant/list_knowledge_sources.ts

src/tools/assistant/searchKnowledge.ts

src/tools/tool.ts

src/tools/assistant/search_knowledge.ts

…assistant-tool

src/tools/assistant/assistantTool.ts

src/tools/assistant/listKnowledgeSources.ts

…assistant-tool

kmruiz · 2025-10-02T09:00:32Z

src/tools/assistant/assistantTool.ts

+        if (args.method === "POST") {
+            headers.set("Content-Type", "application/json");
+        }
+        return await fetch(endpoint, {


We shouldn't use Node.js fetch, but customFetch configured like in the Atlas Client:

mongodb-mcp-server/src/common/atlas/apiClient.ts

Lines 41 to 47 in 04a5ddc

// createFetch assumes that the first parameter of fetch is always a string

// with the URL. However, fetch can also receive a Request object. While

// the typechecking complains, createFetch does passthrough the parameters

// so it works fine.

private static customFetch: typeof fetch = createFetch({

useEnvironmentVariableProxies: true,

}) as unknown as typeof fetch;

The reason is that the customFetch supports enterprise proxies, which are relevant for customers with specific security requirements.

should we move this into apiClient? we also have telemetry events logic there, it would be good to keep all the api related logic there for now

Ok no problem - I will create a new customFetch instance in the assistant tool base because we don't need all of the extra baggage from the Atlas ApiClient class when working with this non-Atlas API.

kmruiz · 2025-10-02T09:03:40Z

src/tools/assistant/listKnowledgeSources.ts

+        return {
+            content: dataSources.map(({ id, type, versions }) => ({
+                type: "text",
+                text: id,
+                _meta: {
+                    type,
+                    versions,
+                },
+            })),
+        };


We should convert this into a string, similar to what we do in:

mongodb-mcp-server/src/tools/mongodb/read/find.ts

Lines 93 to 103 in 04a5ddc

return {

content: formatUntrustedData(

this.generateMessage({

collection,

queryResultsCount,

documents: cursorResults.documents,

appliedLimits: [limitOnFindCursor.cappedBy, cursorResults.cappedBy].filter((limit) => !!limit),

}),

cursorResults.documents.length > 0 ? EJSON.stringify(cursorResults.documents) : undefined

),

};

And also we should consider the information in the knowledge list untrusted.

Looks like formatUntrustedData maps to a single CallToolResult["content"] object so I will need to convert the array of results I get from each API call into a single formatted string.

Do we want to wrap API error text in formatUntrustedData as well? Not sure if that's already mitigated by the isError: true or not.

i.e. instead of

if (!response.ok) { return { content: [ { type: "text", text: `Failed to list knowledge sources: ${response.statusText}`, }, ], isError: true, }; }

we would have

if (!response.ok) { return { content: formatUntrustedData("Failed to list knowledge sources", response.statusText), isError: true, }; }

I don't believe the LLM will employ any injection protection measures regardless of whether the response is marked as error-ed or not. That being said, do we expect that statusText would contain any instructions - this should be just the textual representation of the status code, shouldn't it? I wouldn't expect there to be a way to inject anything malicious in it.

src/tools/assistant/searchKnowledge.ts

blva · 2025-10-03T13:00:40Z

src/tools/assistant/searchKnowledge.ts

+import { LogId } from "../../common/logger.js";
+
+export const SearchKnowledgeToolArgs = {
+    query: z.string().describe("A natural language query to search for in the knowledge base"),


nit: can we add a bit more detail here? this is what the LLM will use to understand that you want it to make a generic natural question that the user haave.

blva · 2025-10-03T13:02:51Z

src/tools/assistant/assistantTool.ts

+        this.requiredHeaders = new Headers({
+            "x-request-origin": "mongodb-mcp-server",
+            "user-agent": serverVersion ? `mongodb-mcp-server/v${serverVersion}` : "mongodb-mcp-server",
+        });


[q] why does this need to be initialized here at the tool level? can we also move this logic into the api layer?

also, we should probably use the userAgent from the apiClient and make sure we use the same name everywhere

why does this need to be initialized here at the tool level? can we also move this logic into the api layer?

The "API layer" as it exists seems to be tailored to Atlas - this is a totally separate server/API. If we prefer I could create an equivalent assistant/apiClient.ts file but personally that seems unnecessary at this point.

we should probably use the userAgent from the apiClient

Our server firewall allowlists specific user agents and right now that's configured to be mongodb-mcp-server

nit: The apiClient uses User-Agent: AtlasMCP which again seems to refer to Atlas specifically

blva · 2025-10-03T13:08:47Z

tests/unit/assistant/assistantHelpers.ts

+/**
+ * Mocks fetch for assistant API calls
+ */
+interface MockedAssistantAPI {


nit: if we're mocking the api calls i'd rather call these unit tests, could you please move the tests to the unit folder and the helpers?

if you have any staging / cloud-dev environment, we could configure integration tests as well, let me know on slack if you do as I can help setting up a GH variable for the staging environment, otherwise we could keep it with unit tests only which is fine

Gotcha that's totally reasonable. I'll move these mocked tests to tests/unit. It's a bit awkward because I think I still want the integration test flow (i.e. mcpClient.callTool({ ... })) but I'd like to use the mocks to control the output to confirm that everything is formatted/handled appropriately.

For integration tests, we could try hitting https://knowledge-dev.mongodb.com. What GH variables do we need to support this?

I created an env variable called TEST_ASSISTANT_BASE_URL in GH actions, you can make it so your integration tests get configured to that if the env is set.

here's an atlas example. you'd need to add the logic for using it as well

- name: Run tests env: MDB_MCP_API_CLIENT_ID: ${{ secrets.TEST_ATLAS_CLIENT_ID }} MDB_MCP_API_CLIENT_SECRET: ${{ secrets.TEST_ATLAS_CLIENT_SECRET }} MDB_MCP_API_BASE_URL: ${{ vars.TEST_ATLAS_BASE_URL }} run: npm test -- tests/integration/tools/atlas

kmruiz · 2025-10-06T09:07:42Z

The only thing missing from my side in this PR is to add some accuracy tests, to make sure that the models we test with do know when to call this tool. In the tests/accuracy folder there are a few examples that can be used as inspiration.

The idea is to write a few prompts that we expect to trigger our tool and ensure that the tool is called with the specific parameters.

nlarew added 4 commits August 18, 2025 12:20

(EAI-1266) Knowledge Search on MCP

2338504

Add tests for assistant tools

881c4ee

Include server version in assistant user-agent header

3bdd3df

get server version from packageInfo

59b3c63

Copilot AI review requested due to automatic review settings August 22, 2025 19:27

nlarew requested a review from a team as a code owner August 22, 2025 19:27

Copilot AI reviewed Aug 22, 2025

View reviewed changes

src/tools/assistant/searchKnowledge.ts Show resolved Hide resolved

src/tools/assistant/searchKnowledge.ts Show resolved Hide resolved

Merge branch 'main' into assistant-tool

34d50d8

mongodben reviewed Aug 22, 2025

View reviewed changes

src/tools/assistant/searchKnowledge.ts Show resolved Hide resolved

nlarew commented Aug 22, 2025

View reviewed changes

src/tools/assistant/assistantTool.ts Show resolved Hide resolved