-
Notifications
You must be signed in to change notification settings - Fork 126
Add MongoDB Assistant Tools #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces MongoDB Assistant tools for the MCP server, enabling natural language search and knowledge source discovery. It adds integration with the MongoDB Assistant AI chatbot and search service.
- Scaffolds a new
AssistantToolBase
class for MongoDB Assistant tool interactions - Implements two initial tools:
list_knowledge_sources
andsearch_knowledge
- Adds comprehensive test coverage with mocked API responses
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/common/config.ts | Adds assistantBaseUrl configuration option |
src/tools/tool.ts | Extends ToolCategory type to include "assistant" |
src/tools/assistant/assistantTool.ts | Base class for Assistant tools with common functionality |
src/tools/assistant/list_knowledge_sources.ts | Tool for listing available knowledge sources |
src/tools/assistant/search_knowledge.ts | Tool for searching the knowledge base |
src/tools/assistant/tools.ts | Exports array of Assistant tools |
src/server.ts | Registers Assistant tools with the server |
tests/integration/helpers.ts | Updates response element types to support metadata |
tests/integration/tools/assistant/assistantHelpers.ts | Test utilities for Assistant tool testing |
tests/integration/tools/assistant/listKnowledgeSources.test.ts | Integration tests for list knowledge sources tool |
tests/integration/tools/assistant/searchKnowledge.test.ts | Integration tests for search knowledge tool |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/tools/assistant/assistantTool.ts
Outdated
if (args.method === "POST") { | ||
headers.set("Content-Type", "application/json"); | ||
} | ||
return await fetch(endpoint, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use Node.js fetch, but customFetch configured like in the Atlas Client:
mongodb-mcp-server/src/common/atlas/apiClient.ts
Lines 41 to 47 in 04a5ddc
// createFetch assumes that the first parameter of fetch is always a string | |
// with the URL. However, fetch can also receive a Request object. While | |
// the typechecking complains, createFetch does passthrough the parameters | |
// so it works fine. | |
private static customFetch: typeof fetch = createFetch({ | |
useEnvironmentVariableProxies: true, | |
}) as unknown as typeof fetch; |
The reason is that the customFetch supports enterprise proxies, which are relevant for customers with specific security requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this into apiClient? we also have telemetry events logic there, it would be good to keep all the api related logic there for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok no problem - I will create a new customFetch instance in the assistant tool base because we don't need all of the extra baggage from the Atlas ApiClient
class when working with this non-Atlas API.
return { | ||
content: dataSources.map(({ id, type, versions }) => ({ | ||
type: "text", | ||
text: id, | ||
_meta: { | ||
type, | ||
versions, | ||
}, | ||
})), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should convert this into a string, similar to what we do in:
mongodb-mcp-server/src/tools/mongodb/read/find.ts
Lines 93 to 103 in 04a5ddc
return { | |
content: formatUntrustedData( | |
this.generateMessage({ | |
collection, | |
queryResultsCount, | |
documents: cursorResults.documents, | |
appliedLimits: [limitOnFindCursor.cappedBy, cursorResults.cappedBy].filter((limit) => !!limit), | |
}), | |
cursorResults.documents.length > 0 ? EJSON.stringify(cursorResults.documents) : undefined | |
), | |
}; |
And also we should consider the information in the knowledge list untrusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like formatUntrustedData
maps to a single CallToolResult["content"]
object so I will need to convert the array of results I get from each API call into a single formatted string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to wrap API error text in formatUntrustedData
as well? Not sure if that's already mitigated by the isError: true
or not.
i.e. instead of
if (!response.ok) {
return {
content: [
{
type: "text",
text: `Failed to list knowledge sources: ${response.statusText}`,
},
],
isError: true,
};
}
we would have
if (!response.ok) {
return {
content: formatUntrustedData("Failed to list knowledge sources", response.statusText),
isError: true,
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the LLM will employ any injection protection measures regardless of whether the response is marked as error-ed or not. That being said, do we expect that statusText
would contain any instructions - this should be just the textual representation of the status code, shouldn't it? I wouldn't expect there to be a way to inject anything malicious in it.
import { LogId } from "../../common/logger.js"; | ||
|
||
export const SearchKnowledgeToolArgs = { | ||
query: z.string().describe("A natural language query to search for in the knowledge base"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we add a bit more detail here? this is what the LLM will use to understand that you want it to make a generic natural question that the user haave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
this.requiredHeaders = new Headers({ | ||
"x-request-origin": "mongodb-mcp-server", | ||
"user-agent": serverVersion ? `mongodb-mcp-server/v${serverVersion}` : "mongodb-mcp-server", | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[q] why does this need to be initialized here at the tool level? can we also move this logic into the api layer?
also, we should probably use the userAgent
from the apiClient and make sure we use the same name everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be initialized here at the tool level? can we also move this logic into the api layer?
The "API layer" as it exists seems to be tailored to Atlas - this is a totally separate server/API. If we prefer I could create an equivalent assistant/apiClient.ts
file but personally that seems unnecessary at this point.
we should probably use the userAgent from the apiClient
- Our server firewall allowlists specific user agents and right now that's configured to be
mongodb-mcp-server
- nit: The
apiClient
usesUser-Agent: AtlasMCP
which again seems to refer to Atlas specifically
/** | ||
* Mocks fetch for assistant API calls | ||
*/ | ||
interface MockedAssistantAPI { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we're mocking the api calls i'd rather call these unit tests, could you please move the tests to the unit folder and the helpers?
if you have any staging / cloud-dev environment, we could configure integration tests as well, let me know on slack if you do as I can help setting up a GH variable for the staging environment, otherwise we could keep it with unit tests only which is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha that's totally reasonable. I'll move these mocked tests to tests/unit
. It's a bit awkward because I think I still want the integration test flow (i.e. mcpClient.callTool({ ... })
) but I'd like to use the mocks to control the output to confirm that everything is formatted/handled appropriately.
For integration tests, we could try hitting https://knowledge-dev.mongodb.com
. What GH variables do we need to support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an env variable called TEST_ASSISTANT_BASE_URL
in GH actions, you can make it so your integration tests get configured to that if the env is set.
here's an atlas example. you'd need to add the logic for using it as well
- name: Run tests
env:
MDB_MCP_API_CLIENT_ID: ${{ secrets.TEST_ATLAS_CLIENT_ID }}
MDB_MCP_API_CLIENT_SECRET: ${{ secrets.TEST_ATLAS_CLIENT_SECRET }}
MDB_MCP_API_BASE_URL: ${{ vars.TEST_ATLAS_BASE_URL }}
run: npm test -- tests/integration/tools/atlas
The only thing missing from my side in this PR is to add some accuracy tests, to make sure that the models we test with do know when to call this tool. In the tests/accuracy folder there are a few examples that can be used as inspiration. The idea is to write a few prompts that we expect to trigger our tool and ensure that the tool is called with the specific parameters. |
Proposed changes
(EAI-1266) Knowledge Search on MCP
Adds a new class of
Assistant
tools for interacting with the MongoDB Assistant (AI Chatbot + Search).This PR scaffolds the generic
AssistantToolBase
class as well as two initial tools:list_knowledge_sources
- pulls a list of all available data sources. Useful for knowing what data is available and for filtering thesearch_knowledge
search_knowledge
- runs a natural language search against the MongoDB Assistant knowledge base. Returns content chunks for the most relevant results.Checklist