Skip to content

feat(importer): add behavioral scanner#1043

Merged
adamtagscherer merged 6 commits intomainfrom
feat/add-behavioral-scanner
Mar 16, 2026
Merged

feat(importer): add behavioral scanner#1043
adamtagscherer merged 6 commits intomainfrom
feat/add-behavioral-scanner

Conversation

@adamtagscherer
Copy link
Copy Markdown
Member

@adamtagscherer adamtagscherer commented Mar 12, 2026

Implement behavioral security scanner

Implements the behavioral scanner. For each imported record, the scanner:

  • Extracts the source_code locator URL and optional mcp_data.repository.subfolder
  • Clones the repo (git clone --depth=1) to a temp directory
  • Runs mcp-scanner behavioral --raw on the source
  • Parses the JSON output and maps findings to ScanResult (CRITICAL/HIGH → error, MEDIUM → warning, LOW → info)

Records are gracefully skipped when:

  • No source_code locator exists
  • URL is a placeholder (example.com)
  • Repository is not cloneable (private/deleted)
  • Also wires the scanner into the CI workflow (import-records.yaml):

This PR also fixes a go version error because the importer currently is failing:
https://github.com/agntcy/dir/actions/runs/22980792321

Successful CI run logs:
https://github.com/agntcy/dir/actions/runs/22997376095

@github-actions github-actions bot added the size/M Denotes a PR that changes 200-999 lines label Mar 12, 2026
@adamtagscherer adamtagscherer changed the title feat: behavioral scanner feat(importer): behavioral scanner Mar 12, 2026
@adamtagscherer adamtagscherer changed the title feat(importer): behavioral scanner feat(importer): add behavioral scanner Mar 12, 2026
@adamtagscherer adamtagscherer force-pushed the feat/add-behavioral-scanner branch from 7219c8c to 5b2cbdf Compare March 12, 2026 09:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 12, 2026

The latest Buf updates on your PR. Results from workflow Buf CI / verify-proto (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped⏩ skipped✅ passedMar 16, 2026, 10:18 AM

@adamtagscherer adamtagscherer marked this pull request as ready for review March 12, 2026 09:35
@adamtagscherer adamtagscherer requested a review from a team as a code owner March 12, 2026 09:35
@adamtagscherer adamtagscherer self-assigned this Mar 12, 2026
@adamtagscherer adamtagscherer added go Pull requests that update go code area/importer labels Mar 12, 2026
@adamtagscherer adamtagscherer added this to the DIR v1.1.0 milestone Mar 12, 2026
@adamtagscherer adamtagscherer linked an issue Mar 12, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Member

@paralta paralta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, looks good overall 👍

Comment on lines +152 to +155
MCP_SCANNER_LLM_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
MCP_SCANNER_LLM_BASE_URL: ${{ secrets.AZURE_OPENAI_ENDPOINT }}
MCP_SCANNER_LLM_MODEL: "azure/gpt-4o"
MCP_SCANNER_LLM_API_VERSION: "2024-10-21"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be neat to reuse the more generic vars above AZURE_*. the vars are already used for the importer enricher and we are using the same here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These env vars are defined in the mcp scanner repository - https://github.com/cisco-ai-defense/mcp-scanner - I don't think we can rewrite it based on our needs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// isPlaceholderURL returns true for URLs that are not real repositories
// (e.g. example.com placeholders injected by the transformer).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this injection made? maybe we need to refrain from injecting these placeholders in the transformer stage

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined here: https://github.com/agntcy/oasf-sdk/blob/main/pkg/translator/mcp.go#L691

@akijakya should know more likely why we need that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I think this is a leftover from when locator was a required field in the record (up until OASF 0.7.0), so we had to put something there, but it should be removed now. I'll create an issue about it in the SDK's repo, but I think this placeholder check can be removed either way, because it is also gracefully handled if the repo doesn't exist/private, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can remove, we are just unnecessarily trying to clone a repo at an URL that doesn't exist, and most of the records have this placeholder URL. I'll remove it and when we removed it from the OASF-SDK the problem will cease to exist.

Comment on lines +117 to +124
func extractSourceCodeURL(fields map[string]*structpb.Value) string {
locatorsVal, ok := fields["locators"]
if !ok || locatorsVal == nil {
return ""
}

listVal := locatorsVal.GetListValue()
if listVal == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we still doing data extraction like this? we have our corev1 package that can take in a protobuf and return a record, lets use that to access data

@adamtagscherer adamtagscherer force-pushed the feat/add-behavioral-scanner branch 2 times, most recently from 006eb63 to 86a4da0 Compare March 13, 2026 12:14
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
@adamtagscherer adamtagscherer force-pushed the feat/add-behavioral-scanner branch from 86a4da0 to d50d50d Compare March 16, 2026 08:21
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
@adamtagscherer adamtagscherer merged commit 81eafca into main Mar 16, 2026
56 of 57 checks passed
@adamtagscherer adamtagscherer deleted the feat/add-behavioral-scanner branch March 16, 2026 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/importer go Pull requests that update go code size/M Denotes a PR that changes 200-999 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add behavioral scanner

4 participants