feat(importer): add behavioral scanner#1043
Conversation
7219c8c to
5b2cbdf
Compare
|
The latest Buf updates on your PR. Results from workflow Buf CI / verify-proto (pull_request).
|
paralta
left a comment
There was a problem hiding this comment.
Minor comments, looks good overall 👍
| MCP_SCANNER_LLM_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }} | ||
| MCP_SCANNER_LLM_BASE_URL: ${{ secrets.AZURE_OPENAI_ENDPOINT }} | ||
| MCP_SCANNER_LLM_MODEL: "azure/gpt-4o" | ||
| MCP_SCANNER_LLM_API_VERSION: "2024-10-21" |
There was a problem hiding this comment.
would be neat to reuse the more generic vars above AZURE_*. the vars are already used for the importer enricher and we are using the same here
There was a problem hiding this comment.
These env vars are defined in the mcp scanner repository - https://github.com/cisco-ai-defense/mcp-scanner - I don't think we can rewrite it based on our needs.
There was a problem hiding this comment.
you can pass the existing vars here https://github.com/agntcy/dir/blob/feat/add-behavioral-scanner/importer/scanner/behavioral/behavioral.go#L220C14-L220C28
| } | ||
|
|
||
| // isPlaceholderURL returns true for URLs that are not real repositories | ||
| // (e.g. example.com placeholders injected by the transformer). |
There was a problem hiding this comment.
where is this injection made? maybe we need to refrain from injecting these placeholders in the transformer stage
There was a problem hiding this comment.
It's defined here: https://github.com/agntcy/oasf-sdk/blob/main/pkg/translator/mcp.go#L691
@akijakya should know more likely why we need that.
There was a problem hiding this comment.
Ah yes, I think this is a leftover from when locator was a required field in the record (up until OASF 0.7.0), so we had to put something there, but it should be removed now. I'll create an issue about it in the SDK's repo, but I think this placeholder check can be removed either way, because it is also gracefully handled if the repo doesn't exist/private, right?
There was a problem hiding this comment.
Yes we can remove, we are just unnecessarily trying to clone a repo at an URL that doesn't exist, and most of the records have this placeholder URL. I'll remove it and when we removed it from the OASF-SDK the problem will cease to exist.
| func extractSourceCodeURL(fields map[string]*structpb.Value) string { | ||
| locatorsVal, ok := fields["locators"] | ||
| if !ok || locatorsVal == nil { | ||
| return "" | ||
| } | ||
|
|
||
| listVal := locatorsVal.GetListValue() | ||
| if listVal == nil { |
There was a problem hiding this comment.
why are we still doing data extraction like this? we have our corev1 package that can take in a protobuf and return a record, lets use that to access data
006eb63 to
86a4da0
Compare
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
86a4da0 to
d50d50d
Compare
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Signed-off-by: Tagscherer Ádám <adam.tagscherer@gmail.com>
Implement behavioral security scanner
Implements the behavioral scanner. For each imported record, the scanner:
Records are gracefully skipped when:
This PR also fixes a go version error because the importer currently is failing:
https://github.com/agntcy/dir/actions/runs/22980792321
Successful CI run logs:
https://github.com/agntcy/dir/actions/runs/22997376095