Safe Executor v1 + MCP utilization standard (internal rollout)#356
Safe Executor v1 + MCP utilization standard (internal rollout)#356fvn946zh9w-crypto wants to merge 1 commit intowonderwhy-er:mainfrom
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Nitpicks 🔍
|
| }); | ||
| } | ||
|
|
||
| const run = await skillRunner.approveRun(parsed.data.runId); |
There was a problem hiding this comment.
Suggestion: The concurrency limit configured via skillMaxConcurrentRuns is only enforced for run_skill in execute mode, but not when a run transitions to execution via approve_skill_run, so a caller can bypass the limit by creating many planned runs and then approving them, leading to more concurrent executions than configured; you should apply the same getPendingOrActiveCount vs maxConcurrentRuns check in the approve handler before calling into the runner. [logic error]
Severity Level: Major ⚠️
- ❌ Concurrency limit ignored for approve_skill_run execution path.
- ⚠️ Safe Executor may spawn more runs than configured.
- ⚠️ Resource usage spikes despite skillMaxConcurrentRuns protection.
- ⚠️ Governance expectations on execution caps silently violated.| const run = await skillRunner.approveRun(parsed.data.runId); | |
| if (skillRunner.getPendingOrActiveCount() >= settings.maxConcurrentRuns) { | |
| return errorResponse(`Max concurrent skill runs reached (${settings.maxConcurrentRuns}).`, 'concurrency_limit_reached'); | |
| } | |
Steps of Reproduction ✅
1. Start the MCP server that wires the `approve_skill_run` tool to `handleApproveSkillRun`
in `src/handlers/skills-handlers.ts:157-193` (this module is imported by the tools layer
via `../tools/schemas.js`).
2. Configure the server so that `configManager.getConfig()` (used by `getSkillConfig` at
`src/handlers/skills-handlers.ts:34-37`) returns a `ServerConfig` with
`skillsEnabled=true`, `commandValidationMode='strict'`, `skillMaxConcurrentRuns=1`, and
`skillExecuteEvalGateEnabled=true`, so `normalizeSkillRuntimeConfig` at
`src/skills/runtime-config.ts:79-147` yields `maxConcurrentRuns = 1` and `evalGateEnabled
= true`.
3. From a client (DesktopCommander / MCP client), call the `run_skill` tool so it routes
to `handleRunSkill` at `src/handlers/skills-handlers.ts:107-155` with arguments that
create multiple runs in a non-`execute` mode (for example a planning/confirm flow defined
by `RunSkillArgsSchema`), so the concurrency check `if (parsed.data.mode === 'execute' &&
skillRunner.getPendingOrActiveCount() >= settings.maxConcurrentRuns)` at lines 134-137 is
not executed, producing several `waiting_approval` runs.
4. While at least one other run is already executing (started earlier via `run_skill` with
`mode='execute'` and allowed by the check at `skills-handlers.ts:134-137`, bringing real
concurrent executions to the configured limit of 1), approve each queued run by repeatedly
invoking the `approve_skill_run` tool, which reaches `handleApproveSkillRun` at
`src/handlers/skills-handlers.ts:157-193`; observe that no concurrency check is performed
before `skillRunner.approveRun(...)` is called, so additional runs start executing even
though `skillRunner.getPendingOrActiveCount()` is already ≥ `settings.maxConcurrentRuns`,
allowing more concurrent executions than the configured limit.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/handlers/skills-handlers.ts
**Line:** 183:183
**Comment:**
*Logic Error: The concurrency limit configured via `skillMaxConcurrentRuns` is only enforced for `run_skill` in `execute` mode, but not when a run transitions to execution via `approve_skill_run`, so a caller can bypass the limit by creating many planned runs and then approving them, leading to more concurrent executions than configured; you should apply the same `getPendingOrActiveCount` vs `maxConcurrentRuns` check in the approve handler before calling into the runner.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| text.includes('Started') || text.includes('No'), | ||
| ['search_session_started_or_no_results'], | ||
| [text.substring(0, 200)], | ||
| 'Search response did not include expected markers' | ||
| ); |
There was a problem hiding this comment.
Suggestion: In the search step, the verification object always includes a failureReason string even when the step passes, which leads to confusing and inconsistent state where successful steps appear to have a failure reason in downstream views; adjust the logic so failureReason is only populated when the verification actually fails. [logic error]
Severity Level: Major ⚠️
- ⚠️ Successful search steps carry misleading failureReason metadata.
- ⚠️ Consumers of run summaries may misinterpret search step status.| text.includes('Started') || text.includes('No'), | |
| ['search_session_started_or_no_results'], | |
| [text.substring(0, 200)], | |
| 'Search response did not include expected markers' | |
| ); | |
| const searchSessionOk = text.includes('Started') || text.includes('No'); | |
| const verification = createVerification( | |
| searchSessionOk, | |
| ['search_session_started_or_no_results'], | |
| [text.substring(0, 200)], | |
| searchSessionOk ? undefined : 'Search response did not include expected markers' |
Steps of Reproduction ✅
1. In `src/skills/runner.ts:333-383`, invoke `skillRunner.runSkill()` with `mode:
'execute'` and `executionMode: 'auto_safe'` on a valid `SkillDescriptor` so that a plan is
built including the `search` step (`buildDeterministicPlan()` at
`src/skills/runner.ts:114-155` always adds a `type: 'search'` step as `step-2`).
2. Ensure the run is executed (i.e., not `plan` or `confirm` mode) so `runSkill()` calls
`this.executePlanSteps(runId)` at `src/skills/runner.ts:382`, which in turn calls
`this.executeSingleStep()` for each step at `src/skills/runner.ts:418`.
3. When the `search` step is processed in `executeSingleStep()`
(`src/skills/runner.ts:510-536`), the call to `handleStartSearch()` returns a non-error
result where `result.content[0].text` includes either `"Started"` or `"No"` (the expected
success markers implied by the verification checks), making `verification.passed === true`
in the block at `src/skills/runner.ts:528-533`.
4. Observe the returned `SkillRun` from `runSkill()` (or via `getRun()` at
`src/skills/runner.ts:659-662`): in `run.executionSummary.stepOutcomes`, the `search` step
has `verification.passed === true` but `verification.failureReason === 'Search response
did not include expected markers'`, demonstrating a successful step carrying a non-empty
failureReason.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/skills/runner.ts
**Line:** 529:533
**Comment:**
*Logic Error: In the search step, the verification object always includes a failureReason string even when the step passes, which leads to confusing and inconsistent state where successful steps appear to have a failure reason in downstream views; adjust the logic so failureReason is only populated when the verification actually fails.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| if (prevSkillsEnabled !== undefined) { | ||
| try { | ||
| await configManager.setValue('skillsEnabled', prevSkillsEnabled); | ||
| } catch { | ||
| // ignore | ||
| } |
There was a problem hiding this comment.
Suggestion: The test only restores skillsEnabled if it was previously defined, so if the config key did not exist before the test, the test will persistently add skillsEnabled: false to the user's config and violate the "should not leave config mutated" expectation. [logic error]
Severity Level: Major ⚠️
- ❌ User skills tools remain hidden across MCP sessions.
- ⚠️ Test mutates shared config despite 'no mutation' comment.| if (prevSkillsEnabled !== undefined) { | |
| try { | |
| await configManager.setValue('skillsEnabled', prevSkillsEnabled); | |
| } catch { | |
| // ignore | |
| } | |
| try { | |
| await configManager.setValue('skillsEnabled', prevSkillsEnabled); | |
| } catch { | |
| // ignore |
Steps of Reproduction ✅
1. Ensure the configuration managed by `src/config-manager.ts` has no `skillsEnabled` key
so that `configManager.getValue('skillsEnabled')` (implemented at lines 215-218) will
return `undefined`.
2. Run `test/test-combined-tool-filtering.js`, which imports `{ configManager }` from
`../dist/config-manager.js` and calls `prevSkillsEnabled = await
configManager.getValue('skillsEnabled');` followed by `await
configManager.setValue('skillsEnabled', false);` inside `run()` (lines 32-34).
3. In the `finally` block at `test/test-combined-tool-filtering.js:63-79`, the code checks
`if (prevSkillsEnabled !== undefined)` before restoring; since `prevSkillsEnabled` is
`undefined` in this scenario, the call to `configManager.setValue('skillsEnabled',
prevSkillsEnabled)` is skipped entirely.
4. As a result, `src/config-manager.ts:setValue` (lines 223-248) has already persisted
`skillsEnabled: false` to the backing config file, and this value is not reverted after
the test, leaving `skillsEnabled` permanently set to `false` for subsequent server runs
initiated via `dist/index.js`, which drives skill tool filtering validated by this test.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** test/test-combined-tool-filtering.js
**Line:** 65:70
**Comment:**
*Logic Error: The test only restores `skillsEnabled` if it was previously defined, so if the config key did not exist before the test, the test will persistently add `skillsEnabled: false` to the user's config and violate the "should not leave config mutated" expectation.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| try { | ||
| skillRunner.resetExecuteEvalStats(); | ||
| await configManager.setValue('skillsEnabled', true); | ||
| await configManager.setValue('skillsDirectories', ['/Users/test1/.codex/skills']); |
There was a problem hiding this comment.
Suggestion: The test hardcodes skillsDirectories to /Users/test1/.codex/skills, which will not exist on most machines or CI environments; this can cause handleListSkills to return an empty skills array, leading to a runtime error when accessing listPayload.skills[0].id and making the test environment-dependent and brittle. Reusing the previously configured skillsDirectories avoids the hard-coded, user-specific path while still allowing the test to operate on whatever skill directories are actually configured. [logic error]
Severity Level: Major ⚠️
- ⚠️ Eval gate test fails on non-`test1` developer machines.
- ⚠️ CI on shared runners may intermittently fail tests.
- ⚠️ Hard-coded user path reduces portability of test suite.| await configManager.setValue('skillsDirectories', ['/Users/test1/.codex/skills']); | |
| if (prevDirs !== undefined) { | |
| await configManager.setValue('skillsDirectories', prevDirs); | |
| } |
Steps of Reproduction ✅
1. Clone the repository to a machine where the directory `/Users/test1/.codex/skills` does
not exist or does not contain valid skills for DesktopCommanderMCP (this is a
user-specific absolute path).
2. Run the eval gate test directly with `node test/test-skill-eval-gate.js`, which invokes
the `run()` function defined in `test/test-skill-eval-gate.js:11-68`.
3. Inside `run()`, the test overwrites the skills directory config with `await
configManager.setValue('skillsDirectories', ['/Users/test1/.codex/skills']);` at
`test/test-skill-eval-gate.js:22`, and then calls `handleListSkills({ limit: 1 })` at
`test/test-skill-eval-gate.js:28`.
4. `handleListSkills` (in `src/handlers/skills-handlers.ts:47-79`) uses `getSkillConfig()`
and then `skillRegistry.scanSkills(settings.skillDirs)` with the now hard-coded
`/Users/test1/.codex/skills`. Because that path has no usable skills on this machine,
`handleListSkills` returns a result with no skills (or a config error). Back in the test,
`parseTextPayload(listed)` at `test/test-skill-eval-gate.js:30` yields
`listPayload.skills` as an empty array, so `listPayload.skills[0].id` at
`test/test-skill-eval-gate.js:31` is `undefined.id`, causing a TypeError or failing the
earlier assertion `assert.ok(!listed.isError, ...)`, and the test fails in otherwise valid
environments.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** test/test-skill-eval-gate.js
**Line:** 22:22
**Comment:**
*Logic Error: The test hardcodes `skillsDirectories` to `/Users/test1/.codex/skills`, which will not exist on most machines or CI environments; this can cause `handleListSkills` to return an empty skills array, leading to a runtime error when accessing `listPayload.skills[0].id` and making the test environment-dependent and brittle. Reusing the previously configured `skillsDirectories` avoids the hard-coded, user-specific path while still allowing the test to operate on whatever skill directories are actually configured.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| try { | ||
| skillRunner.resetExecuteEvalStats(); | ||
| await configManager.setValue('skillsEnabled', true); | ||
| await configManager.setValue('skillsDirectories', ['/Users/test1/.codex/skills']); |
There was a problem hiding this comment.
Suggestion: The test hardcodes skillsDirectories to /Users/test1/.codex/skills, which is a user-specific absolute path; on any machine where that path does not exist or contain skills, handleListSkills will return no skills and the assertions expecting discovered skills will fail, so the test should instead reuse the existing configuration or a configurable directory. [logic error]
Severity Level: Major ⚠️
- ❌ Skills workflow test fails on non-author environments.
- ⚠️ CI/npm test pipeline may be red by default.
- ⚠️ Contributors cannot reliably run full test-suite locally.| await configManager.setValue('skillsDirectories', ['/Users/test1/.codex/skills']); | |
| const skillsDirs = Array.isArray(prevDirs) && prevDirs.length > 0 | |
| ? prevDirs | |
| : (process.env.DESKTOP_COMMANDER_SKILLS_DIR ? [process.env.DESKTOP_COMMANDER_SKILLS_DIR] : []); | |
| await configManager.setValue('skillsDirectories', skillsDirs); |
Steps of Reproduction ✅
1. On any machine where `/Users/test1/.codex/skills` does not exist or contains no skill
definitions, build the project so that `dist/handlers/skills-handlers.js` is present.
2. Run the test entrypoint `node test/test-skills-workflow.js`, which calls the `run()`
function defined at `test/test-skills-workflow.js:18`.
3. Inside `run()`, at `test/test-skills-workflow.js:26-33`, the test sets configuration
via `configManager.setValue`, including `skillsDirectories` to
`['/Users/test1/.codex/skills']` at line 29, then calls `handleListSkills({ limit: 5 })`
at line 35.
4. `handleListSkills` (implemented in `src/handlers/skills-handlers.ts:47-79`) calls
`skillRegistry.scanSkills(settings.skillDirs)` using that directory; with no skills found,
it returns an empty `skills` array, causing the assertion
`assert.ok(listPayload.skills.length > 0, 'at least one skill should be discovered');` at
`test/test-skills-workflow.js:39` to fail and the test process to exit with a non-zero
status.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** test/test-skills-workflow.js
**Line:** 29:29
**Comment:**
*Logic Error: The test hardcodes `skillsDirectories` to `/Users/test1/.codex/skills`, which is a user-specific absolute path; on any machine where that path does not exist or contain skills, `handleListSkills` will return no skills and the assertions expecting discovered skills will fail, so the test should instead reuse the existing configuration or a configurable directory.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
|
Ouh wow that is a big one |
User description
Title
Safe Executor v1 + MCP utilization standard (internal rollout)
Summary
/Users/test1/DesktopCommanderMCP.Included
run_skill,approve_skill_run,get_skill_run,cancel_skill_run).dc://skills/catalog,dc://skills/eval-gate,dc://skills/runs/{runId}).operations/rollout/.operations/rollout/2026-02-23/.Validation
operations/rollout/2026-02-23/npm_test.logand summary file.operations/rollout/2026-02-23/pilot_run_summary.md.operations/rollout/2026-02-23/skills_eval_gate_snapshot.json.Security Defaults Confirmed
commandValidationMode = strictskillExecutionMode = confirmtoolCallLoggingMode = redactedskillExecuteEvalGateEnabled = true(enforced post-sampling)Known Environment-specific Notes
listen EPERMin restricted sandbox; test includes environment-safe skip path.Rollout Scope
CodeAnt-AI Description
Add Safe Executor skills runtime with guarded skill discovery and execution
What Changed
Impact
✅ Fewer accidental destructive commands✅ Clearer eval-gate blocks for skill execution✅ Fewer raw secrets in tool call logs💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.