Mercury CLI is a Rust direct cargo verifier repair beta for teams using Inception Labs models.
The current branch is aligned to the Mercury CLI 1.0.0-beta.1 pre-release runtime surface. The real product wedge today is Rust-first repair: local watch --repair is for supported direct Rust verifier commands, while fix and the GitHub repair workflow are productized around direct allowlisted Rust verifier commands and the checked-in Tier 1 benchmark lane at evals/v0/tier1-manifest.json. In practice, that means cargo test, cargo check, and cargo clippy are the supported verifier classes for the current beta story. TypeScript remains a frozen experimental lane for selected direct verifier commands; it currently relies on token-aware repository scanning and failure parsing rather than a real TypeScript parser, so it should not be read as parity with Rust repair quality. Artifact bundles, phased candidate fanout via fix --max-agents N, status --live runtime events, verifier allowlisting, audit logs, and output redaction are implemented. Candidate verification isolation is repo-copy/worktree based under .mercury/worktrees/ (not a stronger process/container sandbox claim). .github/workflows/repair.yml only opens or updates draft PRs for verified repairs when dry_run=false and same-repo write permissions are available.
Use the installer script when you want a tagged binary instead of a source build:
# Current prerelease branch surface
curl -fsSL https://raw.githubusercontent.com/denster32/mercury-cli/main/scripts/install.sh | bash -s -- --version v1.0.0-beta.1
# Later stable GA tags use the same path
curl -fsSL https://raw.githubusercontent.com/denster32/mercury-cli/main/scripts/install.sh | bash -s -- --version v1.0.0If you omit --version, the installer resolves the latest non-prerelease GitHub release. Use an explicit version whenever you want the current beta surface instead of the latest stable tag.
git clone https://github.com/denster32/mercury-cli
cd mercury-cli
cargo build --releaseRun commands with either:
./target/release/mercury-cli ...(no global install)cargo install --path .thenmercury-cli ...
Official release archives are currently published only for macOS arm64 and Linux x86_64. Hyphenated versions such as v1.0.0-beta.1 publish as GitHub prereleases for that exact branch surface and should not be read as a broader platform-support contract; plain v1.0.0 remains reserved for the stable GA release line. Windows is covered by the CI test matrix, but there is no official Windows release archive in the current repo, so Windows remains a source-build path for now. Tagged release archives now include both mercury-cli and a mercury compatibility alias, and tagged releases also attach mercury-benchmarks-<version>.tar.gz so the checked-in public benchmark publication set is downloadable outside the repo.
For upgrade notes and prerelease deltas, use CHANGELOG.md.
After install, start with docs/operator-quickstart.md. If you want a disposable first repair attempt before touching a real repo, use the checked-in starter repos in starter-repos/README.md.
INCEPTION_API_KEY is the preferred environment variable. MERCURY_API_KEY still works as a backward-compatible fallback.
export INCEPTION_API_KEY="your-api-key"
# Optional fallback for older configs or older CLI help text:
export MERCURY_API_KEY="$INCEPTION_API_KEY"The fastest real path today is local Rust repair: reproduce a failing verifier command, run watch --repair, inspect the artifact bundle, and keep only verified changes.
git clone https://github.com/denster32/mercury-cli
cd mercury-cli
cargo build --release
export INCEPTION_API_KEY="your-api-key"
export MERCURY_API_KEY="$INCEPTION_API_KEY"
./target/release/mercury-cli init
./target/release/mercury-cli watch "cargo test -p your-crate" --repairWhat a good current run should leave behind:
- a final watch decision printed to the terminal
- an artifact bundle path under
.mercury/runs/ summary-index.jsonas the top-level local run entrypointwatch.jsonplus the watched command's stdout and stderr- copied repair artifacts when Mercury applies a fix attempt
- no partial writes from rejected candidates
Important limits:
- local
watch --repairauto-repair is currently Rust-only and targeted at directcargoverifier commands:cargo test,cargo check, andcargo clippy - optional env-prefix forms are supported when they still resolve directly to those commands (for example
RUST_BACKTRACE=1 cargo test --quiet) - non-allowlisted watch commands are rejected before execution and before any watch-cycle artifacts are created
watchwithout--repairis report-only for allowlisted verifier commands
Start here after install:
Reproducible repo-backed walkthroughs:
- Local red -> green watch-repair flow
- CI-oriented repair to draft PR flow
- Local Rust watch-repair starter repo
- CI draft-PR repair starter repo
Examples below assume you built from source in this repo, so command invocations use ./target/release/mercury-cli. If you installed via cargo install --path ., replace with mercury-cli.
| Surface | Status | Reality in the current repo |
|---|---|---|
./target/release/mercury-cli init |
Available | Creates .mercury/ config and thermal database. |
./target/release/mercury-cli plan <goal> |
Available | Produces a structured repair plan and thermal assessments. |
./target/release/mercury-cli ask <query> |
Available | Repo-aware Mercury 2 Q&A. |
./target/release/mercury-cli status [--heatmap] [--agents] [--budget] |
Available | Reports thermal state and scheduler metadata. |
./target/release/mercury-cli status --live [--interval-ms N] |
Available | Streams candidate, phase, and runtime events in a TTY dashboard and emits JSONL event records when piped, including winner/loss/suppression explanations from persisted candidate metadata. --interval-ms must be at least 250. |
./target/release/mercury-cli edit apply |
Available | Concrete Mercury Edit apply surface for replacement snippets or patch content. It is not an instruction-driven repair endpoint, and --dry-run shows a unified diff without writing. |
./target/release/mercury-cli edit complete |
Available | Completion-style Mercury Edit request for a file or cursor location. |
./target/release/mercury-cli edit next |
Available | Next-edit prediction using current file state plus focused cursor and recent-snippet context. |
./target/release/mercury-cli fix <description> |
Available | Repair flow with planning, candidate generation, isolated repo-copy/worktree verification, and artifacts for direct allowlisted Rust verifier commands aligned with the Tier 1 Rust beta lane in docs/benchmarks/. It also includes a frozen experimental TypeScript lane for selected direct verifier commands, but that lane is not parser-backed parity with Rust. |
./target/release/mercury-cli fix <description> --noninteractive |
Available | CI-safe output mode for log parsing and deterministic summary lines. |
./target/release/mercury-cli watch <direct allowlisted verifier command> |
Available | Re-runs an allowlisted direct verifier command when repo contents change and records a watch artifact bundle per cycle. |
./target/release/mercury-cli watch <direct Rust verifier command> --repair |
Available with limits | End-to-end local repair loop for direct cargo test, cargo check, or cargo clippy commands (including env-prefix variants), with targeted verifier reuse and post-repair confirmation. |
./target/release/mercury-cli watch <command> --noninteractive |
Available | CI-safe watch output mode with compact cycle decisions. |
./target/release/mercury-cli watch <composed shell command> --repair |
Not supported | Commands with pipelines or shell chaining are rejected by the watch command allowlist before cycle execution. |
./target/release/mercury-cli config get / validate |
Available | Reads or validates config values. |
./target/release/mercury-cli config set |
Available with limits | Safely updates the documented scalar keys in .mercury/config.toml and validates the full config before write. Unsupported keys still require direct TOML editing. |
| Manual CI-to-draft-PR handoff | Documented workflow | The repo includes a case study for publishing artifacts and opening a draft PR after a verified local or CI reproduction. |
| GitHub Action repair workflow | Available with limits | The Mercury CI Auto-Repair Draft PR workflow in .github/workflows/repair.yml reproduces a failure in isolation, runs Mercury repair for direct allowlisted Rust verifier commands and the frozen experimental TypeScript verifier lane when baseline is red and an API key is present, uploads artifacts for every terminal status, and opens or updates a draft PR only when repair is verified, dry_run=false, and the workflow can push to the same repository. Verified reruns targeting the same base ref and failure command reuse the same repair branch/PR head instead of minting a new branch name per run. Use dry_run when you want the evidence bundle without branch or PR mutation. |
| Eval corpus | Available | evals/v0/manifest.json is the 50-case Rust baseline corpus, evals/v0/tier0-manifest.json and evals/v0/tier2-manifest.json are diagnostic slices derived from that corpus, evals/v0/tier1-manifest.json is the 35-case Tier 1 Rust repair beta lane, and evals/v1_typescript/manifest.json is the 50-case frozen experimental TypeScript baseline harness. The TypeScript corpus is baseline coverage, not parser-backed repair-parity evidence. |
| Published repair benchmark report | Available with scoped evidence | docs/benchmarks/rust-v0-repair-benchmark.md, docs/benchmarks/rust-v0-quality.report.json, and docs/benchmarks/rust-v0-agent-sweep.report.json are generated by evals/repair_benchmark/publish.py from aggregate runner outputs. Those checked-in numbers are the product truth for the Tier 1 Rust beta lane at evals/v0/tier1-manifest.json; evals/v0/tier0-manifest.json and evals/v0/tier2-manifest.json exist as diagnostic slices for tiered analysis and release artifacts, not as a broader support claim. The published surface includes repair outcome distribution, tier breakdowns, separate cargo test/cargo check/cargo clippy verifier-class tables, candidate lineage breakdowns, failure attribution, and execution diagnostics for misses. |
./target/release/mercury-cli fix --max-agents N |
Available with scoped benchmark evidence | Materially changes phased runtime dispatch with real parallel candidate execution and isolated candidate fanout. docs/benchmarks/ publishes representative runtime and cost curves for --max-agents 1,2,4,8 on the Tier 1 Rust beta lane, but those exact runs should not be treated as a broad convergence or repair-quality claim beyond the checked-in corpus and run ids. |
Generic workflow DSL / agent run |
Out of scope for tagged 1.0.0 GA | Intentionally deferred until the repair workflow is stronger. |
Mercury CLI should be evaluated like a repair system, not a chatbot shell.
Repair candidates are generated and verified in disposable repo-copy/worktree isolation under .mercury/worktrees/ instead of mutating the user worktree directly. This is filesystem/worktree isolation, not a process/container sandbox guarantee.
Rejected candidates are discarded with the workspace. Accepted candidates are copied back only after local verification succeeds.
Parse, test, and lint commands are local gates. Model output does not become an accepted repository change until those gates pass.
By default, repair verifier commands must resolve to direct allowlisted Rust or selected direct TypeScript verifier invocations (including supported env-prefix variants) without shell composition. End-to-end fix and CI repair flows support those allowlisted commands, but the TypeScript path remains a frozen experimental lane and narrower than the Rust lane; local watch --repair targeting remains Rust-only today. Shell composition is rejected unless MERCURY_ALLOW_UNSAFE_VERIFIER_COMMANDS=1 is explicitly set.
Planner and eval artifacts use strict JSON schemas where implemented in the runtime and harness. Critique output is still best-effort prose from Mercury 2, so it should be treated as advisory context rather than a schema-validated contract.
Run output is redacted for known API key marker lines and configured API-key env names before writing artifacts or replaying command logs. Every fix run and every watch cycle writes append-only audit events to audit.log in the run bundle.
Repair runs are expected to leave behind an artifact bundle under .mercury/runs/ with plan/candidate/verifier/timing/cost evidence for the path executed. watch --repair adds a watch-level record for the watched command and confirmation rerun when repair executes.
A successful watch-repair cycle in the current 1.0.0-beta.1 pre-release runtime should leave enough evidence to replay the decision:
summary-index.jsonwith the top-level decision, headline, failure reason rollup, candidate lineage counts, winning candidates when a repair was accepted, and pointers to the most important bundle fileswatch.jsonwith the watched command, decision, timestamps, and repair recordinitial.stdout.txtandinitial.stderr.txtfrom the failing commandinitial.failure.jsonwhen a structured failure parse is available for the initial command resultconfirmation.stdout.txtandconfirmation.stderr.txtwhen Mercury reruns the verifier after repairconfirmation.failure.jsonwhen a structured failure parse is available for the confirmation rerunaudit.logwith JSONL audit events for run start/plan/execution/decision- mirrored nested repair artifacts when the fix flow ran:
repair/diff.patch,repair/execution-summary.json,repair/final-verification.json,repair/metadata.json, plusrepair/plan.jsonandrepair/grounded-context.jsonwhen present in the sourcefixrun bundle - the source
fixartifact root recorded inwatch.jsonwhen you need the full nested repair bundle
For direct ./target/release/mercury-cli fix runs, the run bundle also includes:
summary-index.jsonas the operator-first summary for the fix bundleplan.jsonandassessments.jsonexecution-summary.jsonandfinal-verification.jsonwhen final verification ranagent-logs.jsonandthermal-aggregates.jsonmetadata.jsonaudit.logdiff.patchwhen an accepted candidate produced a final patchswarm-state.jsonwhen runtime state was captured
For the Mercury CI Auto-Repair Draft PR workflow, the uploaded evidence bundle includes:
artifact-index.jsonas the stable top-level CI artifact index and entrypoint into the bundlesummary.md,decision.json,environment.json, andpr-body.mdsummary.mdnow includes the nested Mercury run headline, failure reason rollup, candidate lineage, and winning candidate summary when a nested repair bundle was captureddecision.jsonmirrors those nested Mercury run highlights undermercury_runrepair.diff,repair.diffstat.txt, andgit-status.txtlogs/baseline.stdout.logandlogs/baseline.stderr.loglogs/repair.stdout.log,logs/repair.stderr.log,logs/post-repair.stdout.log, andlogs/post-repair.stderr.logwhen a repair attempt ranlogs/setup.stdout.logandlogs/setup.stderr.logwhensetup_commandwas usedlogs/mercury-init.stdout.logandlogs/mercury-init.stderr.logwhen workflow init was run- copied
mercury-run/artifacts when the workflow captured a nestedfixrun internal-error.txtwhen orchestration hits an unexpected internal failure
Minimum required by workflow contract before summary publishing:
artifact-index.jsonsummary.mddecision.jsonenvironment.jsonpr-body.mdrepair.diffrepair.diffstat.txtlogs/baseline.stdout.loglogs/baseline.stderr.log
Workflow status behavior:
verified_patch_readyandrepair_not_verifiedare non-blocking terminal statusesbaseline_not_reproduced,missing_api_key,setup_failed, andinternal_errorstill upload the evidence bundle but end the workflow as failed
The repo includes three manifest-driven eval assets:
evals/v0/manifest.json: 50-case Rust baseline corpus (rust-v0.3-seeded)evals/v0/tier0-manifest.json: 20-case Tier 0 Rust diagnostic slice (rust-v0.3-tier0)evals/v0/tier1-manifest.json: 35-case Tier 1 Rust repair beta lane (rust-v0.3-tier1)evals/v0/tier2-manifest.json: 15-case Tier 2 Rust diagnostic slice (rust-v0.3-tier2)evals/v1_typescript/manifest.json: 50-case frozen experimental TypeScript corpus (typescript-v1.0-seeded)
What that means today:
- the Rust baseline and TypeScript harnesses exercise reproducible red-state checks (
evals/v0/run.py,evals/v1_typescript/run.py) - the Tier 0 and Tier 2 manifests provide diagnostic slices so benchmark reporting can break out trivial repairs versus harder or unsupported classes without changing the public beta claim
- the Tier 1 manifest narrows public Rust repair claims to solvable compile, test, and lint failures
- the repo has the raw ingredients for a benchmark loop that can explain misses, not just total outcomes
What it does not mean yet:
- the checked-in Rust benchmark reports under
docs/benchmarks/are intentionally narrow: they cover the Tier 1 Rust beta manifest, the exact run ids listed there, and the published repair outcome, tier, verifier-class, candidate-lineage, failure-attribution, and execution-diagnostics slices for that lane, not a universal repair-quality claim - TypeScript harness pass/fail proves baseline fixture contract only; it is supportive evidence for a frozen experimental lane built on token-aware scanning and failure parsing, not a standalone end-to-end TypeScript repair quality benchmark
- you should treat these corpora as evaluation scaffolding, not finished market-grade benchmark reporting
- The current branch is aligned to
1.0.0-beta.1. Matching hyphenated tags publish GitHub prerelease binaries for that exact runtime surface and should not be read as a broader support commitment; plainv1.0.0remains reserved for the stable GA release line. - Official release archives are currently limited to macOS arm64 and Linux x86_64. Prefer matching release assets when a tag exists for the runtime you want; use source installs for unreleased branch-head behavior or platforms outside that matrix.
INCEPTION_API_KEYis provider-preferred;MERCURY_API_KEYremains backward-compatible fallback.- TypeScript support currently includes corpus coverage, token-aware repo mapping and symbol extraction, failure classification, and selected direct verifier-command support in
fixand CI repair paths. It remains a frozen experimental scoped lane, not a real-parser-backed peer to Rust repair quality, andwatch --repairremains Rust-only.
- Rust-first repair workflows, with the most operator-ready path being local Rust
watch --repairand Rust-firstfix/CI verifier flows - local
watch --repairfor supported direct Rust verifier commands - phased execution with isolated candidate workspaces and
--max-agents-driven fanout - artifact bundles for watch cycles and fix runs
- local verification before acceptance
- Mercury 2 for planning and critique
- Mercury Edit for focused edits
status --livecandidate, phase, and runtime observability via TTY pane or JSONL stream- verifier allowlisting, output redaction, and append-only audit logs
- frozen experimental TypeScript support for token-aware repo mapping/symbol extraction, failure parsing, and selected direct verifier commands in
fixand CI repair paths - official release archives for macOS arm64 and Linux x86_64
- manual or workflow-driven promotion of a verified run into a draft PR
- checked-in Rust benchmark evidence under
docs/benchmarks/with scrubbed machine-readable aggregates, repair outcome distribution, tier and verifier-class breakdowns, candidate lineage slices, execution diagnostics, and published--max-agentscurves for the current Tier 1 corpus - documented limits for incomplete surfaces in docs/known-limitations.md
- broader watch auto-repair coverage outside direct Rust verifier commands
- broader CI repair automation beyond the current workflow-dispatch draft-PR path
- broader conflict arbitration for overlapping edits beyond the current narrow runtime suppression path
- TypeScript expansion beyond the current frozen experimental selected direct verifier-command support stays gated behind stronger Rust benchmark outcomes
- benchmark expansion beyond the current Tier 1 Rust beta corpus, run ids, and methodology envelope published under
docs/benchmarks/ - richer live observability around conflict alerts, winner selection, and phase-routing telemetry beyond the current event stream
TypeScript note: token-aware repository mapping/symbol extraction, failure parsing, and selected direct verifier-command support are implemented in the current branch for fix and CI repair flows as a frozen experimental scoped lane. The repo does not currently ship a real TypeScript parser, so this should not be read as parity with Rust repair quality. Local watch --repair remains intentionally Rust-only.
- benchmark-backed
--max-agentsresults beyond the exact Rust corpus and run ids published underdocs/benchmarks/ - broad overlapping-edit convergence across arbitrarily many concurrent candidates
- broad language support beyond the current Rust-first repair surface and frozen experimental TypeScript support
- official release binaries beyond macOS arm64 and Linux x86_64
- zero-touch autonomous repair for every failing repo
Example .mercury/config.toml:
[api]
mercury2_endpoint = "https://api.inceptionlabs.ai/v1/chat/completions"
mercury_edit_endpoint = "https://api.inceptionlabs.ai/v1"
api_key_env = "INCEPTION_API_KEY"
[scheduler]
max_concurrency = 20
max_cost_per_command = 0.50
max_agents_per_command = 100
retry_limit = 3
backoff_base_ms = 500
[verification]
parse_before_write = true
test_after_write = true
lint_after_write = true
mercury2_critique_on_failure = true
test_command = "cargo test"
lint_command = "cargo clippy"Compatibility note: older configs may still reference MERCURY_API_KEY. Current docs prefer INCEPTION_API_KEY and treat the older name as a fallback.
Mercury CLI has four practical layers:
Planner: Mercury 2 turns a goal plus repository context into a bounded repair plan.Edit engine: Mercury Edit produces focused mutations and next-edit suggestions.Verifier: local parse, test, and lint commands decide whether a candidate is acceptable.Runtime: disposable workspaces, artifacts, cost tracking, and acceptance rules keep the workflow reproducible.
Implementation detail, trust boundaries, and roadmap notes live in docs/ARCHITECTURE.md.
cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-features --verboseSee CONTRIBUTING.md for development and release guidance.
See SECURITY.md for vulnerability reporting guidance.
Mercury CLI is source-available under a custom non-commercial license.
- You can use, modify, and share it for personal, educational, research, evaluation, and other non-commercial purposes.
- You cannot profit from it, use it in commercial operations, or deploy derivative works commercially without Dennis Palucki's prior written permission.
- Commercial terms are handled separately and may include revenue sharing or other negotiated terms.
See LICENSE for the binding terms and COMMERCIAL_LICENSE.md for the plain-English summary and contact path.