One clear decision for every AI change.
1. Download the binary for your OS (no repo clone needed):
# Linux
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval
# macOS (Apple Silicon)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval
# Windows (PowerShell)
Invoke-WebRequest -Uri https://github.com/geval-labs/geval/releases/latest/download/geval-windows-x86_64.exe -OutFile geval.exe2. Run the built-in demo (no files needed):
./geval demoYou’ll see a decision report and get PASS, REQUIRE_APPROVAL, or BLOCK. Same binary works in CI — no npm or pip. → Use in CI
Using your own files (e.g. after cloning the repo for examples):
./geval check --signals path/to/signals.json --policy path/to/policy.yaml --env prodIf the download fails (e.g. no binaries for your OS yet), build from source (requires Rust):
git clone https://github.com/geval-labs/geval.git && cd geval
cargo build --release --manifest-path geval/Cargo.toml
./geval/target/release/geval demoThe problem • What Geval does • CLI • Docs
Your team runs evals (evaluations that measure AI quality — accuracy, relevance, safety, hallucinations, latency). You also have A/B tests, human review, and business metrics. When you change a model or a prompt, you get a flood of signals:
- Evals say: “Accuracy improved.”
- A/B says: “Engagement dropped a bit.”
- Review says: “Edge case flagged.”
So: do you ship or not? Today that call often happens in Slack or a meeting — inconsistent, hard to audit, and easy to forget. Geval gives you one place to write the rules and one clear answer every time: ship, get approval first, or block.
Geval is a decision engine for AI releases. You feed it the outcomes of your evals and other signals (as simple data files). You define your policy in a single file: “If engagement drops, block. If hallucination rate is above X, block. If retrieval quality is below Y, require human approval.” Geval applies those rules in a fixed order and returns:
| Outcome | Meaning |
|---|---|
| PASS | Good to ship. No rule blocked it. |
| REQUIRE_APPROVAL | A rule says a human must approve before shipping. |
| BLOCK | A rule says do not ship until the issue is fixed. |
Every run is recorded (which policy and signals were used, which rule fired, when). So product managers, engineers, and auditors can always answer: “Why did we ship this?” and “Who approved it?”
In short: evals and other tools answer “What happened?” Geval answers “Given what happened, are we allowed to ship?”
| Command | What it does |
|---|---|
geval demo |
Run built-in example (no files). Use this first after downloading. |
geval check |
Run your signals + policy → PASS / REQUIRE_APPROVAL / BLOCK (exit 0 / 1 / 2) |
geval explain |
Show why (which rule, which signals) |
geval approve / geval reject |
Record human approval or rejection |
geval validate-policy |
Validate policy file |
| Guide | Description |
|---|---|
| GitHub Actions | Run Geval in CI (workflow YAML, exit codes) |
| Examples | Sample signals.json and policy.yaml |
| Installation | PATH, CI, and build-from-source (for contributors) |
| Developer workflow | PR → check → approve/reject |
| Auditing | Decision artifacts, hashes |
We welcome contributions. See CONTRIBUTING.md. Building from source (e.g. to contribute): see Installation → Build from source.
MIT © Geval Contributors