Skip to content

overseek944/geval

 
 

Repository files navigation

Geval

Geval

One clear decision for every AI change.

Release MIT License CI


Install & try (under a minute)

1. Download the binary for your OS (no repo clone needed):

# Linux
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval

# macOS (Apple Silicon)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval

# Windows (PowerShell)
Invoke-WebRequest -Uri https://github.com/geval-labs/geval/releases/latest/download/geval-windows-x86_64.exe -OutFile geval.exe

2. Run the built-in demo (no files needed):

./geval demo

You’ll see a decision report and get PASS, REQUIRE_APPROVAL, or BLOCK. Same binary works in CI — no npm or pip. → Use in CI

Using your own files (e.g. after cloning the repo for examples):

./geval check --signals path/to/signals.json --policy path/to/policy.yaml --env prod

If the download fails (e.g. no binaries for your OS yet), build from source (requires Rust):

git clone https://github.com/geval-labs/geval.git && cd geval
cargo build --release --manifest-path geval/Cargo.toml
./geval/target/release/geval demo

The problemWhat Geval doesCLIDocs


The problem

Your team runs evals (evaluations that measure AI quality — accuracy, relevance, safety, hallucinations, latency). You also have A/B tests, human review, and business metrics. When you change a model or a prompt, you get a flood of signals:

  • Evals say: “Accuracy improved.”
  • A/B says: “Engagement dropped a bit.”
  • Review says: “Edge case flagged.”

So: do you ship or not? Today that call often happens in Slack or a meeting — inconsistent, hard to audit, and easy to forget. Geval gives you one place to write the rules and one clear answer every time: ship, get approval first, or block.


What Geval does

Geval is a decision engine for AI releases. You feed it the outcomes of your evals and other signals (as simple data files). You define your policy in a single file: “If engagement drops, block. If hallucination rate is above X, block. If retrieval quality is below Y, require human approval.” Geval applies those rules in a fixed order and returns:

Outcome Meaning
PASS Good to ship. No rule blocked it.
REQUIRE_APPROVAL A rule says a human must approve before shipping.
BLOCK A rule says do not ship until the issue is fixed.

Every run is recorded (which policy and signals were used, which rule fired, when). So product managers, engineers, and auditors can always answer: “Why did we ship this?” and “Who approved it?”

In short: evals and other tools answer “What happened?” Geval answers “Given what happened, are we allowed to ship?”


CLI

Command What it does
geval demo Run built-in example (no files). Use this first after downloading.
geval check Run your signals + policy → PASS / REQUIRE_APPROVAL / BLOCK (exit 0 / 1 / 2)
geval explain Show why (which rule, which signals)
geval approve / geval reject Record human approval or rejection
geval validate-policy Validate policy file

Documentation

Guide Description
GitHub Actions Run Geval in CI (workflow YAML, exit codes)
Examples Sample signals.json and policy.yaml
Installation PATH, CI, and build-from-source (for contributors)
Developer workflow PR → check → approve/reject
Auditing Decision artifacts, hashes

Contributing

We welcome contributions. See CONTRIBUTING.md. Building from source (e.g. to contribute): see Installation → Build from source.


License

MIT © Geval Contributors


WebsiteReleasesGitHub

About

Eval-driven release gates for AI applications

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • TypeScript 82.2%
  • Rust 17.5%
  • Python 0.3%