RAGProof is built to run as a quality gate. The gate exits non-zero when a metric breaches a threshold, so a pull request that degrades your pipeline fails the build.
| Code | Meaning |
|---|---|
| 0 | Gate passed |
| 1 | Gate failed: a quality threshold was breached |
| 2 | Execution error: the pipeline, judge, or store failed |
| 3 | Configuration error |
CI can tell a real regression (1) apart from an outage (2). The JUnit output
marks the two differently: threshold breaches are <failure>, execution
problems are <error>.
Configure thresholds under gate in ragproof.yaml:
gate:
baseline: latest # optional, enables relative checks
on_missing: fail # fail | skip when a gated metric is absent
min_samples: 30 # warn below this many scored cases
thresholds:
generation.groundedness:
min: 0.85 # absolute floor
max_drop: 0.03 # max allowed drop vs the baseline
noise_floor: 0.02 # drops at or below this are treated as noise
retrieval.mrr:
min: 0.7
A metric fails on the absolute check when its mean is below min. It fails the
relative check when it drops more than max_drop below the baseline, the drop
is larger than noise_floor, and the drop is statistically confident. For each
relative check RAGProof computes a bootstrap 95% confidence interval of the
delta; if the interval still admits an acceptable drop, the metric warns
instead of failing. This keeps judge noise from failing builds falsely.
Put tight thresholds on deterministic metrics and looser ones with a
noise_floor on judge-backed metrics.
The repo ships a reusable action. A minimal workflow:
name: ragproof
on: [pull_request]
jobs:
gate:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: sanmaxdev/ragproof@v1
with:
config: ragproof.yaml
The action runs the evaluation, gates it, uploads the HTML report as an artifact, and posts the Markdown summary as a sticky pull request comment. Provide your judge and target credentials as repository secrets and reference them in the environment of the job.
A container image is built from the repo Dockerfile for pipelines that prefer running the CLI in a container:
docker build -t ragproof .
docker run --rm -v "$PWD:/work" ragproof run --config ragproof.yaml