Skip to content

Latest commit

 

History

History
75 lines (59 loc) · 2.53 KB

File metadata and controls

75 lines (59 loc) · 2.53 KB

Running in CI

RAGProof is built to run as a quality gate. The gate exits non-zero when a metric breaches a threshold, so a pull request that degrades your pipeline fails the build.

Exit codes

Code Meaning
0 Gate passed
1 Gate failed: a quality threshold was breached
2 Execution error: the pipeline, judge, or store failed
3 Configuration error

CI can tell a real regression (1) apart from an outage (2). The JUnit output marks the two differently: threshold breaches are <failure>, execution problems are <error>.

Thresholds

Configure thresholds under gate in ragproof.yaml:

gate:
  baseline: latest        # optional, enables relative checks
  on_missing: fail        # fail | skip when a gated metric is absent
  min_samples: 30         # warn below this many scored cases
  thresholds:
    generation.groundedness:
      min: 0.85           # absolute floor
      max_drop: 0.03      # max allowed drop vs the baseline
      noise_floor: 0.02   # drops at or below this are treated as noise
    retrieval.mrr:
      min: 0.7

A metric fails on the absolute check when its mean is below min. It fails the relative check when it drops more than max_drop below the baseline, the drop is larger than noise_floor, and the drop is statistically confident. For each relative check RAGProof computes a bootstrap 95% confidence interval of the delta; if the interval still admits an acceptable drop, the metric warns instead of failing. This keeps judge noise from failing builds falsely.

Put tight thresholds on deterministic metrics and looser ones with a noise_floor on judge-backed metrics.

GitHub Action

The repo ships a reusable action. A minimal workflow:

name: ragproof
on: [pull_request]
jobs:
  gate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: sanmaxdev/ragproof@v1
        with:
          config: ragproof.yaml

The action runs the evaluation, gates it, uploads the HTML report as an artifact, and posts the Markdown summary as a sticky pull request comment. Provide your judge and target credentials as repository secrets and reference them in the environment of the job.

Docker

A container image is built from the repo Dockerfile for pipelines that prefer running the CLI in a container:

docker build -t ragproof .
docker run --rm -v "$PWD:/work" ragproof run --config ragproof.yaml