Skip to content

Latest commit

 

History

History
93 lines (71 loc) · 3.69 KB

File metadata and controls

93 lines (71 loc) · 3.69 KB

Dashboard

RAGProof ships a local web dashboard for analyzing runs: triaging failed cases, comparing runs, and watching quality trends. The CLI stays the write path and the CI surface; the dashboard is a read-only viewer over the same store.

Run it

Install the ui extra, then start the dashboard:

pip install 'ragproof[ui]'
ragproof ui --config ragproof.yaml

It binds 127.0.0.1:8484 and opens your browser. The dashboard reads the SQLite store named in the config, so it shows exactly the runs the CLI recorded.

Options:

  • --port change the port
  • --host bind elsewhere (a warning prints, since the dashboard has no auth)
  • --no-browser do not open the browser
  • --dev serve the API only, for use behind the Vite dev server

Screens

  • Runs the home table: one row per run with status, pinned metric scores and micro-bars, and the delta against the previous run. Running runs poll and update live.
  • Run detail four tabs. Overview shows a card per metric with mean, percentiles, a distribution histogram, and the worst cases. Cases is the triage grid with filters and worst-first sort. Gate renders the same verdict the CLI produces, including confidence intervals. Metadata shows hashes, judge model, prompt versions, and cost.
  • Case panel opens from any case: question, answer, retrieved chunks with the cited ones highlighted, the per-claim groundedness checklist, and the raw judge output. Press Esc to close.
  • Compare two runs side by side with per-metric deltas and a different-dataset warning.
  • Trends one chart per metric over time on a fixed 0 to 1 scale; click a point to open that run.
  • Datasets every dataset with its kind breakdown, frozen hash, and the runs over it.
  • Calibration the judge prompts with fixtures, and the command to measure agreement.

Press Ctrl/Cmd-K anywhere for the command palette to jump to a screen or run, or to start an action.

Control panel

The dashboard is not only a viewer. From the top bar you can start operations that run as background jobs on the server:

  • New run evaluates the configured pipeline, with an optional label and a bypass-cache toggle.
  • Generate dataset builds a dataset from a corpus folder (needs RAGPROOF_GEN_MODEL).
  • Calibrate judge and Check setup run with one click.
  • On any run, Re-run repeats it and Report builds the HTML, Markdown, and JUnit artifacts.

Every action opens on the Jobs screen, which streams the job's log lines live and, when finished, links to the resulting run or offers the report files to download. Actions map one to one onto the CLI commands and run with the same config, so the dashboard and the CLI stay interchangeable.

The Config screen shows the active configuration. It is read-only: editing configuration from a browser is intentionally not supported, so the config on disk stays the single source of truth. Edit ragproof.yaml and restart to change it.

Guarantees

Every number in the dashboard comes from the same code paths as the CLI (reports/data.py, gate.py), so the UI and ragproof gate never disagree on the same store. The dashboard makes no external network requests: fonts and assets are bundled and the Content-Security-Policy is locked to self.

The server binds 127.0.0.1 by default and has no authentication, so its actions have the same reach as running the CLI on that machine and no more. Only bind a non-local host on a network you trust.

Building from source

The compiled bundle ships in release wheels, so pip install 'ragproof[ui]' needs no Node. To build it from a checkout:

cd frontend
npm ci
npm run build

That writes the bundle into ragproof/ui/static/, which the server serves.