feat(evaluations): auto-pop schema + openspec change (chunk A)#3299
feat(evaluations): auto-pop schema + openspec change (chunk A)#3299
Conversation
Chunk A of openspec change `auto-populate-eval-datasets`. Lays the schema foundation for rule-driven dataset auto-population and delta evaluation runs. No runtime behaviour change yet — views, ingestion task, and auto-trigger logic land in subsequent chunks (B–E). - Add `DatasetAutoPopulationRule` with high-water-mark + status fields. - Add `DatasetIngestionEntry` with partial unique constraints providing per-rule, per-source idempotency for both message and session modes. - Add `EvaluationConfig.auto_run_on_append` opt-in flag. - Add `EvaluationRunType.DELTA` choice and `EvaluationRun.scoped_messages` M2M to support runs scoped to a subset of dataset messages. - Register `flag_auto_populate_eval_datasets` Waffle flag (gated under `flag_evaluations`). - Wire up admin entries for the two new models. - Include the openspec change definition (proposal/design/specs/tasks). Migration is additive only — defaults preserve existing-row behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Product Description
No user-facing change in this PR. Lays the schema foundation for the upcoming auto-populate evaluation datasets feature, where teams can configure rules that automatically append matching sessions/messages from a source experiment to an evaluation dataset, and optionally trigger a delta evaluation run on each append. UI, ingestion task, and auto-trigger logic land in subsequent chunks.
Technical Description
This is Chunk A of the openspec change auto-populate-eval-datasets. The full proposal, design, specs, and task list are committed under `openspec/changes/auto-populate-eval-datasets/`.
Schema additions (additive only, no backfills):
Migration: `0015_evaluationconfig_auto_run_on_append_and_more.py`. New nullable / defaulted columns and two new tables; no existing data is rewritten.
Out of scope for this PR (chunks B–E): rule create/edit forms and views, the periodic ingestion task, the dataset-append → delta-run hook, the run-history UI for delta runs, integration tests, docs.
Validation:
Migrations
Demo
n/a — schema-only PR, no runtime behaviour change. Demo will accompany Chunk C (ingestion task) and D (auto-trigger).
Docs and Changelog
The openspec change definition itself ships in this PR (proposal/design/specs/tasks). Feature-flag and developer docs land alongside chunks C/D when the user-facing behaviour exists.
🤖 Generated with Claude Code