@@ -65,9 +65,9 @@ The first two layers alone catch most regressions — fully offline, zero cost.
6565### The workflow
6666
6767``` bash
68- evalview capture --agent http://localhost:8000/invoke # 1. Record real interactions
69- evalview snapshot # 2. Save as baseline
70- evalview check # 3. Catch regressions
68+ evalview generate --agent http://localhost:8000 # 1. Draft a regression suite
69+ evalview snapshot tests/generated --approve-generated # 2. Approve + baseline
70+ evalview check tests/generated # 3. Catch regressions
7171evalview monitor # 4. Watch continuously (+ Slack alerts)
7272# ✅ All clean — or ❌ REGRESSION: score 85 → 71
7373```
@@ -76,7 +76,9 @@ evalview monitor # 4. Watch continuously
7676
7777Choose the shortest path for your use case:
7878
79- - New project: ` evalview capture --agent ... ` → ` evalview snapshot ` → ` evalview check `
79+ - New project, no traffic yet: ` evalview generate --agent ... ` → ` evalview snapshot --approve-generated ` → ` evalview check `
80+ - Existing traffic or staging logs: ` evalview generate --from-log traffic.jsonl `
81+ - Production-shaped tests from real usage: ` evalview capture --agent ... ` → ` evalview snapshot ` → ` evalview check `
8082- Existing tests, no baselines yet: ` evalview snapshot `
8183- CI gate for regressions: [ Golden Traces] ( docs/GOLDEN_TRACES.md ) and [ CI/CD Integration] ( docs/CI_CD.md )
8284- Framework-specific setup: [ Framework Support] ( docs/FRAMEWORK_SUPPORT.md )
@@ -245,7 +247,23 @@ evalview check --semantic-diff
245247pip install evalview
246248```
247249
248- ### Step 1 — Capture real interactions as tests
250+ ### Step 1 — Generate or capture tests
251+
252+ If you have no test suite yet, start with generation:
253+
254+ ``` bash
255+ evalview generate --agent http://localhost:8000
256+ # Writes draft YAML tests to tests/generated/
257+ # Also writes tests/generated/generated.report.json for CI review
258+ ```
259+
260+ If you already have logs from staging or production:
261+
262+ ``` bash
263+ evalview generate --from-log traffic.jsonl
264+ ```
265+
266+ If you want tests based on real user flows instead of planned probes:
249267
250268``` bash
251269evalview capture --agent http://localhost:8000/invoke
@@ -254,9 +272,19 @@ evalview capture --agent http://localhost:8000/invoke
254272# Tests are saved to tests/test-cases/ automatically
255273```
256274
257- > ** Why capture first?** Tests from real usage catch real regressions. Auto-generated tests from guessed queries score poorly and give you false confidence.
275+ > ** When to use which?**
276+ > ` generate ` is the fastest path from zero to a draft suite.
277+ > ` capture ` is the highest-signal path when you already have real usage to replay.
278+
279+ ### Step 2 — Review and save as your baseline
258280
259- ### Step 2 — Save as your baseline
281+ Generated tests are draft-only until you approve them:
282+
283+ ``` bash
284+ evalview snapshot tests/generated --approve-generated
285+ ```
286+
287+ Captured or hand-written tests snapshot normally:
260288
261289``` bash
262290export OPENAI_API_KEY=' your-key' # for LLM-as-judge scoring
@@ -269,6 +297,14 @@ evalview snapshot
269297evalview check # run this after every change
270298```
271299
300+ ### Review generated suites in CI
301+
302+ ``` bash
303+ evalview ci comment --results tests/generated/generated.report.json --dry-run
304+ ```
305+
306+ That review comment summarizes discovered tools, generated behavior paths, coverage gaps, and the approval workflow before baselining.
307+
272308### No agent yet? Try the demo
273309
274310``` bash
0 commit comments