Skip to content

Commit 58e49da

Browse files
committed
readme changes
1 parent 9b3da2f commit 58e49da

File tree

1 file changed

+22
-4
lines changed

1 file changed

+22
-4
lines changed

README.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
<p align="center">
55
<img src="assets/logo.png" alt="EvalView" width="350">
66
<br>
7-
<strong>Regression testing for AI agents.</strong><br>
8-
AI agent testing for CI/CD: generate tests, snapshot behavior, detect regressions, and block broken tool-calling agents before production.
7+
<strong>Regression guardrails for agents.</strong><br>
8+
Generate tests, snapshot behavior, and catch silent regressions in CI before they hit production.
99
</p>
1010

1111
<p align="center">
@@ -23,10 +23,10 @@
2323

2424
---
2525

26-
EvalView is an **AI agent testing** and **LLM agent regression testing** tool for teams shipping tool-calling agents. It helps you:
26+
EvalView is a **regression guardrail** for teams shipping agents, especially tool-calling agents. It helps you:
2727
- generate your first agent test suite from a URL or traffic logs
2828
- snapshot a golden baseline for agent behavior
29-
- detect tool-call, sequence, output, cost, and latency regressions
29+
- detect tool-call, sequence, output, cost, latency, and multi-turn regressions
3030
- run AI agent tests in CI/CD before shipping changes
3131

3232
Use EvalView when you need:
@@ -75,6 +75,24 @@ The first two layers alone catch most regressions — fully offline, zero cost.
7575

7676
**Your data stays local.** Nothing is sent to EvalView servers — all processing happens on your machine.
7777

78+
### Multi-turn regressions are first-class
79+
80+
EvalView does not stop at single prompt/output checks. It can catch regressions where an agent skips a clarification question, asks the wrong follow-up, or takes the wrong tool path on turn two.
81+
82+
```yaml
83+
tests:
84+
- name: refund_flow_requires_clarification
85+
conversation:
86+
- user: "I want a refund"
87+
expected:
88+
assistant_contains: ["order number"]
89+
- user: "Order 4812"
90+
expected:
91+
tools_called: ["lookup_order", "check_policy"]
92+
```
93+
94+
That matters because many real agent failures happen after the first turn, when the agent has to remember context, ask a clarifying question, or decide whether to act.
95+
7896
### The workflow
7997
8098
```bash

0 commit comments

Comments
 (0)