Skip to content

Commit e3a96d2

Browse files
committed
Clean up emojis - keep in tables, remove from bullet points
1 parent 2ad7ff1 commit e3a96d2

File tree

1 file changed

+38
-38
lines changed

1 file changed

+38
-38
lines changed

README.md

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,10 @@ EvalView is a **testing framework for AI agents**.
4242

4343
It lets you:
4444

45-
- 🧪 **Write tests in YAML** that describe inputs, expected tools, and acceptance thresholds
46-
- 🔁 **Turn real conversations into regression suites** (record → generate tests → re-run on every change)
47-
- 🚦 **Gate deployments in CI** on behavior, tool calls, cost, and latency
48-
- 🧩 Plug into **LangGraph, CrewAI, OpenAI Assistants, Anthropic Claude, HTTP agents**, and more
45+
- **Write tests in YAML** that describe inputs, expected tools, and acceptance thresholds
46+
- **Turn real conversations into regression suites** (record → generate tests → re-run on every change)
47+
- **Gate deployments in CI** on behavior, tool calls, cost, and latency
48+
- Plug into **LangGraph, CrewAI, OpenAI Assistants, Anthropic Claude, HTTP agents**, and more
4949

5050
Think: _"pytest / Playwright mindset, but for multi-step agents and tool-calling workflows."_
5151

@@ -122,10 +122,10 @@ evalview quickstart
122122

123123
You'll see a full run with:
124124

125-
- A demo agent spinning up
126-
- A test case created for you
127-
- A config file wired up
128-
- 📊 A scored test: tools used, output quality, cost, latency
125+
- A demo agent spinning up
126+
- A test case created for you
127+
- A config file wired up
128+
- A scored test: tools used, output quality, cost, latency
129129

130130
### Run examples directly (no config needed)
131131

@@ -259,10 +259,10 @@ Database config is optional – EvalView only uses it if you enable it in config
259259
260260
## Why EvalView?
261261
262-
- 🔓 **Fully Open Source** – Apache 2.0 licensed, runs entirely on your infra, no SaaS lock-in
263-
- 🔌 **Framework-agnostic** – Works with LangGraph, CrewAI, OpenAI, Anthropic, or any HTTP API
264-
- 🚀 **Production-ready** – Parallel execution, CI/CD integration, configurable thresholds
265-
- 🧩 **Extensible** – Custom adapters, evaluators, and reporters for your stack
262+
- **Fully Open Source** – Apache 2.0 licensed, runs entirely on your infra, no SaaS lock-in
263+
- **Framework-agnostic** – Works with LangGraph, CrewAI, OpenAI, Anthropic, or any HTTP API
264+
- **Production-ready** – Parallel execution, CI/CD integration, configurable thresholds
265+
- **Extensible** – Custom adapters, evaluators, and reporters for your stack
266266
267267
---
268268
@@ -357,7 +357,7 @@ $ evalview run
357357
358358
---
359359
360-
## 🚀 Generate 1000 Tests from 1
360+
## Generate 1000 Tests from 1
361361
362362
**Problem:** Writing tests manually is slow. You need volume to catch regressions.
363363
@@ -387,9 +387,9 @@ evalview record --interactive
387387
```
388388

389389
EvalView captures:
390-
- Query → Tools called → Output
391-
- Auto-generates test YAML
392-
- Adds reasonable thresholds
390+
- Query → Tools called → Output
391+
- Auto-generates test YAML
392+
- Adds reasonable thresholds
393393

394394
**Result:** Go from 5 manual tests → 500 comprehensive tests in minutes.
395395

@@ -411,40 +411,40 @@ evalview run
411411
```
412412

413413
Supports 7+ frameworks with automatic detection:
414-
LangGraph • CrewAI • OpenAI Assistants • Anthropic Claude • AutoGen • Dify • Custom APIs
414+
LangGraph • CrewAI • OpenAI Assistants • Anthropic Claude • AutoGen • Dify • Custom APIs
415415

416416
---
417417

418-
## ☁️ EvalView Cloud (Coming Soon)
418+
## EvalView Cloud (Coming Soon)
419419

420420
We're building a hosted version:
421421

422-
- 📊 **Dashboard** - Visual test history, trends, and pass/fail rates
423-
- 👥 **Teams** - Share results and collaborate on fixes
424-
- 🔔 **Alerts** - Slack/Discord notifications on failures
425-
- 📈 **Regression detection** - Automatic alerts when performance degrades
426-
- **Parallel runs** - Run hundreds of tests in seconds
422+
- **Dashboard** - Visual test history, trends, and pass/fail rates
423+
- **Teams** - Share results and collaborate on fixes
424+
- **Alerts** - Slack/Discord notifications on failures
425+
- **Regression detection** - Automatic alerts when performance degrades
426+
- **Parallel runs** - Run hundreds of tests in seconds
427427

428-
👉 **[Join the waitlist](https://form.typeform.com/to/EQO2uqSa)** - be first to get access
428+
**[Join the waitlist](https://form.typeform.com/to/EQO2uqSa)** - be first to get access
429429

430430
---
431431

432432
## Features
433433

434-
- 🚀 **Test Expansion** - Generate 100+ test variations from a single seed test
435-
- 🎥 **Test Recording** - Auto-generate tests from live agent interactions
436-
- **YAML-based test cases** - Write readable, maintainable test definitions
437-
- **Parallel execution** - Run tests concurrently (8x faster by default)
438-
- 📊 **Multiple evaluation metrics** - Tool accuracy, sequence correctness, output quality, cost, and latency
439-
- 🤖 **LLM-as-judge** - Automated output quality assessment
440-
- 💰 **Cost tracking** - Automatic cost calculation based on token usage
441-
- 🔌 **Universal adapters** - Works with any HTTP or streaming API
442-
- 🎨 **Rich console output** - Beautiful, informative test results
443-
- 📁 **JSON & HTML reports** - Interactive HTML reports with Plotly charts
444-
- 🔄 **Retry logic** - Automatic retries with exponential backoff for flaky tests
445-
- 👀 **Watch mode** - Re-run tests automatically on file changes
446-
- ⚖️ **Configurable weights** - Customize scoring weights globally or per-test
447-
- 📊 **Statistical mode** - Run tests N times, get variance metrics and flakiness scores
434+
- **Test Expansion** - Generate 100+ test variations from a single seed test
435+
- **Test Recording** - Auto-generate tests from live agent interactions
436+
- **YAML-based test cases** - Write readable, maintainable test definitions
437+
- **Parallel execution** - Run tests concurrently (8x faster by default)
438+
- **Multiple evaluation metrics** - Tool accuracy, sequence correctness, output quality, cost, and latency
439+
- **LLM-as-judge** - Automated output quality assessment
440+
- **Cost tracking** - Automatic cost calculation based on token usage
441+
- **Universal adapters** - Works with any HTTP or streaming API
442+
- **Rich console output** - Beautiful, informative test results
443+
- **JSON & HTML reports** - Interactive HTML reports with Plotly charts
444+
- **Retry logic** - Automatic retries with exponential backoff for flaky tests
445+
- **Watch mode** - Re-run tests automatically on file changes
446+
- **Configurable weights** - Customize scoring weights globally or per-test
447+
- **Statistical mode** - Run tests N times, get variance metrics and flakiness scores
448448

449449
---
450450

0 commit comments

Comments
 (0)