safety-evals

Here are 3 public repositories matching this topic...

Portable evaluation bundles for agents and agent-shaped workflows: bounded, reproducible, regression-aware proof surfaces for quality claims.

Production-path evals for AI agent behavior: persona drift, safety boundaries, and evidence discipline.

gemini wretch ai-agents evals llm-evals agent-evals openclaw safety-evals persona-evals

Production-minded LLM eval harness for safety, reliability, cost, and latency analysis.

Add a description, image, and links to the safety-evals topic page so that developers can more easily learn about it.

To associate your repository with the safety-evals topic, visit your repo's landing page and select "manage topics."