Preprint paper package — Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents (Zenodo DOI 10.5281/zenodo.20034550)
research open-science ai-agents preprint tool-use agent-evaluation llm-reliability workflow-evaluation artifact-paper operational-scorecards
-
Updated
May 16, 2026 - Python