Question
I have a test set with 40+ test cases. I primarily use custom evaluators, some of which also use LLM calls to evaluate an output. Some cases take a very long time to execute and potentially even time out. This regularly causes my whole experiment / test set evaluation to crash.
Is there a way how to limit the execution time for a test case? Or some other strategy to handle long-running test cases and prevent them from crashing the whole evolution?
Additional Context
- Python 3.13.3
- pydantic-ai 1.0.0