Skip to content

Gracefully handle errors in evals #2295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

dmontagu
Copy link
Contributor

@dmontagu dmontagu commented Jul 24, 2025

Also adds retry functionality. Builds on top of the work in #2282.

Still need to:

  • update the report printing to show errors in the terminal
  • make sure this change doesn't break the logfire display of pydantic-evals spans
  • probably add tests and docs

Note: I think this is technically a breaking change because it adds some fields to the ReportCase and EvaluationReport classes, so deserialization might not work on existing data. Given we are still 0.X I guess it's worth a bump to minor version(?) but I expect it won't be very disruptive in practice..

Copy link

github-actions bot commented Jul 24, 2025

Docs Preview

commit: 3ce4de1
Preview URL: https://d3ecda74-pydantic-ai-previews.pydantic.workers.dev

@dmontagu dmontagu force-pushed the dmontagu/graceful-evals-error-handling branch from 5fe2ebf to 6bb2a3f Compare July 24, 2025 04:20
Base automatically changed from dmontagu/retry-handling to main July 25, 2025 17:34
except Exception as e:
return [
EvaluatorFailure(
name=evaluator.get_default_evaluation_name(), error_msg=f'{type(e).__name__}: {e}', source=evaluator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we attach the exception itself so it can be read programmatically to get the stack trace if so desired? It wouldn't be serialized of course.

inputs=case.inputs,
metadata=case.metadata,
expected_output=case.expected_output,
error_msg=f'{type(exc).__name__}: {exc}',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with EvaluatorFailure, I'd like to include the exception itself

@DouweM
Copy link
Collaborator

DouweM commented Jul 25, 2025

Note: I think this is technically a breaking change because it adds some fields to the ReportCase and EvaluationReport classes, so deserialization might not work on existing data. Given we are still 0.X I guess it's worth a bump to minor version(?) but I expect it won't be very disruptive in practice..

@dmontagu Can't we handle missing fields by just using the default value of []?

Copy link

github-actions bot commented Aug 2, 2025

This PR is stale, and will be closed in 3 days if no reply is received.

@github-actions github-actions bot added the Stale label Aug 2, 2025
@DouweM DouweM assigned dmontagu and unassigned DouweM Aug 4, 2025
@DouweM DouweM removed the Stale label Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants