Set combined_score = -inf for failed evaluations #198

mechakotik · 2025-08-11T11:44:34Z

Failed evaluation should never have better score than successful one.

codelion · 2025-08-13T01:18:02Z

This is not necessary as the evaluator.py can set the score to whatever needed for failed evaluations for a given example.

mechakotik · 2025-08-13T06:24:02Z

It is useful when evaluator.py itself fails (syntax error, timeout, unhandled exception). Surely you can make an evaluator that takes care of all of this, but it's a nice fallback to not ruin all the progress if you didn't handle some edge case properly.

Set combined_score = -inf for failed evaluations

c082b15

Failed evaluation should never have better score than successful one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set combined_score = -inf for failed evaluations #198

Set combined_score = -inf for failed evaluations #198

Uh oh!

mechakotik commented Aug 11, 2025

Uh oh!

codelion commented Aug 13, 2025

Uh oh!

mechakotik commented Aug 13, 2025

Uh oh!

Uh oh!

Set combined_score = -inf for failed evaluations #198

Are you sure you want to change the base?

Set combined_score = -inf for failed evaluations #198

Uh oh!

Conversation

mechakotik commented Aug 11, 2025

Uh oh!

codelion commented Aug 13, 2025

Uh oh!

mechakotik commented Aug 13, 2025

Uh oh!

Uh oh!