Distinguishing model failure vs task inadmissibility in evaluation #1089

finkeissen · 2026-02-26T12:06:07Z

finkeissen
Feb 26, 2026

In some evaluation workflows, a task can become ill-posed rather than simply failed.

For example:

Do you distinguish between model failure and task inadmissibility during evaluation?

I’m curious whether this is tracked explicitly in Kiln workflows or treated as a regular failure.