-
Notifications
You must be signed in to change notification settings - Fork 329
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needednew-taskscience-team
Description
Evaluation short description
Google has releases BBEH as a way to compensate for the saturation of BBH in the latest generation of LLMs. Overall looks like a good benchmark to probe reasoning capabilities.
Evaluation metadata
Provide all available
- Paper url: https://arxiv.org/pdf/2502.19187
- Github url: https://github.com/google-deepmind/bbeh
- Dataset url:
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needednew-taskscience-team