Skip to content

[EVAL] Big-Bench Extra Hard (BBEH) #600

@lewtun

Description

@lewtun

Evaluation short description

Google has releases BBEH as a way to compensate for the saturation of BBH in the latest generation of LLMs. Overall looks like a good benchmark to probe reasoning capabilities.

Evaluation metadata

Provide all available

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions