Should we go with something like this? Similar to https://github.com/google/BIG-bench agml/ agml/benchmark_tasks/ agml/models/ docs/