Skip to content

Parallelize TPU CI tests #9071

@tengyifei

Description

@tengyifei

🚀 Parallelize TPU CI tests

Motivation

Our TPU CI takes over an hour. Every PR must pass TPU CI for stability and correctness. That makes PR submission slow.

Pitch

  • Sample the running time of tests based on recent logs
  • Divide the tests into 2-3 groups
  • Change the CI logic to run these groups in parallel, similar to how we already do things for CPU and GPU tests.

Additional context

Currently we have 32 TPU CI runners. If the number of PRs grow, we may need to request additional TPU resources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CICI related changeenhancementNew feature or requesttestingTesting and coverage related issues.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions