New Backend Tester Harness & CI #13622
GregoryComer
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As part of the backend hardening workstream (RFC), we have a new test harness and jobs running in nightly CI, covering lowering flows for P0 backends. This harness runs a wide variety of operator and model-level tests using pybindings, and gives detailed statistics on delegated operators, output accuracy, lowering time, and more. XNNPACK, Core ML, Vulkan, and QNN are integrated, and the framework is intended to be extended for all backends.
Summary
The goal of this effort is to create test infrastructure for “black box” testing of hardware backends and lowering flows. The tests are written to validate lowering and runtime behavior across a variety of operators and models, in order to systematically find and surface bugs in backends, including running the generated PTEs using pybindings.
Expectations for Backends
As described in the RFC, the goal of this effort is to ensure that our backends can reasonably lower and run arbitrary graphs. Performance is not in scope, nor is backend-specific behavior. Each backend is tested as a black box.
As of now, there is no expectation that backends pass all tests in the suites. Instead, it is intended as a tool to systemically surface issues, which backend authors can prioritize and resolve.
Tests do not assert that backends support specific operators or features. Backends are free to delegate nothing and will pass all tests. If a backend chooses to partition a node, the test validates that the lowering process completes without error, the resultant PTE file can be loaded and run, and that the outputs are reasonably close to the reference implementation.
Accessing Results
The test suite runs in the ExecuTorch nightly CI workflow. Test results are surfaced from the job logs, as a downloadable CSV report, and through a markdown job summary.
To see the latest run results:
The most recent run (at the time of writing) is https://github.com/pytorch/executorch/actions/runs/17115506183. It can also be found through the HUD. Searching for "backend-test" is the easiest way to filter down.
You’ll see a bunch of jobs, based on the matrix of { backend, quantization, suite }. Each job generates a markdown summary with a table of failing test cases, viewable in the browser. Scrolling to the bottom to the “Artifacts” section allows for downloading the full test report for each job, in CSV format.
Interpreting Results
For each test case, the harness collects the following information:
To further drill down, or to access the full run output, see either the full CI job logs or repro the test locally.
Running Locally (Reproing Failures)
To run tests, use the the test driver code, which provides CLI args to filter tests by backend, test ID, and suite. There are currently two tests suites - models and operators.
To run a specific test from the operator suite,
To run all model tests for a backend,
Each test in the harness is also a normal Python unittest test and can be run standalone using standard Python unit test commands. The Test ID column corresponds to the name of the Python test function.
CLI Driver Args
The executorch.backends.test.suite.runner script is a wrapper / driver for the underlying unittest test cases. It provides CLI arguments to control which tests are run.
Usage:
Backends and Flows
Each backend exposes one or more test flows, which includes quantization, lowering configuration, and any other relevant parts of the lowering recipe.
For example, XNNPACK registers the following flows: xnnpack (unquantized), xnnpack_static_int8_per_channel, xnnpack_static_int8_per_tensor, and xnnpack_dynamic_int8_per_channel.
See backends/test/suite/flows for a full reference of the registered flows for each backend.
Integrating a New Backend
What follows is a brief guide on integrating a backend with the test infrastructure. More details are landing in the README and documentation.
a. This can be via native execution or a simulator, such as QNN uses.
b. Ensure that the backend libraries are linked into the pybind library (see Link QNN backend to pybinding lib when built #13467 for an example).
a. See XNNPACK's or QNN's implementations.
a. See XNNPACK's or other implementations under backends/test/suite/flows.
Note that this is subject to change as the framework evolves.
Beta Was this translation helpful? Give feedback.
All reactions