New Backend Tester Harness & CI #13622

GregoryComer · 2025-08-23T01:08:58Z

GregoryComer
Aug 23, 2025
Collaborator

As part of the backend hardening workstream (RFC), we have a new test harness and jobs running in nightly CI, covering lowering flows for P0 backends. This harness runs a wide variety of operator and model-level tests using pybindings, and gives detailed statistics on delegated operators, output accuracy, lowering time, and more. XNNPACK, Core ML, Vulkan, and QNN are integrated, and the framework is intended to be extended for all backends.

Summary

The goal of this effort is to create test infrastructure for “black box” testing of hardware backends and lowering flows. The tests are written to validate lowering and runtime behavior across a variety of operators and models, in order to systematically find and surface bugs in backends, including running the generated PTEs using pybindings.

Expectations for Backends

As described in the RFC, the goal of this effort is to ensure that our backends can reasonably lower and run arbitrary graphs. Performance is not in scope, nor is backend-specific behavior. Each backend is tested as a black box.

As of now, there is no expectation that backends pass all tests in the suites. Instead, it is intended as a tool to systemically surface issues, which backend authors can prioritize and resolve.

Tests do not assert that backends support specific operators or features. Backends are free to delegate nothing and will pass all tests. If a backend chooses to partition a node, the test validates that the lowering process completes without error, the resultant PTE file can be loaded and run, and that the outputs are reasonably close to the reference implementation.

Accessing Results

The test suite runs in the ExecuTorch nightly CI workflow. Test results are surfaced from the job logs, as a downloadable CSV report, and through a markdown job summary.

To see the latest run results:

Navigate to the ExecuTorch GitHub repo.
Select “Actions” from the top bar.
Find the “Nightly” job on the left.
Click on the latest nightly run.

The most recent run (at the time of writing) is https://github.com/pytorch/executorch/actions/runs/17115506183. It can also be found through the HUD. Searching for "backend-test" is the easiest way to filter down.

You’ll see a bunch of jobs, based on the matrix of { backend, quantization, suite }. Each job generates a markdown summary with a table of failing test cases, viewable in the browser. Scrolling to the bottom to the “Artifacts” section allows for downloading the full test report for each job, in CSV format.

Interpreting Results

For each test case, the harness collects the following information:

Pass/Fail
The stage at which it failed (Quantization / Lowering / Load PTE / Run PTE).
Quantization and lowering times (wall time).
Delegated and undelegated node counts, by operator.
PTE size.
Output accuracy stats, including maximum element-wise error, mean absolute error (MAE), and SNR. These are relative to the unquantized, eager model.

To further drill down, or to access the full run output, see either the full CI job logs or repro the test locally.

Running Locally (Reproing Failures)

To run tests, use the the test driver code, which provides CLI args to filter tests by backend, test ID, and suite. There are currently two tests suites - models and operators.

To run a specific test from the operator suite,

python -m executorch.backends.test.suite.runner operators --filter test_upsample_nearest2d_dtype_float32_xnnpack

To run all model tests for a backend,

python -m executorch.backends.test.suite.runner models --backend xnnpack

Each test in the harness is also a normal Python unittest test and can be run standalone using standard Python unit test commands. The Test ID column corresponds to the name of the Python test function.

CLI Driver Args

The executorch.backends.test.suite.runner script is a wrapper / driver for the underlying unittest test cases. It provides CLI arguments to control which tests are run.

Usage:

python -m executorch.backends.test.suite.runner [suite] --filter [filter] --flow [flow] --backend [backend] -seed [seed] --report [report]

suite: The test suite to run. One of models or operators.
--filter: A regex filter applied to test names. Useful for running individual or groups of tests (test_add_, for example).
--backend: The backend to test. If excluded, run all available backends. Currently xnnpack, vulkan, coreml, and qnn are supported.
--flow: The lowering flow to test. These are backend specific and represent a single lowering recipe.
--seed: The random seed to use. Useful for reproducing results from CI. The seed is printed in the job logs at the start of the run.
--report: The path to write the CSV report.

Backends and Flows

Each backend exposes one or more test flows, which includes quantization, lowering configuration, and any other relevant parts of the lowering recipe.

For example, XNNPACK registers the following flows: xnnpack (unquantized), xnnpack_static_int8_per_channel, xnnpack_static_int8_per_tensor, and xnnpack_dynamic_int8_per_channel.

See backends/test/suite/flows for a full reference of the registered flows for each backend.

Integrating a New Backend

What follows is a brief guide on integrating a backend with the test infrastructure. More details are landing in the README and documentation.

Support PTE execution on a host machine via the python bindings.
a. This can be via native execution or a simulator, such as QNN uses.
b. Ensure that the backend libraries are linked into the pybind library (see Link QNN backend to pybinding lib when built #13467 for an example).
Provide an implementation of the Tester class.
a. See XNNPACK's or QNN's implementations.
Create one or more test flows for the backend, typically one for each quantization scheme or lowering path.
a. See XNNPACK's or other implementations under backends/test/suite/flows.
Register the flows in backends/test/suite/flow.py.
Update the CI workflow in nightly.yml to run the new flows.

Note that this is subject to change as the framework evolves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Backend Tester Harness & CI #13622

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

New Backend Tester Harness & CI #13622

Uh oh!

GregoryComer Aug 23, 2025 Collaborator

Summary

Expectations for Backends

Accessing Results

Interpreting Results

Running Locally (Reproing Failures)

CLI Driver Args

Backends and Flows

Integrating a New Backend

Replies: 0 comments

GregoryComer
Aug 23, 2025
Collaborator