feat: add Graph -> quickr lowering by t-kalinowski · Pull Request #71 · r-xla/anvil

t-kalinowski · 2025-12-16T19:49:00Z

Why

PJRT is great for accelerator-backed execution, but today there are two CPU-centric use cases where a {quickr} backend is especially compelling:

CPU-only environments: when there’s no GPU/TPU available (or we’re deploying somewhere that only has CPU), we want a fast backend that keeps execution in plain R and doesn’t rely on accelerator runtimes.
AOT / caching foundation: lowering a Graph to a concrete piece of R code is the natural seam for future ahead-of-time compilation and/or caching of compiled artifacts.

For pure CPU graphs, {quickr} should also be a strong performance baseline (and may beat PJRT-on-CPU by avoiding PJRT program/execute overhead).

In the longer term, I anticipate we'll add GPU support to quickr (likely via OpenACC, or possibly via CUDA directly), at which time quickr might provide a simple interface to writing custom, fast, anvil kernels.

What’s included

New exported API: graph_to_quickr_function()
Lowers a supported subset of anvil::Graph to a plain R function and eagerly compiles it with quickr::quick()
Handles graph constants and structured (nested list) outputs by packing/unpacking values for {quickr}
{quickr} is optional (Suggests); feature/tests skip gracefully when it isn’t installed
Correctness coverage compares {quickr} execution vs PJRT execution

Minimal example (reprex)

library(anvil)

graph <- trace_fn(
  function(x, y) x * y + x,
  list(
    x = nv_scalar(0, dtype = "f64"),
    y = nv_scalar(0, dtype = "f64")
  )
)

f_quick <- graph_to_quickr_function(graph)
f_quick(2, 3)
#> [1] 8

Integration tests

MNIST-shaped rank-5 batch training loop: quickr-compiled loss + grad matches PJRT outputs, then a few SGD steps reduce the loss:
https://github.com/t-kalinowski/anvil/blob/quickr/tests/testthat/test-graph-to-quickr-integration.R#L1
TFP/greta-like MAP workflow: quickr-compiled log_prob + grad matches PJRT, and parameter updates remain identical over multiple gradient-ascent steps:
https://github.com/t-kalinowski/anvil/blob/quickr/tests/testthat/test-graph-to-quickr-integration.R#L70

Testing

devtools::test()

Notes / limitations

Inputs must be flat (non-nested) argument lists.
Supported primitives: constant, add, sub, mul, divide, negate, broadcast_in_dim, dot_general, transpose, reshape, sum.
Current implementation supports tensors up to rank 5; transpose is currently rank-2 only; sum supports rank 0–2 reductions and full reductions for rank > 2.

- Add `graph_to_r_function()` to lower supported `Graph` computations to plain base R - Add `graph_to_quickr_function()` wrapper that emits `declare(type(...))` and compiles via `quickr::quick()` - Export new helpers; add roxygen docs and DESCRIPTION/Collate updates (incl. quickr in Suggests) - Add tests for R conversion and quickr parity vs PJRT across core ops/reductions/select - Add anvil.Rproj and fix minor Rd whitespace

- Replace large switch() in .emit_prim() with a registry of per-primitive lowerers and add a preflight check that errors with a list of unsupported primitives. - Add mul_broadcast_axis lowering and an op-level fusion pass that rewrites broadcast_in_dim + mul into a loop-based broadcasted multiply for higher-rank tensors. - Add base emitters for any/all reductions when include_declare = FALSE. - Test improvements: move eval_graph_pjrt() into tests/testthat/helper-eval-graph.R, add graph_to_r_function parity coverage vs PJRT, add broadcast-mul fusion coverage for both R and quickr backends, and extend graph_to_r_function primitive support tests (max/min/transpose/ reshape/convert) plus an explicit error for unsupported higher-order primitives.

- Add if primitive lowerer to graph_to_r_function(), including scalar predicate validation and support for scalar/tensor outputs (including nested if). - Refactor graph lowering to share counters and extract helpers for constant inlining and op extraction (.inline_constants_for_graph(), .ops_from_graph()), reused by if branches. - Extend broadcast-mul fusion coverage (axis=1 and broadcast-as-LHS) and ensure fusion doesn’t trigger when the broadcasted value is reused. - Expand R/quickr parity tests against PJRT for if (scalar/tensor outputs, input-dependent branches, nested if, and fusion inside branches) and update the “unsupported higher-order primitives” test to while.

…tion tests - Ignore local mnist.rds in git/R builds (.gitignore, .Rbuildignore) - Fix graph_to_r_function() dot_general lowering to respect contracting_dims/batching_dims (R/ graph-to-r.R) - Add missing backward rules for exp, log, maximum/minimum, and reduce_max (tie-splitting) (R/ rules-backward.R) - Add MNIST MLP training script + opt-in training test using mnist.rds (inst/extra-tests/train- mnist-mlp.R, tests/testthat/test-mnist-training.R, tests/testthat/helper.R) - Add quickr/PJRT integration tests for larger “real use” graphs + extra dot_general coverage (tests/testthat/test-graph-to-quickr-integration.R, tests/testthat/test-graph-to-r.R, tests/ testthat/test-primitives-backward.R)

- Load MNIST from Sys.getenv("ANVIL_MNIST_RDS", "mnist.rds") (no path searching) (tests/testthat/ helper.R, inst/extra-tests/train-mnist-mlp.R) - Update MNIST training defaults to full dataset sizes (train_n=60000, test_n=10000) and set the training test default ANVIL_MNIST_TRAIN_N=60000 (inst/extra-tests/train-mnist-mlp.R, tests/ testthat/test-mnist-training.R) - Add greta-like probabilistic model integration tests (log-joint / gradient) validated via quickr vs PJRT (tests/testthat/test-greta-like-models.R)

- Add R/quickr.R helpers for optional {quickr} integration and eager compilation - Refactor graph_to_quickr_function() to use assert_quickr_installed() and quickr_eager_compile() - Make graph_to_r_function() default include_declare = TRUE, with declare(type(...)) treated as a no-op in plain R - Rename dtype mapping helper to .dtype_to_r_ctor and clean up related codegen paths - Avoid relying on %||% in backward pass required-env lookup - Wire new file into Collate and add a small optional-dependency test

- Reorder Collate so quickr/conversion helpers load before dependent code - Add as_r_function() and as_quickr_function() convenience wrappers - Drop the constants argument from graph_to_r_function() and graph_to_quickr_function(); constants are always inlined so only graph inputs become function args - Update docs (.Rd) and remove tests that covered the removed “constants as args” mode - Export new helpers in NAMESPACE

- Compile the leaf-argument function with quickr and, when graph@in_tree is nested, return an outer R wrapper that accepts the original top-level inputs, flattens them, and forwards to the compiled inner function. - Make eval_graph_pjrt() flatten ... so PJRT evaluation matches nested input calling conventions. - Add a PJRT-vs-quickr parity test covering nested params input (MLP loss-style).

- Allow graph_to_quickr_function() to handle non-leaf outputs by packing to a flat vector for quickr and unflattening to out_tree - Add graph_to_r_function(pack_output=) plus quickr-safe lowering for sign() and atan2(), and implement while graph lowering - Tighten eval_graph_pjrt() arg checking and preserve nested output structures - Extend tests for sign/atan2, packed list outputs, and while

# Conflicts: # DESCRIPTION

# Conflicts: # DESCRIPTION # R/reexports.R # man/platform.Rd # man/reexports.Rd

- Extend lowering coverage (reshape, reductions incl. max/min/prod, boolean reduce_any/reduce_all, plus additional primitives/rules) - Improve generated-function wrappers (constants/static args, packing/unpacking structured outputs, and edge cases like empty dims/slices) - Add/reshape test suite to compare quickr vs PJRT across primitives and integration workloads; update docs/pkgdown + dependency metadata

- Qualify testthat helpers (skip_if_not_installed/expect_*) for object_usage_linter - Avoid cross-helper binding lint for eval_graph_pjrt via get(...) - Silence object_length lint for helper name without breaking line-length rules

Rename long helper functions to satisfy lintr object_length_linter after formatting, and update call sites.

t-kalinowski · 2026-02-09T18:28:30Z

Hi @sebffischer, thanks again for the careful review, and sorry for the long delay here.

I’ve updated the PR to address the main points you raised, and I also merged the latest changes from main into this branch.

I’m also trying to keep this PR tightly scoped (avoid touching unrelated files / large cross-cutting refactors). In that spirit, I kept the quickr lowering rules together in one place for now; if you’d prefer rules colocated with primitives, I’m happy to do that as a follow-up PR after this lands.

Two integration tests/examples that exercise end-to-end use cases and show how this might fit into a real workflow:

MNIST-shaped rank-5 training loop: quickr-compiled loss + grad matches PJRT, and a few SGD steps reduce the loss (tests/testthat/test-graph-to-quickr-integration.R)
TFP/greta-like MAP workflow: quickr-compiled log_prob + grad matches PJRT and stays in lockstep over multiple update steps (tests/testthat/test-graph-to-quickr-integration.R)

One additional note: working on this PR also drove some feature development in {quickr} to enable this overall approach. This currently depends on the dev version of {quickr}; I’ll aim to cut a release to CRAN soon.

Longer-term, I’d like to move towards making {quickr} capable of being used for writing CUDA kernels, and also more generally compiling functions that take non-atomic R objects (e.g. external pointers, AnvilTensors, etc.). That will require some additional design and implementation work in {quickr} before it’s something we can rely on here, but I think this PR is a good first step.

Replies to your numbered questions:

Separate package vs in {anvil}: I agree this could eventually be split out (it’s the same general shape as {stablehlo}: a lowering pass plus a set of rules). For now I’d prefer keeping it in {anvil} so it can evolve alongside Graph/tracing without extra cross-repo coordination. Once the backend surface stabilizes, a split would make sense.
quickr kernels inside XLA / PJRT custom functions: I didn’t pursue “calling quickr inside XLA” here. I think I’m aligned with @dfalbel that the execution models are quite different. The direction that seems most useful is letting users register backend-specific implementations for higher-level ops/primitives (e.g. a relu primitive with a fast quickr kernel), but I think that’s out of scope for this PR.
Automated tests comparing quickr vs PJRT/XLA: agreed. I added an automated parity suite comparing quickr execution vs PJRT execution for a larger set of primitives (tests/testthat/test-primitives-quickr-pjrt.R), plus the integration tests above. These skip cleanly when {quickr} isn’t installed.
Literals / GraphLiteral: this should now work. The lowering handles GraphLiteral in quickr_expr_of_node() (R/graph-to-quickr-r.R:678), and there’s a direct test for the function(x) x + 1 case (tests/testthat/test-graph-to-quickr.R:40).
Multiple backends in jit(): I’m fine either way. Right now I kept it as a separate API (graph_to_quickr_function() / graph_to_quickr_r_function()) to avoid committing to jit() semantics prematurely. If you’d rather expose it via jit(..., backend = "quickr") (or an option default), I’m happy to adjust once we agree on the desired interface.

A few other highlights (non-exhaustive):

API docs: man/graph_to_quickr_function.Rd
Since our last round of review I also added a number of additional primitives to the lowering; the supported set is now substantially broader than the initial draft (see the “Currently supported primitives” section in R/graph-to-quickr.R).
Nested inputs/outputs are supported at the boundary via flattening/unflattening while keeping a stable flat signature for the compiled function (R/graph-to-quickr.R:46, tests: tests/testthat/test-graph-to-quickr.R:19 and tests/testthat/test-graph-to-quickr.R:53)

Things I’d like your take on:

Top-level interface: keep this as graph_to_quickr_*() for now, or fold into jit() via an option / backend= argument?
Rule organization: keep the current registry approach (R/graph-to-quickr-r.R:1284) for now to keep this PR focused, or do you feel strongly that rules should be moved next to primitives before merging?

Suggested review entry points: R/graph-to-quickr.R (API + wrapper boundary), R/graph-to-quickr-r.R (actual lowering), and tests/testthat/test-primitives-quickr-pjrt.R (parity story).

sebffischer · 2026-02-10T05:46:59Z

@t-kalinowski I am on vacation for the next 3 weeks, so it will take some time for me to respond.

sebffischer · 2026-02-10T05:47:46Z

Also a TODO for me:

add quickr to the benchmark here: https://github.com/r-xla/benchmarks/tree/main/benchmarks/mlp (results are displayed here).

t-kalinowski added 30 commits December 12, 2025 16:10

docs: regenerate Rd; ignore anvil.Rproj

c8ac5b8

graph_to_r_function: add target for quickr compatibility

40903c6

graph_to_r: extract broadcast-mul fusion pass

275fee0

graph_to_r: factor reduction codegen

ceda029

graph_to_r: consolidate subgraph codegen

ee61007

mnist: add internal helpers + unit tests

ab587be

mnist: strengthen internal helper tests

6040924

mnist: prefer mnist package for dataset loading

b08ba0c

mnist: reuse internal helpers in training code

5a68b6b

mnist: simplify loader to use mnist::mnist

a4e60f8

quickr: tfp-like workflow test

c33a160

quickr: graph_to_quickr_function

730163f

quickr: integration tests

8e9c303

Merge branch 'main' of https://github.com/r-xla/anvil into quickr

8f8d666

# Conflicts: # DESCRIPTION

quickr: rank-5 graph lowering

a59da60

quickr: batched dot_general up to rank 5

5af2be7

quickr: refactor quickr lowering helpers

48282cf

update stablehlo

d47c0ee

quickr: use rlang::call2 for codegen

9851cfa

quickr: use f64 + tighten integration equivalence

c6658bb

quickr: tighten greta integration equivalence

d93a463

t-kalinowski added 17 commits February 5, 2026 20:41

Merge branch 'main' into quickr

8e227a4

# Conflicts: # DESCRIPTION # R/reexports.R # man/platform.Rd # man/reexports.Rd

chore: drop unrelated changes from quickr branch

15a6d08

refactor: simplify graph_to_quickr lowering

0613ac2

test: drop quickr helper coverage tests

39449f1

fix: quickr lowering fill/convert and 1d outputs

36ae4a7

test: increase quickr lowering coverage via public API

fe8b609

refactor: simplify quickr lowering invariants

db096f4

test: cover more quickr lowering paths and limits

db75ddf

test: tidy quickr reduce_sum no-op comment

e72e5a6

style: satisfy seq_linter in quickr integration ranks test

ef72675

test: remove redundant quickr tests

323a8c9

test: avoid anvil:: and anvil::: qualifiers

862edc1

quickr: lower select/compare/bool primitives

d96fc00

quickr: lower unary math primitives

36c5622

quickr: lower maximum/minimum/power

2cc8f78

quickr: lower reduce_prod/max/min

b60cace

docs: update graph_to_quickr_function supported primitives

8ab4f41

t-kalinowski mentioned this pull request Feb 6, 2026

reduce_max / reduce_min give wrong results on PJRT CPU for f64 #182

Closed

t-kalinowski added 7 commits February 9, 2026 10:48

style: air format quickr lowering

3dde509

Merge upstream/main into quickr

ceb6b24

tests: revert helper.R namespace cleanup

e57a676

tests: fix lintr in quickr helpers

11e5f4e

- Qualify testthat helpers (skip_if_not_installed/expect_*) for object_usage_linter - Avoid cross-helper binding lint for eval_graph_pjrt via get(...) - Silence object_length lint for helper name without breaking line-length rules

air format .

5ec9b76

tests: rename quickr PJRT parity helpers

59c17a9

Rename long helper functions to satisfy lintr object_length_linter after formatting, and update call sites.

t-kalinowski marked this pull request as ready for review February 9, 2026 18:25

t-kalinowski requested a review from sebffischer February 9, 2026 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Graph -> quickr lowering#71

feat: add Graph -> quickr lowering#71
t-kalinowski wants to merge 88 commits intor-xla:mainfrom
t-kalinowski:quickr

t-kalinowski commented Dec 16, 2025 •

edited

Loading

Uh oh!

t-kalinowski commented Feb 9, 2026

Uh oh!

sebffischer commented Feb 10, 2026

Uh oh!

sebffischer commented Feb 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

t-kalinowski commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What’s included

Minimal example (reprex)

Integration tests

Testing

Notes / limitations

Uh oh!

t-kalinowski commented Feb 9, 2026

Uh oh!

sebffischer commented Feb 10, 2026

Uh oh!

sebffischer commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

t-kalinowski commented Dec 16, 2025 •

edited

Loading

sebffischer commented Feb 10, 2026 •

edited

Loading