Skip to content

feat: add Graph -> quickr lowering#71

Open
t-kalinowski wants to merge 88 commits intor-xla:mainfrom
t-kalinowski:quickr
Open

feat: add Graph -> quickr lowering#71
t-kalinowski wants to merge 88 commits intor-xla:mainfrom
t-kalinowski:quickr

Conversation

@t-kalinowski
Copy link

@t-kalinowski t-kalinowski commented Dec 16, 2025

Why

PJRT is great for accelerator-backed execution, but today there are two CPU-centric use cases where a {quickr} backend is especially compelling:

  1. CPU-only environments: when there’s no GPU/TPU available (or we’re deploying somewhere that only has CPU), we want a fast backend that keeps execution in plain R and doesn’t rely on accelerator runtimes.
  2. AOT / caching foundation: lowering a Graph to a concrete piece of R code is the natural seam for future ahead-of-time compilation and/or caching of compiled artifacts.

For pure CPU graphs, {quickr} should also be a strong performance baseline (and may beat PJRT-on-CPU by avoiding PJRT program/execute overhead).

In the longer term, I anticipate we'll add GPU support to quickr (likely via OpenACC, or possibly via CUDA directly), at which time quickr might provide a simple interface to writing custom, fast, anvil kernels.

What’s included

  • New exported API: graph_to_quickr_function()
  • Lowers a supported subset of anvil::Graph to a plain R function and eagerly compiles it with quickr::quick()
  • Handles graph constants and structured (nested list) outputs by packing/unpacking values for {quickr}
  • {quickr} is optional (Suggests); feature/tests skip gracefully when it isn’t installed
  • Correctness coverage compares {quickr} execution vs PJRT execution

Minimal example (reprex)

library(anvil)

graph <- trace_fn(
  function(x, y) x * y + x,
  list(
    x = nv_scalar(0, dtype = "f64"),
    y = nv_scalar(0, dtype = "f64")
  )
)

f_quick <- graph_to_quickr_function(graph)
f_quick(2, 3)
#> [1] 8

Integration tests

Testing

  • devtools::test()

Notes / limitations

  • Inputs must be flat (non-nested) argument lists.
  • Supported primitives: constant, add, sub, mul, divide, negate, broadcast_in_dim, dot_general, transpose, reshape, sum.
  • Current implementation supports tensors up to rank 5; transpose is currently rank-2 only; sum supports rank 0–2 reductions and full reductions for rank > 2.

- Add `graph_to_r_function()` to lower supported `Graph` computations to plain base R
- Add `graph_to_quickr_function()` wrapper that emits `declare(type(...))` and compiles via
  `quickr::quick()`
- Export new helpers; add roxygen docs and DESCRIPTION/Collate updates (incl. quickr in Suggests)
- Add tests for R conversion and quickr parity vs PJRT across core ops/reductions/select
- Add anvil.Rproj and fix minor Rd whitespace
- Replace large switch() in .emit_prim() with a registry of per-primitive lowerers and add a
  preflight check that errors with a list of unsupported primitives.
- Add mul_broadcast_axis lowering and an op-level fusion pass that rewrites broadcast_in_dim +
  mul into a loop-based broadcasted multiply for higher-rank tensors.
- Add base emitters for any/all reductions when include_declare = FALSE.
- Test improvements: move eval_graph_pjrt() into tests/testthat/helper-eval-graph.R, add
  graph_to_r_function parity coverage vs PJRT, add broadcast-mul fusion coverage for both R and
  quickr backends, and extend graph_to_r_function primitive support tests (max/min/transpose/
  reshape/convert) plus an explicit error for unsupported higher-order primitives.
- Add if primitive lowerer to graph_to_r_function(), including scalar predicate validation and
  support for scalar/tensor outputs (including nested if).
- Refactor graph lowering to share counters and extract helpers for constant inlining and op
  extraction (.inline_constants_for_graph(), .ops_from_graph()), reused by if branches.
- Extend broadcast-mul fusion coverage (axis=1 and broadcast-as-LHS) and ensure fusion doesn’t
  trigger when the broadcasted value is reused.
- Expand R/quickr parity tests against PJRT for if (scalar/tensor outputs, input-dependent
  branches, nested if, and fusion inside branches) and update the “unsupported higher-order
  primitives” test to while.
…tion tests

- Ignore local mnist.rds in git/R builds (.gitignore, .Rbuildignore)
- Fix graph_to_r_function() dot_general lowering to respect contracting_dims/batching_dims (R/
  graph-to-r.R)
- Add missing backward rules for exp, log, maximum/minimum, and reduce_max (tie-splitting) (R/
  rules-backward.R)
- Add MNIST MLP training script + opt-in training test using mnist.rds (inst/extra-tests/train-
  mnist-mlp.R, tests/testthat/test-mnist-training.R, tests/testthat/helper.R)
- Add quickr/PJRT integration tests for larger “real use” graphs + extra dot_general coverage
  (tests/testthat/test-graph-to-quickr-integration.R, tests/testthat/test-graph-to-r.R, tests/
  testthat/test-primitives-backward.R)
- Load MNIST from Sys.getenv("ANVIL_MNIST_RDS", "mnist.rds") (no path searching) (tests/testthat/
  helper.R, inst/extra-tests/train-mnist-mlp.R)
- Update MNIST training defaults to full dataset sizes (train_n=60000, test_n=10000) and set the
  training test default ANVIL_MNIST_TRAIN_N=60000 (inst/extra-tests/train-mnist-mlp.R, tests/
  testthat/test-mnist-training.R)
- Add greta-like probabilistic model integration tests (log-joint / gradient) validated via
  quickr vs PJRT (tests/testthat/test-greta-like-models.R)
- Add R/quickr.R helpers for optional {quickr} integration and eager compilation
- Refactor graph_to_quickr_function() to use assert_quickr_installed() and quickr_eager_compile()
- Make graph_to_r_function() default include_declare = TRUE, with declare(type(...)) treated as a
  no-op in plain R
- Rename dtype mapping helper to .dtype_to_r_ctor and clean up related codegen paths
- Avoid relying on %||% in backward pass required-env lookup
- Wire new file into Collate and add a small optional-dependency test
- Reorder Collate so quickr/conversion helpers load before dependent code
- Add as_r_function() and as_quickr_function() convenience wrappers
- Drop the constants argument from graph_to_r_function() and graph_to_quickr_function();
  constants are always inlined so only graph inputs become function args
- Update docs (.Rd) and remove tests that covered the removed “constants as args” mode
- Export new helpers in NAMESPACE
- Compile the leaf-argument function with quickr and, when graph@in_tree is nested, return an
  outer R wrapper that accepts the original top-level inputs, flattens them, and forwards to the
  compiled inner function.
- Make eval_graph_pjrt() flatten ... so PJRT evaluation matches nested input calling conventions.
- Add a PJRT-vs-quickr parity test covering nested params input (MLP loss-style).
- Allow graph_to_quickr_function() to handle non-leaf outputs by packing to a flat vector for quickr and unflattening to out_tree
- Add graph_to_r_function(pack_output=) plus quickr-safe lowering for sign() and atan2(), and implement while graph lowering
- Tighten eval_graph_pjrt() arg checking and preserve nested output structures
- Extend tests for sign/atan2, packed list outputs, and while
- Extend lowering coverage (reshape, reductions incl. max/min/prod, boolean reduce_any/reduce_all, plus additional primitives/rules)

- Improve generated-function wrappers (constants/static args, packing/unpacking structured outputs, and edge cases like empty dims/slices)

- Add/reshape test suite to compare quickr vs PJRT across primitives and integration workloads; update docs/pkgdown + dependency metadata
- Qualify testthat helpers (skip_if_not_installed/expect_*) for object_usage_linter
- Avoid cross-helper binding lint for eval_graph_pjrt via get(...)
- Silence object_length lint for helper name without breaking line-length rules
Rename long helper functions to satisfy lintr object_length_linter after formatting, and update call sites.
@t-kalinowski t-kalinowski marked this pull request as ready for review February 9, 2026 18:25
@t-kalinowski
Copy link
Author

Hi @sebffischer, thanks again for the careful review, and sorry for the long delay here.

I’ve updated the PR to address the main points you raised, and I also merged the latest changes from main into this branch.

I’m also trying to keep this PR tightly scoped (avoid touching unrelated files / large cross-cutting refactors). In that spirit, I kept the quickr lowering rules together in one place for now; if you’d prefer rules colocated with primitives, I’m happy to do that as a follow-up PR after this lands.

Two integration tests/examples that exercise end-to-end use cases and show how this might fit into a real workflow:

  • MNIST-shaped rank-5 training loop: quickr-compiled loss + grad matches PJRT, and a few SGD steps reduce the loss (tests/testthat/test-graph-to-quickr-integration.R)
  • TFP/greta-like MAP workflow: quickr-compiled log_prob + grad matches PJRT and stays in lockstep over multiple update steps (tests/testthat/test-graph-to-quickr-integration.R)

One additional note: working on this PR also drove some feature development in {quickr} to enable this overall approach. This currently depends on the dev version of {quickr}; I’ll aim to cut a release to CRAN soon.

Longer-term, I’d like to move towards making {quickr} capable of being used for writing CUDA kernels, and also more generally compiling functions that take non-atomic R objects (e.g. external pointers, AnvilTensors, etc.). That will require some additional design and implementation work in {quickr} before it’s something we can rely on here, but I think this PR is a good first step.

Replies to your numbered questions:

  1. Separate package vs in {anvil}: I agree this could eventually be split out (it’s the same general shape as {stablehlo}: a lowering pass plus a set of rules). For now I’d prefer keeping it in {anvil} so it can evolve alongside Graph/tracing without extra cross-repo coordination. Once the backend surface stabilizes, a split would make sense.

  2. quickr kernels inside XLA / PJRT custom functions: I didn’t pursue “calling quickr inside XLA” here. I think I’m aligned with @dfalbel that the execution models are quite different. The direction that seems most useful is letting users register backend-specific implementations for higher-level ops/primitives (e.g. a relu primitive with a fast quickr kernel), but I think that’s out of scope for this PR.

  3. Automated tests comparing quickr vs PJRT/XLA: agreed. I added an automated parity suite comparing quickr execution vs PJRT execution for a larger set of primitives (tests/testthat/test-primitives-quickr-pjrt.R), plus the integration tests above. These skip cleanly when {quickr} isn’t installed.

  4. Literals / GraphLiteral: this should now work. The lowering handles GraphLiteral in quickr_expr_of_node() (R/graph-to-quickr-r.R:678), and there’s a direct test for the function(x) x + 1 case (tests/testthat/test-graph-to-quickr.R:40).

  5. Multiple backends in jit(): I’m fine either way. Right now I kept it as a separate API (graph_to_quickr_function() / graph_to_quickr_r_function()) to avoid committing to jit() semantics prematurely. If you’d rather expose it via jit(..., backend = "quickr") (or an option default), I’m happy to adjust once we agree on the desired interface.

A few other highlights (non-exhaustive):

  • API docs: man/graph_to_quickr_function.Rd
  • Since our last round of review I also added a number of additional primitives to the lowering; the supported set is now substantially broader than the initial draft (see the “Currently supported primitives” section in R/graph-to-quickr.R).
  • Nested inputs/outputs are supported at the boundary via flattening/unflattening while keeping a stable flat signature for the compiled function (R/graph-to-quickr.R:46, tests: tests/testthat/test-graph-to-quickr.R:19 and tests/testthat/test-graph-to-quickr.R:53)

Things I’d like your take on:

  • Top-level interface: keep this as graph_to_quickr_*() for now, or fold into jit() via an option / backend= argument?
  • Rule organization: keep the current registry approach (R/graph-to-quickr-r.R:1284) for now to keep this PR focused, or do you feel strongly that rules should be moved next to primitives before merging?

Suggested review entry points: R/graph-to-quickr.R (API + wrapper boundary), R/graph-to-quickr-r.R (actual lowering), and tests/testthat/test-primitives-quickr-pjrt.R (parity story).

@sebffischer
Copy link
Contributor

@t-kalinowski I am on vacation for the next 3 weeks, so it will take some time for me to respond.

@sebffischer
Copy link
Contributor

sebffischer commented Feb 10, 2026

Also a TODO for me:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants