Skip to content

Reintroduce ONNX#486

Merged
nreinicke merged 10 commits intomainfrom
ndr/reintroduce-onnx
Apr 1, 2026
Merged

Reintroduce ONNX#486
nreinicke merged 10 commits intomainfrom
ndr/reintroduce-onnx

Conversation

@nreinicke
Copy link
Copy Markdown
Collaborator

@nreinicke nreinicke commented Mar 24, 2026

Reintroduces the ONNX prediction model since the ort package has improved significantly since we last attempted to add this in.

An important note is that the ort::Session::run method requires a mutable self reference. This is due to the fact that it's a wrapper around the ONNX Runtime which does modify its own state (see here). To accommodate this, we create a pool of sessions. For large parallel batch runs, we can either explicitly set the pool size or we default to the total available threads if not specified. This does add memory overhead since we have to duplicate the models but our models are currently small and our default behavior is to use the interpolation model which sidesteps this problem entirely.

I have tested this with the attached compass configuration but I'm leaving adding in the onnx models for future work since we're currently in the process of revamping our model serialization over in the RouteE Powertrain codebase. The ultimate goal would be to allow compass to access our shared model database directly and download any necessary model metadata and binaries.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reintroduces ONNX-backed prediction models in the routee-compass-powertrain crate using the ort crate, including configuration support for an ONNX session pool to enable parallel inference while preserving the existing PredictionModel trait shape.

Changes:

  • Added OnnxModel with a pooled set of ONNX Runtime sessions guarded by mutexes to support Session::run(&mut self) under a &self prediction API.
  • Extended ModelType to include an onnx { pool_size } variant with custom deserialization supporting both "onnx" and {"onnx": {"pool_size": N}}.
  • Updated build/config tooling: added ort dependency to the Rust workspace and refined Pixi tasks/docs; updated .gitignore handling for ONNX files.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
rust/routee-compass-powertrain/src/model/prediction/prediction_model_record.rs Instantiates OnnxModel and selects a default pool size based on available parallelism.
rust/routee-compass-powertrain/src/model/prediction/onnx/onnx_model.rs Implements ONNX inference with a mutex-guarded session pool.
rust/routee-compass-powertrain/src/model/prediction/onnx/mod.rs Exposes the ONNX module.
rust/routee-compass-powertrain/src/model/prediction/model_type.rs Adds ONNX variant and custom Deserialize + tests for backward-compatible parsing.
rust/routee-compass-powertrain/src/model/prediction/mod.rs Re-exports ONNX module from prediction.
rust/routee-compass-powertrain/src/model/prediction/interpolation/interpolation_model.rs Allows interpolation grids to be built from either Smartcore or ONNX underlying models.
rust/routee-compass-powertrain/Cargo.toml Adds ort dependency to the powertrain crate.
rust/Cargo.toml Adds workspace dependency pin for ort.
pyproject.toml Splits Pixi build tasks into build_py/build_rust and adds Rust check tasks.
pixi.lock Updates package version metadata for the lockfile.
CLAUDE.md Updates repository guidance to use Pixi tasks for builds and checks.
.gitignore Changes ignore patterns to include *.onnx.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nreinicke
Copy link
Copy Markdown
Collaborator Author

@robfitzgerald - This has passed through the AI review gauntlet and should be ready for you.

@robfitzgerald
Copy link
Copy Markdown
Collaborator

@nreinicke After a read through, I'm excited to see this is once again a small change set. I don't think that pool size should get exposed to the ModelType configuration API since pool size is a runtime performance parameter, not a vehicle configuration. I also think pool size is probably something we can compute dynamically. Looking forward to an IRL discussion tomorrow as I'm actually out of office right now. I should be in, though it's another wonky week 😵

@robfitzgerald
Copy link
Copy Markdown
Collaborator

robfitzgerald commented Mar 30, 2026

i've been thinking and reading more about this design choice using the Condvar + Mutex<Vec<Session>> to provide parallel Session access. i think if this Condvar "Semaphore" went away and we just had a Mutex<Session>, i would feel more comfortable accepting this. my main question here: did you try Mutex<Session> and observe bad runtimes, then move to the semaphore-style access? i'm concerned about premature optimization, that the overhead of managing this parallel Session access might be greater than simply waiting to get the Mutex<Session>'s lock (esp since the underlying model.run op is very simple). that, coupled with the fact that we don't actually intend to run this in parallel (just planning to run it in interpolation mode), makes it feel like it may not be worth the confusion it brings.

@nreinicke
Copy link
Copy Markdown
Collaborator Author

did you try Mutex and observe bad runtimes, then move to the semaphore-style access?

I did not do any explicit performance testing and so it's possible that this optimization is not warranted. From past experience, the onnx inference on a single session takes a large amount of time relative to every other operation during a link traversal and so I feel fairly confident that 100 threads locking on one session would significantly increase runtimes.

But, like you said, we're not currently planning on actually running these sessions in parallel and so maybe the best move is to revert back to the single mutex, make it clear that might see a large performance degradation if you try to run these in parallel and leave it for a future optimization if/when we want to start using onnx models directly in the search.

@nreinicke
Copy link
Copy Markdown
Collaborator Author

@robfitzgerald - this should be ready for you to look at again

Copy link
Copy Markdown
Collaborator

@robfitzgerald robfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me in concept.

i'm wondering in practice if this would work with a real ONNX file. do we need to specify the opset versions for a model explicitly or would ORT read this from the file? do we need to expose or set any SessionBuilder parameters to ensure we get an optimized compile on load? things to consider when we have a working model produced from RouteE Powertrain.

🎺

@nreinicke nreinicke merged commit 06abe99 into main Apr 1, 2026
8 of 9 checks passed
@nreinicke nreinicke deleted the ndr/reintroduce-onnx branch April 1, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants