Conversation
There was a problem hiding this comment.
Pull request overview
Reintroduces ONNX-backed prediction models in the routee-compass-powertrain crate using the ort crate, including configuration support for an ONNX session pool to enable parallel inference while preserving the existing PredictionModel trait shape.
Changes:
- Added
OnnxModelwith a pooled set of ONNX Runtime sessions guarded by mutexes to supportSession::run(&mut self)under a&selfprediction API. - Extended
ModelTypeto include anonnx { pool_size }variant with custom deserialization supporting both"onnx"and{"onnx": {"pool_size": N}}. - Updated build/config tooling: added
ortdependency to the Rust workspace and refined Pixi tasks/docs; updated.gitignorehandling for ONNX files.
Reviewed changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/routee-compass-powertrain/src/model/prediction/prediction_model_record.rs | Instantiates OnnxModel and selects a default pool size based on available parallelism. |
| rust/routee-compass-powertrain/src/model/prediction/onnx/onnx_model.rs | Implements ONNX inference with a mutex-guarded session pool. |
| rust/routee-compass-powertrain/src/model/prediction/onnx/mod.rs | Exposes the ONNX module. |
| rust/routee-compass-powertrain/src/model/prediction/model_type.rs | Adds ONNX variant and custom Deserialize + tests for backward-compatible parsing. |
| rust/routee-compass-powertrain/src/model/prediction/mod.rs | Re-exports ONNX module from prediction. |
| rust/routee-compass-powertrain/src/model/prediction/interpolation/interpolation_model.rs | Allows interpolation grids to be built from either Smartcore or ONNX underlying models. |
| rust/routee-compass-powertrain/Cargo.toml | Adds ort dependency to the powertrain crate. |
| rust/Cargo.toml | Adds workspace dependency pin for ort. |
| pyproject.toml | Splits Pixi build tasks into build_py/build_rust and adds Rust check tasks. |
| pixi.lock | Updates package version metadata for the lockfile. |
| CLAUDE.md | Updates repository guidance to use Pixi tasks for builds and checks. |
| .gitignore | Changes ignore patterns to include *.onnx. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rust/routee-compass-powertrain/src/model/prediction/onnx/onnx_model.rs
Outdated
Show resolved
Hide resolved
rust/routee-compass-powertrain/src/model/prediction/onnx/onnx_model.rs
Outdated
Show resolved
Hide resolved
rust/routee-compass-powertrain/src/model/prediction/interpolation/interpolation_model.rs
Outdated
Show resolved
Hide resolved
rust/routee-compass-powertrain/src/model/prediction/model_type.rs
Outdated
Show resolved
Hide resolved
|
@robfitzgerald - This has passed through the AI review gauntlet and should be ready for you. |
|
@nreinicke After a read through, I'm excited to see this is once again a small change set. I don't think that pool size should get exposed to the ModelType configuration API since pool size is a runtime performance parameter, not a vehicle configuration. I also think pool size is probably something we can compute dynamically. Looking forward to an IRL discussion tomorrow as I'm actually out of office right now. I should be in, though it's another wonky week 😵 |
|
i've been thinking and reading more about this design choice using the |
I did not do any explicit performance testing and so it's possible that this optimization is not warranted. From past experience, the onnx inference on a single session takes a large amount of time relative to every other operation during a link traversal and so I feel fairly confident that 100 threads locking on one session would significantly increase runtimes. But, like you said, we're not currently planning on actually running these sessions in parallel and so maybe the best move is to revert back to the single mutex, make it clear that might see a large performance degradation if you try to run these in parallel and leave it for a future optimization if/when we want to start using onnx models directly in the search. |
|
@robfitzgerald - this should be ready for you to look at again |
robfitzgerald
left a comment
There was a problem hiding this comment.
looks good to me in concept.
i'm wondering in practice if this would work with a real ONNX file. do we need to specify the opset versions for a model explicitly or would ORT read this from the file? do we need to expose or set any SessionBuilder parameters to ensure we get an optimized compile on load? things to consider when we have a working model produced from RouteE Powertrain.
🎺
Reintroduces the ONNX prediction model since the ort package has improved significantly since we last attempted to add this in.
An important note is that the ort::Session::run method requires a mutable self reference. This is due to the fact that it's a wrapper around the ONNX Runtime which does modify its own state (see here). To accommodate this, we create a pool of sessions. For large parallel batch runs, we can either explicitly set the pool size or we default to the total available threads if not specified. This does add memory overhead since we have to duplicate the models but our models are currently small and our default behavior is to use the interpolation model which sidesteps this problem entirely.
I have tested this with the attached compass configuration but I'm leaving adding in the onnx models for future work since we're currently in the process of revamping our model serialization over in the RouteE Powertrain codebase. The ultimate goal would be to allow compass to access our shared model database directly and download any necessary model metadata and binaries.