Save much memory at model loading time by converting weights to OrtValues early #26345

yuslepukhin · 2025-10-17T20:31:06Z

Description

Converts weights early and revert "Properly remove in-memory references (#25652)"
This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
#26263
#25833.

Previous history:
#23979
#25320
#25626
#25652

The first change (#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this:

Before

After

The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb respectively.
Most of the savings during optimization phase come from ConstantFolding where we are able to reuse the resulting OrtValues directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state.

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers.

Snapshot 4 Peak AddInitializer from ConstantFolding

Motivation and Context

Reduce memory usage.

This reverts commit 3ca49d8. It also makes adjustments for the current source code state.

Copilot

Pull Request Overview

This PR saves significant memory during model loading by converting weight initializers to OrtValues early in the graph construction process, rather than later during graph transformation. The changes revert previous logic that deferred this conversion and implements early weight conversion at graph initialization time. The PR demonstrates dramatic memory savings during optimization phases (from 8.1GB to 3.5GB for Phi4 Instruct model) by enabling reuse of OrtValues during constant folding operations.

Key changes:

Early conversion of large initializers to OrtValues during graph construction
Update of all graph transformation code to use AddInitializerWithExternalData instead of AddInitializer
Removal of deferred initializer conversion logic from session inference flow

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
onnxruntime/core/graph/graph.cc	Implements early initializer-to-OrtValue conversion during graph construction
include/onnxruntime/core/graph/graph.h	Removes ConvertInitializersIntoOrtValues method declaration
onnxruntime/core/session/inference_session.cc	Removes deferred initializer conversion call from transform pipeline
onnxruntime/core/optimizer/*.cc	Updates optimizer classes to use AddInitializerWithExternalData for new initializers
orttraining/orttraining/core/optimizer/*.cc	Updates training optimizer classes to use AddInitializerWithExternalData
onnxruntime/test/ir/graph_test.cc	Removes test code that called the now-removed conversion method
onnxruntime/test/framework/cuda/fence_cuda_test.cc	Updates test utility to use AddInitializerWithExternalData

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Revert "Properly remove in-memory references (#25652)"

13d2c37

This reverts commit 3ca49d8. It also makes adjustments for the current source code state.

yuslepukhin requested review from adrianlizarraga, Copilot, fs-eire and skottmckay October 17, 2025 20:31

Copilot AI reviewed Oct 17, 2025

View reviewed changes

yuslepukhin marked this pull request as ready for review October 17, 2025 21:43

Debug and disable invalid test

17fb9a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Save much memory at model loading time by converting weights to OrtValues early #26345

Save much memory at model loading time by converting weights to OrtValues early #26345

Uh oh!

yuslepukhin commented Oct 17, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Save much memory at model loading time by converting weights to OrtValues early #26345

Are you sure you want to change the base?

Save much memory at model loading time by converting weights to OrtValues early #26345

Uh oh!

Conversation

yuslepukhin commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuslepukhin commented Oct 17, 2025 •

edited

Loading