Skip to content

Conversation

yuslepukhin
Copy link
Member

@yuslepukhin yuslepukhin commented Oct 17, 2025

Description

Converts weights early and revert "Properly remove in-memory references (#25652)"
This reverts commit 3ca49d8 and makes appropriate adjustments for the current state of the code.

This PR is made possible and on the heels of:
#26263
#25833.

Previous history:
#23979
#25320
#25626
#25652

The first change (#26263) allows us to convert initializers to OrtValues early and save lots of memory at model loading time.

Specifically, for Phi-4-mini-instruct-INT4 model before and after looks like this:

Before
Before change DEBUG 2025-10-16 144819

After

After change DEBUG 2025-10-16 144819

The two peaks represent memory usage at optimization time (8.1Gb before) and after weights memory mapping (6.5Gb)
After this change corresponding numbers look 3.5Gb and 4.7Gb respectively.
Most of the savings during optimization phase come from ConstantFolding where we are able to reuse the resulting OrtValues directly for the new initializers.

This PR concludes a series of PRs converting initializers to OrtValues.

Memory consumption before the conversion began was 9.3Gb and 6.7Gb respectively. We are saving almost 6Gb during optimization and 2Gb for the steady state.

image

The model also loads about 12 seconds faster.

Example of ConstantFolding being one of the top contributors where we duplicate memory for higher peak before Resolve takes care of no longer used initializers.
Sanpshot 3 Peak on ConstantFolding Transpose Optimizer

Snapshot 4 Peak AddInitializer from ConstantFolding image

Motivation and Context

Reduce memory usage.

This reverts commit 3ca49d8.

It also makes adjustments for the current source code state.
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR saves significant memory during model loading by converting weight initializers to OrtValues early in the graph construction process, rather than later during graph transformation. The changes revert previous logic that deferred this conversion and implements early weight conversion at graph initialization time. The PR demonstrates dramatic memory savings during optimization phases (from 8.1GB to 3.5GB for Phi4 Instruct model) by enabling reuse of OrtValues during constant folding operations.

Key changes:

  • Early conversion of large initializers to OrtValues during graph construction
  • Update of all graph transformation code to use AddInitializerWithExternalData instead of AddInitializer
  • Removal of deferred initializer conversion logic from session inference flow

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated no comments.

Show a summary per file
File Description
onnxruntime/core/graph/graph.cc Implements early initializer-to-OrtValue conversion during graph construction
include/onnxruntime/core/graph/graph.h Removes ConvertInitializersIntoOrtValues method declaration
onnxruntime/core/session/inference_session.cc Removes deferred initializer conversion call from transform pipeline
onnxruntime/core/optimizer/*.cc Updates optimizer classes to use AddInitializerWithExternalData for new initializers
orttraining/orttraining/core/optimizer/*.cc Updates training optimizer classes to use AddInitializerWithExternalData
onnxruntime/test/ir/graph_test.cc Removes test code that called the now-removed conversion method
onnxruntime/test/framework/cuda/fence_cuda_test.cc Updates test utility to use AddInitializerWithExternalData

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yuslepukhin yuslepukhin marked this pull request as ready for review October 17, 2025 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant