Skip to content

(#2583) remove low_cpu_mem_usage from qwen init, it does not do anything anymore; use dtype instead of torch_dtype#2586

Merged
bghira merged 1 commit intomainfrom
bugfix/2583
Feb 9, 2026
Merged

(#2583) remove low_cpu_mem_usage from qwen init, it does not do anything anymore; use dtype instead of torch_dtype#2586
bghira merged 1 commit intomainfrom
bugfix/2583

Conversation

@bghira
Copy link
Owner

@bghira bghira commented Feb 8, 2026

This pull request improves the loading process for text encoder models in simpletuner/helpers/models/flux2/model.py. The changes optimize device placement during model initialization and add informative logging to help debug device allocation in distributed setups.

Device placement improvements:

  • Changed model loading for both Qwen3 and Mistral-3 text encoders to load models onto CPU first, then move them to the per-rank accelerator device, preventing GPU contention during initialization. (_load_text_encoder_qwen3, _load_text_encoder_mistral) [1] [2]
  • Updated model loading parameters by replacing torch_dtype with dtype and removing low_cpu_mem_usage for consistency and clarity. (_load_text_encoder_qwen3, _load_text_encoder_mistral) [1] [2]

Logging enhancements:

  • Added detailed logging statements when moving Qwen3 and Mistral-3 text encoders to their target devices, including device information and process rank, to aid in debugging distributed training setups. (_load_text_encoder_qwen3, _load_text_encoder_mistral) [1] [2]

…ing anymore; use dtype instead of torch_dtype
@bghira bghira linked an issue Feb 8, 2026 that may be closed by this pull request
@bghira bghira merged commit 46b77f9 into main Feb 9, 2026
2 checks passed
@bghira bghira deleted the bugfix/2583 branch February 9, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] multi GPU training is loading TEs on the same GPU causing OOM

1 participant