[aice/v.1.22] refactor chunk size code #354

ranzhejiang · 2025-09-01T06:08:21Z

@czhu15 @yangulei @Wei-Lin-Intel please help to review, thanks a lot.
1.Set env PT_HPU_MOE_CHUNK for chunk moe, which determines the chunk size sequence.
2.Add env PT_HPU_MOE_TOKEN_BOUNDARY for chunk moe, which helps to select chunk size when meet different tokens numbers.

For example:

Assume PT_HPU_MOE_TOKEN_BOUNDARY is [64,128,1536,1736,2048,3072,4096], PT_HPU_MOE_CHUNK is [64,128,512,1024,1536,2048,4096]
When the token_number is 1025, we first look for the interval in PT_HPU_MOE_TOKEN_BOUNDARY. We find that 128 < 1025 ≤ 1536, so we select the position index of the interval (128, 1536], which is 2. Next, we choose the corresponding chunk size value in PT_HPU_MOE_CHUNK, which is 512. Therefore, the chunk size is set to 512.

* enable chunk moe * add fix * fix wrong name

Summary This PR fixes the scaling issue for models like Hunyuan: w2_scale_fp8 is provided as a scalar, but should be expanded to match the per-channel size. w13_scale_fp8 is given in a combined form (two values for W1/W3) and needs to be reshaped and repeated to the correct size for per-channel quantization. Ensures that w2_input_scale is stored as a list (one per expert) instead of a single tensor.

* add calibration and conversion for GLM-4.5 fp8 models * set VLLM_DISABLE_MARK_SCALES_AS_CONST=true for scale_format=const * add conversion scripts for GLM-4.5 fp8 models * use torch.finfo for fp8 max

czhu15 · 2025-09-01T08:00:31Z

what's the relationship between
"PT_HPU_MOE_CHUNK", "64,128,512,1024,1536,2048,4096"
and
"PT_HPU_MOE_TOKEN_BOUNDARY", "64,64,1536,1536,2048,2048,4096"
will be good if can give an example explaination.

ranzhejiang · 2025-09-02T03:35:15Z

what's the relationship between "PT_HPU_MOE_CHUNK", "64,128,512,1024,1536,2048,4096" and "PT_HPU_MOE_TOKEN_BOUNDARY", "64,64,1536,1536,2048,2048,4096" will be good if can give an example explaination.

Updated in PR description

czhu15 · 2025-09-18T07:46:11Z

@ranzhejiang , pls let me know if this PR is still valid. If not, pls close it. Or resolve the conflict. Thanks!

Wei-Lin-Intel and others added 7 commits August 18, 2025 09:38

Add global num_expert for MOE (HabanaAI#330)

383ec24

[aice/v1.22.0] enable chunk moe (HabanaAI#337)

068814b

* enable chunk moe * add fix * fix wrong name

Add Support for GLM-4.5-MOE Compressed Tensor Path (HabanaAI#350)

ceebc79

Add FP8 conversion script for Hunyuan models (HabanaAI#351)

70dd222

add calibration and conversion for GLM-4.5 fp8 models (HabanaAI#353)

0673ccb

* add calibration and conversion for GLM-4.5 fp8 models * set VLLM_DISABLE_MARK_SCALES_AS_CONST=true for scale_format=const * add conversion scripts for GLM-4.5 fp8 models * use torch.finfo for fp8 max

refactor chunk size code

eab2794

ranzhejiang requested review from afierka-intel, jikunshang, kzawora-intel, madamczyk-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, tzielinski-habana and xuechendi as code owners September 1, 2025 06:08

ranzhejiang changed the title ~~[aice/v.1.22] refactor chunk size code~~ [WIP] [aice/v.1.22] refactor chunk size code Sep 1, 2025

fix logic

618054c

ranzhejiang changed the title ~~[WIP] [aice/v.1.22] refactor chunk size code~~ [aice/v.1.22] refactor chunk size code Sep 1, 2025

fix extra blank line

5063009

hlin99 approved these changes Sep 1, 2025

View reviewed changes

fix number example

c9f91f1

czhu15 force-pushed the aice/v1.22.0 branch from af81546 to c4c784f Compare September 5, 2025 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[aice/v.1.22] refactor chunk size code #354

[aice/v.1.22] refactor chunk size code #354

Uh oh!

ranzhejiang commented Sep 1, 2025 •

edited

Loading

Uh oh!

czhu15 commented Sep 1, 2025

Uh oh!

ranzhejiang commented Sep 2, 2025

Uh oh!

czhu15 commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[aice/v.1.22] refactor chunk size code #354

Are you sure you want to change the base?

[aice/v.1.22] refactor chunk size code #354

Uh oh!

Conversation

ranzhejiang commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

czhu15 commented Sep 1, 2025

Uh oh!

ranzhejiang commented Sep 2, 2025

Uh oh!

czhu15 commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ranzhejiang commented Sep 1, 2025 •

edited

Loading