[MoE] MoE Calibration with `calibrate_all_experts` #1760

kylesayrs · 2025-08-19T20:27:44Z

Coauthored with @dichn!

Purpose

Add support for calibrate_all_experts option, which sends all tokens to all experts, but still produces the same outputs as if tokens had been gated

Changes

Modify model definitions such that, in the case of calibrate_all_experts=True token gating occurs after passing tokens to experts, rather than before

# `calibrate_all_experts=True` by default
model = replace_modules_for_calibration(model, calibrate_all_experts=True)

Testing

Added correctness tests for new model definitions which checks that outputs are exactly the same
Added hook tests to make sure all experts are being sent tokens

Change Purpose: - Add calibrate_all_experts option to improve MoE calibration Change Details: - Add `calibrate_all_experts` flag to MoE layers - Update `replace_modules_for_calibration` and `moe_calibration_context` to propagate the flag into modules - Modify expert forward passes: * Normal mode (default): compute output only for tokens routed to top-k experts, and combine their weighted results in the final output * Calibration mode (`calibrate_all_experts=True`): compute output for all tokens on every expert, but still apply the top-k gating to decide which token outputs contribute to the final result. Testing: - Add unit test to verify all experts are triggered during MoE calibration

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-08-19T20:27:51Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

src/llmcompressor/modeling/llama4.py

Signed-off-by: Kyle Sayers <[email protected]>

src/llmcompressor/modeling/llama4.py

fynnsu

Left a comment below. I also agree with @brian-dellabetta's point that this could maybe be simplified by patching self.top_k temporarily.

src/llmcompressor/modeling/deepseek_v3.py

Signed-off-by: Kyle Sayers <[email protected]>

dsikka

I did not get a chance to run through these as of yet but it would be good to run through nvfp4 for llama4 and qwen3 and validating performance on the b200 before landing this, if anybody has bandwidth to run these

kylesayrs · 2025-09-09T14:19:45Z

Running those examples now

brian-dellabetta · 2025-09-09T16:20:48Z

src/llmcompressor/utils/helpers.py

@@ -974,7 +974,8 @@ def getattr_chain(obj: Any, chain_str: str, *args, **kwargs) -> Any:
    return res


-class DisableKVCache:
+@contextlib.contextmanager
+def disable_cache(module: torch.nn.Module):


definitely agree with these changes, but might be better in a separate PR or added in the PR summary. Seems orthogonal to calibrate_all_experts

dichn and others added 2 commits August 17, 2025 17:17

changes, qwen still doesn't work

20f1ed2

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~[Calibrat] Llama4 and More tests~~ [MoE] Llama4 and More tests Aug 19, 2025

dsikka reviewed Aug 19, 2025

View reviewed changes

src/llmcompressor/modeling/llama4.py Outdated Show resolved Hide resolved

reduce precision expectations

53fbdb7

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~[MoE] Llama4 and More tests~~ [MoE] MoE Calibration with calibrate_all_experts Aug 28, 2025

kylesayrs added 3 commits August 28, 2025 16:56

add note

c685b51

Signed-off-by: Kyle Sayers <[email protected]>

remove config

95d3402

Signed-off-by: Kyle Sayers <[email protected]>

default to true

81c2e1a

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs marked this pull request as ready for review August 28, 2025 21:00

kylesayrs added 2 commits August 28, 2025 17:04

Merge remote-tracking branch 'origin' into kylesayrs/calib

d2df4eb

remove unneeded imports

dd2d9e5

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs requested review from dsikka and shanjiaz August 28, 2025 21:05

brian-dellabetta reviewed Aug 28, 2025

View reviewed changes

src/llmcompressor/modeling/llama4.py Show resolved Hide resolved

fynnsu reviewed Aug 29, 2025

View reviewed changes

src/llmcompressor/modeling/deepseek_v3.py Show resolved Hide resolved

update test

dc62205

Signed-off-by: Kyle Sayers <[email protected]>

dsikka reviewed Sep 2, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/calib

3003c83

kylesayrs marked this pull request as draft September 9, 2025 11:49

brian-dellabetta approved these changes Sep 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE] MoE Calibration with `calibrate_all_experts` #1760

[MoE] MoE Calibration with `calibrate_all_experts` #1760

Uh oh!

kylesayrs commented Aug 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

fynnsu left a comment •

edited

Loading

Uh oh!

Uh oh!

dsikka left a comment

Uh oh!

kylesayrs commented Sep 9, 2025

Uh oh!

brian-dellabetta Sep 9, 2025

Uh oh!

Uh oh!

[MoE] MoE Calibration with calibrate_all_experts #1760

Are you sure you want to change the base?

[MoE] MoE Calibration with calibrate_all_experts #1760

Uh oh!

Conversation

kylesayrs commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

Uh oh!

Uh oh!

fynnsu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Sep 9, 2025

Uh oh!

brian-dellabetta Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[MoE] MoE Calibration with `calibrate_all_experts` #1760

[MoE] MoE Calibration with `calibrate_all_experts` #1760

kylesayrs commented Aug 19, 2025 •

edited

Loading

fynnsu left a comment •

edited

Loading