Add example for MOE in AutoParallel #3

fmassa · 2025-06-12T13:15:55Z

For now assumes it's balanced.

Note that AutoParallel doesn't work in this case (sort operator not implemented, and maybe other things as well involving the indexing)

Note: For now I'm picking a single expert per token, but that can be changed (maybe at the expense of replicating the tokens, which needs to be assessed to see if the memory increase is reasonable)

For now assumes it's balanced

…sa/example_moe

wconstab

kinda curious, do you have the solution handy for this?

…sa/example_moe

Everything seems to be working as expected!

…sa/example_moe

Taken from #3 and #29. Decomposing softmax_backward leads to prims.fma, which doesn't have a sharding rule and we end up having a Replicate showing up as only possible sharding

Very slow, need to try Sinkhorn-Knopp

Add example for MOE in AutoParallel

6a9e43d

For now assumes it's balanced

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 12, 2025

Merge branch 'main' of github.com:pytorch-labs/autoparallel into fmas…

1c4f872

…sa/example_moe

wconstab approved these changes Jun 16, 2025

View reviewed changes

wconstab reviewed Jun 16, 2025

View reviewed changes

fmassa added 8 commits July 25, 2025 09:01

Merge branch 'main' of github.com:pytorch-labs/autoparallel into fmas…

a0adbe6

…sa/example_moe

Update to latest main

d1ceb77

Increase number of experts so that they can be added to the dp dim

fa29567

Add debug example and use values from DeepSeekV3

4be83cc

Add top_k experts

4521067

Everything seems to be working as expected!

Fix meshdim name

09dd86f

Merge branch 'main' of github.com:meta-pytorch/autoparallel into fmas…

3506589

…sa/example_moe

Update to latest main

94856a8

ezyang approved these changes Sep 27, 2025

View reviewed changes

Merge branch 'main' of github.com:meta-pytorch/autoparallel into fmas…

8e33d24

…sa/example_moe

fmassa mentioned this pull request Sep 29, 2025

Add scatter propagation rule #169

Merged

Add more complete example

b3beb6b

fmassa added a commit that referenced this pull request Sep 29, 2025

Remove decomposition from softmax

e6b327e

Taken from #3 and #29. Decomposing softmax_backward leads to prims.fma, which doesn't have a sharding rule and we end up having a Replicate showing up as only possible sharding

fmassa mentioned this pull request Sep 29, 2025

Remove decomposition from softmax #171

Merged

fmassa added a commit that referenced this pull request Oct 1, 2025

Remove decomposition from softmax (#171)

6cd2133

Taken from #3 and #29. Decomposing softmax_backward leads to prims.fma, which doesn't have a sharding rule and we end up having a Replicate showing up as only possible sharding

Add balanced token->expert selection

6c7ddbb

Very slow, need to try Sinkhorn-Knopp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add example for MOE in AutoParallel #3

Add example for MOE in AutoParallel #3

Uh oh!

fmassa commented Jun 12, 2025 •

edited

Loading

Uh oh!

wconstab left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add example for MOE in AutoParallel #3

Are you sure you want to change the base?

Add example for MOE in AutoParallel #3

Uh oh!

Conversation

fmassa commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fmassa commented Jun 12, 2025 •

edited

Loading