Skip to content

Conversation

@aleozlx
Copy link
Collaborator

@aleozlx aleozlx commented Oct 4, 2025

📌 Description

trtllm-gen bf16 moe

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).
pytest -x -v tests/moe/test_trtllm_gen_fused_moe.py -k All_BF16

9 passed, 999 skipped

====

pytest  tests/moe/test_trtllm_gen_fused_moe.py

PENDING.. some IMA in existing tests is detected

Reviewer Notes

In the new trtllm_bf16_moe interface, i used * in the argument list to mark which ones should be passed by keyword only. It is a practice to make function calls less error prone when the function has a very long list of arguments. Before * are the ones commonly used, whereas the ones after are optional ones / perf tuning.

@aleozlx
Copy link
Collaborator Author

aleozlx commented Oct 4, 2025

IMA

pytest tests/moe/test_trtllm_gen_fused_moe.py::test_moe_quantization_classes[SwiGlu-Shuffled_MajorK-DSLite-NvFP4xNvFP4-1024-1024-1]

@yzh119
Copy link
Collaborator

yzh119 commented Oct 4, 2025

Hi @aleozlx can we confirm whether this IMA is a kernel issue or an integration issue?

jiahanc added a commit that referenced this pull request Nov 8, 2025
<!-- .github/pull_request_template.md -->

## 📌 Description
- Refactor `trtllm_fused_moe_kernel_launcher.cu` to use class structure
for code cleanliness and readability
- Add BF16 MOE, initial PR
(#1859) from @aleozlx
and @nekorobov
- Add BF16 MOE autotune
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* BF16 Mixture-of-Experts (MoE) pathway added with autotuning and public
API access.

* **Improvements**
* Unified BF16/FP8/FP4/FP16 pathways with clearer dtype compatibility
checks and corrected operator return semantics.
* Routing selection now respects token-size and input packing, and
diagnostics produce more descriptive error messages.

* **Tests**
* Expanded BF16 test coverage across routing modes, weight layouts, and
token sizes.

* **Chores**
  * Updated artifact metadata and checksums.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: jiahanc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants