You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: separate SM100 and legacy TRT-LLM comm modules
Restructure the compilation of the TensorRT-LLM communication module to
improve hardware compatibility and portability.
Previously, the module was compiled with SM100-specific flags only if a
compatible GPU was detected during the build process.
This made a single build non-portable across different GPU generations.
This change introduces two distinct modules:
- `trtllm_comm`: compiled with SM100 optimizations for Hopper+ GPUs.
- `trtllm_comm_legacy`: a fallback version for older GPU architectures.
At runtime, `get_trtllm_comm_module` now detects the GPU's compute capability
and dynamically loads the appropriate module.
This allows a single FlashInfer build to support a wider range of NVIDIA GPUs
and gracefully handles CPU-only environments.
Signed-off-by: Emilien Macchi <[email protected]>
0 commit comments