Skip to content

ValueError: num_query_groups (2) must be a multiple of tensor_model_parallel_size (8). #8219

@gaussWu

Description

@gaussWu

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

对Qwen3.5-397B-A17B 使用 mcore进行全量sft,num_query_groups必须是tp的倍数,这个是Megatron本身的限制吗,还是说需要升级版本

How to Reproduce / 如何复现

参数里面设置
--tensor_model_parallel_size 8
Megatron-core版本:Megatron-LM-core_v0.15.0
Swift版本:main分支

Additional Information / 补充信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions