Skip to content

Commit dc960e7

Browse files
authored
[BugFix] Fix mlapo accuracy problem related with weight processing. (#3850)
This PR fixes a mlapo accuracy problem related with weight processing. Furthermore, add back mlapo related e2e test with quantized deepseek model. - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: whx-sjtu <[email protected]>
1 parent adadd50 commit dc960e7

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm_ascend/attention/mla_v1.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -826,9 +826,9 @@ def _process_weights_for_fused_mlapo(self, act_dtype: torch.dtype):
826826
..., self.q_lora_rank:].contiguous()
827827
q_a_proj_wt = self.fused_qkv_a_proj.weight.data[
828828
..., :self.q_lora_rank].contiguous()
829-
kv_a_proj_wt = kv_a_proj_wt.contiguous()
829+
kv_a_proj_wt = kv_a_proj_wt.t().contiguous()
830830
kv_a_proj_wt = trans_rope_weight(kv_a_proj_wt, self.qk_rope_head_dim)
831-
kv_a_proj_wt = kv_a_proj_wt.contiguous()
831+
kv_a_proj_wt = kv_a_proj_wt.t().contiguous()
832832
wd_qkv = torch.cat((kv_a_proj_wt, q_a_proj_wt), dim=-1)
833833
wd_qkv = wd_qkv.t().contiguous()
834834
wd_qkv = transdata(wd_qkv,

0 commit comments

Comments
 (0)