optimize the generation of attention mask #331

imh966 · 2024-01-13T05:59:57Z

Hi, I found that the attention mask tensor is created on cpu, leading to inefficient operations on attention mask and an extra H2D operation.

Co-authored-by: Hyeongmin Moon <[email protected]> Co-authored-by: Zhewei Yao <[email protected]>

optimize the generation of attention mask

b3608c4

imh966 requested review from GuanhuaWang, ShadenSmith, arashb, awan-10, conglongli, duli2012, eltonzheng, minjiaz, mrwyattii, tjruwase and xiaoxiawu-microsoft as code owners January 13, 2024 05:59

XZQshiyu pushed a commit to XZQshiyu/Megatron-DeepSpeed that referenced this pull request Jan 15, 2025

Fixed typo with --only_optimize_lora (deepspeedai#331)

ab4e2e5

Co-authored-by: Hyeongmin Moon <[email protected]> Co-authored-by: Zhewei Yao <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize the generation of attention mask #331

optimize the generation of attention mask #331

Uh oh!

imh966 commented Jan 13, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

optimize the generation of attention mask #331

Are you sure you want to change the base?

optimize the generation of attention mask #331

Uh oh!

Conversation

imh966 commented Jan 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

imh966 commented Jan 13, 2024 •

edited

Loading