Skip to content

Conversation

winglian
Copy link
Contributor

@winglian winglian commented Jul 23, 2025

What does this PR do?

#39474 causes the grad norm to become NaN during training. We don't need to revert the whole thing, but reverting just this function seems to fix the regression.

@ArthurZucker @SunMarc

pre-#39474 loss
Screenshot 2025-07-23 at 5 28 24 PM

post-#39474 loss
Screenshot 2025-07-23 at 5 31 19 PM

with this fix
Screenshot 2025-07-23 at 7 07 01 PM

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this won't work for inference! can we just have a if else!

@ArthurZucker
Copy link
Collaborator

(Thanks a lot for checking sorry for breaking)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks makes sense!

@ArthurZucker ArthurZucker merged commit 5a81d7e into huggingface:main Jul 24, 2025
23 checks passed
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants