Skip to content

Conversation

jialun-zhang
Copy link

Summary: Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 30, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 30, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 30, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 30, 2025
…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 31, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 31, 2025
…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 31, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 31, 2025
…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@jialun-zhang jialun-zhang force-pushed the export-D79128843 branch 2 times, most recently from 29f3764 to 5199ed0 Compare July 31, 2025 18:37
jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 31, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Jul 31, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 5, 2025
…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 5, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 5, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 5, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 5, 2025
…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 5, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 5, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 6, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request Aug 6, 2025
…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79128843

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants