Skip to content

Adding torch accelerator to ddp-tutorial-series example #1376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dggaytan
Copy link
Contributor

Adding accelerator to ddp tutorials examples

Support for multiple accelerators:

  • Updated ddp_setup functions in multigpu.py, multigpu_torchrun.py, and multinode.py to use torch.accelerator for device management. The initialization of process groups now dynamically selects the backend based on the device type, with a fallback to CPU if no accelerator is available.
  • Modified Trainer classes in multigpu_torchrun.py and multinode.py to accept a device parameter and use it for model placement and snapshot loading.

Improvements to example execution:

  • Added run_example.sh to simplify running tutorial examples with configurable GPU counts and node settings.
  • Updated run_distributed_examples.sh to include a new function for running all DDP tutorial series examples.

Dependency updates:

  • Increased the minimum PyTorch version requirement in requirements.txt to 2.7 to ensure compatibility with the new torch.accelerator API.

CC: @msaroufim @malfet @dvrogozh

Copy link

netlify bot commented Jul 21, 2025

Deploy Preview for pytorch-examples-preview canceled.

Name Link
🔨 Latest commit cb48338
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-examples-preview/deploys/687ea60fd2e52400086a7789

@meta-cla meta-cla bot added the cla signed label Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant