Skip to content

Commit a25d22b

Browse files
author
ankurneog
committed
[RFC] Generalize pytorch content for non-native device execution
1 parent 87f4656 commit a25d22b

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed

RFC-0039-generalize-pytorch-ut.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
2+
# [RFC] Generalization of PyTorch framework UT for non-cuda device execution
3+
4+
**Authors:**
5+
* @ankurneog
6+
7+
8+
## **Summary**
9+
Modify PyTorch framework UTs so that non-cuda devices such as intel Gaudi and intel XPU is able to harness the content and improve quality.
10+
11+
12+
## **Motivation**
13+
The Pytorch framework UTs are good indicator for device stack health, however these are mostly written for cpu and cuda devices, which restricts its use for non-cuda devices.
14+
15+
We propose to modify the content wherever possible to make it available for non-cuda device execution
16+
17+
This will also ensure greater participation for content enhancement.
18+
19+
### **Examples**
20+
21+
* The execution is blocked for non-native devices using decorators such as ```onlyNativeDevices```
22+
* The execution is blocked for cuda only using decorators such as ```onlyNCCL``` or ```onlyCUDA```
23+
* Need scalable mechanism to select Dtypes per op described in OpInfo or ModuleInfo instead of using separate variable similar to ```dtypesIfCUDA```
24+
* Need scalable mechanism to skip for different devices instead of using specific decorator ```skipIfCUDA```
25+
* The dynamo content should be refactored to allow tweaking per platform/device for eg. addition of custom backends or skipping in case of unsupported backends
26+
* Distributed content assumes most execution is done for nccl and gloo, with almost entire non-cpu content hard coded for nccl.
27+
28+
## **Proposed Implementation**
29+
Since the content is huge, we propose a staggered approach for the implementation
30+
Steps:
31+
* Remove restriction imposed through @onlyNativeDevices in core content, replace these with hooks so that supported devices can enable their content selectively.
32+
These should be flexible enough to support both in-tree and out-of-tree devices.
33+
* Dtypes for a device should be dynamically loaded per op based on a common dictionary, instead of using different variables per device , eg: dtypesIfCuda
34+
* Miscelleneous decorators such as @skipIfCuda should be generalized @skipIfDevice
35+
* Extend use of instantiate_device_type for all content, so that developers are forced to use generalized device code rather than using "cuda" or "cpu"
36+
* Generalize common distributed content , so that it can be extended for non nccl backends such as intel's hccl and ccl
37+
* Generalize the dynamo content for specific backends which other devices might want to verify with existing content, the backends should always be extracted from
38+
a list that is abstracted out and the list can be appended per device per TC.
39+
40+
41+
42+
#### Metrics
43+
Other devices can track the pass-percentage and be part of the CI if the coverage and pass percentage is good.
44+
45+
#### Additional Context
46+
Towards adding support for Intel Gaudi devices we have already done couple of changes in this regard.
47+
* Removing onlyNativeDevice : https://github.com/pytorch/pytorch/pull/128584
48+
49+
* Changing Dynamo Content : https://github.com/pytorch/pytorch/pull/130714
50+
51+
* Generalizing Distributed Content : https://github.com/pytorch/pytorch/pull/131758
52+
53+
* Generalizing FSDP Content : https://github.com/pytorch/pytorch/pull/133209
54+
55+
More to follow
56+
57+
58+
### Next Steps
59+
As part of introducing support for intel Gaudi which is an out-of-tree device, we are already introduces changes to support it in a manner that can be used by other devices as well.
60+
61+
62+

0 commit comments

Comments
 (0)