-
Notifications
You must be signed in to change notification settings - Fork 281
[STF] Fix incorrect level index in 3-depth execution policy #6089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[STF] Fix incorrect level index in 3-depth execution policy #6089
Conversation
Fixed a typo in places.cuh line 1665 where l2_size was incorrectly getting width from level 1 instead of level 2, causing Unsatisfiable
@19970126ljl great catch thanks a lot for all of your efforts ! I'm hijacking this PR to tell you are welcome to send any feature request or feedback on these launch constructs ([email protected]), the launch API is something we believe useful and the multi-level synchronization itself should be ok even if we can reduce the launch overhead and make it more efficient with CUDA graphs, but mechanisms like apply_partition to iterate over data subsets are not optimized enough to be applicable. |
/ok to test a03c77a |
Thank you for for sharing such valuable insights! I'm currently learning and experimenting with cudaSTF to write some parallel programs, and I find it an excellent framework as well as a great learning resource. If I come across any new findings, ideas, or potential improvements while using it, I’ll be sure to reach out and discuss them with you right away. |
…iguration in multi-level specs
I've added an additional unit test as a PR in your branch which we could add to your fix |
This comment has been minimized.
This comment has been minimized.
…ndex Add a new test to ensure proper CUDA configurations
pre-commit.ci autofix |
/ok to test d5350b6 |
This comment has been minimized.
This comment has been minimized.
I can't push to your branch, but i'll add an additional
|
🥳 CI Workflow Results🟩 Finished in 14h 30m: Pass: 100%/53 | Total: 10h 24m | Max: 47m 08s | Hits: 81%/22571See results here. |
Description
Fixes a typo in
places.cuh
line 1665 wherel2_size
was incorrectly getting width from level 1 instead of level 2.closes #6090
Expected: 1 device × 4320 blocks × 256 threads
Actual error: "Maximum block size 640 threads, requested 4320 (level 2)"
The fix ensures
l2_size
correctly reads from level 2 (p.get_width(2)
),which represents threads per block, not level 1 which represents number of blocks. see places.cuh
Checklist