Skip to content

[WIP] Address feedback on ROCM-20519: stabilize DeviceSynchronize functional race#4257

Draft
Copilot wants to merge 2 commits intoROCM-20519from
copilot/sub-pr-4250-again
Draft

[WIP] Address feedback on ROCM-20519: stabilize DeviceSynchronize functional race#4257
Copilot wants to merge 2 commits intoROCM-20519from
copilot/sub-pr-4250-again

Conversation

Copy link
Contributor

Copilot AI commented Mar 20, 2026

The pre-sync assertion in Unit_hipDeviceSynchronize_Functional was unconditional, causing spurious failures on fast ASICs where all stream work completes before the host reaches the check.

Motivation

REQUIRE(NUM_ITERS != A[NUM_STREAMS - 1][0] - 1) fires before hipDeviceSynchronize() to verify work is still in-flight. On fast ASICs the GPU finishes the entire pipeline (H2D → kernel → D2H) before the host reaches that line, making the assertion inherently racy regardless of synchronization correctness.

Technical Details

Guard pre-sync assertion with hipStreamQuery — skip rather than fail when the GPU has already finished:

// Before: always asserted, fails on fast ASICs
REQUIRE(NUM_ITERS != A[NUM_STREAMS - 1][0] - 1);

// After: only asserts when stream is confirmed still in-flight
if (hipStreamQuery(stream[NUM_STREAMS - 1]) == hipErrorNotReady) {
    REQUIRE(NUM_ITERS != A[NUM_STREAMS - 1][0] - 1);
}

hipStreamQuery returning hipErrorNotReady guarantees the D2H copy hasn't landed yet (it's the last op in the stream), so the assertion is semantically sound when it fires. When the GPU has finished, skipping is correct — not a false pass.

Keep the 1 GiB default in functionalDeviceSynchronizeCopyBytes() — a smaller value (e.g. the previously suggested 64 MiB) would allow fast ASICs to finish the transfer before the host reaches hipStreamQuery, causing the guard to skip on every run and making the pre-sync check entirely inert. 1 GiB widens the in-flight window so the assertion exercises the pre-sync path on typical hardware, while still gracefully skipping on very fast ASICs instead of failing.

The post-sync REQUIRE(NUM_ITERS == A[NUM_STREAMS - 1][0] - 1) is unchanged and remains the primary correctness check.

JIRA ID

ROCM-20519

Test Plan

Manual inspection of the guard logic. The test is inherently timing-dependent and best validated by running on the fast ASIC hardware that triggered the race.

Test Result

Pre-existing CI baseline unchanged. The guard eliminates the known spurious failure path on fast ASICs.

Submission Checklist


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…ures on fast ASICs

Co-authored-by: jaydeeppatel1111 <106300970+jaydeeppatel1111@users.noreply.github.com>
Copilot AI changed the title [WIP] [WIP] Address feedback on ROCM-20519: stabilize DeviceSynchronize functional race [WIP] Address feedback on ROCM-20519: stabilize DeviceSynchronize functional race Mar 20, 2026
Copilot AI requested a review from jaydeeppatel1111 March 20, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants