ROCM-20519 - stabilize DeviceSynchronize functional race with scoped … by jaydeeppatel1111 · Pull Request #4250 · ROCm/rocm-systems

jaydeeppatel1111 · 2026-03-20T14:13:04Z

…copy size.

Motivation

Unit_hipDeviceSynchronize_Functional's pre-sync assertion races on fast GPUs
when transfers are tiny. Use a dedicated copy size for that test only (default
1 GiB) so Nullstream stays on small _SIZE.

Technical Details

Add HIP_TEST_DEVICE_SYNCHRONIZE_FUNCTIONAL_COPY_MB to tune or shrink on
memory-limited CI. Refresh comments to describe H2D/kernel/D2H ordering.

Issue observed on MI300a 4xGPUs.

JIRA ID

ROCM-20519

Test Plan

NA

Test Result

NA

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…copy size.

Copilot

Pull request overview

Adjusts Unit_hipDeviceSynchronize_Functional to reduce flakiness from a pre-sync host read racing with fast GPU execution by introducing a dedicated (configurable) transfer size for that test.

Changes:

Added functionalDeviceSynchronizeCopyBytes() to select a larger per-test copy size, configurable via HIP_TEST_DEVICE_SYNCHRONIZE_FUNCTIONAL_COPY_MB.
Switched the functional test’s host/device allocations and async copies from _SIZE to the dedicated copyBytes.
Updated comments to better describe the intended H2D/kernel/D2H ordering and the rationale for the larger transfer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T14:18:35Z

projects/hip-tests/catch/unit/device/hipDeviceSynchronize.cc

+  // 1 GiB per buffer (four allocations in this test ≈ 4 GiB total) — a practical default
+  // for fast links; use a smaller HIP_TEST_DEVICE_SYNCHRONIZE_FUNCTIONAL_COPY_MB on
+  // memory-constrained runners if the pre-sync check is not required there.
+  return 1024ULL * 1024 * 1024;


The default functionalDeviceSynchronizeCopyBytes() return value (1 GiB) makes this test allocate ~2 GiB pinned host memory + ~2 GiB device memory and perform ~4 GiB of H2D/D2H traffic. That is likely to fail (hipErrorOutOfMemory / memlock limits) or significantly slow down/timing-flake on common CI runners. Consider choosing a smaller default and/or deriving the size from available memory (e.g., via hipMemGetInfo) and skipping the test when the requested allocation can’t be satisfied, with the env var used to increase when needed on fast GPUs.

Suggested change

// 1 GiB per buffer (four allocations in this test ≈ 4 GiB total) — a practical default

// for fast links; use a smaller HIP_TEST_DEVICE_SYNCHRONIZE_FUNCTIONAL_COPY_MB on

// memory-constrained runners if the pre-sync check is not required there.

return 1024ULL * 1024 * 1024;

// 64 MiB per buffer by default (four allocations in this test ≈ 256 MiB total).

// Use HIP_TEST_DEVICE_SYNCHRONIZE_FUNCTIONAL_COPY_MB to increase this on very fast GPUs

// if the pre-sync check still races, or to further reduce it on memory-constrained runners.

return 64ULL * 1024 * 1024;

@copilot , Below check fails on some of ASICs which makes it racy and increasing _SIZE makes the kernel and memcopies to take longer and hence check passes.
REQUIRE(NUM_ITERS != A[NUM_STREAMS - 1][0] - 1);

Copilot · 2026-03-20T14:18:36Z

projects/hip-tests/catch/unit/device/hipDeviceSynchronize.cc

+  // This check assumes per-stream work is still in flight (typically still in a large H2D
+  // or the long kernel) when the host reads A[NUM_STREAMS-1][0]. Very fast GPUs or
+  // HIP_LAUNCH_BLOCKING=true can still fail it; increase copy size via
+  // HIP_TEST_DEVICE_SYNCHRONIZE_FUNCTIONAL_COPY_MB or drop the assertion if needed.

  REQUIRE(NUM_ITERS != A[NUM_STREAMS - 1][0] - 1);


The pre-sync REQUIRE(NUM_ITERS != A[...]) is still inherently racy because it reads the host buffer without any synchronization while a D2H copy may eventually write to that same memory. Even with a larger copy size, this can still be nondeterministic across devices/settings (and the comment already notes it may fail). To make the test stable, consider removing the pre-sync assertion entirely (keeping only the post-hipDeviceSynchronize check), or replacing it with a deterministic readiness check (e.g., hipStreamQuery/event query) rather than a host read race.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-03-20T14:32:52Z

@jaydeeppatel1111 I've opened a new pull request, #4256, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-03-20T14:33:08Z

@jaydeeppatel1111 I've opened a new pull request, #4257, to work on those changes. Once the pull request is ready, I'll request review from you.

ROCM-20519 - stabilize DeviceSynchronize functional race with scoped …

401a7ec

…copy size.

jaydeeppatel1111 requested a review from a team as a code owner March 20, 2026 14:13

Copilot AI review requested due to automatic review settings March 20, 2026 14:13

github-actions bot added the project: hip-tests label Mar 20, 2026

Copilot started reviewing on behalf of jaydeeppatel1111 March 20, 2026 14:15 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

systems-assistant bot added the organization: ROCm label Mar 20, 2026

This was referenced Mar 20, 2026

[WIP] Address feedback on ROCM-20519: replace racy pre-sync host read with hipStreamQuery #4256

Draft

[WIP] Address feedback on ROCM-20519: stabilize DeviceSynchronize functional race #4257

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCM-20519 - stabilize DeviceSynchronize functional race with scoped …#4250

ROCM-20519 - stabilize DeviceSynchronize functional race with scoped …#4250
jaydeeppatel1111 wants to merge 1 commit intodevelopfrom
ROCM-20519

jaydeeppatel1111 commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

jaydeeppatel1111 Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

jaydeeppatel1111 Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jaydeeppatel1111 commented Mar 20, 2026

Motivation

Technical Details

JIRA ID

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

jaydeeppatel1111 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

jaydeeppatel1111 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants