Skip to content

Commit 53dc5ee

Browse files
sergey-kozubcopybara-github
authored andcommitted
PR #28782: [XLA:GPU] Annotate cuBLAS/cuDNN outputs to avoid initcheck failures
Imported from GitHub PR openxla/xla#28782 Upgrades NVTX to v3.2.1 and marks the outputs of cuBLAS/cuDNN as initialized (as compute-sanitizer may emit false positives for kernels using TMA). Copybara import of the project: -- 55977057d4c3bc3008649cdedc7ddb7923780958 by Sergey Kozub <[email protected]>: [XLA:GPU] Annotate cuBLAS/cuDNN outputs to avoid initcheck failures Merging this change closes #28782 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#28782 from openxla:skozub/nvtx_init_annotation 55977057d4c3bc3008649cdedc7ddb7923780958 PiperOrigin-RevId: 788806680
1 parent 4ed7120 commit 53dc5ee

File tree

2 files changed

+22
-0
lines changed

2 files changed

+22
-0
lines changed

tsl/profiler/lib/nvtx_utils.cc

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,4 +119,21 @@ StringHandle RegisterString(ProfilerDomainHandle domain,
119119
buffer.append(suffix);
120120
return impl(buffer.c_str());
121121
}
122+
123+
void MarkMemoryInitialized(void const* address, size_t size,
124+
StreamHandle stream) {
125+
auto domain = DefaultProfilerDomain();
126+
nvtxMemVirtualRangeDesc_t range_desc{size, address};
127+
nvtxMemMarkInitializedBatch_t regions_desc{
128+
NVTX_EXT_COMPATID_MEM,
129+
sizeof(nvtxMemMarkInitializedBatch_t),
130+
NVTX_MEM_TYPE_VIRTUAL_ADDRESS,
131+
/*regionDescCount=*/1,
132+
sizeof(nvtxMemVirtualRangeDesc_t),
133+
&range_desc};
134+
nvtxMemCudaMarkInitialized(reinterpret_cast<nvtxDomainHandle_t>(domain),
135+
reinterpret_cast<cudaStream_t>(stream),
136+
/*isPerThreadStream=*/false, &regions_desc);
137+
}
138+
122139
} // namespace tsl::profiler

tsl/profiler/lib/nvtx_utils.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,4 +79,9 @@ void RangePush(ProfilerDomainHandle domain, StringHandle title,
7979
// version of RangePush
8080
uint64_t RegisterSchema(ProfilerDomainHandle domain, const void* schemaAttr);
8181
} // namespace tsl::profiler
82+
83+
// Mark a memory region as initialized.
84+
// This mitigates false positives from the compute sanitizer (initcheck).
85+
void MarkMemoryInitialized(void const* address, size_t size,
86+
tsl::profiler::StreamHandle stream);
8287
#endif // TENSORFLOW_TSL_PROFILER_LIB_NVTX_UTILS_H_

0 commit comments

Comments
 (0)