You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+99-5Lines changed: 99 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,40 @@
2
2
3
3
Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs.amd.com/projects/HIP/en/latest/index.html)
4
4
5
+
## HIP 6.5 for ROCm 6.5
6
+
7
+
### Added
8
+
9
+
* New support for Open Compute Project (OCP) floating-point `FP4`/`FP6`/`FP8` as the following. For details, see [Low precision floating point document](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/low_fp_types.html).
10
+
- Data types for `FP4`/`FP6`/`FP8`.
11
+
- HIP APIs for `FP4`/`FP6`/`FP8`, which are compatible with corresponding CUDA APIs.
12
+
- HIP Extensions APIs for microscaling formats, which are supported on AMD GPUs.
13
+
* New `wptr` and `rptr` values in `ClPrint`, for better logging in dispatch barrier methods.
14
+
* New debug mask, to print precise code object information for logging.
15
+
16
+
### Changed
17
+
18
+
* Some unsupported GPUs such as gfx8, gfx8 and gfx7 are deprecated on Microsoft Windows.
19
+
20
+
### Optimized
21
+
22
+
HIP runtime has the following functional improvements which greatly improve runtime performance and user experience.
23
+
24
+
* Reduced usage of the lock scope in events and kernel handling.
25
+
- Switches to `shared_mutex` for event validation, uses `std::unique_lock` in HIP runtime to create/destroy event, instead of `scopedLock`.
26
+
- Reduces the `scopedLock` in handling of kernel execution. HIP runtime now calls `scopedLock` during kernel binary creation/initialization,
27
+
doesn't call it again during kernel vector iteration before launch.
28
+
* Implementation of unifying managed buffer and kernel argument buffer so HIP runtime doesn't need to create/load a separate kernel argument buffer.
29
+
* Refactored memory validation, creates a unique function to validate a variety of memory copy operations.
30
+
* Improved kernel logging using demangling shader names.
31
+
* Advanced support for SPIRV, now kernel compilation caching is enabled by default. This feature is controlled by the environment variable `AMD_COMGR_CACHE`, for details, see [hip_rtc document](https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_rtc.html).
32
+
* Programmatic support for scratch limit on GPU device. Developer can now use the environment variable `HSA_SCRATCH_SINGLE_LIMIT` to change the default allocation size with expected scratch limit.
33
+
34
+
### Resolved issues
35
+
36
+
* Error of "unable to find modules" in HIP clean up for code object module.
37
+
38
+
5
39
## HIP 6.4 (For ROCm 6.4)
6
40
7
41
### Added
@@ -13,15 +47,75 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
13
47
-`hipGraphBatchMemOpNodeGetParams` returns the pointer of parameters from the batch memory operation node.
14
48
-`hipGraphBatchMemOpNodeSetParams` sets parameters for the batch memory operation node.
15
49
-`hipGraphExecBatchMemOpNodeSetParams` sets the parameters for a batch memory operation node in the given executable graph.
50
+
-`hipLinkAddData` adds SPIRV code object data to linker instance with options.
51
+
-`hipLinkAddFile` adds SPIRV code object file to linker instance with options.
52
+
-`hipLinkCreate` creates linker instance at runtime with options.
53
+
-`hipLinkComplete` completes linking of program and output linker binary to use with hipModuleLoadData.
54
+
-`hipLinkDestroy` deletes linker instance.
55
+
56
+
### Changed
57
+
58
+
* roc-obj* tools are being deprecated, and will be removed in an upcoming release.
59
+
- Perl package dependencies are now RECOMMENDS or SUGGESTS. Users will need to install these themselves.
60
+
- Support for ROCm Object tooling has moved into llvm-objdump provided by package rocm-llvm.
61
+
* SDMA retainer logic is removed for engine selection in operation of runtime buffer copy.
62
+
63
+
### Optimized
64
+
65
+
*`hipGraphLaunch` parallelism is improved for complex data-parallel graphs.
66
+
* Round-robin queue mechanism is updated for command scheduling. For multi-streams execution, HSA queue from null stream lock is freed and won't occupy the queue ID after the kernel in the stream is finished.
67
+
* The HIP runtime doesn't free bitcode object before code generation. It adds a cache, which allows compiled code objects to be reused instead of recompiling. This improves performance on multi-GPU systems.
68
+
* Runtime uses unified copy approach
69
+
- Unpinned `H2D`copies are no longer blocking until the size of 1MB.
70
+
- Kernel copy path is enabled for unpinned `H2D`/`D2H` methods.
71
+
- The default environment variable `GPU_FORCE_BLIT_COPY_SIZE` is set to `16`, which limits the kernel copy to sizes less than 16 KB, while copies about that would be handled by `SDMA` engine.
72
+
- Blit code is refactored and ASAN instrumentation is cleaned up.
16
73
17
74
### Resolved issues
18
75
19
-
* Out of memory error on Windows. When the user calls the API hipMalloc for device memory allocation specifying a size larger than the available device memory, the HIP runtime fixes the error in the API implementation, allocating the available device memory plus system memory (shared virtual memory). The fix is not available on Linux.
76
+
* Out of memory error on Windows. When the user calls `hipMalloc` for device memory allocation while specifying a size larger than the available device memory, the HIP runtime fixes the error in the API implementation, allocating the available device memory plus system memory (shared virtual memory).
77
+
* Error of dependency on libgcc-s1 during rocm-dev install on Debian Buster. HIP runtime now uses libgcc1 for this distros.
78
+
* Stack corruption during kernel execution. HIP runtime now adds maximum stack size limit based on the GPU device feature.
79
+
80
+
### Upcoming changes
81
+
82
+
The following are the list of backwards incompatible changes planned for the upcoming major ROCm release.
83
+
84
+
* Signature changes in APIs to match corresponding CUDA APIs,
85
+
-`hiprtcCreatreProgram`
86
+
-`hiprtcCompileProgram`
87
+
-`hipCtxGetApiVersion`
88
+
* Behaviour of `hipPointerGetAttributes` is changed to match corresponding CUDA API in version 11 and later releases.
89
+
* Return error/value codes update in the following hip APIs, they now match the corresponding CUDA APIs,
90
+
-`hipModuleLaunchKernel`
91
+
-`hipExtModuleLaunchKernel`
92
+
-`hipModuleLaunchCooperativeKernel`
93
+
-`hipGetTextureAlignmentOffset`
94
+
-`hipTexObjectCreate`
95
+
-`hipBindTexture2D`
96
+
-`hipBindTextureToArray`
97
+
-`hipModuleLoad`
98
+
-`hipLaunchCooperativeKernelMultiDevice`
99
+
-`hipExtLaunchCooperativeKernelMultiDevice`
100
+
101
+
* HIPRTC implementation, the compilation of hiprtc now uses namespace ` __hip_internal`, instead of the standard headers `std`.
102
+
* Stream capture mode update in the following hip APIs. Stream can only be captured in relax mode, to match the behavior of the corresponding CUDA APIs,
103
+
-`hipMallocManaged`
104
+
-`hipMemAdvise`
105
+
-`hipLaunchCooperativeKernelMultiDevice`
106
+
-`hipDeviceSetCacheConfig`
107
+
-`hipDeviceSetSharedMemConfig`
108
+
-`hipMemPoolCreate`
109
+
-`hipMemPoolDestory`
110
+
-`hipDeviceSetMemPool`
111
+
-`hipEventQuery`
112
+
* The implementation of `hipStreamAddCallback` is updated, to match the behaviour of CUDA.
113
+
* Removal of hiprtc symbols from hip library.
114
+
- hiprtc will be a independent library, all symbols supported in hip library are removed.
115
+
- Any application using hiprtc APIs should link explicitly with hiprtc library.
116
+
- This change makes the usage of hiprtc library on Linux the same as on Windows, and matches the behaviour of CUDA nvrtc.
117
+
* Removal of deprecated struct `HIP_MEMSET_NODE_PARAMS`, developers can use definition `hipMemsetParams` instead.
20
118
21
-
### Changed
22
-
- roc-obj* tools are being deprecated, and will be removed in an upcoming release.
23
-
- Perl package dependencies are now RECOMENDS or SUGGESTS. Users will need to install these themselves.
24
-
- Support for ROCm Object tooling has moved into llvm-objdump provided by package rocm-llvm.
0 commit comments