-
Notifications
You must be signed in to change notification settings - Fork 39
Add resident device change call #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/memory_provider.c
Outdated
hProvider->provider_priv, deviceIndex, isAdding); | ||
checkErrorAndSetLastProvider(res, hProvider); | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied
src/provider/provider_cuda.c
Outdated
return UMF_RESULT_ERROR_INVALID_ARGUMENT; | ||
} | ||
|
||
static umf_result_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a "default" handler in memory_provider.c (see umfDefaultCtlHandle) and do not modify CUDA provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where exactly in the code you think "default" is missing?
do not modify CUDA provid
Rejected. umf_memory_provider_ops_t type received a new field and it should be somehow initialized in UMF_CUDA_MEMORY_PROVIDER_OPS as all other fields. Please explain if you still want me to remove this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please initialize cuda ops .ext_resident_device_change to NULL.
Next, define the umfDefaultResidentDeviceChange in src/memory_provider.c that returns UMF_RESULT_ERROR_NOT_SUPPORTED and set this handler for each created provider if the ext_resident_device_change is set to NULL (see how this is done for other default handlers like "umfDefaultCloseIPCHandle")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, applied
src/provider/provider_tracking.c
Outdated
p->hUpstream, memory_property_id, size); | ||
} | ||
|
||
static umf_result_t trackingResidentDeviceChange(void *provider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rejected. umf_memory_provider_ops_t type received a new field and it should be somehow initialized in UMF_TRACKING_MEMORY_PROVIDER_OPS as all other fields. Please explain if you still want me to remove this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied in the same way as in cuda provider
test/common/provider_null.c
Outdated
return UMF_RESULT_SUCCESS; | ||
} | ||
|
||
static umf_result_t nullResidentDeviceChange(void *provider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rejected, same reason as above
src/provider/provider_level_zero.c
Outdated
|
||
if (deviceCount && !hDevices) { | ||
LOG_ERR("Resident devices array is NULL, but deviceCount is not zero"); | ||
if (residentDevicesCount && !residentDevicesIndices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also check if residentDevicesCount == 0 but residentDevicesIndices != NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionally, should indices be unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, added checking for uniqueness
src/provider/provider_level_zero.c
Outdated
} | ||
|
||
struct ze_memory_provider_resident_device_change_data { | ||
bool isAdding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_adding
change_data->source_memory_provider | ||
->device_handles[change_data->peer_device_index]; | ||
|
||
// TODO: add assertions to UMF and change it to be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ASSERT() macros from src/utils/utils_assert.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postponed to some future PR. I need here a permanent ASSERT, not a one that compiles on in debug config like ASSERT does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you assert only things which are can be an artefact of some internal umf error - if it is true debug only assert is fine. If an error can be a result of an user input you just handle the issue and return a error to the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case it is artifact of some internal umf error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, what Łukasz was saying, since it is artifact of some internal umf error
a simple debug assert is ok - pls add assert()
and remove the TODO
info->props.base, info->props.base_size); | ||
} else { | ||
result = ZE_RESULT_SUCCESS; | ||
// TODO: currently not implemented call evict here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you plan to add the missing code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR.
src/provider/provider_level_zero.c
Outdated
if (ze_provider->resident_device_count == 0 || | ||
existing_peer_index == ze_provider->resident_device_count) { | ||
// not found | ||
if (!isAdding) { // impossible for UR, should be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this should be an assertion, please use the ASSERT() macro from src/utils/utils_assert.h
Also, move the comment to the next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be changed in future PR when umf will have permanent assertions. Comments moved.
src/provider/provider_level_zero.c
Outdated
|
||
static umf_result_t | ||
ze_memory_provider_resident_device_change(void *provider, uint32_t device_index, | ||
bool isAdding) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_adding
src/provider/provider_level_zero.c
Outdated
// adding case | ||
if (ze_provider->device_count <= | ||
ze_provider | ||
->resident_device_count) { // impossible for UR, should be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this should be an assertion, please use the ASSERT() macro from src/utils/utils_assert.h
Also, move the comment to the next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postponed when UMF have permanent assertions, comments moved
src/provider/provider_level_zero.c
Outdated
} | ||
|
||
if (ze_provider->device_count <= | ||
device_index) { // impossible for UR, should be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied as above
src/provider/provider_level_zero.c
Outdated
|
||
} else { | ||
// found | ||
if (isAdding) { // impossible for UR, should be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied as above
src/provider/provider_level_zero.c
Outdated
LOG_ERR("umfMemoryTrackerIterateAll did not manage to do some change " | ||
"numFailed:%d, numSuccess:%d", | ||
privData.success_changes, privData.failed_changes); | ||
return UMF_RESULT_ERROR_INVALID_ARGUMENT; // probably some other result is better, best just change into assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use UMF_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC
or change to assert
d53a23f
to
9d7e12c
Compare
src/libumf.def
Outdated
umfMemoryProviderPurgeForce | ||
umfMemoryProviderPurgeLazy | ||
umfMemoryProviderPutIPCHandle | ||
umfMemoryProviderResidentDeviceChange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls move down in the file (after a comment ; Added in UMF_1.0
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- first of all, wrong function was moved -
umfMemoryProviderResidentDeviceChange
is the new one - mea culpa, it should be already in
; Added in UMF_1.1
section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved down in 01d74ba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umfLevelZeroMemoryProviderResidentDeviceChange
still not in the 1.1 section (it should be right after umfLevelZeroMemoryProviderParamsSetName
)
include/umf/memory_provider_ops.h
Outdated
/// @brief Adds or removes devices on which allocations should be made | ||
/// resident. | ||
/// @param provider handle to the memory provider | ||
/// @param device_index identifier of device | ||
/// @param is_adding Boolean indicating if peer is to be removed or added | ||
/// @return UMF_RESULT_SUCCESS on success or appropriate error code on | ||
/// failure. | ||
umf_result_t (*ext_resident_device_change)(void *provider, | ||
uint32_t device_index, | ||
bool is_adding); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not be in OPS. This is something very specific to L0 provider. Resident devices are passed to L0 provider thru provider specific params, so control of them should be olso be done through provider specific API, this is why i think with should be implemented through CTL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rejected. We discussed this in teams (4th Aug, 25) we do it as ops (Piotr, sorry in Polish): co do tego API co Ty robisz (w stylu daj mi wszystkie zaalokowane page'e) ja bym chyba sugerował zrobić API a nie robić przez CTLa, Rafał: no to faktycznie chyba dedykowane API lepiej pasuje niż CTL.
CTL should be for statistics, not a universal, hard-to-read tool to implement any API, Łukasz (me): Być może będziecie chcieli uprościć ext_ctl by nie był maszynką, którą mozna zaimplementować wszystko, a jedynie służył do statystyk. Ale to już ja się na tym nie znam - zostawiam do przemyśleń.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about API to iterate over all allocations, not about API to modify some internal settings of the L0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see it both ways. My comment during that discussion was about a generic functionality implementable for all providers. This isn't that.
I also remember making a point about CTL being useful for provider/pool-specific functionality.
On the other hand, adding an API function is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, adding an API function is simpler.
Thank you for the comment. Chosen simpler way.
/// @brief Set the resident devices in the parameters struct. | ||
/// @param hParams handle to the parameters of the Level Zero Memory Provider. | ||
/// @param hDevices array of devices for which the memory should be made resident. | ||
/// @param deviceCount number of devices for which the memory should be made resident. | ||
/// @param hDevices array of all devices for which the memory can be made resident. | ||
/// @param deviceCount number of devices for which the memory can be made resident. | ||
/// @param residentDevicesIndices array of indices in all devices array to devices for which the memory should be made resident. | ||
/// @param residentDevicesCount number of items in indices array. | ||
/// @return UMF_RESULT_SUCCESS on success or appropriate error code on failure. | ||
umf_result_t umfLevelZeroMemoryProviderParamsSetResidentDevices( | ||
umf_level_zero_memory_provider_params_handle_t hParams, | ||
ze_device_handle_t *hDevices, uint32_t deviceCount); | ||
ze_device_handle_t *hDevices, uint32_t deviceCount, | ||
uint32_t *residentDevicesIndices, uint32_t residentDevicesCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an API(and ABI) break - we are post 1.0 release so you cannot do changes like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I bump some number? Only UR uses this API and I am changing UR right now as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if UR uses only this API, old UR should work with new umf. And we will not do 2.0 release just after 1.0.
You cannot brake API. If changes are needed we have to keep old function working correctly and add new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't break backwards compatibility with SYCL / UR. Changing a major version is a significant undertaking that involves updating the components across all the different layers (UMF is nearly the lowest-most component in the stack). We've been burned on this in the past, and it's very disruptive.
In this case, my suggestion would be to simply create a new function that sets the indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, applied, I rewrote code in 53ec753 to be backward compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 / 16 files reviewed (will be continued)
include/umf/memory_provider_ops.h
Outdated
/// failure. | ||
umf_result_t (*ext_resident_device_change)(void *provider, | ||
uint32_t device_index, | ||
bool is_adding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we have two functions - add/remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not apply this, please, because this will introduce too much copy-pasted code.
88f672b
to
53ec753
Compare
I still oppose against adding provider specific code to ops interface. note: remember that you will have to call umfMemoryProviderGetPriv() in this function to retrieve ze_memory_provider_t from provider handle |
src/libumf.def
Outdated
umfCUDAMemoryProviderParamsSetContext | ||
umfCUDAMemoryProviderParamsSetDevice | ||
umfCUDAMemoryProviderParamsSetMemoryType | ||
umfLevelZeroMemoryProviderParamsSetResidentDevices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep alphabetical order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, applied
src/provider/provider_level_zero.c
Outdated
|
||
static void init_ze_global_state(void) { | ||
|
||
char *lib_name = getenv("UMF_ZE_LOADER_LIB_NAME"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need this code? will this code path be used by UR? if not please remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see MockedLevelZeroTestEnvironment::SetUp This variable is needed by tests which mock L0.
src/provider/provider_level_zero.c
Outdated
|
||
static void init_ze_global_state(void) { | ||
|
||
char *lib_name = getenv("UMF_ZE_LOADER_LIB_NAME"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this thru CTL please.
We created CTL to limit new envvariables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for tests only. What is wrong with env variables? They are simple to use in the code and to understand what they do there. Anyway I may do it by CTL. Please write me how to set in a way that it will be effective in this early place, that is global state initialization and how to read it.
hParams->resident_device_handles = hDevices; | ||
hParams->resident_device_count = deviceCount; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we copy this array? - user after call to this function can do anything with this array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it worked before. UR code is aware of it and keeps params until they are used. hParams is copied in ze_memory_provider_initialize. Decided not to fix this in this PR.
ze_provider->context, ze_provider->resident_device_handles[i], | ||
*resultPtr, size); | ||
if (ze_result != ZE_RESULT_SUCCESS) { | ||
utils_read_unlock(&ze_provider->resident_device_rwlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return an failure then it's means allocation failed, but allocation is still "done". IMHO we should free it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, fixed
change_data->source_memory_provider | ||
->device_handles[change_data->peer_device_index]; | ||
|
||
// TODO: add assertions to UMF and change it to be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you assert only things which are can be an artefact of some internal umf error - if it is true debug only assert is fine. If an error can be a result of an user input you just handle the issue and return a error to the user
src/provider/provider_level_zero.c
Outdated
|
||
// TODO: add assertions to UMF and change it to be an assertion | ||
if (info->props.base != (void *)key) { | ||
LOG_ERR("key:%p is different than base:%p", (void *)key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG_FATAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, applied
src/utils/utils_log.h
Outdated
#define LOG_DEBUG(...) utils_log(LOG_DEBUG, __func__, __VA_ARGS__); | ||
#define LOG_INFO(...) utils_log(LOG_INFO, __func__, __VA_ARGS__); | ||
#define LOG_WARN(...) utils_log(LOG_WARNING, __func__, __VA_ARGS__); | ||
#define LOG_ERR(...) utils_log(LOG_ERROR, __func__, __VA_ARGS__); | ||
#define LOG_FATAL(...) utils_log(LOG_FATAL, __func__, __VA_ARGS__); | ||
|
||
#define LOG_PDEBUG(...) utils_plog(LOG_DEBUG, __func__, __VA_ARGS__); | ||
#define LOG_PINFO(...) utils_plog(LOG_INFO, __func__, __VA_ARGS__); | ||
#define LOG_PWARN(...) utils_plog(LOG_WARNING, __func__, __VA_ARGS__); | ||
#define LOG_PERR(...) utils_plog(LOG_ERROR, __func__, __VA_ARGS__); | ||
#define LOG_PFATAL(...) utils_plog(LOG_FATAL, __func__, __VA_ARGS__); | ||
#ifdef UMF_DEVELOPER_MODE | ||
#define UMF_STRINGIFY(x) #x | ||
#define UMF_TOSTRING(x) UMF_STRINGIFY(x) | ||
#define UMF_FUNC_DESC() __FILE__ ":" UMF_TOSTRING(__LINE__) | ||
#else | ||
#define UMF_FUNC_DESC() __func__ | ||
#endif | ||
|
||
#define LOG_DEBUG(...) utils_log(LOG_DEBUG, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_INFO(...) utils_log(LOG_INFO, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_WARN(...) utils_log(LOG_WARNING, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_ERR(...) utils_log(LOG_ERROR, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_FATAL(...) utils_log(LOG_FATAL, UMF_FUNC_DESC(), __VA_ARGS__); | ||
|
||
#define LOG_PDEBUG(...) utils_plog(LOG_DEBUG, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PINFO(...) utils_plog(LOG_INFO, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PWARN(...) utils_plog(LOG_WARNING, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PERR(...) utils_plog(LOG_ERROR, UMF_FUNC_DESC(), __VA_ARGS__); | ||
#define LOG_PFATAL(...) utils_plog(LOG_FATAL, UMF_FUNC_DESC(), __VA_ARGS__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a separate PR.
There are comments which i whould make - but i strongly believe that this discussion should happen in the separate PR to do not block this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/provider/provider_level_zero.c
Outdated
// found | ||
if (is_adding) { | ||
utils_write_unlock(&ze_provider->resident_device_rwlock); | ||
// impossible for UR, should be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho we should not assume that user will not do something then is should be an assertion. If it comes it for any kind of user it should be validated an correct error should be returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, comment removed
|
||
uint32_t existing_peer_index = 0; | ||
utils_write_lock(&ze_provider->resident_device_rwlock); | ||
while (existing_peer_index < ze_provider->resident_device_count && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: this code whould be much easier to read if it whould be a for loop
for (index = 0; index < count; index++) if (device[index] == index) break;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need for loop index to be available outside the loop. I don't think it is more readable. Left as is.
const uint32_t new_capacity = | ||
ze_provider->resident_device_capacity * 2 + | ||
1; // +1 to work also with old capacity == 0 | ||
ze_device_handle_t *new_handles = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this code is performance critical, or will be called multiple times
If not i would prefer to have this array just always reallocated by one,
If yes can we abstract this vector like structure to separate module, or even better just use list structure with is included in umf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, not critical, applied
.failed_changes = 0, | ||
}; | ||
|
||
umf_result_t result = umfMemoryTrackerIterateAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is "hacky" and it will not work if someone uses pool without tracker or just use provider without pool.
We allready solved this issue in other providers were we keep track allocation by the provider (for example please check os_provider)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told by you, that I can use tracker. Since it is not a case now that l0 provider is used without tracker, so I am only leaving a comment here that warns about what you noticed.
Great! Using umfLevelZeroMemoryProviderResidentDeviceChange simplified code even more than using ops. Applied with pleasure. |
|
||
// defined in both libze_ops from src/utils/utils_level_zero.cpp and | ||
// ze_ops_t operations from src/provider/provider_level_zero.c | ||
// zeDeviceGetProperties, zeMemFree, zeMemGetAllocProperties, zeMemFree, zeMemAllocDevice, find them in the former ones No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing EOF
* | ||
*/ | ||
|
||
#ifndef UMF_PROVIDER_LEVEL_ZERO_MOCKS_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UMF_PROVIDER_LEVEL_ZERO_MOCKS_H
-> UMF_TEST_PROVIDER_LEVEL_ZERO_MOCKS_H
MOCK_METHOD2(zeMemFree, | ||
ze_result_t(ze_context_handle_t hContext, void *ptr)); | ||
|
||
// helper setting all expects related to successful l0 provider creation & initialization and calling its creation & initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls break a long line
// Under the Apache License v2.0 with LLVM Exceptions. See LICENSE.TXT. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
#ifndef UMF_ZE_LOOPBACK_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UMF_ZE_LOOPBACK_H
-> UMF_TEST_ZE_LOOPBACK_H
@@ -0,0 +1,96 @@ | |||
/* | |||
* | |||
* Copyright (C) 2023-2025 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply 2025
, as it's a new file
// apply to all new files
InitGoogleTest(&argc, argv); | ||
AddGlobalTestEnvironment(new MockedLevelZeroTestEnvironment); | ||
return RUN_ALL_TESTS(); | ||
} No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing EOF
change_data->source_memory_provider | ||
->device_handles[change_data->peer_device_index]; | ||
|
||
// TODO: add assertions to UMF and change it to be an assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, what Łukasz was saying, since it is artifact of some internal umf error
a simple debug assert is ok - pls add assert()
and remove the TODO
|
||
static void init_ze_global_state(void) { | ||
|
||
const char *lib_name = getenv("UMF_ZE_LOADER_LIB_NAME"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we document this new env var? is there a use case to set it outside of testing...?
UMF part of https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_peer_access.asciidoc feature.
UR/llvm part is in intel/llvm#19257