Skip to content

Conversation

JackAKirk
Copy link
Contributor

Description

In some cases cuSolver operations can return a successful error code while failing. The previous implementation of this check is done via SYCL and requires the CPU to wait until the cuSolver function completes. This is not great for performance as it prevents the user for issuing more work to the queue until it has received a response from oneMKL.

This PR moves the check inside the host task.

@AidanBeltonS I've updated the cusolver_batch.cpp cases too, can you check it is OK?

int *devInfo = (int *)malloc_device(sizeof(int) * batch_size, queue);

auto done = queue.submit([&](sycl::handler &cgh) {
int64_t num_events = dependencies.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependencies loop can be simplified in the batch with depends_on_events(cgh, dependencies);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one. I've made these changes.

auto a_ = reinterpret_cast<cuDataType *>(a);
auto devInfo_ = reinterpret_cast<int *>(devInfo);
auto scratchpad_ = reinterpret_cast<cuDataType *>(scratchpad);
auto ipiv_ = reinterpret_cast<int *>(ipiv32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also remove this unnecessary reinterpret cast as ipiv32 is already an int *

@AidanBeltonS
Copy link
Contributor

LGTM!

@JackAKirk
Copy link
Contributor Author

@ericlars Could we get a review of this from someone? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants