Skip to content

Conversation

gtong-nv
Copy link
Contributor

@gtong-nv gtong-nv commented Sep 16, 2025

Add RHI Device Caching and Test Prefix Exclusion

Summary

This PR introduces two key improvements to the Slang test infrastructure:

  1. RHI Device Caching: Implements device caching to significantly speed up test execution by reusing graphics devices across tests, RHI Device Caching reduces slang-test execution time from ~15 minutes to ~5 minutes in Windows release builds
  2. Test Prefix Exclusion: Adds -exclude-prefix option to skip tests matching specified path prefixes

Changes

RHI Device Caching

  • New DeviceCache class (slang-test-device-cache.h/cpp): Thread-safe device cache with LRU eviction (max 10 devices)
  • Cache control option: -cache-rhi-device flag in both slang-test and render-test
    • Default: enabled in slang-test, disabled in render-test when run standalone
    • Automatically skips caching for CUDA devices (due to driver issues)
  • Performance benefit: Eliminates expensive device creation/destruction cycles, especially beneficial for Vulkan on Tegra platforms

Test Prefix Exclusion

  • New -exclude-prefix <prefix> option in slang-test
  • Allows excluding entire test directories or patterns from execution
  • Complements existing -category and individual test filtering options

Usage Examples

# Enable device caching (default)
slang-test

# Disable device caching
slang-test -cache-rhi-device false

# Exclude tests from specific directories
slang-test -exclude-prefix tests/problematic/
slang-test -exclude-prefix tests/slow/ -exclude-prefix tests/experimental/

This change should significantly improve test execution performance, particularly in CI environments with frequent device operations. This is needed for running the GPU test in aarch64, where repeated device creation/destroy is causing driver issues.

Needed by: #8346

@gtong-nv gtong-nv changed the title WIP: Cache and reuse VK Device in slang-test Cache and reuse VK Device in slang-test Sep 19, 2025
@gtong-nv gtong-nv changed the title Cache and reuse VK Device in slang-test Add RHI Device Caching and Test Prefix Exclusion Sep 19, 2025
@gtong-nv gtong-nv marked this pull request as ready for review September 19, 2025 23:24
@gtong-nv gtong-nv requested a review from a team as a code owner September 19, 2025 23:24
@gtong-nv
Copy link
Contributor Author

/format

@slangbot
Copy link
Contributor

🌈 Formatted, please merge the changes from this PR

Automated code formatting for
#8448

Co-authored-by: slangbot <[email protected]>
@gtong-nv gtong-nv added the pr: non-breaking PRs without breaking changes label Sep 20, 2025
jkwak-work
jkwak-work previously approved these changes Sep 22, 2025
Copy link
Collaborator

@jkwak-work jkwak-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments but it looks good enough as it is.

Comment on lines +37 to +51
std::size_t DeviceCache::DeviceCacheKeyHash::operator()(const DeviceCacheKey& key) const
{
std::size_t h1 = std::hash<int>{}(static_cast<int>(key.deviceType));
std::size_t h2 = std::hash<bool>{}(key.enableValidation);
std::size_t h3 = std::hash<bool>{}(key.enableRayTracingValidation);
std::size_t h4 = std::hash<std::string>{}(key.profileName);

std::size_t h5 = 0;
for (const auto& feature : key.requiredFeatures)
{
h5 ^= std::hash<std::string>{}(feature) + 0x9e3779b9 + (h5 << 6) + (h5 >> 2);
}

return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3) ^ (h5 << 4);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, chatGPT suggested the following as a cleaner alternative,

    return std::hash<
        std::tuple<
            int, bool, bool, std::string, std::vector<std::string>
        >
    >{}(
        std::tuple{
            static_cast<int>(key.deviceType),
            key.enableValidation,
            key.enableRayTracingValidation,
            key.profileName,
            key.requiredFeatures
        }
    );

I haven't tried it.

Comment on lines +86 to +87
// Skip caching for CUDA devices due to crashes
if (desc.deviceType == rhi::DeviceType::CUDA)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not caching for CUDA?
What is the problem with this?
If this is a temporary WAR, we may need a new github issue for this.

Copy link
Contributor Author

@gtong-nv gtong-nv Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the CUDA device destroying, it calls the debugCall func pointer.
Our debug callback lifetime is per-test, and doesn't out live the device.

I tried a few approaches and but seems there is no easy fix. That requires a careful design to be threadsafe. I will create another issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we make debug callback also have the same lifetime as the device?

Comment on lines +24 to +28
uint64_t& DeviceCache::getNextCreationOrder()
{
static uint64_t instance = 0;
return instance;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to have a function scoped static for "getMutex".
But when the type is just a primitive type, it seems unnecessary.
Feel free to ignore this comment but I would go with a simpler/traditional code like,

static uint64_t s_nextCreationOrder = 0;

};

private:
static constexpr int MAX_CACHED_DEVICES = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore this comment, but we may want to set the value from the command-line argument for a debugging purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that in a follow up PR, as this will require we adding another option to both render-test-tool and cmdline arg to slang-test.

" -verbose-paths Use verbose paths in output\n"
" -category <name> Only run tests in specified category\n"
" -exclude <name> Exclude tests in specified category\n"
" -exclude-prefix <prefix> Exclude tests with specified path prefix\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be a separate PR for adding "exclude-XXX"
The tricky part is "prefix" is often not enough to describe which one to exclude.
The workaround had been just delete the files locally if you want to exclude them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some tests that we need to exclude for running GPU tests in aarch64. I filed an issue to track that #8468

I'd like to get this feature to enable the GPU tests for aarch64 ASAP.

for (auto& excludePrefix : context->options.excludePrefixes)
{
if (filePath.startsWith(excludePrefix))
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can print an information message when the verbose option is enabled.
Something like,

XXX file is excluded from the test because it is found from the exclusion list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. Updated.

std::sort(key.requiredFeatures.begin(), key.requiredFeatures.end());

// Evict oldest device if we've reached the limit
evictOldestDeviceIfNeeded();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to print some message when the verbose mode is enabled for a debugging purpose.
It will be useful if we can track when certain devices were created and when certain devices were evicted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be good, but I haven't seen verbose option in render-test-tool. That would requires us adding another option.
I will do that in a follow up PR

Comment on lines +108 to +112
for (int i = 0; i < desc.requiredFeatureCount; ++i)
{
key.requiredFeatures.push_back(desc.requiredFeatures[i]);
}
std::sort(key.requiredFeatures.begin(), key.requiredFeatures.end());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to use std::set if the list has to be always sorted?

@gtong-nv
Copy link
Contributor Author

/format

@slangbot
Copy link
Contributor

🌈 Formatted, please merge the changes from this PR

Automated code formatting for
#8448

Co-authored-by: slangbot <[email protected]>
Copy link
Collaborator

@jkwak-work jkwak-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@gtong-nv gtong-nv enabled auto-merge September 22, 2025 22:28
@gtong-nv gtong-nv added this pull request to the merge queue Sep 22, 2025
Merged via the queue into master with commit ba81323 Sep 22, 2025
62 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: non-breaking PRs without breaking changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants