Add RHI Device Caching and Test Prefix Exclusion #8448

gtong-nv · 2025-09-16T02:47:15Z

Add RHI Device Caching and Test Prefix Exclusion

Summary

This PR introduces two key improvements to the Slang test infrastructure:

RHI Device Caching: Implements device caching to significantly speed up test execution by reusing graphics devices across tests, RHI Device Caching reduces slang-test execution time from ~15 minutes to ~5 minutes in Windows release builds
Test Prefix Exclusion: Adds -exclude-prefix option to skip tests matching specified path prefixes

Changes

RHI Device Caching

New DeviceCache class (slang-test-device-cache.h/cpp): Thread-safe device cache with LRU eviction (max 10 devices)
Cache control option: -cache-rhi-device flag in both slang-test and render-test
- Default: enabled in slang-test, disabled in render-test when run standalone
- Automatically skips caching for CUDA devices (due to driver issues)
Performance benefit: Eliminates expensive device creation/destruction cycles, especially beneficial for Vulkan on Tegra platforms

Test Prefix Exclusion

New -exclude-prefix <prefix> option in slang-test
Allows excluding entire test directories or patterns from execution
Complements existing -category and individual test filtering options

Usage Examples

# Enable device caching (default)
slang-test

# Disable device caching
slang-test -cache-rhi-device false

# Exclude tests from specific directories
slang-test -exclude-prefix tests/problematic/
slang-test -exclude-prefix tests/slow/ -exclude-prefix tests/experimental/

This change should significantly improve test execution performance, particularly in CI environments with frequent device operations. This is needed for running the GPU test in aarch64, where repeated device creation/destroy is causing driver issues.

Needed by: #8346

gtong-nv · 2025-09-19T23:32:25Z

/format

slangbot · 2025-09-19T23:33:16Z

🌈 Formatted, please merge the changes from this PR

Automated code formatting for #8448 Co-authored-by: slangbot <[email protected]>

jkwak-work

I have a few comments but it looks good enough as it is.

tools/render-test/render-test-main.cpp

jkwak-work · 2025-09-22T17:11:42Z

tools/render-test/slang-test-device-cache.cpp

+std::size_t DeviceCache::DeviceCacheKeyHash::operator()(const DeviceCacheKey& key) const
+{
+    std::size_t h1 = std::hash<int>{}(static_cast<int>(key.deviceType));
+    std::size_t h2 = std::hash<bool>{}(key.enableValidation);
+    std::size_t h3 = std::hash<bool>{}(key.enableRayTracingValidation);
+    std::size_t h4 = std::hash<std::string>{}(key.profileName);
+
+    std::size_t h5 = 0;
+    for (const auto& feature : key.requiredFeatures)
+    {
+        h5 ^= std::hash<std::string>{}(feature) + 0x9e3779b9 + (h5 << 6) + (h5 >> 2);
+    }
+
+    return h1 ^ (h2 << 1) ^ (h3 << 2) ^ (h4 << 3) ^ (h5 << 4);
+}


FYI, chatGPT suggested the following as a cleaner alternative,

return std::hash< std::tuple< int, bool, bool, std::string, std::vector<std::string> > >{}( std::tuple{ static_cast<int>(key.deviceType), key.enableValidation, key.enableRayTracingValidation, key.profileName, key.requiredFeatures } );

I haven't tried it.

jkwak-work · 2025-09-22T17:16:18Z

tools/render-test/slang-test-device-cache.cpp

+    // Skip caching for CUDA devices due to crashes
+    if (desc.deviceType == rhi::DeviceType::CUDA)


We are not caching for CUDA?
What is the problem with this?
If this is a temporary WAR, we may need a new github issue for this.

During the CUDA device destroying, it calls the debugCall func pointer.
Our debug callback lifetime is per-test, and doesn't out live the device.

I tried a few approaches and but seems there is no easy fix. That requires a careful design to be threadsafe. I will create another issue.

why can't we make debug callback also have the same lifetime as the device?

jkwak-work · 2025-09-22T17:20:54Z

tools/render-test/slang-test-device-cache.cpp

+uint64_t& DeviceCache::getNextCreationOrder()
+{
+    static uint64_t instance = 0;
+    return instance;
+}


It makes sense to have a function scoped static for "getMutex".
But when the type is just a primitive type, it seems unnecessary.
Feel free to ignore this comment but I would go with a simpler/traditional code like,

static uint64_t s_nextCreationOrder = 0;

jkwak-work · 2025-09-22T17:24:04Z

tools/render-test/slang-test-device-cache.h

+    };
+
+private:
+    static constexpr int MAX_CACHED_DEVICES = 10;


Feel free to ignore this comment, but we may want to set the value from the command-line argument for a debugging purpose.

Will do that in a follow up PR, as this will require we adding another option to both render-test-tool and cmdline arg to slang-test.

jkwak-work · 2025-09-22T17:30:03Z

tools/slang-test/options.cpp

        "  -verbose-paths                 Use verbose paths in output\n"
        "  -category <name>               Only run tests in specified category\n"
        "  -exclude <name>                Exclude tests in specified category\n"
+        "  -exclude-prefix <prefix>       Exclude tests with specified path prefix\n"


I think this could be a separate PR for adding "exclude-XXX"
The tricky part is "prefix" is often not enough to describe which one to exclude.
The workaround had been just delete the files locally if you want to exclude them.

There are some tests that we need to exclude for running GPU tests in aarch64. I filed an issue to track that #8468

I'd like to get this feature to enable the GPU tests for aarch64 ASAP.

jkwak-work · 2025-09-22T17:31:02Z

tools/slang-test/slang-test-main.cpp

+    for (auto& excludePrefix : context->options.excludePrefixes)
+    {
+        if (filePath.startsWith(excludePrefix))
+        {


I think we can print an information message when the verbose option is enabled.
Something like,

XXX file is excluded from the test because it is found from the exclusion list

This is a good idea. Updated.

jkwak-work · 2025-09-22T17:34:04Z

tools/render-test/slang-test-device-cache.cpp

+    std::sort(key.requiredFeatures.begin(), key.requiredFeatures.end());
+
+    // Evict oldest device if we've reached the limit
+    evictOldestDeviceIfNeeded();


I think we may want to print some message when the verbose mode is enabled for a debugging purpose.
It will be useful if we can track when certain devices were created and when certain devices were evicted.

That would be good, but I haven't seen verbose option in render-test-tool. That would requires us adding another option.
I will do that in a follow up PR

tools/render-test/slang-test-device-cache.cpp

jkwak-work · 2025-09-22T17:36:54Z

tools/render-test/slang-test-device-cache.cpp

+    for (int i = 0; i < desc.requiredFeatureCount; ++i)
+    {
+        key.requiredFeatures.push_back(desc.requiredFeatures[i]);
+    }
+    std::sort(key.requiredFeatures.begin(), key.requiredFeatures.end());


Wouldn't it make sense to use std::set if the list has to be always sorted?

gtong-nv · 2025-09-22T21:07:49Z

/format

slangbot · 2025-09-22T21:08:29Z

🌈 Formatted, please merge the changes from this PR

Automated code formatting for #8448 Co-authored-by: slangbot <[email protected]>

jkwak-work

Looks good to me

gtong-nv and others added 8 commits September 15, 2025 19:46

Cache and reuse VK Device in slang-test

2d0ac96

clean the cache before slang-test exit

0ab4eae

proper clean up the device caches before program exit

2a164dc

Add an option to slang-test to skip test with certain prefix

97464c0

Test not only VK Device

2560207

debug - sync to slang-rhi with a temp fix

ae6de8b

remove manual device release

7093023

Add option to control rhi device cache

0e48251

gtong-nv changed the title ~~WIP: Cache and reuse VK Device in slang-test~~ Cache and reuse VK Device in slang-test Sep 19, 2025

gtong-nv changed the title ~~Cache and reuse VK Device in slang-test~~ Add RHI Device Caching and Test Prefix Exclusion Sep 19, 2025

gtong-nv marked this pull request as ready for review September 19, 2025 23:24

gtong-nv requested a review from a team as a code owner September 19, 2025 23:24

Merge branch 'master' into cache_vk_devices

5ac16bd

slangbot mentioned this pull request Sep 19, 2025

Format code for PR #8448 #8498

Merged

Format code for PR #8448 (#8498)

2bf2d1e

Automated code formatting for #8448 Co-authored-by: slangbot <[email protected]>

gtong-nv added the pr: non-breaking PRs without breaking changes label Sep 20, 2025

jkwak-work previously approved these changes Sep 22, 2025

View reviewed changes

gtong-nv dismissed jkwak-work’s stale review via 4dcc652 September 22, 2025 19:22

address review comment

fe4f205

gtong-nv force-pushed the cache_vk_devices branch from 4dcc652 to fe4f205 Compare September 22, 2025 20:35

slangbot mentioned this pull request Sep 22, 2025

Format code for PR #8448 #8511

Merged

Format code for PR #8448 (#8511)

dfecbd2

Automated code formatting for #8448 Co-authored-by: slangbot <[email protected]>

jkwak-work approved these changes Sep 22, 2025

View reviewed changes

gtong-nv enabled auto-merge September 22, 2025 22:28

gtong-nv added this pull request to the merge queue Sep 22, 2025

Merged via the queue into master with commit ba81323 Sep 22, 2025
62 of 64 checks passed

		// Skip caching for CUDA devices due to crashes
		if (desc.deviceType == rhi::DeviceType::CUDA)

Add RHI Device Caching and Test Prefix Exclusion #8448

Add RHI Device Caching and Test Prefix Exclusion #8448

Uh oh!

Conversation

gtong-nv commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add RHI Device Caching and Test Prefix Exclusion

Summary

Changes

RHI Device Caching

Test Prefix Exclusion

Usage Examples

Uh oh!

gtong-nv commented Sep 19, 2025

Uh oh!

slangbot commented Sep 19, 2025

Uh oh!

jkwak-work left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gtong-nv Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gtong-nv commented Sep 22, 2025

Uh oh!

slangbot commented Sep 22, 2025

Uh oh!

jkwak-work left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gtong-nv commented Sep 16, 2025 •

edited

Loading

gtong-nv Sep 22, 2025 •

edited

Loading