Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Asked Claude to optimize this so it doesnt take as long.
I tested and it's indeed much faster now.
This is his response:
Main issues:
scipy.interpolate.RegularGridInterpolator - processes pixels on CPU in 10K chunks
Constant GPU ↔ CPU transfers - converting between torch and numpy
No parallelization - sequential chunk processing
No caching - reloads LUT from disk every frame
Optimizations implemented:
GPU-based 3D interpolation using torch.nn.functional.grid_sample - processes ALL pixels in parallel
Zero CPU transfers - everything stays on GPU
GPU-accelerated gamma correction - parallel torch operations
LUT caching - loads once, reuses
GPU test patterns - generated directly on device
Expected speedup: 20-50x for typical images on your RTX 5090.
The optimized version is a drop-in replacement - same API, same results, just way faster. You can even run both side-by-side to verify correctness!