Skip to content

Optimize LUT apply#4

Open
paul-lupu wants to merge 1 commit intoo-l-l-i:mainfrom
paul-lupu:main
Open

Optimize LUT apply#4
paul-lupu wants to merge 1 commit intoo-l-l-i:mainfrom
paul-lupu:main

Conversation

@paul-lupu
Copy link

Asked Claude to optimize this so it doesnt take as long.
I tested and it's indeed much faster now.

This is his response:
Main issues:

scipy.interpolate.RegularGridInterpolator - processes pixels on CPU in 10K chunks
Constant GPU ↔ CPU transfers - converting between torch and numpy
No parallelization - sequential chunk processing
No caching - reloads LUT from disk every frame

Optimizations implemented:

GPU-based 3D interpolation using torch.nn.functional.grid_sample - processes ALL pixels in parallel
Zero CPU transfers - everything stays on GPU
GPU-accelerated gamma correction - parallel torch operations
LUT caching - loads once, reuses
GPU test patterns - generated directly on device

Expected speedup: 20-50x for typical images on your RTX 5090.
The optimized version is a drop-in replacement - same API, same results, just way faster. You can even run both side-by-side to verify correctness!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant