Optimize LUT apply by paul-lupu · Pull Request #4 · o-l-l-i/ComfyUI-OlmLUT

paul-lupu · 2026-02-05T00:45:43Z

Asked Claude to optimize this so it doesnt take as long.
I tested and it's indeed much faster now.

This is his response:
Main issues:

scipy.interpolate.RegularGridInterpolator - processes pixels on CPU in 10K chunks
Constant GPU ↔ CPU transfers - converting between torch and numpy
No parallelization - sequential chunk processing
No caching - reloads LUT from disk every frame

Optimizations implemented:

GPU-based 3D interpolation using torch.nn.functional.grid_sample - processes ALL pixels in parallel
Zero CPU transfers - everything stays on GPU
GPU-accelerated gamma correction - parallel torch operations
LUT caching - loads once, reuses
GPU test patterns - generated directly on device

Expected speedup: 20-50x for typical images on your RTX 5090.
The optimized version is a drop-in replacement - same API, same results, just way faster. You can even run both side-by-side to verify correctness!

Optimize LUT apply

1f38d85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize LUT apply#4

Optimize LUT apply#4
paul-lupu wants to merge 1 commit intoo-l-l-i:mainfrom
paul-lupu:main

paul-lupu commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paul-lupu commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant