Optimized ONNX Transform via Class Merging and Thread Pooling #546

abhishek-singh591 · 2025-08-23T09:53:23Z

Optimized ONNX Transform via Class Merging and Thread Pooling

This PR follows up on #539 – Optimized ONNX transform class via multithreading.

It merges the FP16 and Split ONNX transform classes into a single implementation to eliminate redundant tensor loading and iteration. Additionally, the transform logic has been refactored to use a thread pool, replacing the previous sequential loop to parallelize tensor operations.

Performance Benchmarks:-

Model	Original Duration (s)	Optimized Duration (s)
LLaMA 3.1 8B	88.35	58.55
LLaMA 3.1 70B	1029.82	727.37

Note: Thread count is set to os.cpu_count() * 4 to better handle I/O-bound workloads. Performance may vary depending on system hardware and threading capabilities.

Signed-off-by: abhishek-singh591 <[email protected]>

merged fp16 and split in onnx transform

4d8d878

Signed-off-by: abhishek-singh591 <[email protected]>

abhishek-singh591 requested review from quic-rishinr, ochougul, quic-hemagnih and quic-amitraj as code owners August 23, 2025 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Optimized ONNX Transform via Class Merging and Thread Pooling #546

abhishek-singh591 commented Aug 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Are you sure you want to change the base?

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Conversation

abhishek-singh591 commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!