⚡️ Speed up function construct_simd_step_input by 37% in PR #1504 (feature/try-to-beat-the-limitation-of-ee-in-terms-of-singular-elements-pushed-into-batch-inputs)
#1509
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1504
If you approve this dependent PR, these changes will be merged into the original PR branch
feature/try-to-beat-the-limitation-of-ee-in-terms-of-singular-elements-pushed-into-batch-inputs.📄 37% (0.37x) speedup for
construct_simd_step_inputininference/core/workflows/execution_engine/v1/executor/execution_data_manager/step_input_assembler.py⏱️ Runtime :
1.99 milliseconds→1.46 milliseconds(best of40runs)📝 Explanation and details
The optimized code achieves a 36% speedup through a single but impactful conditional check optimization in the
prepare_parametersfunction.Key Optimization:
The main performance improvement comes from adding an
if empty_indices:check before executing expensive list comprehension and data removal operations:Why this optimization works:
empty_indicesis an empty set, making the filtering operations unnecessary[e for e in indices if e not in empty_indices]has O(n*m) complexity where n=len(indices) and m=len(empty_indices)remove_indices()recursively processes nested data structures, which is expensive even for empty removal setsempty_indicesis empty, we eliminate significant computational overheadPerformance impact by test case type:
This optimization is particularly effective because most workflow executions don't have empty batch elements that need filtering, making the conditional check a highly beneficial guard against unnecessary work.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr1504-2025-08-25T10.24.18and push.