Skip to content

Conversation

olia110
Copy link
Contributor

@olia110 olia110 commented Aug 11, 2025

This Pull request:

This PR improves the performance of the ROperator_Tile by replacing its code generation logic.
The previous implementation used an iterative method with multiple loops and std::copy operations.

The new implementation uses a faster direct-mapping algorithm. It pre-calculates memory strides and then uses a single loop to compute the source index for each destination element.

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

@olia110 olia110 requested a review from lmoneta as a code owner August 11, 2025 11:19
Copy link
Contributor

@sanjibansg sanjibansg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good approach to optimizing the Tile operator, thanks, some comments:


const int rank = fShapeInput.size();

out << SP << "const int input_shape[" << rank << "] = " << ConvertShapeToString(fShapeInput) << ";\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to use size_t here instead of just int.

cc: @lmoneta


// For each output element, calculating the corresponding input element's index.
out << SP << SP << "for (int i = 0; i < " << rank << "; ++i) {\n";
out << SP << SP << SP << "const int out_coord = current_idx / output_strides[i];\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we avoid these repetitive division steps since they are more expensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants