Skip to content

Conversation

aditya0by0
Copy link
Member

@aditya0by0 aditya0by0 requested a review from sfluegel05 August 9, 2025 15:46
@aditya0by0 aditya0by0 self-assigned this Aug 9, 2025
@aditya0by0 aditya0by0 added priority: low Issue with low priority bug:fix enhancement New feature or request and removed bug:fix labels Aug 9, 2025
@aditya0by0 aditya0by0 marked this pull request as ready for review August 9, 2025 17:05
Comment on lines -249 to +253
for index, row in data_frame.iterrows():
for row in data_frame.itertuples(index=False):
train_data.append(
[
data_frame.iloc[index].values[1],
data_frame.iloc[index].values[2:502].tolist(),
row.SMILES,
row.LABELS,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this lines of code, whether this was the actual change intended

@aditya0by0 aditya0by0 changed the title Avoid using iterrows Avoid using iterrows + vectorization for performance Aug 9, 2025
@aditya0by0 aditya0by0 changed the title Avoid using iterrows + vectorization for performance Avoid using iterrows, use vectorization wherever possible Aug 9, 2025
@aditya0by0 aditya0by0 marked this pull request as draft August 10, 2025 10:28
Copy link
Collaborator

@sfluegel05 sfluegel05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for implementing this. I created a new dataset with these changes and it worked (although there was no major performance boost, because the most time-intensive part is the split generation).

@aditya0by0
Copy link
Member Author

ok, I have few minor changes which I will commit later. Will mark the PR ready for review once done.

@aditya0by0 aditya0by0 marked this pull request as ready for review October 15, 2025 17:43
@aditya0by0 aditya0by0 requested a review from sfluegel05 October 15, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request priority: low Issue with low priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace df.iterrows() with df.itertuples() for Significant Performance Gains

2 participants