Avoid using iterrows, use vectorization wherever possible #120

aditya0by0 · 2025-08-09T15:46:41Z

Closes Replace df.iterrows() with df.itertuples() for Significant Performance Gains #86

aditya0by0 · 2025-08-09T17:08:24Z

chebai/train.py

-    for index, row in data_frame.iterrows():
+    for row in data_frame.itertuples(index=False):
        train_data.append(
            [
-                data_frame.iloc[index].values[1],
-                data_frame.iloc[index].values[2:502].tolist(),
+                row.SMILES,
+                row.LABELS,


Not sure about this lines of code, whether this was the actual change intended

sfluegel05

thanks for implementing this. I created a new dataset with these changes and it worked (although there was no major performance boost, because the most time-intensive part is the split generation).

aditya0by0 · 2025-08-11T13:43:12Z

ok, I have few minor changes which I will commit later. Will mark the PR ready for review once done.

itertuples instead of iterrows

567c68d

aditya0by0 requested a review from sfluegel05 August 9, 2025 15:46

aditya0by0 self-assigned this Aug 9, 2025

aditya0by0 added priority: low Issue with low priority bug:fix enhancement New feature or request and removed bug:fix labels Aug 9, 2025

aditya0by0 added 2 commits August 9, 2025 18:36

optimize _setup_pruned_test_set logic

30ca5f6

avoid repeated slicing in loop

3785bb5

aditya0by0 marked this pull request as ready for review August 9, 2025 17:05

aditya0by0 commented Aug 9, 2025

View reviewed changes

aditya0by0 changed the title ~~Avoid using iterrows~~ Avoid using iterrows + vectorization for performance Aug 9, 2025

aditya0by0 changed the title ~~Avoid using iterrows + vectorization for performance~~ Avoid using iterrows, use vectorization wherever possible Aug 9, 2025

aditya0by0 marked this pull request as draft August 10, 2025 10:28

sfluegel05 approved these changes Aug 11, 2025

View reviewed changes

Merge branch 'dev' into fix/avoid_iterrows

13afc28

aditya0by0 force-pushed the fix/avoid_iterrows branch from 5ca7b14 to 13afc28 Compare October 6, 2025 10:24

aditya0by0 marked this pull request as ready for review October 15, 2025 17:43

aditya0by0 requested a review from sfluegel05 October 15, 2025 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid using iterrows, use vectorization wherever possible #120

Avoid using iterrows, use vectorization wherever possible #120

aditya0by0 commented Aug 9, 2025

Uh oh!

aditya0by0 Aug 9, 2025

Uh oh!

sfluegel05 left a comment

Uh oh!

aditya0by0 commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid using iterrows, use vectorization wherever possible #120

Are you sure you want to change the base?

Avoid using iterrows, use vectorization wherever possible #120

Conversation

aditya0by0 commented Aug 9, 2025

Uh oh!

aditya0by0 Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

sfluegel05 left a comment

Choose a reason for hiding this comment

Uh oh!

aditya0by0 commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants