Skip to content

Conversation

@satvshr
Copy link
Collaborator

@satvshr satvshr commented Aug 25, 2025

closes #141

This PR:

  • Fixes a small bug in AptaNetPipeline
  • Makes AptaNetPipeline inherit from BaseObject to prevent errors during benchmarking
  • Removes an unnecessary test (test_pfoa), the loader is already being tested in test_loaders
  • The benchmarking framework

@satvshr satvshr requested a review from fkiraly September 29, 2025 20:59
@satvshr satvshr changed the title [ENH] Benchmarking framework [ENH] Benchmarking framework and csv loader Sep 30, 2025
from pyaptamer.datasets._loaders import load_pfoa_structure


def test_pfoa_loader():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deleting this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the description of the PR

@fkiraly
Copy link
Contributor

fkiraly commented Sep 30, 2025

could you kindly make sure you write a good PR description in the first post?

@satvshr satvshr requested a review from fkiraly September 30, 2025 10:13
@satvshr
Copy link
Collaborator Author

satvshr commented Sep 30, 2025

could you kindly make sure you write a good PR description in the first post?

Done.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are making changes to AptaNetPipeline - is this required for benchmarking? Would this not interact with other PRs, e.g., #153?

I would recommend to make this a separate PR.

@NennoMP
Copy link
Collaborator

NennoMP commented Oct 2, 2025

While investigating #144, I noticed that the benchmark class performs cross-validation on the given dataset (training data) to generate validation splits on which the model is then evaluated. However, the purpose of cross-validation is to maximize the use of training data during model selection to select the best hyperparameters. Here there is no model selection though.

Shouldn't the purpose of the benchmarking class be to train the model on training data using the best hyperparameters previously identified, and then perform a final evaluation on held-out test data (i.e., the test_li2014.csv file from AptaTrans)?

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 6, 2025

is this required for benchmarking?

If you read the PR description, the BaseObject part is required for benchmarking, the bug on the other hand should be patched and is small so I thought id put it here too.

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 6, 2025

Shouldn't the purpose of the benchmarking class be to train the model on training data using the best hyperparameters previously identified, and then perform a final evaluation on held-out test data (i.e., the test_li2014.csv file from AptaTrans)?

We can use StratifiedSplit if we want a train-test split for evaluation, and "generic" cv to ensure a certain split is not the cause for one model performing better than the other one, and we can get a more accurate result by evaluating it over n iterations.

@satvshr satvshr requested a review from fkiraly October 8, 2025 14:01
@fkiraly
Copy link
Contributor

fkiraly commented Oct 8, 2025

is this required for benchmarking?

If you read the PR description, the BaseObject part is required for benchmarking, the bug on the other hand should be patched and is small so I thought id put it here too.

What exactly is the bug?

@fkiraly
Copy link
Contributor

fkiraly commented Oct 8, 2025

While investigating #144, I noticed that the benchmark class performs cross-validation on the given dataset (training data) to generate validation splits on which the model is then evaluated. However, the purpose of cross-validation is to maximize the use of training data during model selection to select the best hyperparameters. Here there is no model selection though.

Shouldn't the purpose of the benchmarking class be to train the model on training data using the best hyperparameters previously identified, and then perform a final evaluation on held-out test data (i.e., the test_li2014.csv file from AptaTrans)?

@NennoMP, re-sampling or CV can be used both in benchmarking and in tuning - it is even possible to combine both, to benchmark a model including the tuning algorithm. For benchmarking, one has to be careful about error bars etc, the CV-based error bars are not reliable (due to sample correlation from the cv splits).

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 9, 2025

What exactly is the bug?

FunctionTransformer takes a dictionary as input to kw_args, the test was only passing as I was not giving an input to kw_args. Once I did that, tests started failing.

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflicts with main

@satvshr satvshr changed the title [ENH] Benchmarking framework and csv loader [ENH] Benchmarking framework Oct 14, 2025
@satvshr satvshr requested a review from fkiraly October 14, 2025 08:12
Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requests have not been addressed. Repeating them here

  • May I request to add a short notebook with a small benchmarking experiment and some reasonably chosen dataset? - first change request #114 (review)
    • as said there, this is necessary for me to understand your design
  • I asked for the bugfix to be moved to a separate PR, and for a description of the bug you claim to fix. #114 (comment)

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 15, 2025

  • as said there, this is necessary for me to understand your design

#165 is waiting for the AptaTrans notebook before adding AptaTrans to it.

I asked for the bugfix to be moved to a separate PR, and for a description of the bug you claim to fix. #114 (comment)

FunctionTransformer takes a dictionary as input to kw_args, the test was only passing as I was not giving an input to kw_args. Once I did that, tests started failing.

You asked me what the bug was, not to move it to another PR. The description of the bug is as mentioned above and is a very small fix, I understand it is better to move it to another PR but its only 2 lines and I thought its easier to get it done in this one than make an issue and a PR for a 2 line fix that is not that important rn.

@fkiraly
Copy link
Contributor

fkiraly commented Oct 16, 2025

you still have not explained what you think the bug is/was.

@satvshr
Copy link
Collaborator Author

satvshr commented Oct 17, 2025

you still have not explained what you think the bug is/was.

My bad, I realised I was not clear enough. FunctionTransformer takes a dictionary as input to kw_args, currently it'sonly taking a constant k as input.
The tests pass currently, as in the tests for AptaNet, k value is not is being passed to the AptaNetPipeline, which I also fix as part of this PR.

@satvshr satvshr requested a review from fkiraly October 17, 2025 07:39
@fkiraly fkiraly merged commit 68777eb into main Oct 26, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Benchmarking framework

4 participants