-
Notifications
You must be signed in to change notification settings - Fork 7
[ENH] Benchmarking framework #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…on tests and bug fixing
| from pyaptamer.datasets._loaders import load_pfoa_structure | ||
|
|
||
|
|
||
| def test_pfoa_loader(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we deleting this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in the description of the PR
|
could you kindly make sure you write a good PR description in the first post? |
Done. |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are making changes to AptaNetPipeline - is this required for benchmarking? Would this not interact with other PRs, e.g., #153?
I would recommend to make this a separate PR.
|
While investigating #144, I noticed that the benchmark class performs cross-validation on the given dataset (training data) to generate validation splits on which the model is then evaluated. However, the purpose of cross-validation is to maximize the use of training data during model selection to select the best hyperparameters. Here there is no model selection though. Shouldn't the purpose of the benchmarking class be to train the model on training data using the best hyperparameters previously identified, and then perform a final evaluation on held-out test data (i.e., the |
If you read the PR description, the |
We can use |
What exactly is the bug? |
@NennoMP, re-sampling or CV can be used both in benchmarking and in tuning - it is even possible to combine both, to benchmark a model including the tuning algorithm. For benchmarking, one has to be careful about error bars etc, the CV-based error bars are not reliable (due to sample correlation from the cv splits). |
|
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge conflicts with main
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests have not been addressed. Repeating them here
- May I request to add a short notebook with a small benchmarking experiment and some reasonably chosen dataset? - first change request #114 (review)
- as said there, this is necessary for me to understand your design
- I asked for the bugfix to be moved to a separate PR, and for a description of the bug you claim to fix. #114 (comment)
#165 is waiting for the AptaTrans notebook before adding AptaTrans to it.
You asked me what the bug was, not to move it to another PR. The description of the bug is as mentioned above and is a very small fix, I understand it is better to move it to another PR but its only 2 lines and I thought its easier to get it done in this one than make an issue and a PR for a 2 line fix that is not that important rn. |
|
you still have not explained what you think the bug is/was. |
My bad, I realised I was not clear enough. |
closes #141
This PR:
AptaNetPipelineAptaNetPipelineinherit fromBaseObjectto prevent errors during benchmarkingtest_pfoa), the loader is already being tested intest_loaders