How did the authors perform hyperparameter selection for the pretrained mdoels? #164
Replies: 1 comment 1 reply
-
|
No early stopping, checkpoints mainly for resuming if the job fails. Also no hyperparam tuning. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I found that the pretraining phase does not involve any validation data, and instead, saves checkpoints per epoch. I wonder how the authors decided early stop and hyperparameter tuning. Did the authors run all checkpoints on the evaluation benchmarks?
Beta Was this translation helpful? Give feedback.
All reactions