How can one force a simple vanilla training process? #21278
Closed
Unanswered
hmf
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment
-
|
Problem was in the model (incorrect scheduler). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am currently (3rd attempt) trying to replicate an experiment that is coded in Pytorch (training, validation and test loops). The original code seems to be able to overfit - the training and validation loss keep decreasing, albeit slowly. It goes on until a max 200 epochs. Basically a very simple AE is trained to reconstruct cropped samples.
In this new attempt I have altered the code as little as possible. All I have done is replace the training with a
Trainer.fit. I then use the original test script to compare the results. The following metrics are used: SSIM and MSE. With theTrainerversion, these values are large. The training loss initially drops quickly and is approximately the same as the original code. But towards the end (until max 200 epoch) it oscillates a little without ever decreasing. If I activate early stopping, training is stopped anywhere from 48 to 75 epochs.My latest changes use the following Trainer related code:
My question is, what other parameters can I try in order to use the a simple, vanilla training loop? Any diagnosis I can do? Anything else I an try out? For reference, below I show the original code.
TIA,
HF
The original train loop is:
The original train loop is as follows:
Beta Was this translation helpful? Give feedback.
All reactions