You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many different forms of model training which exist. One popular form of training is knowledge distillation, where a student model learns the output distributions from a teacher model. This commit introduces support for knowledge distillation in the training library.
This commit also exposes the `weight_decay` hyperparameter which is often used to help deep learning models generalize.
Lastly, this commit changes the useage from `torch.distributed` to just `dist`, as it is a common module used throughout the codebase.
Signed-off-by: Oleg S <[email protected]>
0 commit comments