-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Description
In WeightDecay regularization class, the code replaces the parameter's gradient with the gradient of the regularization:
param.grad = self.regularize(param)Should it instead add the regularization gradient to the existing parameter gradient? i.e.:
param.grad.add_(self.regularize(param))Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels