You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below is defined to only take gradient of the β and γ in batch norm layers.
trainable(bn::BatchNorm) = (bn.β, bn.γ)
However, this stops us from using params and loadparams! to save and load parameters as the other two fields, μ and σ², which are updated during training as well, to be saved and loaded.
Maybe it's just fine to not define trainable(bn::BatchNorm) = (bn.β, bn.γ) as μ and σ² doesn't seems to have gradient?