Hi:
I noticed that named_parameters return bn parameters too. What’s the effect if I pass bn parameters to optim?
The named_parameters
of BatchNorm
are the weight
and bias
, which relate to the gamma and beta from the BatchNorm paper.
These are the learnable parameters of the layer, which might eliminate the normalization performed by the running stats.
That’s expected behavior.