When I train an network, I see that the BatchNorm2d weights are becoming negative. Is this possible? Any way to prevent this?
In the default setup batchnorm layers use affine parameters (weight
and bias
) as well as the running estimates. Neither of those tensors is bounded in any way.
If you want to enforce a certain range you might e.g. clip the weight after an update.