Pytorch's weight_decay and batch normalization layer

samin_hamidi · February 16, 2022, 4:48pm

I was wondering if the parameters of batch_norm layers are considered when computing the L2_norm of weight decay in Pytorch’s implementation?

ptrblck · February 17, 2022, 2:39am

The weight_decay argument will be applied to the current parameter group. I.e. if you are passing the batchnorm parameters to this group (or re just using a single group and are passing all parameters) weight decay will also be applied on them.