Pytorch's weight_decay and batch normalization layer

I was wondering if the parameters of batch_norm layers are considered when computing the L2_norm of weight decay in Pytorch’s implementation?

The weight_decay argument will be applied to the current parameter group. I.e. if you are passing the batchnorm parameters to this group (or re just using a single group and are passing all parameters) weight decay will also be applied on them.