Weight decay in SGD and batch normalization layers

shaden · September 14, 2021, 11:07pm

Does the weight decay in optim.SGD includes applying penalty on the batch normalization parameters?