How to inspect the regularization loss if I use weight decay for some of the layers?

yxchng · October 31, 2018, 7:09am

Does the loss given by criterion(output, target) includes the regularization loss? If so, is it possible to extract them. If not, how do I inspect the regularization loss?

albanD · October 31, 2018, 9:44am

Depends which criterion and which regularization you are using.
In pytorch, weight decay is usually (for sgd / adam) included in the optimizer directly and not the loss. This is because this term does not need to be computed explicitly.

yxchng · October 31, 2018, 2:29pm

Yup for such cases like sgd/adam, how can i get the regularization term? Is there a way to do it?

albanD · October 31, 2018, 2:32pm

You cannot get it as it is never computed.
Indeed, the gradient corresponding to it will always be proportionnal to the weights themselves before the update. So the step is implemented directly. For example, for the sgd optimizer, you can see at this line that it only adds the weights themselves to the gradients.