Does the loss given by criterion(output, target) includes the regularization loss? If so, is it possible to extract them. If not, how do I inspect the regularization loss?
Depends which criterion and which regularization you are using.
In pytorch, weight decay is usually (for sgd / adam) included in the optimizer directly and not the loss. This is because this term does not need to be computed explicitly.
Yup for such cases like sgd/adam, how can i get the regularization term? Is there a way to do it?
You cannot get it as it is never computed.
Indeed, the gradient corresponding to it will always be proportionnal to the weights themselves before the update. So the step is implemented directly. For example, for the sgd optimizer, you can see at this line that it only adds the weights themselves to the gradients.