Regularization in optimizer

What is the difference between weight decay in optimizer and loss? Usually we add the regularization in loss function, but pytorch adds it in optimizer. Just a little curious.

I think there is no difference. The regularization in optimizer is part of your cost function (cost = acutal loss + regularization loss)

Thanks! I don’t have to implement my own loss function with regularization like following now!

allWeight = Variable(torch.zeros(1)).cuda()
for param in model.parameters():
	if param.dim() == 2:
		allWeight += torch.norm(