Regularization in optimizer

abandon_tf · December 25, 2017, 11:45am

What is the difference between weight decay in optimizer and loss? Usually we add the regularization in loss function, but pytorch adds it in optimizer. Just a little curious.

jdhao · December 25, 2017, 4:32pm

I think there is no difference. The regularization in optimizer is part of your cost function (cost = acutal loss + regularization loss)

abandon_tf · December 26, 2017, 1:45am

Thanks! I don’t have to implement my own loss function with regularization like following now!

allWeight = Variable(torch.zeros(1)).cuda()
for param in model.parameters():
	if param.dim() == 2:
		allWeight += torch.norm(param.data)