What is the difference between weight decay in optimizer and loss? Usually we add the regularization in loss function, but pytorch adds it in optimizer. Just a little curious.
I think there is no difference. The regularization in optimizer is part of your cost function (cost = acutal loss + regularization loss)
Thanks! I don’t have to implement my own loss function with regularization like following now!
allWeight = Variable(torch.zeros(1)).cuda() for param in model.parameters(): if param.dim() == 2: allWeight += torch.norm(param.data)