Optimizer = torch.optim.SGD()

I use this line “optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)” to do L2 regularization to prevent overfitting. Generally, regularization only penalizes the weight W parameter of the model, and the bias parameter b does not penalize, but there is a network saying that the weight decay specified by the optimizer weight_decay parameter of torch.optim is for all parameters in the network , Including the weight w and bias b for simultaneous punishment. Is that right?

Reference URL: https://blog.csdn.net/guyuealian/article/details/88426648


Yes this is right, the weight decay parameter here will be applied to all the parameters.

Hello 乃仁 梁!

If you wish to turn off weight decay for your network biases, you may
use “parameter groups” to use different optimizer hyperparameters to
optimize different sets of network parameters.

Here’s a brief example:

import torch
lin = torch.nn.Linear (3, 1)
opt = torch.optim.SGD ([
    {'params': lin.weight, 'lr': 0.1, 'weight_decay': 0.5},
    {'params': lin.bias,   'lr': 0.1, 'weight_decay': 0.0}

opt will use a learning rate of 0.1 for all of lin's parameters – both
weight and bias – but will only use a weight decay of 0.5 for weight
and no weight decay (weight_decay = 0.0) for bias.


K. Frank

I use this URL (https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-collect.md) to go to the training step. How can I change the weight decay of train.py (https://github.com/dusty-nv/pytorch-classification/blob/5107de352cade326e7aacd44bf8d625b7055fd4e/train.py) without affecting the bias value?