How does SGD weight_decay work?

rasbt · December 26, 2018, 7:21pm

The part that I circled doesn’t seem right to me:

In L2 regularization, you modify the cost as follows

The weight update should be then

The way PyTorch applied the weight decay seems correct to me (you can drop the factor 2)