L2 regularization with only weight parameters

I asked here but got no answer. Maybe it’s a topic which is many years ago, and my question is a little bit inconsistent with the original one. So I’m sorry but ask again here.

My question is,
it is said that when regularization L2, it should only for weight parameters , but not bias parameters .(if regularization L2 is for all parameters, it’s very easy for the model to become overfitting, is it right?:flushed:)
But the L2 regularization included in most optimizers in PyTorch, is for all of the parameters in the model (weight and bias).
least_squares_l2
I mean the parameters in the red box should be weight parameters only. (If what I heard of is right.)

And the way to deal with it is code here:

weight_p, bias_p = [],[]
for name, p in model.named_parameters():
  if 'bias' in name:
    bias_p += [p]
  else:
    weight_p += [p]

optim.SGD(
  [
    {'params': weight_p, 'weight_decay':1e -5},
    {'params': bias_p, 'weight_decay':0}
  ],
  lr=1e-2, momentum=0.9
)

is it right?:flushed::flushed:

The code looks fine.
I’m not sure, if an L2 regularization of the bias terms leads to overfitting.
Do you have any references on it, as it’s quite interesting?

1 Like

Thanks for your checking code for me :smiley:.

The contents in Updata 2019-9-30 here.
But that’s written in Chinese, you may need Google Translate.

Good post!
@ptrblck as the blog tells, an L2 regularization of the bias terms leads to underfitting, ragher than overfitting. You can have a look at the plot it shows.

1 Like

In the article @shirui-japina gave, the author refers the Deep learning book by Goodfellow et.al.

And it says: “regularizing the bias leads to under-fitting”, not overfitting.

2 Likes