L2 regularization with only weight parameters

shirui-japina · September 26, 2019, 4:19am

I asked here but got no answer. Maybe it’s a topic which is many years ago, and my question is a little bit inconsistent with the original one. So I’m sorry but ask again here.

My question is,
it is said that when regularization L2, it should only for weight parameters , but not bias parameters .(if regularization L2 is for all parameters, it’s very easy for the model to become overfitting, is it right?)
But the L2 regularization included in most optimizers in PyTorch, is for all of the parameters in the model (weight and bias).
least_squares_l2
I mean the parameters in the red box should be weight parameters only. (If what I heard of is right.)

And the way to deal with it is code here:

weight_p, bias_p = [],[]
for name, p in model.named_parameters():
  if 'bias' in name:
    bias_p += [p]
  else:
    weight_p += [p]

optim.SGD(
  [
    {'params': weight_p, 'weight_decay':1e -5},
    {'params': bias_p, 'weight_decay':0}
  ],
  lr=1e-2, momentum=0.9
)

is it right?

ptrblck · September 30, 2019, 7:53am

The code looks fine.
I’m not sure, if an L2 regularization of the bias terms leads to overfitting.
Do you have any references on it, as it’s quite interesting?

shirui-japina · September 30, 2019, 1:22pm

Thanks for your checking code for me .

The contents in Updata 2019-9-30 here.
But that’s written in Chinese, you may need Google Translate.

fly2mars · November 29, 2019, 7:45am

Good post!
@ptrblck as the blog tells, an L2 regularization of the bias terms leads to underfitting, ragher than overfitting. You can have a look at the plot it shows.

zeakey · November 29, 2019, 9:09am

In the article @shirui-japina gave, the author refers the Deep learning book by Goodfellow et.al.

And it says: “regularizing the bias leads to under-fitting”, not overfitting.