Changing the weight decay on bias using named_parameters

iAvicenna · June 3, 2018, 9:09pm

I want to set the weight decay of the bias terms to zero and I am trying to do this using the named_parameter. The way I do it is through a function of the form

def setParams(network,state):
  
    params_dict = dict(network['model'].named_parameters())
    params=[]
  
    for key, value in params_dict.items():
        if key[-4:] == 'bias':
          
            params += [{'params':value,'weight_decay':0.0}]
  
    return params

and I call it inside the main as follows:

if __name__ == '__main__':

    #some other lines#
    #network list contains the optimizer,cost function and the model
    #state contains parameters required to define the network

    state['params']=setParams(network,state)

    network['optimizer'] =  torch.optim.SGD(state['params'],network['model'].parameters(), lr=state['learning rate'], momentum=state['momentum'],weight_decay=state['weight decay'],nesterov=True) #optimizer

But when I do this in pytorch 0.3 or 0.4 I get the error:

TypeError                                 Traceback (most recent call last)
<ipython-input-8-8ab30dfcadcd> in <module>()
    501     state['params']=setParams(network,state)
    502     network['cost criterion']=  torch.nn.CrossEntropyLoss() #cost function
--> 503     network['optimizer'] =  torch.optim.SGD(state['params'],network['model'].parameters(), lr=state['learning rate'], momentum=state['momentum'],weight_decay=state['weight decay'],nesterov=True) #optimizer
    504     #[{'params': resLayer4.bias, 'weight_decay': 0}, {'params': resLayer3.bias, 'weight_decay': 0}, {'params': resLayer2.bias, 'weight_decay': 0},   {'params': resLayer1.bias, 'weight_decay': 0}, {'params': batchNorm1.bias, 'weight_decay': 0},{'params': full1.bias, 'weight_decay': 0} ,{'params': batchNorm1.bias, 'weight_decay': 0} ]
    505 

TypeError: __init__() got multiple values for argument 'lr'

iAvicenna · June 3, 2018, 10:49pm

Okay I now see that the mistake was not removing “network[‘model’].parameters()” input in the optimizer construction. But what is actually contain in “network[‘model’].parameters()?” Does it not have the information about the weights and such. Should I also construct the state[‘params’] so that it contains all the information in network[‘model’].parameters()?

iAvicenna · June 3, 2018, 11:08pm

Okay so the new setParams functions looks like

def setParams(network,state):
  
  params_dict = dict(network['model'].named_parameters())
  params=[]
  weights=[]

  
  for key, value in params_dict.items():

      if key[-4:] == 'bias':
          
          params += [{'params':value,'weight_decay':0.0}]
          
      else: 
          
          params +=  [{'params': value,'weight_decay':state['weght decay']}]
  

  return params

And I guess that is all the required information that I need to feed in? Learning rate etc are already given to the optimizer constructer as a separate variable. Finally this seems to work but if anyone spots any mistakes in the way I change these parameters please let me know thanks

henrique · April 29, 2019, 4:07pm

this seems to work:

def add_weight_decay(net, l2_value, skip_list=()):
 decay, no_decay = [], []
 for name, param in net.named_parameters():
  if not param.requires_grad: continue # frozen weights		            
  if len(param.shape) == 1 or name.endswith(".bias") or name in skip_list: no_decay.append(param)
  else: decay.append(param)
 return [{'params': no_decay, 'weight_decay': 0.}, {'params': decay, 'weight_decay': l2_value}]

params = add_weight_decay(net, 2e-5)
sgd = torch.optim.SGD(params, lr=0.05)

from https://raberrytv.wordpress.com/2017/10/29/pytorch-weight-decay-made-easy/