Initializing parameters of a multi-layer LSTM

dpernes · August 4, 2017, 3:06pm

I have a nn.Module that contains an LSTM whose number of layers is passed in the initialization. I would like to do Xavier initialization of its weights and setting the bias of the forget gate to 1, to promote learning of long-term dependencies. My problem is how to iterate over all the parameters in order to initialize them.

Doing something like

for name, param in lstm.named_parameters():
  if 'bias' in name:
     nn.init.constant(param, 0.0)
  elif 'weight' in name:
     nn.init.xavier_normal(param)

does not work, because param is a copy of the parameters in lstm and not a reference to them. This kind of loop can be used, for instance, to print the values of the parameters but not to modify them (as far as I know). Thank you in advance.

tom · August 4, 2017, 6:10pm

Did you try? Your snippet works perfectly well for me.

I was surprised at the other thread. The reason you cannot assign the loop variable is because then the loop variable name will just point to something else. You most certainly can modify the element you are looping over and it is exactly that what is sometimes thought of as surprising when you loop over a list of lists and append to the inner variables or somesuch.

In fact, looping over model.parameters() way above is also how the optimizers get the parameters they are optimizing…

Best regards

Thomas

dpernes · August 4, 2017, 8:24pm

I have just tried it now in a toy example and in fact it works! No idea why it was not working before… Some stupid mistake, for sure. Thank you again!

yzexeter · September 24, 2018, 2:53pm

It works in my experiments. Thanks!

fallcat · April 26, 2019, 5:45pm

Hi! I have a question. Don’t you need to do ‘xavier_uniform_’ instead of ‘xavier_uniform’? Aren’t functions with underscore the ones that modify the weights in-place?

krishna · August 6, 2019, 12:01pm

Thank you @dpernes , your code works. Yes @fallcat we need to use xavier_uniform_ for in-place initialization of weights.

ishan_pytorch · February 4, 2020, 3:20am

Seems like the functions with the underscore are deprecated (https://pytorch.org/docs/stable/_modules/torch/nn/init.html). Hence dpernes’ solution worked for everyone.

sajastu · April 12, 2020, 11:28pm

Works for me as well!

cc4df22877b7876e50f0 · March 16, 2022, 1:06pm

I want to load a rnn pretrained model from a dict(named model_pretrained), param changed. But when I print the model parametes, I found rnn parameters not change.

for name, param in model.rnn.named_parameters():
            print("0", param)
            param = torch.nn.Parameter(model_pretrained["rnn."+name])
            print("1", param)

cc4df22877b7876e50f0 · March 16, 2022, 2:58pm

print(id(param)) get different identity

for name, param in at.rnn.named_parameters():
    print(id(param))
    param = torch.nn.Parameter(model_pretrained_itbl["rnn."+name].to("cuda"), requires_grad = False)
    print(id(param))

140360628406128
140360701517152