Initializing parameters of a multi-layer LSTM

I have a nn.Module that contains an LSTM whose number of layers is passed in the initialization. I would like to do Xavier initialization of its weights and setting the bias of the forget gate to 1, to promote learning of long-term dependencies. My problem is how to iterate over all the parameters in order to initialize them.

Doing something like

for name, param in lstm.named_parameters():
  if 'bias' in name:
     nn.init.constant(param, 0.0)
  elif 'weight' in name:
     nn.init.xavier_normal(param)

does not work, because param is a copy of the parameters in lstm and not a reference to them. This kind of loop can be used, for instance, to print the values of the parameters but not to modify them (as far as I know). Thank you in advance.

9 Likes

Did you try? Your snippet works perfectly well for me.

I was surprised at the other thread. The reason you cannot assign the loop variable is because then the loop variable name will just point to something else. You most certainly can modify the element you are looping over and it is exactly that what is sometimes thought of as surprising when you loop over a list of lists and append to the inner variables or somesuch.

In fact, looping over model.parameters() way above is also how the optimizers get the parameters they are optimizing…

Best regards

Thomas

3 Likes

I have just tried it now in a toy example and in fact it works! :smiley: No idea why it was not working before… Some stupid mistake, for sure. Thank you again!

It works in my experiments. Thanks!

Hi! I have a question. Don’t you need to do ‘xavier_uniform_’ instead of ‘xavier_uniform’? Aren’t functions with underscore the ones that modify the weights in-place?

Thank you @dpernes , your code works. Yes @fallcat we need to use xavier_uniform_ for in-place initialization of weights.

Seems like the functions with the underscore are deprecated (https://pytorch.org/docs/stable/_modules/torch/nn/init.html). Hence dpernes’ solution worked for everyone.

1 Like

Works for me as well!

I want to load a rnn pretrained model from a dict(named model_pretrained), param changed. But when I print the model parametes, I found rnn parameters not change.

for name, param in model.rnn.named_parameters():
            print("0", param)
            param = torch.nn.Parameter(model_pretrained["rnn."+name])
            print("1", param)

print(id(param)) get different identity

for name, param in at.rnn.named_parameters():
    print(id(param))
    param = torch.nn.Parameter(model_pretrained_itbl["rnn."+name].to("cuda"), requires_grad = False)
    print(id(param))

140360628406128
140360701517152