How to initialize weights/bias of RNN LSTM GRU?

I am new to Pytorch and RNN, and don not know how to initialize the trainable parameters of nn.RNN, nn.LSTM, nn.GRU. I would appreciate it if some one could show some example or advice!!!

Thanks

4 Likes
net = nn.LSTM(10, 20, 1)
net.weight_hh_l0.data.fill_(0)

make a 1 layer lstm, input_dim = 10, hidden_state = 20, this can make weight in first layer is 0

Thanks @SherlockLiao
There are four weights/bias for a LSTM layer, so all need to be initialized in this way? Is there a common initialization distribution for LSTM? Like Gaussian or Uniform distribution.

weight_ih_l[k] ā€“ the learnable input-hidden weights of the k-th layer (W_ii|W_if|W_ig|W_io), of shape (input_size x 4hidden_size)
weight_hh_l[k] ā€“ the learnable hidden-hidden weights of the k-th layer (W_hi|W_hf|W_hg|W_ho), of shape (hidden_size x 4
hidden_size)
bias_ih_l[k] ā€“ the learnable input-hidden bias of the k-th layer (b_ii|b_if|b_ig|b_io), of shape (4hidden_size)
bias_hh_l[k] ā€“ the learnable hidden-hidden bias of the k-th layer (b_hi|b_hf|b_hg|b_ho), of shape (4
hidden_size)

1 Like

I only know how to do it one by one, maybe this way, u can write a for loop

net.weight.data.normal_(0, math.sqrt(2. / n))

How about this?

dict = {}       #we can store the weights in this dict for convenience
import torch.nn.init as weight_init
for name, param in net.named_parameters(): 
      weight_init.normal(param); 
      dict[name] = param
4 Likes

How about these solutions below. Are they viable?

init.xavier_normal(GRU)
# or
for p in layer.params():
    init.xavier_normal(p)

Hi, I found a way as follows (but Iā€™m not sure is it correct or not):

a = nn.GRU(500, 50, num_layers=2)

from torch.nn import init
for layer_p in a._all_weights:
    for p in layer_p:
        if 'weight' in p:
            # print(p, a.__getattr__(p))
            init.normal(a.__getattr__(p), 0.0, 0.02)
            # print(p, a.__getattr__(p))

This snippet of the code could initialize the weights of all layers.
Hope this could help you :slight_smile:

3 Likes

To initialize the weights for nn.RNN, you can do the following :
In this example, I initialize the weights randomly.

rnn = nn.RNN(input_size=5,hidden_size=6, 
                      num_layers=2,batch_first=True)  
num_layers = 2
for i in range(num_layers):
     rnn.all_weights[i][0] = torch.randn(size=(5,6)) # weights connecting input-hidden
     rnn.all_weights[i][1] = torch.randn(size=(6,6)) #weights connecting hidden-hidden

I think this will work for initializing weights and biases.

for layer in range(num_layers):
    for weight in rnn._all_weights[layer]:
        if "weight" in weight:
            nn.init.xavier_uniform_(getattr(rnn,weight))
        if "bias" in weight:
            nn.init.uniform_(getattr(rnn,weight))

If you want to use a different initialization, just change it. The key is that getattr(rnn, weight) is referencing each of the attributes in question.