# How to initialize weights/bias of RNN LSTM GRU?

I am new to Pytorch and RNN, and don not know how to initialize the trainable parameters of nn.RNN, nn.LSTM, nn.GRU. I would appreciate it if some one could show some example or advice!!!

Thanks

3 Likes
``````net = nn.LSTM(10, 20, 1)
net.weight_hh_l0.data.fill_(0)
``````

make a 1 layer lstm, input_dim = 10, hidden_state = 20, this can make weight in first layer is 0

Thanks @SherlockLiao
There are four weights/bias for a LSTM layer, so all need to be initialized in this way? Is there a common initialization distribution for LSTM? Like Gaussian or Uniform distribution.

weight_ih_l[k] – the learnable input-hidden weights of the k-th layer (W_ii|W_if|W_ig|W_io), of shape (input_size x 4hidden_size)
weight_hh_l[k] – the learnable hidden-hidden weights of the k-th layer (W_hi|W_hf|W_hg|W_ho), of shape (hidden_size x 4
hidden_size)
bias_ih_l[k] – the learnable input-hidden bias of the k-th layer (b_ii|b_if|b_ig|b_io), of shape (4hidden_size)
bias_hh_l[k] – the learnable hidden-hidden bias of the k-th layer (b_hi|b_hf|b_hg|b_ho), of shape (4
hidden_size)

1 Like

I only know how to do it one by one, maybe this way, u can write a for loop

``````net.weight.data.normal_(0, math.sqrt(2. / n))
``````

``````dict = {}       #we can store the weights in this dict for convenience
import torch.nn.init as weight_init
for name, param in net.named_parameters():
weight_init.normal(param);
dict[name] = param
``````
4 Likes

How about these solutions below. Are they viable?

``````init.xavier_normal(GRU)
# or
for p in layer.params():
init.xavier_normal(p)``````

Hi, I found a way as follows (but I’m not sure is it correct or not):

``````a = nn.GRU(500, 50, num_layers=2)

from torch.nn import init
for layer_p in a._all_weights:
for p in layer_p:
if 'weight' in p:
# print(p, a.__getattr__(p))
init.normal(a.__getattr__(p), 0.0, 0.02)
# print(p, a.__getattr__(p))
``````

This snippet of the code could initialize the weights of all layers.
Hope this could help you 4 Likes

To initialize the weights for nn.RNN, you can do the following :
In this example, I initialize the weights randomly.

``````rnn = nn.RNN(input_size=5,hidden_size=6,
num_layers=2,batch_first=True)
num_layers = 2
for i in range(num_layers):
rnn.all_weights[i] = torch.randn(size=(5,6)) # weights connecting input-hidden
rnn.all_weights[i] = torch.randn(size=(6,6)) #weights connecting hidden-hidden
``````