I am new to Pytorch and RNN, and don not know how to initialize the trainable parameters of nn.RNN, nn.LSTM, nn.GRU. I would appreciate it if some one could show some example or advice!!!

Thanks

I am new to Pytorch and RNN, and don not know how to initialize the trainable parameters of nn.RNN, nn.LSTM, nn.GRU. I would appreciate it if some one could show some example or advice!!!

Thanks

4 Likes

```
net = nn.LSTM(10, 20, 1)
net.weight_hh_l0.data.fill_(0)
```

make a 1 layer lstm, input_dim = 10, hidden_state = 20, this can make weight in first layer is 0

Thanks @SherlockLiao

There are four weights/bias for a LSTM layer, so all need to be initialized in this way? Is there a common initialization distribution for LSTM? Like Gaussian or Uniform distribution.

weight_ih_l[k] ā the learnable input-hidden weights of the k-th layer (W_ii|W_if|W_ig|W_io), of shape (input_size x 4

hidden_size)hidden_size)

weight_hh_l[k] ā the learnable hidden-hidden weights of the k-th layer (W_hi|W_hf|W_hg|W_ho), of shape (hidden_size x 4

bias_ih_l[k] ā the learnable input-hidden bias of the k-th layer (b_ii|b_if|b_ig|b_io), of shape (4hidden_size)hidden_size)

bias_hh_l[k] ā the learnable hidden-hidden bias of the k-th layer (b_hi|b_hf|b_hg|b_ho), of shape (4

1 Like

I only know how to do it one by one, maybe this way, u can write a for loop

```
net.weight.data.normal_(0, math.sqrt(2. / n))
```

How about this?

```
dict = {} #we can store the weights in this dict for convenience
import torch.nn.init as weight_init
for name, param in net.named_parameters():
weight_init.normal(param);
dict[name] = param
```

4 Likes

How about these solutions below. Are they viable?

```
init.xavier_normal(GRU)
# or
for p in layer.params():
init.xavier_normal(p)
```

Hi, I found a way as follows (but Iām not sure is it correct or not):

```
a = nn.GRU(500, 50, num_layers=2)
from torch.nn import init
for layer_p in a._all_weights:
for p in layer_p:
if 'weight' in p:
# print(p, a.__getattr__(p))
init.normal(a.__getattr__(p), 0.0, 0.02)
# print(p, a.__getattr__(p))
```

This snippet of the code could initialize the weights of all layers.

Hope this could help you

4 Likes

To initialize the weights for nn.RNN, you can do the following :

In this example, I initialize the weights randomly.

```
rnn = nn.RNN(input_size=5,hidden_size=6,
num_layers=2,batch_first=True)
num_layers = 2
for i in range(num_layers):
rnn.all_weights[i][0] = torch.randn(size=(5,6)) # weights connecting input-hidden
rnn.all_weights[i][1] = torch.randn(size=(6,6)) #weights connecting hidden-hidden
```

I think this will work for initializing weights and biases.

```
for layer in range(num_layers):
for weight in rnn._all_weights[layer]:
if "weight" in weight:
nn.init.xavier_uniform_(getattr(rnn,weight))
if "bias" in weight:
nn.init.uniform_(getattr(rnn,weight))
```

If you want to use a different initialization, just change it. The key is that `getattr(rnn, weight)`

is referencing each of the attributes in question.