I am new to Pytorch and RNN, and don not know how to initialize the trainable parameters of nn.RNN, nn.LSTM, nn.GRU. I would appreciate it if some one could show some example or advice!!!
Thanks
I am new to Pytorch and RNN, and don not know how to initialize the trainable parameters of nn.RNN, nn.LSTM, nn.GRU. I would appreciate it if some one could show some example or advice!!!
Thanks
net = nn.LSTM(10, 20, 1)
net.weight_hh_l0.data.fill_(0)
make a 1 layer lstm, input_dim = 10, hidden_state = 20, this can make weight in first layer is 0
Thanks @SherlockLiao
There are four weights/bias for a LSTM layer, so all need to be initialized in this way? Is there a common initialization distribution for LSTM? Like Gaussian or Uniform distribution.
weight_ih_l[k] ā the learnable input-hidden weights of the k-th layer (W_ii|W_if|W_ig|W_io), of shape (input_size x 4hidden_size)
weight_hh_l[k] ā the learnable hidden-hidden weights of the k-th layer (W_hi|W_hf|W_hg|W_ho), of shape (hidden_size x 4hidden_size)
bias_ih_l[k] ā the learnable input-hidden bias of the k-th layer (b_ii|b_if|b_ig|b_io), of shape (4hidden_size)
bias_hh_l[k] ā the learnable hidden-hidden bias of the k-th layer (b_hi|b_hf|b_hg|b_ho), of shape (4hidden_size)
I only know how to do it one by one, maybe this way, u can write a for loop
net.weight.data.normal_(0, math.sqrt(2. / n))
How about this?
dict = {} #we can store the weights in this dict for convenience
import torch.nn.init as weight_init
for name, param in net.named_parameters():
weight_init.normal(param);
dict[name] = param
How about these solutions below. Are they viable?
init.xavier_normal(GRU)
# or
for p in layer.params():
init.xavier_normal(p)
Hi, I found a way as follows (but Iām not sure is it correct or not):
a = nn.GRU(500, 50, num_layers=2)
from torch.nn import init
for layer_p in a._all_weights:
for p in layer_p:
if 'weight' in p:
# print(p, a.__getattr__(p))
init.normal(a.__getattr__(p), 0.0, 0.02)
# print(p, a.__getattr__(p))
This snippet of the code could initialize the weights of all layers.
Hope this could help you
To initialize the weights for nn.RNN, you can do the following :
In this example, I initialize the weights randomly.
rnn = nn.RNN(input_size=5,hidden_size=6,
num_layers=2,batch_first=True)
num_layers = 2
for i in range(num_layers):
rnn.all_weights[i][0] = torch.randn(size=(5,6)) # weights connecting input-hidden
rnn.all_weights[i][1] = torch.randn(size=(6,6)) #weights connecting hidden-hidden
I think this will work for initializing weights and biases.
for layer in range(num_layers):
for weight in rnn._all_weights[layer]:
if "weight" in weight:
nn.init.xavier_uniform_(getattr(rnn,weight))
if "bias" in weight:
nn.init.uniform_(getattr(rnn,weight))
If you want to use a different initialization, just change it. The key is that getattr(rnn, weight)
is referencing each of the attributes in question.