I think I understand but I wanted to check how the layers parameter works in LSTM and GRU. Am I correct in thinking that the layers parameter is effectively a short-hand repetition i.e.
for a univariate time series, and GRUs with a hidden state size 10
self.gru = nn.GRU(input_size=1,hidden_size=10,num_layers=2)
Is equivalent to:
self.gru1 = nn.GRU(input_size=1,hidden_size=10)
self.gru2 = nn.GRU(input_size=10,hidden_size=10)
I think that is what the documentation is saying.
For clarity the hidden size of 10 was picked at random, I think LSTM is similar but I only wrote up GRU as the example.
I’ll even take a yes or no answer so I know I should keep looking
If anyone comes looking at this - someone just asked a similar question and got an answer they are the same:
@timdnewman Run the following code on a jupyter to see what nn.GRU() does :
You’ll see that the prints have the same values.
import torch.nn as nn
gru = nn.GRU(input_size=1,hidden_size=1,num_layers=2, bias=False)
gru_0 = nn.GRU(input_size=1,hidden_size=1,num_layers=1, bias=False)
gru_1 = nn.GRU(input_size=1,hidden_size=1,num_layers=1, bias=False)
#save the parameters of the gru.
params = list(gru.named_parameters())
# Assign the weights to the other grus.
gru_0.weight_ih_l0 = params
gru_0.weight_hh_l0 = params
gru_1.weight_ih_l0 = params
gru_1.weight_hh_l0 = params
input_tensor = torch.tensor([[3.]])
_, hs = gru(input_tensor)
out0, h0 = gru_0(input_tensor)
out1, h1 = gru_1(out0)
Thanks! Although you’re about 3 hours too late
I have marked you as the solution as that is definitive.
Do you know how to recommend an edit to the documentation? Something like this would clarify it a lot, or at least I think so.
@timdnewman. Thanks, a GRU check was in order for me. Regarding the docs, I have no idea, but if you find out do let me know haha.
Here is the method - but it is a sufficiently long-winded proces that I’ll probably just people search for this