Is it possible to have GRU
with different input_size
and hidden_size
?
According to documents x_t
is the input for the first layer or the hidden state of the previous layer.
Unfortunately I didn’t read the documents carefully.
Yes, it is possible. Inputs flow in the network from first layer and for the next layer, output of first layer used as input.