GRU hidden size vs input size

Hi All,

When we use GRU with multiple layers, the previous hidden state is used as the input to the upper layers of GRU. [As per this doc -]

Suppose my input size is 8 and hidden size is 64. How can we use the hidden state as the input to the next layer? With this setting, the model learns. Does that mean PyTorch truncate the hidden size to match the input size (or some other mechanism to change dimension)? I am not really clear in this scenario. Looking forward to a good explanation.


Only the first layer has the input size of your original data, e.g., 8. The upper layers use the size of the hidden state as input size.