Hey, I am trying to figure out the calculations that take place in a GRU layer.
I obtained a pre-trained model and it has a GRU layer define as GRU(96, 96, bias=True).
I checked the dimensions of the weights and bias:
weight_ih_l0 = [288, 96]
weight_hh_l0 = [288, 96]
bias_ih_l0 = [288]
bias_hh_l0 = [288]
The input that is fed to the layer is of size [1000, 8, 96]
The batch_first variable is ‘False’, this would mean:
Sequence Length = 1000
Batch size = 8
Input size = 96
I tried to follow the equations in GRU — PyTorch 1.9.0 documentation
In r(t) we multiply W with X, but my W is 2 dimensions and X is 3 dimensions which makes them incompatible for matrix multiplication.
I know that there are multiple time steps involved, but how is the input X (which is 3D) split so that it is compatible for multiplication with the weight martix