Calculation of gate outputs in GRU layer

Vishvas · August 21, 2021, 6:57am

Hey, I am trying to figure out the calculations that take place in a GRU layer.

I obtained a pre-trained model and it has a GRU layer define as GRU(96, 96, bias=True).

I checked the dimensions of the weights and bias:
weight_ih_l0 = [288, 96]
weight_hh_l0 = [288, 96]
bias_ih_l0 = [288]
bias_hh_l0 = [288]

The input that is fed to the layer is of size [1000, 8, 96]
The batch_first variable is ‘False’, this would mean:
Sequence Length = 1000
Batch size = 8
Input size = 96

I tried to follow the equations in GRU — PyTorch 1.9.0 documentation
In r(t) we multiply W with X, but my W is 2 dimensions and X is 3 dimensions which makes them incompatible for matrix multiplication.

I know that there are multiple time steps involved, but how is the input X (which is 3D) split so that it is compatible for multiplication with the weight martix

googlebot · August 21, 2021, 12:38pm

in optimized implementations it is not sliced, instead batch matrix multiplication is done: (1000,8,96) @ (96,hs*3) = (1000, 8, hs*3) = (1000, 8, hs*3), this is Wx summands for three gates and all timesteps at once (hs=hidden size, it can differ from input size 96) . This is a “precalculation”, as Wh summands must be computed step by step.

Vishvas · August 23, 2021, 4:17am

@googlebot Thank you. I got the calculations