GRU Weights Lacking?

ajrheng · August 13, 2021, 2:16am

A GRU is defined in Torch as

Screenshot from 2021-08-13 10-11-08

As I understand, the trainable weights are W_ir, W_hr, W_iz, W_hz, W_in and W_hn along with their biases. However, if I define a GRU and print its state_dict():

rnn = nn.GRU(1, 1, 1)
print(rnn.state_dict())

I get

OrderedDict([('weight_ih_l0', tensor([[0.8447],
        [0.0636],
        [0.3000]])), ('weight_hh_l0', tensor([[-0.2396],
        [ 0.2593],
        [-0.7984]])), ('bias_ih_l0', tensor([-0.8617, -0.9971, -0.0588])), ('bias_hh_l0', tensor([0.9238, 0.1282, 0.9144]))])

According to the documentation,

~GRU.weight_ih_l[k] – the learnable input-hidden weights of the kth\text{k}^{th}kth layer (W_ir|W_iz|W_in), of shape (3*hidden_size, input_size) for k = 0. Otherwise, the shape is (3*hidden_size, num_directions * hidden_size)

weight_ih_l0 seems to encompass W_ir, W_iz, W_in, but there’s only one set of weights for it. Same goes with weight_hh_l0. Does this mean these weights are being shared?

googlebot · August 13, 2021, 6:22am

they’re concatenated because it is more efficient to do a single operation like [W1 W2 W3] @ x than follow how they appear in formulas literally, on cuda anyway

ajrheng · August 13, 2021, 6:46am

Thanks for the response. According to documentation, each weight, for example W_ir has dimensions (3*hidden_size, hidden_size) assuming unidirectional GRU which is what I have. I set hidden_size=1, so the tensor you see in weight_ih_l0 which has dimensions(3,1)is not W_ir W_iz W_in concatenated. It should refer to just one tensor.

googlebot · August 13, 2021, 6:55am

why do you think that W_ir should have 3x multiplier? It appears from concatenation (r,z,n), you have to slice weight_ih_l0 to extract W_ir.

ajrheng · August 13, 2021, 7:18am

Oh, that 3x multiplier comes about because the three weights are being concatenated. I somehow misread the documentation. Now it makes sense. Thanks!