Given some input a BiGRU cell creates:
-
output
, of shape(max_time_step, batch, hidden_size * 2)
2.hidden
, of shape(n_layers * 2, batch, hidden_size)
To add the forward and backward hidden states for the output is easy I would just have to do something like:
output[:, :, hidden_size:] + output[:,:,:hidden_size]
Though to sum the hidden states for each layer is a bit more difficult and this is where I am stuck, how would I create the bidirectional hidden states for each layer?
To be more concrete I want something like:
layer_1_bi_hidden = hidden[0,:,:] + hidden[1,:,:]
layer_2_bi_hidden = hidden[2,:,:] + hidden[3,:,:]
final_bi_hidden = torch.cat([layer_1_bi_hidden, layer_2_bi_hidden], 0)
Now I understand that I can achieve this through looping, but is there a way to vectorize this operation?