A better way to sum the hidden states of BiGRU cell with more than 1 layer

Given some input a BiGRU cell creates:

  1. output, of shape (max_time_step, batch, hidden_size * 2)
    2.hidden, of shape (n_layers * 2, batch, hidden_size)

To add the forward and backward hidden states for the output is easy I would just have to do something like:
output[:, :, hidden_size:] + output[:,:,:hidden_size]

Though to sum the hidden states for each layer is a bit more difficult and this is where I am stuck, how would I create the bidirectional hidden states for each layer?

To be more concrete I want something like:

layer_1_bi_hidden = hidden[0,:,:] + hidden[1,:,:]
layer_2_bi_hidden = hidden[2,:,:] + hidden[3,:,:]
final_bi_hidden = torch.cat([layer_1_bi_hidden, layer_2_bi_hidden], 0)

Now I understand that I can achieve this through looping, but is there a way to vectorize this operation?