Are the outputs of bidirectional GRU concatenated?

In the document of class torch.nn.GRU(*args, **kwargs):

Outputs: output, h_n
output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features h_t from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

We know that bidirectional RNN reads the sequence forward then get a sequence of forward hidden states, and read the sequence backward then get a sequence of backward hidden states. When do machine translation , we need to concatenate the forward and backward hidden states. Since the size of output is (seq_len, batch, hidden_size * num_directions), does this mean the hidden states have already be concatenated?

Yes, if you want to sum them instead you can do:

 rnn_out = (rnn_out[:, :, :self.hidden_dim] +
                rnn_out[:, :, self.hidden_dim:])
2 Likes

Thank you very much!

Just to make sure. Are the outputs of the two directions reversed? I mean, can we sum them as in your code, or we need to reverse one of them before summation?

And how exactly are the hidden state outputs given? Are they concatenated like {h_layer1_fw , h_layer1_bw , h_layer2_fw, h_layer2_bw} or {h_layer1_fw , h_layer2_fw , h_layer1_bw, h_layer2_bw}. And how do I sum them up?

feel the same as OP, the official docs should’ve been better clarified and demonstrated with more code snippet examples.

Maybe this older post of mine helps a bit, I had a similar question.