Are the outputs of bidirectional GRU concatenated?

NLPpupil · March 18, 2018, 7:46am

In the document of class torch.nn.GRU(*args, **kwargs):

Outputs: output, h_n
output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features h_t from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len

We know that bidirectional RNN reads the sequence forward then get a sequence of forward hidden states, and read the sequence backward then get a sequence of backward hidden states. When do machine translation , we need to concatenate the forward and backward hidden states. Since the size of output is (seq_len, batch, hidden_size * num_directions), does this mean the hidden states have already be concatenated?

austin · March 19, 2018, 7:30am

Yes, if you want to sum them instead you can do:

 rnn_out = (rnn_out[:, :, :self.hidden_dim] +
                rnn_out[:, :, self.hidden_dim:])

NLPpupil · March 19, 2018, 1:20pm

Thank you very much!

ymeng · June 21, 2018, 9:03pm

Just to make sure. Are the outputs of the two directions reversed? I mean, can we sum them as in your code, or we need to reverse one of them before summation?

tarun · June 28, 2018, 5:37pm

And how exactly are the hidden state outputs given? Are they concatenated like {h_layer1_fw , h_layer1_bw , h_layer2_fw, h_layer2_bw} or {h_layer1_fw , h_layer2_fw , h_layer1_bw, h_layer2_bw}. And how do I sum them up?

galactica147 · January 15, 2020, 10:39pm

feel the same as OP, the official docs should’ve been better clarified and demonstrated with more code snippet examples.

vdw · January 16, 2020, 12:54am

Maybe this older post of mine helps a bit, I had a similar question.