In the document of class torch.nn.GRU(*args, **kwargs):
Outputs: output, h_n
output of shape (seq_len, batch, hidden_size * num_directions): tensor containing the output features h_t from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence.
h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len
We know that bidirectional RNN reads the sequence forward then get a sequence of forward hidden states, and read the sequence backward then get a sequence of backward hidden states. When do machine translation , we need to concatenate the forward and backward hidden states. Since the size of output is (seq_len, batch, hidden_size * num_directions), does this mean the hidden states have already be concatenated?