Hi all,
The the usage of initial states for bidirectional GRU/LSTM/RNN seems ambiguous to me in the official documentation.
h_0 (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
Can I assume that [:num_layers, batch, hidden_size] of the initial state are for the forward GRU and the rest of the initial state are for the backward GRU?
Thanks!