Bidirectional GRU layer Dropout behavior

To add dropout after both forward and backward in GRU, we opt for following way in Tensorflow :

gru_cell_forward = tf.nn.rnn_cell.GRUCell(gru_size)
gru_cell_backward = tf.nn.rnn_cell.GRUCell(gru_size)
gru_cell_forward = tf.nn.rnn_cell.DropoutWrapper(gru_cell_forward,output_keep_prob=some_dropout_val)
gru_cell_backward = tf.nn.rnn_cell.DropoutWrapper(gru_cell_backward,output_keep_prob=some_dropout_val)

While in Pytorch:

nn.GRU(embedding_size, gru_hidden_dimension, num_layers=1, bidirectional=True,

Is the dropout here applied after concatenation of both direction outputs? Or is it done individually for each direction?

What’s the proper way in Pytorch to imitate the same behavior of dropout as in Tensorflow code above?

The PyTorch GRU implementation (as for the other RNNs) does not perform Dropout on the last layer. In the docs of the GRU parameters you can read:

dropout – If non-zero, introduces a Dropout layer on the outputs of each GRU layer except the last layer, with dropout probability equal to dropout.

You should get a warning if for dropout > 0 for a 1-layer network.

Therefore, just add a separate Dropout module after the bidirectional GRU, or split the output and add two if you want separate dropouts for each direction.

Ah! Got it.
Many thanks @rdroste