To add dropout after both forward and backward in GRU, we opt for following way in Tensorflow :
.
.
gru_cell_forward = tf.nn.rnn_cell.GRUCell(gru_size)
gru_cell_backward = tf.nn.rnn_cell.GRUCell(gru_size)
gru_cell_forward = tf.nn.rnn_cell.DropoutWrapper(gru_cell_forward,output_keep_prob=some_dropout_val)
gru_cell_backward = tf.nn.rnn_cell.DropoutWrapper(gru_cell_backward,output_keep_prob=some_dropout_val)
.
.
While in Pytorch:
nn.GRU(embedding_size, gru_hidden_dimension, num_layers=1, bidirectional=True,
dropout=some_dropout_val)
Is the dropout here applied after concatenation of both direction outputs? Or is it done individually for each direction?
What’s the proper way in Pytorch to imitate the same behavior of dropout as in Tensorflow code above?