I was wondering if you manually add a dropout layer after LSTM, will the dropout mask be the same for all the time steps in a sequence? Or it will be different for each time step.
The RNN or LSTM network recurs itself for every step, which means in every step it’s like a normal fc network. So the dropout is applied to each time step.
In the newer version of pytorch, 1-layer rnn does not have a valid argument as dropout, so the dropout was not applied to each step, unless it is manually implemented (re-write the rnn module)
Yes, I guess your description would be more clear.
I was trying to explain that dropout would be applied in every time step, which means on every h_t the dropout works.
Dropout was placed in the middle of 2 stacked RNN unit.
I have seen the source code, I tend to think that claim 2 is right, dropout works between layers but not every timestep.
for i in range(num_layers):
all_output = []
for j, inner in enumerate(inners):
l = i * num_directions + j
hy, output = inner(input, hidden[l], weight[l], batch_sizes)
next_hidden.append(hy)
all_output.append(output)
input = torch.cat(all_output, input.dim() - 1)
if dropout != 0 and i < num_layers - 1:
input = F.dropout(input, p=dropout, training=train, inplace=False)