I have a few questions regarding RNNs and Paddings.
I have variable length inputs I want to feed into the RNN.
If I pre-pad the sequences with numpy, and use a dataloader to create mini-batches, is it the same as using the
If we pad 0s, will PyTorch use the 0 paddings as input to compute the gradient?
Answer my own question here:
- You will need to pre-pad everything first, even with pack_padded_sequence. pack_padded_sequence will not pad for you, but instead will make sure that the RNN does not compute gradient on padded values.
Dataloader will not work with pack_padded_sequence since you will need to keep track of all the original lengths before pre-padding yourself (in numpy or pytorch).
You can create a DataLoader out of a DataSet, but the DataSet cannot have 3 items at once (Features, Label, Seq_Lengths), but only (Features, Labels), hence you will lose the original lengths.
Where did you find that (docs?) that padding need to be pre and not post?
Also, how about the Data loader note you make? I am using a data loader on the padded data and then within the forward pass of each mini batch I am pack_padding. That seems to work fine.