I am trying to train a character level language model with multiplicative LSTM.
Now i can train on individual sequences (batch_size 1 in other words) like this:
x - current character, y - next character
TIMESTEPS = len(x)
for t in range(TIMESTEPS):
emb = embed(x[t])
hidden, output = rnn(emb, hidden)
loss += loss_fn(output, y[t])
My problem is how to scale it up to batch processing, given that all my sequences are with different length?
Now, I am confused how it is possible to apply linear decoder to only non-padded elements and the feed them to the loss function? Is there a “pytorch” proper way of doing it or the masking/padding is mandatory?
The easiest way to make a custom RNN compatible with variable-length sequences is to do what this repo does https://github.com/jihunchoi/recurrent-batch-normalization-pytorch – but that won’t be compatible with packedsequence so it won’t be a drop-in replacement for nn.LSTM. The packedsequence approach is fairly specific to the implementation in CUDNN.
To be clear, when you say “easiest way to make a custom RNN compatible with variable-length sequences is to do what this repo does” do you mean this part of the code, where he multiplies any output outside of the time-limit (time < length) by zero? I’ve copied the relevant bit of code below:
To make sure I’m understanding the RNN PackedSequence code correctly, is this the code you’re referring to? From what I understand, this code is doing the dynamic batching algorithm, proposed in this post?