I am trying to train a character level language model with multiplicative LSTM.
Now i can train on individual sequences (batch_size 1 in other words) like this:
x - current character, y - next character
TIMESTEPS = len(x)
for t in range(TIMESTEPS):
emb = embed(x[t])
hidden, output = rnn(emb, hidden)
loss += loss_fn(output, y[t])
My problem is how to scale it up to batch processing, given that all my sequences are with different length?
Check the documentation for nn.LSTM and pack_padded_sequence() / pad_packed_sequence()
You don’t need that
Thank you for your reply.
I got to work PackedSequences:
def forward(self, input, hidden, lengths):
embeddings = self.encoder(input)
packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
output, hidden = self.rnn(packed, hidden)
output, _ = pad_packed_sequence(output, batch_first=True)
Now, I am confused how it is possible to apply linear decoder to only non-padded elements and the feed them to the loss function? Is there a “pytorch” proper way of doing it or the masking/padding is mandatory?
you can derive a mask from the result and use it to mask both the result and the loss (if you use the option for not averaging it)
Thx a lot for your help! I have another question - how it is possible to make a custom RNN compatible with packedsequence?
Custom RNNs cant be made compatible with packedsequence without a significant amount of code. See the inbuilt RNN implementation for example: https://github.com/pytorch/pytorch/search?utf8=✓&q=packedsequence&type=
The easiest way to make a custom RNN compatible with variable-length sequences is to do what this repo does https://github.com/jihunchoi/recurrent-batch-normalization-pytorch – but that won’t be compatible with packedsequence so it won’t be a drop-in replacement for nn.LSTM. The packedsequence approach is fairly specific to the implementation in CUDNN.
To be clear, when you say “easiest way to make a custom RNN compatible with variable-length sequences is to do what this repo does” do you mean this part of the code, where he multiplies any output outside of the time-limit (
time < length) by zero? I’ve copied the relevant bit of code below:
mask = (time < length).float().unsqueeze(1).expand_as(h_next)
h_next = h_next*mask + hx*(1 - mask)
c_next = c_next*mask + hx*(1 - mask)
To make sure I’m understanding the RNN
PackedSequence code correctly, is this the code you’re referring to? From what I understand, this code is doing the dynamic batching algorithm, proposed in this post?
I don’t understand what the masking part does, and why don’t we use one of the built-in loss functions?