Batch processing with variable length sequences


I am trying to train a character level language model with multiplicative LSTM.
Now i can train on individual sequences (batch_size 1 in other words) like this:

x - current character, y - next character

TIMESTEPS = len(x)
for t in range(TIMESTEPS):
    emb = embed(x[t])
    hidden, output = rnn(emb, hidden)
    loss += loss_fn(output, y[t])

My problem is how to scale it up to batch processing, given that all my sequences are with different length?


Check the documentation for nn.LSTM and pack_padded_sequence() / pad_packed_sequence()

You don’t need that for loop

1 Like

Thank you for your reply.

I got to work PackedSequences:

def forward(self, input, hidden, lengths):
    embeddings = self.encoder(input)
    packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
    output, hidden = self.rnn(packed, hidden)
    output, _ = pad_packed_sequence(output, batch_first=True)

Now, I am confused how it is possible to apply linear decoder to only non-padded elements and the feed them to the loss function? Is there a “pytorch” proper way of doing it or the masking/padding is mandatory?


you can derive a mask from the result and use it to mask both the result and the loss (if you use the option for not averaging it)

Thx a lot for your help! I have another question - how it is possible to make a custom RNN compatible with packedsequence?

Custom RNNs cant be made compatible with packedsequence without a significant amount of code. See the inbuilt RNN implementation for example:✓&q=packedsequence&type=

The easiest way to make a custom RNN compatible with variable-length sequences is to do what this repo does – but that won’t be compatible with packedsequence so it won’t be a drop-in replacement for nn.LSTM. The packedsequence approach is fairly specific to the implementation in CUDNN.


To be clear, when you say “easiest way to make a custom RNN compatible with variable-length sequences is to do what this repo does” do you mean this part of the code, where he multiplies any output outside of the time-limit (time < length) by zero? I’ve copied the relevant bit of code below:

    mask = (time < length).float().unsqueeze(1).expand_as(h_next)
    h_next = h_next*mask + hx[0]*(1 - mask)
    c_next = c_next*mask + hx[1]*(1 - mask)

To make sure I’m understanding the RNN PackedSequence code correctly, is this the code you’re referring to? From what I understand, this code is doing the dynamic batching algorithm, proposed in this post?

Explained here:


I don’t understand what the masking part does, and why don’t we use one of the built-in loss functions?

1 Like