How to handle variable length sequences in custom rnn

I’m writing some codes to implement neural turing machines, which needs a memory module, and I don’t know how to handle the variable length sequences, I found the rnn codes which use _backend.rnn and passed a batch_sizes parameter.

If I use the padding sequences, what should I do in the loss and optimizer?

My rnn likes the example probided by docs, is different from the standard rnn:

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)
        self.softmax = nn.LogSoftmax()

    def forward(self, input, hidden):
        combined =, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return Variable(torch.zeros(1, self.hidden_size))
1 Like
  1. Just use for-loop to iterate on you variant length sequences. But in the sense of efficiency, I would recommend you to use padded sequences for mini-batch.
  2. You can use the output of RNN to calculate loss and do backward.

Could it work just to use padded sequences?
If I understand it correctly, first padding the sequences and use the corresponding output ( e.g. input is [1,2,0,0] and output is [0,1,2,2], I will use the second output “1” to calculate the loss), don’t need any other operations in rnn layer?

That’s right. You could also use the final output, which is efficient but will probably do harm to the performance.

Thanks very much!
I used to think I need write some codes in my rnn layer, I have been stuck here for a long time.

I am actually stuck on a similar problem , I am trying to do speech recognition using attention mechanism , I have build the boiler plate code ( model ) for that using the seq-to-seq tutorial , and have preprocessed my speech data , now the problem is that for each item in my dataset x , I have following pair <frames of x of size (anything,13)> , < transcription of x>

Now for each item in dataset the number of frames are different , some are (256,13) , (134,13) , you get the idea , so how do I pad it to create of same length so I can train it on GPU , also where to pad the sequences , should I do this in my dataloader class , or do it before I create a dataloader class , Thanks

I read the pytorch’s rnn code, I found there are two implementaions on cpu and gpu.

In pytorch/nn/_functions/, they use batch_sizes parameter in VariableRecurrent which is running on cpu, and there is no batch_sizes parameter in CudnnRNN which is running on gpu, so maybe dynamic batching is not supported on gpu. I’m not 100% sure about it.

I don’t really understand the VariableRecurrent’s logic flow, I think it uses the corresponding output and calculate the loss.

Now I’m goting to padding the sequences to max length, and use the right output(some short length sequence’s output is not the last one) to get loss.

Please tell me if you have any new answers! Thanks!