I’m writing some codes to implement neural turing machines, which needs a memory module, and I don’t know how to handle the variable length sequences, I found the rnn codes which use _backend.rnn and passed a batch_sizes parameter.
If I use the padding sequences, what should I do in the loss and optimizer?
My rnn likes the example probided by docs, is different from the standard rnn:
Just use for-loop to iterate on you variant length sequences. But in the sense of efficiency, I would recommend you to use padded sequences for mini-batch.
You can use the output of RNN to calculate loss and do backward.
Could it work just to use padded sequences?
If I understand it correctly, first padding the sequences and use the corresponding output ( e.g. input is [1,2,0,0] and output is [0,1,2,2], I will use the second output “1” to calculate the loss), don’t need any other operations in rnn layer?
I am actually stuck on a similar problem , I am trying to do speech recognition using attention mechanism , I have build the boiler plate code ( model ) for that using the seq-to-seq tutorial , and have preprocessed my speech data , now the problem is that for each item in my dataset x , I have following pair <frames of x of size (anything,13)> , < transcription of x>
Now for each item in dataset the number of frames are different , some are (256,13) , (134,13) , you get the idea , so how do I pad it to create of same length so I can train it on GPU , also where to pad the sequences , should I do this in my dataloader class , or do it before I create a dataloader class , Thanks
I read the pytorch’s rnn code, I found there are two implementaions on cpu and gpu.
In pytorch/nn/_functions/rnn.py, they use batch_sizes parameter in VariableRecurrent which is running on cpu, and there is no batch_sizes parameter in CudnnRNN which is running on gpu, so maybe dynamic batching is not supported on gpu. I’m not 100% sure about it.
I don’t really understand the VariableRecurrent’s logic flow, I think it uses the corresponding output and calculate the loss.
Now I’m goting to padding the sequences to max length, and use the right output(some short length sequence’s output is not the last one) to get loss.
Please tell me if you have any new answers! Thanks!