How to use pack_padded_sequence correctly? How to compute the loss?

@marco_zaror I don’t think I can really help here since it require insights into the internals. I’m just the occasional users for my research work.

The problem is that you define your own RNN but are using a PyTorch data structure PackedSequence. This is arguably designed to work well with nn.LSTM and nn.GRU. Sure, in principle, it should be able to use it in a custom fashion, but I have no idea how. The questions is also if it’s worth the effort and re-invent the wheel – I understand, of course, that you’re (partly) doing this for education/understanding.

To be honest, I would ignore that issue. Just use the BucketIterator that creates batches where all sequences within a batch have the same or at least very similar length. Even if there’s padding, it’s minimal, so it arguably won’t have any negative effects. Or enforce batch with sequences of equal length; see this thread.