Do we have to define maskLoss when dealing with variable length sequences?

Hello, friends of the Forum.:blush:

In the PyTorch chatbot tutorial, the pad_packed_sequence is used after the variable_length sequence is processed by the pack_padded_sequence. Finally, the maskLoss function is defined.

But I saw in other books:
After the pack_padded_sequence processing on the variable-length sequence, there is no pad_packed_sequence, and then the loss is calculated directly using the official API-CrossEntropyLoss.

I was very cofused,who can help me~

Hello, I am not sure myself but from what I understand the pad_packed_sequence is not needed if you want to compute the loss as you just want to compute it on real values and not on padded values.
On the other hand if you want to see the output using pad_packed_sequence will put back the padding value and enable you to better see what’s going on.
But (and if I am correct) if you use the pad_packed_sequence in the forward then you will need to filter out the padded values to compute your loss which seems to me a bit of a waste (added padding with pad_packed_sequence and then filter them). If you did not padded back then you can directly compute your loss.
Again I this is as far as I understood it so please correct me if I am wrong
Thanks a lot