In seq2seq, which loss should I choose when using pack_padded_sequence

Hi, I am confused about the loss that I should use when I apply pack_padded_sequence in the Encoder RNN part. Do I have to define the mask and multiply it with the loss at each time stamp of decoder? Or Pytorch can support this function without the further mask.


Packed Sequences are used to not influence the states of a recurrent layer with the padded values. Then, if you want to compute a loss, you have to unpack the Packed Sequences (pad_packed_sequence) and you will get again the padded sequences with the actual lengths. Therefore, you should apply a masked loss to the padded sequence if you think that the padded values are going to influence the loss.

Thanks for our response! For others working on seq2seq, we should use the mask loss.