In seq2seq, which loss should I choose when using pack_padded_sequence

ShilinHE · January 25, 2018, 3:59am

Hi, I am confused about the loss that I should use when I apply pack_padded_sequence in the Encoder RNN part. Do I have to define the mask and multiply it with the loss at each time stamp of decoder? Or Pytorch can support this function without the further mask.

Thanks!

stefanonardo · January 25, 2018, 2:42pm

Packed Sequences are used to not influence the states of a recurrent layer with the padded values. Then, if you want to compute a loss, you have to unpack the Packed Sequences (pad_packed_sequence) and you will get again the padded sequences with the actual lengths. Therefore, you should apply a masked loss to the padded sequence if you think that the padded values are going to influence the loss.

ShilinHE · January 29, 2018, 2:33pm

Thanks for our response! For others working on seq2seq, we should use the mask loss.