Can we batch inputs to the decoder in seq2seq model?

bipul_kalita · November 27, 2018, 1:52pm

please any suggestions?
I am facing a problem while feeding data to Decoder using batch when there is variable length of sequence.
I can deal with the encoder using packing. the problem occurs in this following condition-

encoder([[4 5 6] [4 5 6] [4 5 6]]))--------decoder([[7 7 7]])
encoder([[1 2 3] [1 2 3]])-----------------decoder([[3 3 3] [3 3 3] [3 3 3]])
encoder([[9 9 9]])-------------------------decoder([[2 2 2]])

now I can batch encoder side using packing, but then I can’t batch decoder side since the order is non decreasing. Is it possible to do it in any efficient way?

fermat97 · November 27, 2018, 5:29pm

Have you tried the so called padding and masking?

bipul_kalita · November 27, 2018, 5:42pm

yeah, you’re right. I think this will work. thanks, I will try.

bipul_kalita · November 29, 2018, 7:53am

I finished it, I did it with attention layer. it worked perfectly. tested for language translation.

fermat97 · November 29, 2018, 2:45pm

Just out of curiosity, did you also apply masking for the softmax of your attention?

bipul_kalita · November 29, 2018, 3:29pm

I applied softmax in two parts , 1: before calculating context vector, the outputs of the encoder goes to a linear and then softmax, here I did not mask any since it will learn, also 0 padded words are outputted as 0 vector by the encoder(since i used packing). and 2: before calculating loss, the output of the decoder goes through a linear and then softmax, then I used masking here and used CrossEntropyLoss.