Can we batch inputs to the decoder in seq2seq model?

please any suggestions?
I am facing a problem while feeding data to Decoder using batch when there is variable length of sequence.
I can deal with the encoder using packing. the problem occurs in this following condition-

encoder([[4 5 6] [4 5 6] [4 5 6]]))--------decoder([[7 7 7]])
encoder([[1 2 3] [1 2 3]])-----------------decoder([[3 3 3] [3 3 3] [3 3 3]])
encoder([[9 9 9]])-------------------------decoder([[2 2 2]])

now I can batch encoder side using packing, but then I can’t batch decoder side since the order is non decreasing. Is it possible to do it in any efficient way?

Have you tried the so called padding and masking?

yeah, you’re right. I think this will work. thanks, I will try.

I finished it, I did it with attention layer. it worked perfectly. tested for language translation.

Just out of curiosity, did you also apply masking for the softmax of your attention?

I applied softmax in two parts , 1: before calculating context vector, the outputs of the encoder goes to a linear and then softmax, here I did not mask any since it will learn, also 0 padded words are outputted as 0 vector by the encoder(since i used packing). and 2: before calculating loss, the output of the decoder goes through a linear and then softmax, then I used masking here and used CrossEntropyLoss.