please any suggestions?
I am facing a problem while feeding data to Decoder using batch when there is variable length of sequence.
I can deal with the encoder using packing. the problem occurs in this following condition-
now I can batch encoder side using packing, but then I can’t batch decoder side since the order is non decreasing. Is it possible to do it in any efficient way?
I applied softmax in two parts , 1: before calculating context vector, the outputs of the encoder goes to a linear and then softmax, here I did not mask any since it will learn, also 0 padded words are outputted as 0 vector by the encoder(since i used packing). and 2: before calculating loss, the output of the decoder goes through a linear and then softmax, then I used masking here and used CrossEntropyLoss.