Seq2seq for NMT why the decoder keeps predicting repeated tokens?

Yingqiang_Gao · June 18, 2019, 1:47pm

I have build a encoder decoder architecture for machine translation. During the inference, I found that the decoder is generating repeated words, something like:

tensor([[ 2,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  9, 11,  4,  6, 13,
          5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,
          6, 13,  5,  4,  6, 13,  5, 12,  5,  4,  6, 13,  5,  4,  9, 11,  4,  6,
         13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,
          4,  6, 13,  5, 12,  5,  4,  0],
        [ 2,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  9, 11,  4,  6, 13,
          5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,
          6, 13,  5,  4,  6, 13,  5, 12,  5,  4,  6, 13,  5,  4,  9, 11,  4,  6,
         13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,
          4,  6, 13,  5,  4,  6, 13,  0],
        [ 2,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  9, 11,  4,  6, 13,
          5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,
          6, 13,  5,  4,  6, 13,  5, 12,  5,  4,  6, 13,  5,  4,  9, 11,  4,  6,
         13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,  4,  6, 13,  5,
          4,  6, 13,  5,  4,  9, 10,  0]], device='cuda:0')

I set the beam size to be 3 in this case. The model takes sequence of characters as input and the decoder predicts at each time step a character. I don’t know why this is happening. Any help will be great!

shahensha · March 10, 2020, 7:08pm

@Yingqiang_Gao Were you able to figure out the reason?

zhangguanheng66 · March 11, 2020, 3:00pm

Do you use transformer architecture?