I have build a encoder decoder architecture for machine translation. During the inference, I found that the decoder is generating repeated words, something like:
tensor([[ 2, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 9, 11, 4, 6, 13,
5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4,
6, 13, 5, 4, 6, 13, 5, 12, 5, 4, 6, 13, 5, 4, 9, 11, 4, 6,
13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5,
4, 6, 13, 5, 12, 5, 4, 0],
[ 2, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 9, 11, 4, 6, 13,
5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4,
6, 13, 5, 4, 6, 13, 5, 12, 5, 4, 6, 13, 5, 4, 9, 11, 4, 6,
13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5,
4, 6, 13, 5, 4, 6, 13, 0],
[ 2, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 9, 11, 4, 6, 13,
5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4,
6, 13, 5, 4, 6, 13, 5, 12, 5, 4, 6, 13, 5, 4, 9, 11, 4, 6,
13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5, 4, 6, 13, 5,
4, 6, 13, 5, 4, 9, 10, 0]], device='cuda:0')
I set the beam size to be 3 in this case. The model takes sequence of characters as input and the decoder predicts at each time step a character. I don’t know why this is happening. Any help will be great!