Machine translation transformer predicts repeating output without generating <EOS>

I created and tried to train my own machine translation transformer model with the regular encoder decoder architecture, attempting to keep it as close to the original “Attention is all you need” paper.

The problem is that my model seems to output the same thing repeated without generating an token until many timesteps later. The outputs look something like this for an english to spanish translation.

>>> inference("i like to swim", model, dataset, DEV, 50)
'<SOS> me gusta nadar a nadar te gusta nadar a nadar a nadar les gusta nadar a me gusta nadar como me gusta nadar <EOS>'

The correct output should just be ‘me gusta nadar’, which the model has generated and then gone past, repeating previous outputs again and again.

What could be the reason for behaviour like this? I trained the model for 20 epochs on a dataset with 130,000 translations. Do i keep training for longer?

For context, the code can be found here (with this being the definition of the inference function). I didn’t include it in this post because it would end up being very long.