seq2seq performance

Is this necessarily a bug for a code: test performance on train data

hidden size: 100 Iterations: 400 : Perfect performance
hidden size: 100 Iterations: >=700 : Gets one wrong
hidden size: 200 iterations >=700: Perfect performance

How many training samples do you have?
One misclassified sample doesn’t sound bad and your resubstitution error looks also like your model is perfectly able to overfit the training data.

I have 5 examples. My main concern is why increasing the iterations caused it to wrongly predict a translation.

Maybe the learning rate was too high and so your model parameters were thrown out of a local minimum.
It’s common to see some noisy results, especially using a very small number of samples.

