How to debug when GRU decoder predicts repeated words at inference

How does an example sequence pair in your batches look like. Since each input sequences and each output sequence must be the same length, I assume an input-ouput pair might look like:

  • input: my name is alice PAD PAD PAD
  • ouput: ich heisse alice EOS PAD PAD PAD PAD

Technically, the network should learn to basically ignore PAD tokens.

I always make my life easier here. I simply generate batches where all input sequences and all output sequences with in a batch have the same length. So my example pairs all look like:

  • input: my name is alice
  • ouput: ich heisse alice EOS

This includes that all input sequences in this batch have a length if 4, and all output sequences in this length have a length of 4 – both numbers obviously different for different batches. To me, that’s the easiest way to avoid any issues of padding and different sequence length, etc.

You can read up in more detail on this idea here and here.