How does an example sequence pair in your batches look like. Since each input sequences and each output sequence must be the same length, I assume an input-ouput pair might look like:
- input:
my name is alice PAD PAD PAD
- ouput:
ich heisse alice EOS PAD PAD PAD PAD
Technically, the network should learn to basically ignore PAD
tokens.
I always make my life easier here. I simply generate batches where all input sequences and all output sequences with in a batch have the same length. So my example pairs all look like:
- input:
my name is alice
- ouput:
ich heisse alice EOS
This includes that all input sequences in this batch have a length if 4, and all output sequences in this length have a length of 4 – both numbers obviously different for different batches. To me, that’s the easiest way to avoid any issues of padding and different sequence length, etc.