Model is not able to predict <pad> tokens, ended up throwing garbage tokens

Hi there,
I am working on an Image2Seq model. I have prepared my dataset in a way that gives: (Image tensors, padded seq) for a batch size of 128 i.e. shape of target: (seq_len, 128). While training, my training loss reduces perfectly but the validation doesn’t go down after a point. Initially, I thought it was an overfitting problem, but on analyzing the output, I got to know that the tokens weren’t predicted properly, and instead ended up throwing garbage tokens(as shown in the example). If I remove garbage tokens, I am able to get a pretty decent Bleu score. May I request you to guide me on how should I proceed?

Thank you!

source:

<sos> \rho ( x _ { 0 } ) = S p \int \frac { d ^ { \nu } p } { ( 2 \pi ) ^ { \nu } } \frac { 1 } { \gamma ^ { \mu } p _ { \mu } + M ( \hat { x } + x _ { 0 } ) } <eos> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>

target:

<sos> \rho ( x _ { 0 } ) = S p \int \frac { d ^ { \prime } p } { ( 2 \pi ) ^ { \nu } } \frac { 1 } { \gamma ^ { \nu } p _ { \mu } + M ( q + x _ { 0 } ) } <eos> <eos> <eos> <eos> <eos> <eos> <eos> } <eos> <eos> } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } <eos> <eos> } } } <eos> <eos> } } } <eos> <eos> } } } <eos> <eos> } } } <eos> { 2 } } { { } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } } } { 2 } }

Hi G,

What Image2Seq do you use?

Hi @blackbirdbarber,
I am using CNN(6 layers) as the encoder and LSTM as the decoder (a very simple version of OpenNMT kind of).