It mgiht very well that I am misunderstanding something here, but I am following the official seq2seq tutorial and I am unsure about the following section:
if use_teacher_forcing:
# Teacher forcing: Feed the target as the next input
for di in range(target_length):
decoder_output, decoder_hidden, decoder_attention = decoder(
decoder_input, decoder_hidden, encoder_outputs)
loss += criterion(decoder_output, target_tensor[di])
decoder_input = target_tensor[di] # Teacher forcing
else:
# Without teacher forcing: use its own predictions as the next input
for di in range(target_length):
decoder_output, decoder_hidden, decoder_attention = decoder(
decoder_input, decoder_hidden, encoder_outputs)
topv, topi = decoder_output.topk(1)
decoder_input = topi.squeeze().detach() # detach from history as input
loss += criterion(decoder_output, target_tensor[di])
if decoder_input.item() == EOS_token:
break
I assume that all input and output sequences start with the SOS token and end with the EOS token, both on the source and the target side.
As you can see, the SOS token is given as input and then target_length
new tokens are generated and used to calculate loss. It seems to me that this means that one token too many will be generated (and used in calculating loss) because the first token SOS
has already been given. this also means that the loss function compares the wrong indices: I think it should be target_tensor[di+1]
because otherwise you are one token late, as we shifted one index forward because we already started with SOS.
Am I wrong? I wasn’t sure whether this should be posted as an issue on Github or here, so first trying to post it here.