for di in range(target_length):
decoder_output, decoder_context, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_context, decoder_hidden, encoder_outputs)
loss += criterion(decoder_output, target_variable[di])
decoder_input = target_variable[di]
Does it mean we don’t train the last word in target_variable?