Should the loss of NMT be divided by the time steps?

In the NMT tutorial:

if use_teacher_forcing:
        # Teacher forcing: Feed the target as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            loss += criterion(decoder_output, target_variable[di])
            decoder_input = target_variable[di]  # Teacher forcing

the loss is accumulated by the length of the sentence, however, this causes the longer sentence to have greater loss. So, Should the loss be divided by the time steps?