Updating loss function largely drags down the speed


I am implementing a sequence to one LSTM model. At the training step, I try to build the loss function based on more predictions, i.e. we use the first prediction to predict the next, and use the new prediction for the next and so on. For each time applying the model, we have a loss added.

loss = 0
for i in range(loss_step+1):   # loss_step=0 means the common loss function
     backup = Train_batch[:,1:,:]   # backup the second to the final items of the sequence batch
     out = model(Train_batch)      # apply the sequence to one model to get a single output
     loss = loss + criterion(out,Truth_batch[:,i,:]) # update the loss function based on the new prediction and the truth
     # use the current prediction to update the current batch
     Train_batch[:,:-1,:] = backup.clone() # "shift" the batch upward for one item
     Train_batch[:,-1,:]  = out                    # put the new prediction to the bottom 

However, it turns out this approach is really slow. Is it reasonable because we need more back-propagation steps? Or is there any trick to speed it up? Thanks for all the help!