I am implementing a sequence to one LSTM model. At the training step, I try to build the loss function based on more predictions, i.e. we use the first prediction to predict the next, and use the new prediction for the next and so on. For each time applying the model, we have a loss added.
loss = 0 for i in range(loss_step+1): # loss_step=0 means the common loss function backup = Train_batch[:,1:,:] # backup the second to the final items of the sequence batch optimizer.zero_grad() out = model(Train_batch) # apply the sequence to one model to get a single output loss = loss + criterion(out,Truth_batch[:,i,:]) # update the loss function based on the new prediction and the truth # use the current prediction to update the current batch Train_batch[:,:-1,:] = backup.clone() # "shift" the batch upward for one item Train_batch[:,-1,:] = out # put the new prediction to the bottom loss.backward() optimizer.step()
However, it turns out this approach is really slow. Is it reasonable because we need more back-propagation steps? Or is there any trick to speed it up? Thanks for all the help!