Hi,

I am implementing a sequence to one LSTM model. At the training step, I try to build the loss function based on more predictions, i.e. we use the first prediction to predict the next, and use the new prediction for the next and so on. For each time applying the model, we have a loss added.

```
loss = 0
for i in range(loss_step+1): # loss_step=0 means the common loss function
backup = Train_batch[:,1:,:] # backup the second to the final items of the sequence batch
optimizer.zero_grad()
out = model(Train_batch) # apply the sequence to one model to get a single output
loss = loss + criterion(out,Truth_batch[:,i,:]) # update the loss function based on the new prediction and the truth
# use the current prediction to update the current batch
Train_batch[:,:-1,:] = backup.clone() # "shift" the batch upward for one item
Train_batch[:,-1,:] = out # put the new prediction to the bottom
loss.backward()
optimizer.step()
```

However, it turns out this approach is really slow. Is it reasonable because we need more back-propagation steps? Or is there any trick to speed it up? Thanks for all the help!