I am trying to build an LSTM model, and use the predicted result as the input to predict the result multiple timesteps away. And got the following error: RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
I checked this thread link, and it does not work for my case.
The model is built based on the tutorial TRANSLATION WITH A SEQUENCE TO SEQUENCE NETWORK AND ATTENTION link, with “Teacher forcing: Feed the target as the next input”.
Basically, the pseudocode is the following:
for i in range(epoch):
for j in range(0, X_train.size(0), batch_size):
x, y = getXandY(j)
optimizer.zero_grad()
loss = 0
output = None
for k in range(timeframe_to_be_predicted):
x_input = update_x(output, x, k)
output = model(x_input)
loss += F.mse_loss(output, y)
loss.backward()
optimizer.step()
The main difference between my code and the code on the tutorial is that I am using minibatch and Adam as the optimizer (the code in the tutorial update the parameter by every each epoch).
As my understanding, after optimizer.step()
is called, the parameter was updated and we can start a new computing graph for the next minibatch, so we do not need to set retain_graph=True
in loss.backward()
. But it shows the RuntimeError and because of the memory limitation, loss.backward(retain_graph=True)
is not a good idea as well.
I also tried another method that collects the loss of each timestep. The code is the following. It does not work as well.
loss = []
output = None
for k in range(timeframe_to_be_predicted):
x_input = update_x(output, x, k)
output = model(x_input)
loss.append(F.mse_loss(output, y))
for k in range(len(loss)):
if k == timeframe_to_be_predicted - 1:
loss[k].backward()
else:
loss[k].backward(retain_graph=True)
optimizer.step()
Do you guys have any suggestion on this issue?