"backward through the graph a second time" runtime error in LSTM

ding · May 14, 2019, 8:58pm

I am trying to build an LSTM model, and use the predicted result as the input to predict the result multiple timesteps away. And got the following error: RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I checked this thread link, and it does not work for my case.

The model is built based on the tutorial TRANSLATION WITH A SEQUENCE TO SEQUENCE NETWORK AND ATTENTION link, with “Teacher forcing: Feed the target as the next input”.

Basically, the pseudocode is the following:

for i in range(epoch):
   for j in range(0, X_train.size(0), batch_size):

        x, y = getXandY(j)
        optimizer.zero_grad()
        loss = 0
        output = None

        for k in range(timeframe_to_be_predicted):
           x_input = update_x(output, x, k)
           output = model(x_input)
           loss += F.mse_loss(output, y)
     
        loss.backward()
        optimizer.step()

The main difference between my code and the code on the tutorial is that I am using minibatch and Adam as the optimizer (the code in the tutorial update the parameter by every each epoch).

As my understanding, after optimizer.step() is called, the parameter was updated and we can start a new computing graph for the next minibatch, so we do not need to set retain_graph=True in loss.backward(). But it shows the RuntimeError and because of the memory limitation, loss.backward(retain_graph=True) is not a good idea as well.

I also tried another method that collects the loss of each timestep. The code is the following. It does not work as well.

        loss = []
        output = None

        for k in range(timeframe_to_be_predicted):
           x_input = update_x(output, x, k)
           output = model(x_input)
           loss.append(F.mse_loss(output, y))
     
        for k in range(len(loss)):
            if k == timeframe_to_be_predicted - 1:
                loss[k].backward()
            else:
                loss[k].backward(retain_graph=True)

        optimizer.step()

Do you guys have any suggestion on this issue?

ding · May 16, 2019, 7:43pm

I solved this issue. The problem is the following function. When we update the next step input (using the output of the current step), we need to update the tensor NOT in-place. Create a new tensor helps.

lda16 · May 10, 2022, 2:25pm

Thanks for your answer. I also came across this issue. The key is to detach the grad info of the previous tensor before sending the values to the next step. The h and c tensors have been auto-graded once, and if you don’t detach the grad info, the next auto-grad will raise an error:

input_tensor = output_tensor.detach()
output_tensor = model(input_data,input_tensor)