RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time

I took the LSTM model from this site, and found that the self.hidden_cell is causing the problem.

class LSTM(nn.Module):
    def __init__(self, input_size=1, hidden_layer_size=100, output_size=1):
        super().__init__()
        self.hidden_layer_size = hidden_layer_size

        self.lstm = nn.LSTM(input_size, hidden_layer_size)

        self.linear = nn.Linear(hidden_layer_size, output_size)

        self.hidden_cell = (torch.zeros(1,1,self.hidden_layer_size),
                            torch.zeros(1,1,self.hidden_layer_size))

    def forward(self, input_seq):
        lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq) ,1, -1), self.hidden_cell)
        predictions = self.linear(lstm_out.view(len(input_seq), -1))
        return predictions[-1]

So I removed it with:

lstm_out, _ = self.lstm(input_seq.view(len(input_seq), 1, -1))

I am getting the same error as mostly people getting here is my code
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

class RNN(nn.Module):
    
    def __init__(self,input_size, output_size, hidden_size=64):

        super().__init__()

        self.input_size  = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.xh = nn.Linear(self.input_size, self.hidden_size, bias=False)
        self.hh = nn.Linear(self.hidden_size, self.hidden_size)
        self.hy = nn.Linear(self.hidden_size, self.output_size)
        
        
        self.tanh = nn.Tanh()
        self.softmax = nn.Softmax(dim=1)
        self.sigmoid = nn.Sigmoid()

    def rnn_cell(self, x, prev_h):  
        h = None
        act = self.xh(x)+ self.hh(prev_h)
        h = self.tanh(act)

        updated_c = self.sigmoid(self.hy(h))
        
        return updated_c, h


    def forward(self, inp, h):
        return self.rnn_cell(inp, h)

@ptrblck Hi,

I got the same error, and I view these solutions. However, I still don’t know how to solve my problem.
Here is code,

    for iter, input in enumerate(train_loader):
        template = input['template']            #read input
        search = input['search']
        label_cls = input['out_label']
        reg_label = input['reg_label']
        reg_weight = input['reg_weight']

        cfg_cnn = [(2, 16, 2, 0, 3),
                   (16, 32, 2, 0, 3),
                   (32, 64, 2, 0, 3),
                   (64, 128, 1, 1, 3),
                   (128, 256, 1, 1, 3)]
        cfg_kernel = [127, 63, 31, 31, 31]
        cfg_kernel_first = [63,31,15,15,15]

        c1_m = c1_s = torch.zeros(1, cfg_cnn[0][1], cfg_kernel[0], cfg_kernel[0]).to(device)
        c2_m = c2_s = torch.zeros(1, cfg_cnn[1][1], cfg_kernel[1], cfg_kernel[1]).to(device)
        c3_m = c3_s = torch.zeros(1, cfg_cnn[2][1], cfg_kernel[2], cfg_kernel[2]).to(device)
        trans_snn = [c1_m, c1_s, c2_m, c2_s, c3_m, c3_s]          # use this list

        for i in range(search.shape[-1]):
            cls_loss_ori, cls_loss_align, reg_loss, trans_snn = model(template.squeeze(-1), \
                                                                   search[:,:,:,:,i], trans_snn,\
                                                                label_cls[:,:,:,i], \
                                                               reg_target=reg_label[:,:,:,:,i], reg_weight=reg_weight[:,:,:,i])
             .......
            loss = cls_loss_ori + cls_loss_align + reg_loss
            optimizer.zero_grad()
            loss.backward()

I think the reason why this code is error is that in the loop, I keep updating the value of the variable trans_snn. However, I have no idea about how to solve it by renaming trans_snn. Looking for your help. Thank you very much!

if I remove trans_snn = [c1_m, c1_s, c2_m, c2_s, c3_m, c3_s] into loop, the error will not happen. However, I need the updated trans_snn .

I don’t know how you are updating trans_snn, but assuming you are assigning intermediate or the output tensors from the model, they would most likely still be attached to the computation graph and you would thus “attach the computation graph” to trans_snn.
If that’s the desired use case, you would have to use backward(retain_graph=True), but in most of the cases this is not the desired behavior and you might want to consider detaching the tensors which are assigned to trans_snn.

Dear Ptrblck,

@ptrblck I have a question. I have successfully trained a GCN model. and then I want to re-train this GCN model with some constraints. But I got the same error when performing loss.backward(). i.e., “RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.”. Could you please help me to solve it?

Thanks so much for your kind help.

Best,
cm

This error message is generally raised, if the intermediate activations were already freed by a previous backward() operation while you are trying to calculate the gradients a second time.
This could happen, if you are directly calling backward multiple times without specifying retain_graph=True:

model = nn.Linear(1, 1)
x = torch.randn(1, 1)
out = model(x)

out.backward()
out.backward()
> RuntimeError: Trying to backward through the graph a second time 

or if the computation graph is still attached to a previous forward pass (as is often the case in RNNs when the states are not detached).

I used the function: out.backward(retain_graph=True) at the first training, but I get this error when performing the second training. Could you please give me some advice on it?

Thanks

I would not recommend to simply use retain_graph=True as a workaround without checking, if it’s really the desired behavior (in the majority of the use cases I’ve seen so far it was not the wanted behavior and was used as a workaround).

Thanks for your answer. I have carefully checked my code, and added detach() at variable from previous graph, I can successfully run my code.

Best,
cm

Thanks for your explain.

Do you have any idea how to check which path not cover in the second time?

God bless you sir.

I wish I could buy you a cup of coffee

I have been battling with this for sometime now.

This just ended my two weeks struggle.

Thank you so much

Hi, how did you solve this please?
I think that I have a similar use case and am experiencing the same backprop error

What are the ops that do not require buffers, please?

I’m afraid we don’t have a list of these.
It will depend on the exact formula for each op I’m afraid.
There are some places in the code where you could read about them, but you can also use tools like torchviz to plot what is saved by using show_saved=True.

1 Like