Why is error "Trying to backward through the graph a second time..." solved by detachment of variable?

Heizkessel · November 7, 2018, 7:08pm

I’m using torch version 1.0.0.

In each gradient update step (see last loop in code below) I perform one forward step and loss computation. However when I enter the loop a second time I get the following error

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

Playing around I realized I can fix this error by adding the following line to the loop

states = states.clone().detach()

Why does that solve the problem? I do not really understand what that line does (in an autograd tutorial) they used a similar line to “detach the Variable from its history”. What does this mean? I thought the graph gets cleared upon execution of loss.backward() (that’s why I would get the same error if I’d try to run loss.backward() again right after the first time) but I don’t see what additional function .detach() serves. So, to sum up my question:

Why does the code below give an error and what does .clone().detach() do to fix it?

import torch
import torch.nn as nn

sft = nn.functional.softmax 

def forward(StateVec,ConnectMatrix,L):
    StateVec = StateVec + (2*(-1/8*StateVec - ConnectMatrix.mm(StateVec)))*0.1
    pos = L.mm(sft(StateVec, dim=0))
    return StateVec, pos

N = 6

"""Toy target"""
target = torch.randn(2,20)

"""Randomly initialise L (which ought to be inferred later)"""
L = torch.randn(2,N)
L.requires_grad_(True)


"""Produce Connectivity Matrix rho"""
rho = torch.zeros(N,N);
for i in range(N):
    for j in range(N):
        if i == j:
            rho[i, j] = 0
        elif j == i + 1:
            rho[i, j] = 1.5
        elif j == i - 1:
            rho[i, j] = 0.5
        else:
            rho[i, j] = 1

rho[-1, 0] = 1.5
rho[0, -1] = 0.5

"""Initialise state vector as states = [0.5,0,0,...]"""
states = torch.Tensor(N,1)
states[0] = 0.5
states.requires_grad_(True)

lr = 0.1 # Learning Rate
for t in range(0, target.shape[1]):
    states, pos = forward(states,rho,L)
    loss = torch.sum((pos - target[:,t].float().view([2,1]))**2)
    loss.backward()
    L.data -= L.grad.data * lr
    L.grad.data.zero_()

InnovArul · November 7, 2018, 9:45pm

I think you do not require gradient to states variable. Commenting this line will make the code work.

Why .detach()ing works?

This is my understanding. You are using the same variable (states) that is having requires_grad = True again and again. When you use it for the first time, there is no problem. i.e., when you back propagate, the graph is destroyed on the go and the gradient is accumulated in states.grad buffer.

When you use the states variable second time (I call it states2), this is a variable derived from original states variable. i.e., there is still a link exists to the original states from states2. When you back-propagate through states2, it is still required to pass on the gradients to original states because it has requires_grad=True. But the graph between original states and states2 was already destroyed during first back-propagation. Hence it causes RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

This case is similar to (i.e, same variable usage):

Heizkessel · November 8, 2018, 10:31am

I see, that makes a lot of sense. Thank you very much!