Problem with backward pass, buffers already been freed

ender46 · February 27, 2019, 7:18pm

Hello,

I am running in the backpropagation error that states that “Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time”.

However, I don’t see where is my code trying to run through the graph a second time. Let me show you two examples, the first which works fine, and the second which does not, and I do not understand why it fails.

x = torch.randn( (5, 4), requires_grad=True )  
theta = torch.randn( (4, 3), requires_grad=False )

optimizer = optim.Adam( [x] )

xx = x  # Apparently this is the conflicting line

for k in range(10):
    m = torch.matmul( xx, theta )
    
    optimizer.zero_grad()
    m.sum().backward()
    optimizer.step()

This works fine. However, if I change the xx declaration to simply

xx = x/10

then I get the backpropagation error. Also setting “retain_graph=True” as suggested simply makes “x” not optimise at all, it behaves as if “requires_gradient” were set to “False”, which is not the behaviour I need.

I don’t understand what is going on.

ptrblck · February 27, 2019, 9:17pm

Part of your computation graph is outside the loop (xx = x/10).
After the first optimizer.step() call, the intermediate values will be cleared and thus x cannot be optimized anymore. If you move xx = x/10 into the loop, the code should work.

ender46 · February 28, 2019, 8:11am

I see, thanks a lot.

With that division I was trying to reduce the variance of the initial random sampling in X, so the logical way for me to do so was simply once outside the loop. Ideally I would first have declared XX with the reduced variance and then added it to the optimiser, but if done like that pytorch complains that XX is not a leaf variable; ie

x = torch.randn( (5, 4), requires_grad=True ) 
xx = x/10
optimizer = optim.Adam( [xx] )  # ValueError: can't optimize a non-leaf Tensor

Is there any other way to have a tensor in the optimiser that is the result of some previous computations?

Thanks!

ender46 · February 28, 2019, 8:42am

Nevermind, found the solution:

x = torch.randn( (5, 4) ) 
xx = x/10
xx.requires_grad=True
optimizer = optim.Adam( [xx] )  # Works as expected!