Copying tensor and gradient computation problem

Hi, when executing the following code, I have the error “Trying to backward through the graph a second time”. But when I remove the copy (and compute backward directly on y.mean() ) then it works. Is someone able to explain why the use of copy_ creates such an error ?

x=torch.randn(5,3)
m=nn.Linear(3,2)
store=torch.zeros(5,2)

y=m(x)
store.copy_(y)
store.mean().backward()

y=m(x)
store.copy_(y)
store.mean().backward() ==> ERROR

Hi,

The thing is that copy_() is modifying store inplace.
So the store used in the first part is actually the same as the one used in the second evaluation. And so running backward on the second one also tries to backward through the first one

You can see that by running in colab:

# !pip install torchviz

import torch
from torch import nn
from torchviz import make_dot

x=torch.randn(5,3)
m=nn.Linear(3,2)
store=torch.zeros(5,2)

y=m(x)
store.copy_(y)
res = store.mean()
print("First backward...")
res.backward()
make_dot(res)

y=m(x)
store.copy_(y)
res = store.mean()
print("Before second backward...")
make_dot(res)

You can see that before the second backward, the graph also contains the one from the first backward.

Hi, thanks a lot. But do you understand why it behaves like this ? Since I am replacing the values in the store variable at the second call, the ‘logical’ stuff would be to have the first branch of the autograd graph removed, am I wrong ?

If no, I have to call a detach_() by myself, right ?

You can indeed do store = store.detach() to avoid this.

the ‘logical’ stuff would be to have the first branch of the autograd graph removed, am I wrong ?

While this is true if you change the whole Tensor, it is not if you replace only a subset of it in the case where store is a view of a bigger Tensor. So it is not that simple I’m afraid.