Copying tensor and gradient computation problem

Ludovic_Denoyer · March 18, 2021, 4:22pm

Hi, when executing the following code, I have the error “Trying to backward through the graph a second time”. But when I remove the copy (and compute backward directly on y.mean() ) then it works. Is someone able to explain why the use of copy_ creates such an error ?

x=torch.randn(5,3)
m=nn.Linear(3,2)
store=torch.zeros(5,2)

y=m(x)
store.copy_(y)
store.mean().backward()

y=m(x)
store.copy_(y)
store.mean().backward() ==> ERROR

albanD · March 18, 2021, 9:17pm

Hi,

The thing is that copy_() is modifying store inplace.
So the store used in the first part is actually the same as the one used in the second evaluation. And so running backward on the second one also tries to backward through the first one

You can see that by running in colab:

# !pip install torchviz

import torch
from torch import nn
from torchviz import make_dot

x=torch.randn(5,3)
m=nn.Linear(3,2)
store=torch.zeros(5,2)

y=m(x)
store.copy_(y)
res = store.mean()
print("First backward...")
res.backward()
make_dot(res)

y=m(x)
store.copy_(y)
res = store.mean()
print("Before second backward...")
make_dot(res)

You can see that before the second backward, the graph also contains the one from the first backward.

Ludovic_Denoyer · March 19, 2021, 8:57am

Hi, thanks a lot. But do you understand why it behaves like this ? Since I am replacing the values in the store variable at the second call, the ‘logical’ stuff would be to have the first branch of the autograd graph removed, am I wrong ?

If no, I have to call a detach_() by myself, right ?

albanD · March 22, 2021, 6:41pm

You can indeed do store = store.detach() to avoid this.

the ‘logical’ stuff would be to have the first branch of the autograd graph removed, am I wrong ?

While this is true if you change the whole Tensor, it is not if you replace only a subset of it in the case where store is a view of a bigger Tensor. So it is not that simple I’m afraid.