RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time

ForeverZH0204 · February 26, 2019, 3:37am

I am really everythere hahaha

But it is really hard to understand the graph

With the code

dataiter = iter(test_loader)
images, labels = dataiter.next()
print(images[0].requires_grad)
print(labels.requires_grad)
print(images[0].is_leaf)
print(labels[0].is_leaf)

We get the output

False
False
True
True

As we know, we will never do d(loss)/d(image) and d(loss)/d(label). So we only retain the grad of the leaf node with requires_grad =True?

Weifeng · February 26, 2019, 6:51am

yes

no
Gradient w.r.t. data is useful in generating adversarial examples, for attack/defense or domain generalization.
https://arxiv.org/abs/1412.6572
https://arxiv.org/abs/1804.10745

I’ve never heard of any application of gradient w.r.t. label. But its meaning is clear: the most dissimilar label direction of the example, learned in current model. Any reference about this will be appreciated.

ForeverZH0204 · February 26, 2019, 12:13pm

With the code

dataiter = iter(test_loader)
images, labels = dataiter.next()
images.requires_grad_(True)

output1 = Mnist_Classifier(images)
loss = loss_fn(output1, labels)
loss.backward()


print(images.requires_grad)
print(labels.requires_grad)
print(images.is_leaf) #1
print(labels[0].is_leaf)

The output is

True
False
True
True

And this is what i expect.
But if i change postion 1 to be print(images[0].is_leaf), the correspond output become False, Why?

Weifeng · February 26, 2019, 1:21pm

Slicing is also a operation that creats a new node, and a[0] create a new node. You can print its grad_fn, that will be like <SelectBackward at 0xffffffff>.

ForeverZH0204 · February 26, 2019, 1:39pm

But why

print(labels[0].is_leaf) 
print(labels[0].grad_fn)

get

True
None

The results of labels and iamges seem different.
Is the different dimensions of images and labels make they act differently?

albanD · February 26, 2019, 1:46pm

Hi,

Because labels[0] creates a brand new Tensor that happens to be a leaf. That being said, this newly created Tensor has not been used in any computation yet (and will never be as you did not saved it, it got destroyed just after the print). The second print creates another such new leaf Tensor and this one is brand new as well and so it’s .grad field is None as any newly created Tensor.

Weifeng · February 26, 2019, 1:58pm

For a complement,

All Tensors that have requires_grad which is False will be leaf Tensors by convention.

https://pytorch.org/docs/stable/autograd.html#torch.Tensor.is_leaf

ForeverZH0204 · February 26, 2019, 2:35pm

Thanks 4 your explaination
but why images and labels act differently? Why labels[0] create a new one but not do selection?

albanD · February 26, 2019, 2:38pm

They both create a new one.
As @Weifeng mentionned just above, the difference comes from the fact that one requires gradient (and so after doing an op on it, it’s not a leaf anymore) and one does not require gradients (and so even after doing an op on it, it’s still a leaf).

quazi · February 26, 2019, 3:11pm

I really don’t get this retain_graph parameter… this tutorial doesn’t use it; also, loss.backward() is used inside the mini-batch loop, so it “normal” behavior, as previously pointed out, is (or should be) to “retain_graph”!!

albanD · February 26, 2019, 4:16pm

Hi,

I’m not sure to understand, what is your question?

quazi · February 26, 2019, 4:18pm

the tutorial I linked to does not use retain_graph=True inside loss.backward()… loss.backward() is used repeatedly inside for loop, so the graph is “retained” without needing retain_graph=True… it appears that there are situations when this parameter is needed and I’m not quite sure what these situations are.

albanD · February 26, 2019, 4:22pm

It is not specified because every iteration of the loop recreates the graph and backward through it only once. This is what should happen when you use pytorch (unless doing more complex things that might require to backward through a graph twice).

quazi · February 26, 2019, 4:28pm

my loop (mini-batch loop) reads previously computed representations… this is pretty much standard when doing metric learning approaches. The representations are computed before entering the mini-batch loop, which only operates on representation distances… in fact, even the distances are computed outside the mini-batch loop. The representations have require_grad=True, so I just assumed that this will be sufficient… I understand what you’re saying though; I would really appreciate a link to more info about “recreating the graph” in every iteration of the for loop… I did not realize that; I think I missed some required reading or tutorial on that.

albanD · February 26, 2019, 4:35pm

The thing is that as soon as your model parameters change, you need to recompute the forward pass.
This is a big difference with static graph frameworks like Tensorflow where you define the graph once and then ask for gradients.
In pytorch, you call your forward every time you have a new input and then backward on your newly computed loss to get gradients.

Weifeng · February 27, 2019, 2:30am

By representations, did you mean the process of embedding? It seems that every mini-batch takes part of these embeddings and compute a batch loss? It’s ok to create and free graph in every mini-batch in this case because input between iteration is different.

Some code or pseudo-code is helpful to make the situation more clear.

MariosOreo · February 27, 2019, 7:47am

I have read the example code from docs and have a comfusion on is_leaf and requires_grad attr.
If user create a tensor whatever its requires_grad is the is_leaf = True without anyother operation on it(correct me if im wrong), but what confused me is:

a = torch.rand(10, requires_grad=True) + 2
a.requires_grad = True
a.is_leaf = False

but

a = torch.rand(10) + 2
a.requires_grad = False
a.is_leaf = True

why dose a tensor have requires_grad=False, and its is_leaf is still True after some operation on it which is different from tensor.requires_grad=True

Thanks in advance

Weifeng · February 27, 2019, 9:07am

So it’s just a convention.
It’s not documented why. As I understand it, computation graph is meaningful only for nodes with requires_grad and their neighbors, which are involved in computing the gradients. So taking into account only the involved node, all tensor with requires_grad=False can be considered as leaf node.

graph
Node a* with requires_grad=True
It’s just for understanding, maybe not corresponding exactly to the term graph in pytorch.

ForeverZH0204 · February 27, 2019, 9:32am

To have better understanding, i do some experiments:

dataiter = iter(test_loader)
images, labels = dataiter.next()
images_ = images

images.requires_grad_(True)

print(images.requires_grad)
print(labels.requires_grad)
print(images.is_leaf)
print(labels.is_leaf)
print(images.grad_fn)
print(labels.grad_fn)
print(images[0].is_leaf)
print(labels[0].is_leaf)
print(images[0].grad_fn)
print(labels[0].grad_fn)

The output are:

True
False
True
True
None
None
False
True
<SelectBackward object at 0x11b96fc50>
None

The labels’ requires_grad are False so they are leaf node. The images’ require_grad are False, but their grad_fn are none, so they are not the result of some operation , then they are leaf node.

1. But why labels[0].grad_fn are None meanwhile that of images are selection?

As you mentioned

Because labels[0] creates a brand new Tensor that happens to be a leaf.

The labels[0] is created by us, but why images[0] is the result of the operation of selection?

Weifeng · February 27, 2019, 1:48pm

labels[0] is also the result of operation selection. It’s different than images[0] in that labels[0].requires_grad == False, transducted from labels, so it’s a ‘leaf’.

Maybe you can get more thorough introduction from this article.