RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time

I am really everythere hahaha

But it is really hard to understand the graph

With the code

dataiter = iter(test_loader)
images, labels = dataiter.next()
print(images[0].requires_grad)
print(labels.requires_grad)
print(images[0].is_leaf)
print(labels[0].is_leaf)

We get the output

False
False
True
True

As we know, we will never do d(loss)/d(image) and d(loss)/d(label). So we only retain the grad of the leaf node with requires_grad =True?

yes

no
Gradient w.r.t. data is useful in generating adversarial examples, for attack/defense or domain generalization.
https://arxiv.org/abs/1412.6572
https://arxiv.org/abs/1804.10745

I’ve never heard of any application of gradient w.r.t. label. But its meaning is clear: the most dissimilar label direction of the example, learned in current model. Any reference about this will be appreciated.

2 Likes

With the code

dataiter = iter(test_loader)
images, labels = dataiter.next()
images.requires_grad_(True)

output1 = Mnist_Classifier(images)
loss = loss_fn(output1, labels)
loss.backward()


print(images.requires_grad)
print(labels.requires_grad)
print(images.is_leaf) #1
print(labels[0].is_leaf)

The output is

True
False
True
True

And this is what i expect.
But if i change postion 1 to be print(images[0].is_leaf), the correspond output become False, Why?

Slicing is also a operation that creats a new node, and a[0] create a new node. You can print its grad_fn, that will be like <SelectBackward at 0xffffffff>.

But why

print(labels[0].is_leaf) 
print(labels[0].grad_fn)

get

True
None

The results of labels and iamges seem different.
Is the different dimensions of images and labels make they act differently?

Hi,

Because labels[0] creates a brand new Tensor that happens to be a leaf. That being said, this newly created Tensor has not been used in any computation yet (and will never be as you did not saved it, it got destroyed just after the print). The second print creates another such new leaf Tensor and this one is brand new as well and so it’s .grad field is None as any newly created Tensor.

1 Like

For a complement,

All Tensors that have requires_grad which is False will be leaf Tensors by convention.

https://pytorch.org/docs/stable/autograd.html#torch.Tensor.is_leaf

Thanks 4 your explaination
but why images and labels act differently? Why labels[0] create a new one but not do selection?

They both create a new one.
As @Weifeng mentionned just above, the difference comes from the fact that one requires gradient (and so after doing an op on it, it’s not a leaf anymore) and one does not require gradients (and so even after doing an op on it, it’s still a leaf).

I really don’t get this retain_graph parameter… this tutorial doesn’t use it; also, loss.backward() is used inside the mini-batch loop, so it “normal” behavior, as previously pointed out, is (or should be) to “retain_graph”!!

Hi,

I’m not sure to understand, what is your question?

the tutorial I linked to does not use retain_graph=True inside loss.backward()… loss.backward() is used repeatedly inside for loop, so the graph is “retained” without needing retain_graph=True… it appears that there are situations when this parameter is needed and I’m not quite sure what these situations are.

It is not specified because every iteration of the loop recreates the graph and backward through it only once. This is what should happen when you use pytorch (unless doing more complex things that might require to backward through a graph twice).

1 Like

my loop (mini-batch loop) reads previously computed representations… this is pretty much standard when doing metric learning approaches. The representations are computed before entering the mini-batch loop, which only operates on representation distances… in fact, even the distances are computed outside the mini-batch loop. The representations have require_grad=True, so I just assumed that this will be sufficient… I understand what you’re saying though; I would really appreciate a link to more info about “recreating the graph” in every iteration of the for loop… I did not realize that; I think I missed some required reading or tutorial on that.

The thing is that as soon as your model parameters change, you need to recompute the forward pass.
This is a big difference with static graph frameworks like Tensorflow where you define the graph once and then ask for gradients.
In pytorch, you call your forward every time you have a new input and then backward on your newly computed loss to get gradients.

2 Likes

By representations, did you mean the process of embedding? It seems that every mini-batch takes part of these embeddings and compute a batch loss? It’s ok to create and free graph in every mini-batch in this case because input between iteration is different.

Some code or pseudo-code is helpful to make the situation more clear.

I have read the example code from docs and have a comfusion on is_leaf and requires_grad attr.
If user create a tensor whatever its requires_grad is the is_leaf = True without anyother operation on it(correct me if im wrong), but what confused me is:

a = torch.rand(10, requires_grad=True) + 2
a.requires_grad = True
a.is_leaf = False

but

a = torch.rand(10) + 2
a.requires_grad = False
a.is_leaf = True

why dose a tensor have requires_grad=False, and its is_leaf is still True after some operation on it which is different from tensor.requires_grad=True

Thanks in advance

So it’s just a convention.
It’s not documented why. As I understand it, computation graph is meaningful only for nodes with requires_grad and their neighbors, which are involved in computing the gradients. So taking into account only the involved node, all tensor with requires_grad=False can be considered as leaf node.

graph
Node a* with requires_grad=True
It’s just for understanding, maybe not corresponding exactly to the term graph in pytorch.

To have better understanding, i do some experiments:

dataiter = iter(test_loader)
images, labels = dataiter.next()
images_ = images

images.requires_grad_(True)

print(images.requires_grad)
print(labels.requires_grad)
print(images.is_leaf)
print(labels.is_leaf)
print(images.grad_fn)
print(labels.grad_fn)
print(images[0].is_leaf)
print(labels[0].is_leaf)
print(images[0].grad_fn)
print(labels[0].grad_fn)

The output are:

True
False
True
True
None
None
False
True
<SelectBackward object at 0x11b96fc50>
None

The labels’ requires_grad are False so they are leaf node. The images’ require_grad are False, but their grad_fn are none, so they are not the result of some operation , then they are leaf node.

1. But why labels[0].grad_fn are None meanwhile that of images are selection?

As you mentioned

Because labels[0] creates a brand new Tensor that happens to be a leaf.

The labels[0] is created by us, but why images[0] is the result of the operation of selection?

labels[0] is also the result of operation selection. It’s different than images[0] in that labels[0].requires_grad == False, transducted from labels, so it’s a ‘leaf’.

Maybe you can get more thorough introduction from this article.