What i want to do is to save the features of all data in the self.V as buffer to avoid computing repeatly!
In the “for loop”,I want to use the new features to update the Buffer self.V ,but the error occur!!!
How come the following does NOT produce an error:
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
y.backward(torch.ones(2, 2)) # Note I do not set retain_graph=True
y.backward(torch.ones(2, 2))
?
This is an edge case, since the only op you do is an add and the add does not need any buffer, then there is no buffer that are missing when you do the second backward
Hello!
I found sth really interesting
This is my code
![image|683x500](upload://2wrc9BPeYFQiY4rzm21D7hgOsQv.jpeg)
Why the b.grad is None?
And with reference to the chained rule, will a.grad to be 0?
Hello!
I found sth really interesting
This is my code
Why the b.grad is None?
And with reference to the chained rule, will a.grad to be 0?
Thanks for your reply
But as we optmize the params, we need the d(loss)/d(w) to change the parameters, but if we only retain the inputs’ grad and drop the internediate result(actually some of them are the params.grad), how can we optimze the model(change the params)?
As @smth mentioned in the link, ''By default, gradients are only retained for leaf variables. non-leaf variables’ gradients are not retained to be inspected later. This was done by design, to save memory." I think weights and bias in a network should be leaf variables and their grads are retained. (Correct me if I’m wrong) In your example, you may call b.is_leaf to see it’s False and a.is_leaf is True.
Exactly what you say!
Thanks!
This is to say the params are also the inputs of the model as they store in the leaf node?But we get them by initialization not like the data from dataset. Am i right?
I came across the same problem of fetching gradient of non-leaf node last week.
Pytorch does not keep gradient for non-leaf node unless you call retain_grad()
explicit for some tensor.
Notice that a = a.to(device)
creates a new node and makes a
non-leaf.
However, model = model.to(device)
is usually safe. Parameters usually resides in Module
object. Calling to()
on Module
is taken care to keep parameters leaf-node still by operating on param.data
.
Btw, u r everywhere in the forum, maimeng is shameful😏
I am really everythere hahaha
But it is really hard to understand the graph
With the code
dataiter = iter(test_loader)
images, labels = dataiter.next()
print(images[0].requires_grad)
print(labels.requires_grad)
print(images[0].is_leaf)
print(labels[0].is_leaf)
We get the output
False
False
True
True
As we know, we will never do d(loss)/d(image) and d(loss)/d(label). So we only retain the grad of the leaf node with requires_grad =True?
yes
no
Gradient w.r.t. data
is useful in generating adversarial examples, for attack/defense or domain generalization.
https://arxiv.org/abs/1412.6572
https://arxiv.org/abs/1804.10745
I’ve never heard of any application of gradient w.r.t. label
. But its meaning is clear: the most dissimilar label direction of the example, learned in current model. Any reference about this will be appreciated.
With the code
dataiter = iter(test_loader)
images, labels = dataiter.next()
images.requires_grad_(True)
output1 = Mnist_Classifier(images)
loss = loss_fn(output1, labels)
loss.backward()
print(images.requires_grad)
print(labels.requires_grad)
print(images.is_leaf) #1
print(labels[0].is_leaf)
The output is
True
False
True
True
And this is what i expect.
But if i change postion 1 to be print(images[0].is_leaf)
, the correspond output become False, Why?
Slicing is also a operation that creats a new node, and a[0]
create a new node. You can print its grad_fn
, that will be like <SelectBackward at 0xffffffff>
.
But why
print(labels[0].is_leaf)
print(labels[0].grad_fn)
get
True
None
The results of labels and iamges seem different.
Is the different dimensions of images and labels make they act differently?
Hi,
Because labels[0]
creates a brand new Tensor that happens to be a leaf. That being said, this newly created Tensor has not been used in any computation yet (and will never be as you did not saved it, it got destroyed just after the print). The second print creates another such new leaf Tensor and this one is brand new as well and so it’s .grad
field is None as any newly created Tensor.
For a complement,
All Tensors that have
requires_grad
which isFalse
will be leaf Tensors by convention.
https://pytorch.org/docs/stable/autograd.html#torch.Tensor.is_leaf
Thanks 4 your explaination
but why images and labels act differently? Why labels[0] create a new one but not do selection?
They both create a new one.
As @Weifeng mentionned just above, the difference comes from the fact that one requires gradient (and so after doing an op on it, it’s not a leaf anymore) and one does not require gradients (and so even after doing an op on it, it’s still a leaf).
I really don’t get this retain_graph parameter… this tutorial doesn’t use it; also, loss.backward() is used inside the mini-batch loop, so it “normal” behavior, as previously pointed out, is (or should be) to “retain_graph”!!