Iâve never heard of any application of gradient w.r.t. label. But its meaning is clear: the most dissimilar label direction of the example, learned in current model. Any reference about this will be appreciated.
Slicing is also a operation that creats a new node, and a[0] create a new node. You can print its grad_fn, that will be like <SelectBackward at 0xffffffff>.
Because labels[0] creates a brand new Tensor that happens to be a leaf. That being said, this newly created Tensor has not been used in any computation yet (and will never be as you did not saved it, it got destroyed just after the print). The second print creates another such new leaf Tensor and this one is brand new as well and so itâs .grad field is None as any newly created Tensor.
They both create a new one.
As @Weifeng mentionned just above, the difference comes from the fact that one requires gradient (and so after doing an op on it, itâs not a leaf anymore) and one does not require gradients (and so even after doing an op on it, itâs still a leaf).
I really donât get this retain_graph parameter⌠this tutorial doesnât use it; also, loss.backward() is used inside the mini-batch loop, so it ânormalâ behavior, as previously pointed out, is (or should be) to âretain_graphâ!!
the tutorial I linked to does not use retain_graph=True inside loss.backward()⌠loss.backward() is used repeatedly inside for loop, so the graph is âretainedâ without needing retain_graph=True⌠it appears that there are situations when this parameter is needed and Iâm not quite sure what these situations are.
It is not specified because every iteration of the loop recreates the graph and backward through it only once. This is what should happen when you use pytorch (unless doing more complex things that might require to backward through a graph twice).
my loop (mini-batch loop) reads previously computed representations⌠this is pretty much standard when doing metric learning approaches. The representations are computed before entering the mini-batch loop, which only operates on representation distances⌠in fact, even the distances are computed outside the mini-batch loop. The representations have require_grad=True, so I just assumed that this will be sufficient⌠I understand what youâre saying though; I would really appreciate a link to more info about ârecreating the graphâ in every iteration of the for loop⌠I did not realize that; I think I missed some required reading or tutorial on that.
The thing is that as soon as your model parameters change, you need to recompute the forward pass.
This is a big difference with static graph frameworks like Tensorflow where you define the graph once and then ask for gradients.
In pytorch, you call your forward every time you have a new input and then backward on your newly computed loss to get gradients.
By representations, did you mean the process of embedding? It seems that every mini-batch takes part of these embeddings and compute a batch loss? Itâs ok to create and free graph in every mini-batch in this case because input between iteration is different.
Some code or pseudo-code is helpful to make the situation more clear.
I have read the example code from docs and have a comfusion on is_leaf and requires_grad attr.
If user create a tensor whatever its requires_grad is the is_leaf = True without anyother operation on it(correct me if im wrong), but what confused me is:
So itâs just a convention.
Itâs not documented why. As I understand it, computation graph is meaningful only for nodes with requires_grad and their neighbors, which are involved in computing the gradients. So taking into account only the involved node, all tensor with requires_grad=False can be considered as leaf node.
Node a* with requires_grad=True
Itâs just for understanding, maybe not corresponding exactly to the term graph in pytorch.
The labelsâ requires_grad are False so they are leaf node. The imagesâ require_grad are False, but their grad_fn are none, so they are not the result of some operation , then they are leaf node.
1. But why labels[0].grad_fn are None meanwhile that of images are selection?
As you mentioned
Because labels[0] creates a brand new Tensor that happens to be a leaf.
The labels[0] is created by us, but why images[0] is the result of the operation of selection?
labels[0] is also the result of operation selection. Itâs different than images[0] in that labels[0].requires_grad == False, transducted from labels, so itâs a âleafâ.
Maybe you can get more thorough introduction from this article.