Exclude a variable from a change through a computation

JanoschMenke · September 9, 2019, 1:08pm

Hi this is a beginner question and probably really quickly answered. But I could nto find it online, or probably to know the proper vocabulary to find what I am looking for.

I am trying to save the gradients into the variable “grads”, as in the code below.

grads=[0,0]
with torch.no_grad():
        for idx,p in enumerate(model.parameters()):
           
            #save current gradients
            grads[idx]=p.grad
            p.sub_(lr * p.grad)
            p.grad.zero_()

However, p.grad.zero_ always not only sets the p.grad but also grads to zero.

I understand that it has todo with the way these two variables are connected through a graph. But how do I stop p.grad.zero_ form setting also grads to zero.

I tried using detach, but that did not work.

grads=[0,0]
with torch.no_grad():
        for idx,p in enumerate(model.parameters()):
           
            #save current gradients
            grads[idx]=p.grad.detach()
            p.sub_(lr * p.grad)
            p.grad.zero_()

JuanFMontesinos · September 9, 2019, 1:31pm

Both variables are pointing to the same memory address, you have to do a deep copy of p.grad in order not to be modified. You probably can do that by using the clone() tensor method.

JanoschMenke · September 9, 2019, 2:20pm

Hi, thanks for the quick reply. It worked! yey.

However, I dont understand why.

In this post
the difference between clone and detach is explained as follows:

You should use detach() when attempting to remove a tensor from a computation graph, and clone as a way to copy the tensor while still keeping the copy as a part of the computation graph it came from.

Maybe I am confusing the computatoinal graph and the memory adress. In my case I would like to remoe the grads object form my graph or not?

JuanFMontesinos · September 9, 2019, 4:42pm

Hi,
As far as I understand you want to save gradients.
That’s not really related to computational graphs or pytorch stuff rather than python’s.

When I told you to use clone on grads because gradients are tensors, thus, you can use tensor’s methods. The reason why is clone does a kind of deep copy of the tensor (You can google python’s deepcopy and shallow copy).

In short, when you append a tensor into a list, that element of the list keeps pointing to the original tensor, thus, if the original tensor is modified, the one in the list is modified too.
(Example here copied from https://stackoverflow.com/questions/17873384/how-to-deep-copy-a-list)

>>> a = [[1, 2, 3], [4, 5, 6]]
>>> b = list(a)
>>> a
[[1, 2, 3], [4, 5, 6]]
>>> b
[[1, 2, 3], [4, 5, 6]]
>>> a[0][1] = 10
>>> a
[[1, 10, 3], [4, 5, 6]]
>>> b   # b changes too -> Not a deepcopy.
[[1, 10, 3], [4, 5, 6]]

About the computational graph it’s a really long story. In short, when you detach a tensor you are breaking the computational graph, whereas when you clone a tensor you are creating a “new branch” in the graph.

Feel free to keep asking

JanoschMenke · September 10, 2019, 10:37am

Oh okay thanks for the explanation. I just recently switched to python and did not know about that functionality of python and thought it was something unquie to the comp. graph of pytorch.

I will read into it and might get back if more questions arise.