Why is the clone operation part of the computation graph? Is it even differentiable?

I saw that .clone() was part of the computation graph and thought it was really weird. Why is that?

I wanted to know

  1. what it even meant to take derivatives with respect to such an operation (I thought it shouldn’t be defined)
  2. Why is it part of the computation graph if one can’t take derivatives wrt to clone

Look its in the graph?!

#%%

## Q: 

import torch
from torchviz import make_dot

x = torch.randn(1, requires_grad=True)

temp = x**2
# y = temp.clone().detach()
y = temp.clone()
print(f'y.is_leaf= {y.is_leaf}')
print(f'y.requires_grad = {y.requires_grad}')
temp = temp+y

print(temp)

temp.backward()

print(f'x.grad = {x.grad}')
print(f'y.grad = {y.grad}')

make_dot(temp)
# make_dot(temp)

Hi,

The clone function is like an identity. f(x) = x, So you can take its derivative.
You were expecting something else?

2 Likes

I guess because we are creating an actual tensor (i.e. we are using a clone operation) I expected the new vector to be a leaf (or something “new”). Thus it seemed odd to me that it would be considered the identity function, because those two operation don’t seem the same to me…but I guess clone is just the identity? Whats the point of clone then?

Clone is an identity with new memory. The same way you could use new = t.view_as(t) would be an identity with the exact same memory.
The main use of clone is to be able to do inplace operations on the result without impacting the original Tensor.

Note that in pytorch, all functions will be differentiated and give you gradients. The only exception is .detach() that is defined as setting all gradients to 0 and the ops inside a torch.no_grad() block that are not tracked.

So is .detach() the correct way to create new “leaf nodes”?

Btw, thank for the help! :slight_smile:

Yes. If you want to have a new Tensor that has no gradient history, you should use detach().
Note that the result of detach() uses the same memory space as the original Tensor. So if you plan on modifying it inplace, you want to do .clone().detach().

1 Like

I plan to create a completely separate computation graph (and don’t want the call of backwards to interfere with each other). For that I am running requires_flag=True immediately after .detach(). Is that the right thing to do?

e.g.

        ##
        wt_new = wt.clone().detach()
        wt_new.requires_grad = True
        l = (wt_new*x - y)**2
        print(f'l = {l}')
        l.backward()

If you want a different graph, you can use .detach().
In your example, since you don’t modify wt_new inplace, you don’t need the .clone().

However, the original wt is going to collect gradients with respect to the original graph. So if I call .detach() wouldn’t it collect the gradients for both graphs in the same tensor? That’s definitively not what I want. I want separate gradients (or at least that’s my rationale for calling .clone() first and then .detach()). What are your thoughts master albanD?

In pytorch, it is different to have different Tensor and have different memory.
When you do b = a.detach(), a and b are two completely different Tensors that look at the same memory.
Just like b = a.view(-1) are two different Tensors that look at the same memory.

The inplace version (that would modify the Tensor inplace) is a.detach_() or b = a.detach_(). If you do this, then a and b actually point to the exact same python object (check that id(a) == id(b)) and same Tensor.

1 Like

That was really helpful. Thank you.

I guess to answer my own question, I do not need to call clone it seems. The new tensor tracks its own memory space for its gradients automagically (without interference, so it doesn’t collect the same gradients for both graphs in the same place):

import torch

a = torch.tensor([2.0], requires_grad=True)
b = a.detach()
b.requires_grad = True

la = (5.0 - a)**2
la.backward()
print(f'a.grad = {a.grad}')

lb = (6.0 - b)**2
lb.backward()
print(f'b.grad = {b.grad}')

result:

a.grad = tensor([-6.])
b.grad = tensor([-8.])

of course you guys would have thought of a good implementation of this! Not surprised! :slight_smile:

2 Likes

so is there ever a case one needs .clone()?

I can’t think of one except making 2 variables pointing to different memories and then doing inplace ops separately for each one or something like that…which seems dangerous…

There are a few.
For example if you want to save the current state of the weights of your net, you want to clone because the optimizer update works inplace and so your save will change with your network if you don’t clone.

1 Like

My earlier post checked that two tensors really are detached by comparing gradients. Is there an internal flag to check something like this? A nicer way to check that b is detached?

detached from what?
If you just created b, you can check by checking if it requires gradients or not.

that’s not enough because I am setting the requires gradients true myself later. See sample script (where in it I check they are actually detached and form a separate graph by computing gradients I manually know how to check, but thats harder to check in a complicated net however):

import torch

a = torch.tensor([2.0], requires_grad=True)
b = a.detach()
b.requires_grad = True

la = (5.0 - a)**2
la.backward()
print(f'a.grad = {a.grad}')

lb = (6.0 - b)**2
lb.backward()
print(f'b.grad = {b.grad}')

result:

a.grad = tensor([-6.])
b.grad = tensor([-8.])