Why Tensor.clone() is called clone and not copy?

11158 · January 11, 2019, 1:12pm

In much older library numpy method that copy ndarray is called copy. Why in torch the same method is called clone? Are there any specific reasons?

albanD · January 11, 2019, 2:05pm

Hi,

I think this is mostly for historical reasons in particular copy (now copy_) was used a long time ago to copy into a tensor while clone is used to create an identical clone of a given Tensor.

Sybil · October 11, 2020, 3:49am

Hi, albanD.
Could you explain the difference between b = a.clone() and b.copy_(a)?

The docs said that

Unlike copy_(), clone() is recorded in the computation graph. Gradients propagating to the cloned tensor will propagate to the original tensor.

However, in the example below, the gradient was also backpropagated to the original tensor:

>>> x = torch.randn(2,2,requires_grad=True)
>>> x
tensor([[0.5113, 0.3028],
        [0.7036, 1.4417]], requires_grad=True)
>>> x.grad
>>> y = x*2+3
>>> y_copy = torch.zeros_like(y)
>>> y_copy.copy_(y)
tensor([[4.0227, 3.6057],
        [4.4072, 5.8835]], grad_fn=<CopyBackwards>)
>>> z = y_copy*3+3
>>> z
tensor([[15.0681, 13.8171],
        [16.2216, 20.6504]], grad_fn=<AddBackward0>)
>>> loss=torch.sum(z-15)
>>> loss.backward()
>>> x.grad
tensor([[6., 6.],
        [6., 6.]])

And the y_clone = y.clone() operation showed the same behavior. Could you explain the difference?

albanD · October 11, 2020, 4:59pm

Hi,

I think this is most likely misleading doc here The master doc has been updated and is clearer: https://pytorch.org/docs/master/generated/torch.clone.html?highlight=clone#torch.clone

The difference is that if you use copy_, the original value won’t get gradients. But for clone, there is no original value so not this issue.

y = torch.rand(10, requires_grad=True)

res = y.clone().copy_(x)
res.sum().backward()
assert (y.grad == 0).all()

Sybil · October 18, 2020, 8:30am

Hi, thank you for your reply, but, sorry, I still didn’t get it.
Please take a look at following examples:

>>> x = torch.randn(2,2,requires_grad=True)
>>> y = x.clone()
>>> res=y.sum()
>>> res.backward()
>>> y.grad
>>> x.grad
tensor([[1., 1.],
        [1., 1.]])

>>> x = torch.randn(2,2,requires_grad=True)
>>> y = torch.randn(2,2)
>>> y.copy_(x)
tensor([[ 0.4119, -0.7538],
        [-0.3020, -0.6225]], grad_fn=<CopyBackwards>)
>>> res = y.sum()
>>> res.backward()
>>> y.grad
>>> x.grad
tensor([[1., 1.],
        [1., 1.]])

As I see it, the two operations behave the same if used separately.

In your example, there were three tensors (x, y, and res) and you used .clone().copy_(x) together. To be honest, I got more confused. Could you explain the reason why did you use them together?

And what’s the difference between >>y.grad >>nothing printed out and >>y.grad >>a tensor of zeros printed out?

Thank you for your time!

albanD · October 19, 2020, 3:42pm

Hi,

You should ignore the note in the old doc as I think it is just confusing.
The two actually propagate gradients.

In my example, I use clone to avoid changing the original Tensor because the copy is done inplace.

A gradient can be None for few reasons. Either because the Tensor does not require gradients, is not a leaf Tensor or is independent of the output that you backwarded on.