# Why Tensor.clone() is called clone and not copy?

In much older library numpy method that copy ndarray is called copy. Why in torch the same method is called clone? Are there any specific reasons?

Hi,

I think this is mostly for historical reasons in particular `copy` (now `copy_`) was used a long time ago to copy into a tensor while `clone` is used to create an identical clone of a given Tensor.

1 Like

Hi, albanD.
Could you explain the difference between `b = a.clone()` and `b.copy_(a)`?

The docs said that

Unlike copy_(), clone() is recorded in the computation graph. Gradients propagating to the cloned tensor will propagate to the original tensor.

However, in the example below, the gradient was also backpropagated to the original tensor:

``````>>> x = torch.randn(2,2,requires_grad=True)
>>> x
tensor([[0.5113, 0.3028],
>>> y = x*2+3
>>> y_copy = torch.zeros_like(y)
>>> y_copy.copy_(y)
tensor([[4.0227, 3.6057],
>>> z = y_copy*3+3
>>> z
tensor([[15.0681, 13.8171],
>>> loss=torch.sum(z-15)
>>> loss.backward()
tensor([[6., 6.],
[6., 6.]])
``````

And the `y_clone = y.clone()` operation showed the same behavior. Could you explain the difference?

Hi,

I think this is most likely misleading doc here The master doc has been updated and is clearer: https://pytorch.org/docs/master/generated/torch.clone.html?highlight=clone#torch.clone

The difference is that if you use copy_, the original value won’t get gradients. But for clone, there is no original value so not this issue.

``````y = torch.rand(10, requires_grad=True)

res = y.clone().copy_(x)
res.sum().backward()
``````

Hi, thank you for your reply, but, sorry, I still didn’t get it.
Please take a look at following examples:

``````>>> x = torch.randn(2,2,requires_grad=True)
>>> y = x.clone()
>>> res=y.sum()
>>> res.backward()
tensor([[1., 1.],
[1., 1.]])
``````
``````>>> x = torch.randn(2,2,requires_grad=True)
>>> y = torch.randn(2,2)
>>> y.copy_(x)
tensor([[ 0.4119, -0.7538],
>>> res = y.sum()
>>> res.backward()
tensor([[1., 1.],
[1., 1.]])
``````

As I see it, the two operations behave the same if used separately.

In your example, there were three tensors (x, y, and res) and you used `.clone().copy_(x)` together. To be honest, I got more confused. Could you explain the reason why did you use them together?

And what’s the difference between `>>y.grad >>nothing printed out` and `>>y.grad >>a tensor of zeros printed out`?