Hi, albanD.
Could you explain the difference between b = a.clone() and b.copy_(a)?
The docs said that
Unlike copy_(), clone() is recorded in the computation graph. Gradients propagating to the cloned tensor will propagate to the original tensor.
However, in the example below, the gradient was also backpropagated to the original tensor:
>>> x = torch.randn(2,2,requires_grad=True)
>>> x
tensor([[0.5113, 0.3028],
[0.7036, 1.4417]], requires_grad=True)
>>> x.grad
>>> y = x*2+3
>>> y_copy = torch.zeros_like(y)
>>> y_copy.copy_(y)
tensor([[4.0227, 3.6057],
[4.4072, 5.8835]], grad_fn=<CopyBackwards>)
>>> z = y_copy*3+3
>>> z
tensor([[15.0681, 13.8171],
[16.2216, 20.6504]], grad_fn=<AddBackward0>)
>>> loss=torch.sum(z-15)
>>> loss.backward()
>>> x.grad
tensor([[6., 6.],
[6., 6.]])
And the y_clone = y.clone() operation showed the same behavior. Could you explain the difference?