How do you properly clone a tensor without affecting training?

I am trying to create a custom loss function. I have some tensor x and I need to make a duplicate so I can manipulate the values without affecting the original tensor and whatever computation that goes on in the background. When I am done manipulating the copy, I perform log_softmax(x_copy), use gather(...) to select one element in each row that are relevant for my loss, then compute the loss by returning the mean() of those “gathered” elements. Right now, my model is not performing as well as I would expect, so I wanted to make sure that I am duplicating x properly. Here are some methods I have tried:

x_copy  = torch.tensor(x)
x_copy  = torch.empty_like(x).copy_(x)
x_copy = x.clone().detach() # This seems to be preferred?

I was told by someone that clone() will destroy the gradients of x (?), so I was recommended that I create a new tensor with the same dimensions as x, then copy the values over to keep the two tensors entirely separate. Is this recommendation correct and am I duplicating the tensor properly?

It is not true. Check this example.

a = torch.tensor(1., requires_grad=True)
b = 2*a
b.backward()

a.grad
# 2.

a_ = a.clone()

a.grad
# 2. 

Also, I think you need to do a backward pass through the tensor that you cloned. So you should not use .detach() as that sets the requires_grad=False of the returned tensor.

1 Like

copy_ still does backprop from x_copy to x (as you can see with x_copy.grad_fn being CopyBackwards)

detach() is opposite, and probably does what you want (x - trainable graph branch, x_copy - auxiliary non-trainable w.r.t. x)

clone() in addition to detach() is only nessesary if detached branch does inplace operations on x_copy, otherwise any operation, e.g. x.detach() * z allocates new memory and initial cloning has no effect