Does .to(device) detach the tensor

I’m still not 100% sure on this, and would like to use following snippet as example:

import torch
a = torch.randn(4,5)
a.requried_grad = True (just to make it more complicated)
b = a.to(“cuda:0”)

Is b detached from a assuming they’re not sharing the same memory in nature (one on CPU and another on GPU)? My impression is yes.
If that’s the case, does it save to use following code style:
result1 = net1(a.to(“cuda:0”))
result2 = net2(a.to(“cuda:1”))
and the backprop won’t mess up because two network inputs are both detached from a?

Also, any API that I could use to check whether two tensors are *tached*?
Thank you,

1 Like

Hi,

The Tensor.to() op is not detaching and gradients will flow back to a.

When you do

result1 = net1(a.to(“cuda:0”))
result2 = net2(a.to(“cuda:1”))

Then all the gradients will flow back all the way to a properly.

Also, any API that I could use to check whether two tensors are tached?

I’m afraid there is not.
You can check by side effect by doing: autograd.grad(out, inp). This will try to compute the gradient and raise an error if inp is not attached to out.

3 Likes

Thanks a lot for this clarification!