Moving tensors between devices

Let’s say we have a simple model like this:

class Model(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(3,32, 3)
        self.param1 = nn.Parameter(torch.rand(3,3))

    def set_devices(self):
        self.conv1 = self.conv1.to('cuda:0')
        self.param1 = self.param1.to('cuda:1')
        return self

Then after the creation of the model I call model.set_devices(). Is PyTorch going to accumulate the gradients in the new moved parameters .grad? If so, why? .to will create a new tensor that is no longer a leaf tensor from what I understand, is that correct? Is the behaviour different for nn.Parameter and nn.Module? What about model = model.set_devices()? Does that yield different results?

Thanks in advance for your help

Yes

Yes, nn.Module does some peculiar things to manage nn.Parameters and their movement. You may check source code for details.

Nope, as model is never cloned in your code.

PS in general, self.param.data = self.param.data.to(device) does the trick

Hmm, but then why model.to() creates a copy? Where in the code does this happen? Looking at the nn.Module the _apply() method, should just move the internal parameters and modules in-place.

I see, it does do it in place. I guess it returns self just for convenience so that methods can be chained. But calling model.to() without model=model.to() still works