Let’s say we have a simple model like this:

```
class Model(nn.Module):
def __init__(self):
self.conv1 = nn.Conv2d(3,32, 3)
self.param1 = nn.Parameter(torch.rand(3,3))
def set_devices(self):
self.conv1 = self.conv1.to('cuda:0')
self.param1 = self.param1.to('cuda:1')
return self
```

Then after the creation of the model I call `model.set_devices()`

. Is PyTorch going to accumulate the gradients in the new moved parameters `.grad`

? If so, why? .to will create a new tensor that is no longer a leaf tensor from what I understand, is that correct? Is the behaviour different for nn.Parameter and nn.Module? What about `model = model.set_devices()`

? Does that yield different results?

Thanks in advance for your help