Let’s say we have a simple model like this:
class Model(nn.Module):
def __init__(self):
self.conv1 = nn.Conv2d(3,32, 3)
self.param1 = nn.Parameter(torch.rand(3,3))
def set_devices(self):
self.conv1 = self.conv1.to('cuda:0')
self.param1 = self.param1.to('cuda:1')
return self
Then after the creation of the model I call model.set_devices()
. Is PyTorch going to accumulate the gradients in the new moved parameters .grad
? If so, why? .to will create a new tensor that is no longer a leaf tensor from what I understand, is that correct? Is the behaviour different for nn.Parameter and nn.Module? What about model = model.set_devices()
? Does that yield different results?
Thanks in advance for your help