Let’s say we have a simple model like this:
class Model(nn.Module): def __init__(self): self.conv1 = nn.Conv2d(3,32, 3) self.param1 = nn.Parameter(torch.rand(3,3)) def set_devices(self): self.conv1 = self.conv1.to('cuda:0') self.param1 = self.param1.to('cuda:1') return self
Then after the creation of the model I call
model.set_devices(). Is PyTorch going to accumulate the gradients in the new moved parameters
.grad? If so, why? .to will create a new tensor that is no longer a leaf tensor from what I understand, is that correct? Is the behaviour different for nn.Parameter and nn.Module? What about
model = model.set_devices()? Does that yield different results?
Thanks in advance for your help