Hello,
I am having trouble tracing why the model.to(‘device’) is not moving my whole model to the mps device on a M1 macbook pro.
I am creating custom layer blocks to use within an encoder decoder setting as following:
class Conv2dLayer(nn.Module):
def __init__(self, nin, nout, kernel_size, stride_sz, pad_sz,
activation='relu'):
super(Conv2dLayer, self).__init__()
self.activation = activation
self.conv2d = nn.Conv2d(nin, nout, kernel_size, stride_sz, pad_sz)
if self.activation == 'relu':
self.activ = nn.ReLU(inplace=True)
elif self.activation == 'lrelu':
self.activ = nn.LeakyReLU(0.2, inplace=True)
self.bn = nn.BatchNorm2d(nout)
def forward(self, input):
x = self.conv2d(input)
x = self.activ(x)
x = self.bn(x)
return x
I have a similar layer with ConvTranspose2d. Those layers are then used to populate a conv encoder and deconv decoder which are then used for the autoencoder class. The encoder is as following:
class ConvEncoder(nn.Module):
def __init__(self, encoder_dict):
super(ConvEncoder, self).__init__()
self.in_channels = encoder_dict["n_channels"]
.....
self.c1 = Conv2dLayer(self.in_channels, self.layer_nodes[0], kernel_size=(4,4),
stride_sz=(3,3), pad_sz=(1,1), activation=self.activation)
.....
def forward(self, x):
out1 = self.c1(x)
.....
return out
And the final model is something like that:
class CNN_AE(nn.Module):
def __init__(self, encoder_dict, decoder_dict):
super(CNN_AE, self).__init__()
self.encoder = ConvEncoder(encoder_dict)
self.decoder = ConvDecoder(decoder_dict)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
I am getting the following error:
RuntimeError: Input type (MPSFloatType) and weight type (torch.FloatTensor) should be the same
which is traced to the definition of my custom layer forward line. Now when I am printing all the named parameters of the original model I get an expected result that device is mps:0. However, when I print the named parameters of the first layer within the encoder like so:
for name, param in self.c1.named_parameters():
print(f"{name} device: {param.device}")
I get the following result:
conv2d.weight device: cpu
conv2d.bias device: cpu
bn.weight device: cpu
bn.bias device: cpu
While the situation is similar to this Why model.to(device) wouldn't put tensors on a custom layer to the same device? , in that question the op wanted to move a newly created tensor from within the model to the device which was solved by registering buffers. Here, I can’t trace what is going on. At the same time I am worried that the custom layers are not registered to the grad chain therefore may not be updated even if I try and train with the cpu. The only way to go around this currently is to move everything to device from within the forward function of the encoder or the custom layer definition which is a hacky work-around.
Any thoughts?