Error change output pretrained model

Hi, I am trying to edit the input and output channel of Mobilenet_v3_small. I have created a class for this (avoid the encoder part). The change is based on the mobilenet source code.

input_net = 3 (RGB image)
output_net = 2 (2 classes)
last_channel = 1024 (Default in Mobilenet_v3_small)
class mi_Net(nn.Module):
    def __init__(self):
        super(mi_Net, self).__init__()

        self.model = models.mobilenet_v3_small(pretrained=True)
        for param in self.model.parameters():
            param.requires_grad = False
        
        #Change input and output Mobilenet
        self.model.classifier[0] = nn.Linear(input_net, last_channel)
        self.model.classifier[3] = nn.Linear(last_channel, output_net)
        
        #Encoder
        self.encoder = nn.Sequential(
            nn.Linear(input_encoder, 128),
            nn.ReLU(),
            nn.Linear(128, output_encoder),
        )
        
        self.out = nn.Linear(4, 2)

    def forward(self, x, y):
    	#Mobilenet
        x = self.model(x)  	
        #Encoder
        y = self.encoder(y)
        z = torch.cat(x, y)
        z = self.out(z)
        return z

But when I compile I get this:
mat1 and mat2 shapes cannot be multiplied (4x576 and 3x1024)

I changed input_net = 576 (I am not sure about this)
And now the error is:
mat1 and mat2 shapes cannot be multiplied (68x3 and 17x128)

I would like to solve the first error without change input_net, but I dont know how.

Thank you!!!

This would be the right change assuming you are not changing the shapes of the intermediate activations since the original classifier[0] layer expects 576 input features:

model = models.mobilenet_v3_small(pretrained=False)
print(model)
...
#   (classifier): Sequential(
#     (0): Linear(in_features=576, out_features=1024, bias=True)
#     (1): Hardswish()
#     (2): Dropout(p=0.2, inplace=True)
#     (3): Linear(in_features=1024, out_features=1000, bias=True)
#   )
# )

However, I don’t see the reason why you would like to replace the trained nn.Linear layer with a new one, so you might want to consider only replacing the last one in case you want to change the number of output features.

I guess this error is raised in self.encoder(y). I don’t know how input_encoder is set (I guess it’s set to 17) and don’t know where y is coming from so you would need to make sure self.encoder accepts the y input.

1 Like

input_encoder is set in 17 because the input is a tensor of 17 keypoints. I changed it to 1 because it is 1 tensor. But it returns me an error, because input_encoder must be 3. I dont understand it.

Once I changed to 3 (in order to works), it returns me this:

    z = torch.cat(z)
RuntimeError: Tensors must have same number of dimensions: got 2 and 3

Why returns me this if I have set both outputs to 2?

I would recommend to check the shapes and number of dimensions of all inputs to the model as well as from each output activation before trying to concatenate them.
The latest issue is caused by a dimension mismatch.

1 Like

I want to transform the “y” input (tensor of 17 rows and 3 columns, dimension 3) to an output tensor of 1 row and 2 columns, dimension 2. Because currently the output is a tensor of 17 rows and 2 columns, dimension 3. What layer should I introduce to the encoder? Thank you!

Sorry, I don’t fully understand the shapes, as you are mentioning rows and columns for 3D tensors so it seems the 3rd dimension is missing. Could you post them as shapes e.g. [2, 3, 4] which would be a 3D tensor with the corresponding shapes in each dimension?

Of course, what I mean is at the input I have this (print before self.encoder(y)):
(tensor of 17 rows and 3 columns, dimension 3)

tensor([[[ 0.0253, -0.2194,  0.9556],
         [ 0.0759, -0.2682,  0.9194],
         [-0.0253, -0.2682,  0.9463],
         [ 0.1770, -0.3170,  0.9249],
         [-0.1265, -0.3170,  0.9088],
         [ 0.3288,  0.0244,  0.9021],
         [-0.3288, -0.0244,  0.8784],
         [ 0.3794,  0.4633,  0.8549],
         [-0.4300,  0.4145,  0.8708],
         [ 0.1265,  0.5608,  0.8571],
         [-0.2276,  0.4633,  0.8436],
         [ 0.1770,  0.9997,  0.7719],
         [-0.2276,  0.9997,  0.7444],
         [ 0.1770,  1.7312,  0.8381],
         [-0.2276,  1.7312,  0.8332],
         [ 0.1265,  2.0725,  0.1256],
         [-0.1770,  2.0725,  0.0970]]])

At the output I have this (print after self.encoder(y)):
(tensor of 17 rows and 2 columns, dimension 3)

tensor([[[0.1626, 0.2519],
         [0.1674, 0.2395],
         [0.1650, 0.2347],
         [0.1734, 0.2375],
         [0.1595, 0.2026],
         [0.1249, 0.3276],
         [0.1085, 0.2589],
         [0.0321, 0.4811],
         [0.0249, 0.3825],
         [0.0079, 0.4788],
         [0.0122, 0.4129],
         [0.0000, 0.6507],
         [0.0000, 0.6006],
         [0.0000, 0.9070],
         [0.0000, 0.8434],
         [0.0000, 0.8709],
         [0.0000, 0.8039]]])

And what I want at the output is a tensor like this:
(tensor of 1 row and 2 columns, dimension 2)

tensor([[-0.0739, -0.2400]])

If you want to reduce a tensor in the shape [1, 17, 2] to [1, 2] you could use any reduction operation such as e.g. torch.mean, torch.sum etc.
Also a linear layer would work in case you want to reduce it, but the right approach depends on your use case in the end:

x = torch.tensor([[[0.1626, 0.2519],
                   [0.1674, 0.2395],
                   [0.1650, 0.2347],
                   [0.1734, 0.2375],
                   [0.1595, 0.2026],
                   [0.1249, 0.3276],
                   [0.1085, 0.2589],
                   [0.0321, 0.4811],
                   [0.0249, 0.3825],
                   [0.0079, 0.4788],
                   [0.0122, 0.4129],
                   [0.0000, 0.6507],
                   [0.0000, 0.6006],
                   [0.0000, 0.9070],
                   [0.0000, 0.8434],
                   [0.0000, 0.8709],
                   [0.0000, 0.8039]]])


lin = nn.Linear(17, 1)
out = lin(x.permute(0, 2, 1))
print(out.shape)
# torch.Size([1, 2, 1])
out = out.squeeze(2)
print(out.shape)
# torch.Size([1, 2])
1 Like