I’m building a multiple-input model with 2 types of inputs:
Images (torch.Size([1, 3, 224, 224])) and landmark features (torch.Size([1, 96])).
Here’s the model itself:
class MixedNetwork(nn.Module): def __init__(self): super(MixedNetwork, self).__init__() image_modules = list(models.resnet50().children())[:-1] self.image_features = nn.Sequential(*image_modules) self.landmark_features = nn.Sequential( nn.Linear(in_features=96, out_features=192,bias=False), nn.ReLU(inplace=True), nn.Dropout(p=0.25), nn.Linear(in_features=192,out_features=1000,bias=False), nn.ReLU(inplace=True), nn.Dropout(p=0.25)) self.combined_features = nn.Sequential( nn.Linear(1000, 512), nn.ReLU(), nn.Linear(512, 32), nn.ReLU(), nn.Linear(32,1)) def forward(self, image, landmarks): a = self.image_features(image) print(a.shape) b = self.landmark_features(landmarks) x = torch.cat((a.view(a.size(0), -1), b.view(b.size(0), -1)), dim=1) x = self.combined_features(x) x = F.sigmoid(x) return x
I’m getting confused when it comes to defining input-output features for Linear layers and combined layers. The last FC layer of resnet50 Linear(in_features=2048, out_features=1000). Does it mean that the last output of self.landmark_features layers also has to be 1000 and the first linear layer of self.combined_features should also be 1000?
Is it correct to assume that if the landmark input size is [1, 96] then the in_features for the first layer of self.landmark_features has to be 96?
With the current dimensions I’m getting the error message:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3048 and 1000x512)
(why 3048 and not 2048?)