I have pre-trained VGG models on two different but related types of images. Then I have combined these two models.
class MyEnsemble(nn.Module):
def __init__(self, modelA, modelB, nb_classes=2):
super(MyEnsemble, self).__init__()
self.modelA = modelA
self.modelB = modelB
# Remove last linear layer
self.modelA.classifier[6] = nn.Identity()
self.modelB.classifier[6] = nn.Identity()
# Create new classifier
self.classifier = nn.Linear(4096+4096, nb_classes)
def forward(self, x1, x2):
x1 = self.modelA(x1) # clone to make sure x is not changed by inplace methods
x2 = self.modelB(x2)
x = torch.cat((x1, x2), dim=1)
x = self.classifier(F.relu(x))
return x
# Train your separate models
# ...
# We use pretrained torchvision models here
modelA = models.vgg16(pretrained=True)
num_ftrs = modelA.classifier[6].in_features
modelA.classifier[6] = nn.Linear(num_ftrs,2)
modelB = models.vgg16(pretrained=True)
num_ftrs = modelB.classifier[6].in_features
modelB.classifier[6] = nn.Linear(num_ftrs,2)
modelB.load_state_dict(torch.load('checkpoint1.pt'))
modelA.load_state_dict(torch.load('checkpoint2.pt'))
model = MyEnsemble(modelA, modelB)
Now I want to test the combined model using test images. Can anyone please help me with how to give pair of input images? or how can I test the combined model using two different but related types of images?
Thanks, @ptrblck for your reply. Maybe my question is very silly. Still, I am very much confusing. If I have 1 type of image I can test the model for classification using the following code
The accuracy computation wouldn’t need to be changes, since your model would still output a single prediction for each input pair. However, you would want to change the data loading pipeline to get both images. For this you could write a custom Dataset and return both images in the __getitem__ as well as the label. This will make sure that the DataLoader loop will yield a batch of both images and the target tensors.
Thanks for your kind suggestion. I have written a custom Dataset and call the above test code. But I am getting an error. Can you please tell me whether I am going right direction or something problem?
Thanks. Now I am getting 2 problems.
If I use labels = labels.to(device)
AttributeError: ‘numpy.int64’ object has no attribute 'to
If I did not use CUDA. The above problem is solved. But get this error
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 64 3 3 3, but got 3-dimensional input of size [1, 224, 224] instead
The second error is raised, if your input tensors do not have the expected 3 channels, but are apparently missing the channel dimension (and might thus be grayscale images originally).
If that’s the case, add the channel dimension via unsqueeze and change the first conv layer to accept a single input channel by replacing it in the pretrained models.
It depends where you are using these methods.
You should check the shape of the image tensors inside the __getitem__ method and if both have only two dimensions, then use unsqueeze(0). Alternatively, you could also check the shape of the batch returned by the DataLoader and call unsqueeze(1), if needed, but I would prefer to use the former approach (inside __getitem__).
Depending on the models you are using, the replacement of the conv layer might work. However, you need to check, if the first conv layer is indeed accessible via model.features[0] or model.conv1 etc. To check it, have a look at the source code of the model or use print(model) to see the layer names.