I am using pretrained VGG16 model to get features of an image. Image dimension is 256X340 however, VGG16 takes image of dimension 244X244, so I made 5 images out of original image, each of dimension 244X244 (topleft, topright, bottomleft, bottomright, center). Now I am passing these 5 images to model, dimension [5, 3, 244, 244] and getting output of [5, 4096] (modified last layer of VGG16 to get 4096 feats). Now I want to take the average of these 5 outputs so that I can get a final set of feature for original image. Any suggestion on how to do this ?
Does doing mean on axis = 0 can be a solution here. ??
Hi,
Yes it is
It will give you one set of features for the set of images.
1 Like
Hello @albanD. I have something similar. My data class returns the image, label and ID. How would I average over the ID. each ID could have 1:N pictures in the models. Code so far:
class SuperEncoder(nn.Module):
def init(self):
super(MyModel, self).init()
self.roofEncoder = nn.Sequential(
nn.Conv2d(3, 6, 3, 1, 1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6, 12, 3, 1, 1),
nn.ReLU(),
nn.MaxPool2d(2)
)self.dwellingEconder = nn.Sequential( nn.Conv2d(1, 6, 3, 1, 1), nn.ReLU(), nn.MaxPool2d(2), nn.Conv2d(6, 12, 3, 1, 1), nn.ReLU(), nn.MaxPool2d(2) ) self.fc1 = nn.Linear(54*54*16, 1000) self.fc2 = nn.Linear(54*54*16, 1000) self.fc_out(x) def forward(self, x1, x2): x1 = self.roofEncoder(x1) x1 = x1.view(x1.size(0), -1) x1 = F.relu(self.fc1(x1)) x2 = self.dwellingEncoder(x2) x2 = x2.view(x2.size(0), -1) x2 = F.relu(self.fc2(x2))
For more context see: How do I average photo feature outputs for later concatenation?