Concat features from multiple Pretrained models to do classification

Hi;

I have images taken from two separate cameras (camera1 and camera2). I am planning to use two separate pre-trained Resnet18’s to for images from each source. Further I plan to concatenate/ club the penultimate layer features both networks before I perform classification.

As of now I have written this piece of code for the same :slight_smile:

pretrained_model = models.resnet18(pretrained=True)

my_model1 = nn.Sequential(*list(pretrained_model.children())[:-1])
my_model2 = nn.Sequential(*list(pretrained_model.children())[:-1])

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.feature1 = my_model1
self.feature2 = my_model2
self.fc = nn.Linear(1024, 117)

def forward(self, x,y):

    x1= self.feature1(x)
    x2= self.feature2(y)
    x3 = torch.cat((x1,x2),1)
    x3 = x3.view(x3.size(0), -1)
    x3 = self.fc(x3)
    return x3

net=Net()

p=Variable(torch.rand(4,3,240,240))
q=Variable(torch.rand(4,3,240,240))
o=net(p,q)

As of now I dont get any error. It would be great if someone could let me know if I am doing this write.

This code looks good to me. Are you planning to use the cameras as two β€œeyes” in order to have perspective?

Yes alexis. Thank you for the pointer

1 Like

This code looks good. But it seems that we have to train two models if we don’t freeze the pretrained layers. This ways requires more memory more time. If I have more memory, why do I train deeper network, like resnet 101.

1 Like

Hi will;

This was code was used in case if I have images acquired from two different cameras. The image from camera 1 goes to network 1 and images from camera 2 goes to network2. At some point I merge the two networks and make a decision based on both the images. That was the whole point for the approach I took.

Now, if answering your question.

If your short on gpu, and want to train the deep networks:

  1. Reduce batch size. Use batch size of 2 or 3.
  2. Switch to cpu