Combining two pretrained models for Training

deadwalker7539 · July 15, 2020, 11:58am

I am trying to use resnet18 and densenet121 as the pretrained models and have added 1 FC layer at the end of each network two change dimensions to 512 of both network and then have concatenated the outputs and passed to two final FC layers.

this can be seen here:

class classifier(nn.Module):
  def __init__(self,num_classes):
    super(classifier,self).__init__()

    self.resnet=models.resnet18(pretrained=True)

    self.rfc1=nn.Linear(512,512)
    
    self.densenet=models.densenet121(pretrained=True)
    self.dfc1=nn.Linear(1024,512)

    self.final_fc1=nn.Linear(1024,512)
    self.final_fc2=nn.Linear(512,num_classes)
    self.dropout=nn.Dropout(0.2)

  def forward(self,x):
    y=x.detach().clone()

    x=self.resnet.conv1(x)
    x=self.resnet.bn1(x)
    x=self.resnet.relu(x)
    x=self.resnet.maxpool(x)
    x=self.resnet.layer1(x)
    x=self.resnet.layer2(x)
    x=self.resnet.layer3(x)
    x=self.resnet.layer4(x)
    x=self.resnet.avgpool(x)
    x=x.view(x.size(0),-1)
    x=nn.functional.relu(self.rfc1(x))

    y=self.densenet.features(y)
    y=y.view(y.size(0),-1)
    y=nn.functional.relu(self.dfc1(y))

    x=torch.cat((x,y),0)
    x=nn.functional.relu(self.final_fc1(x))
    x=self.dropout(x)
    x=self.final_fc2(x)

    return x

Error:

size mismatch, m1: [1048576 x 1], m2: [1024 x 512] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283

but as i could see the densenet outputs 1024 feature to the last layer

Questions:

Is the implementation correct ?
Can this implementation work as a Ensemble

ptrblck · July 16, 2020, 5:52am

Your current implementation misses the relu as well as adaptive pooling layer from the DenseNet implementation as used here, which would create a tensor of [batch_size, 1024, 7, 7] for the output of self.densenet.features.
Also, you might want to concatenate the activation tensors in dim1 (feature dimension).

These changes should work:

...
    y = self.densenet.features(y)
    y = F.relu(y)
    y = F.adaptive_avg_pool2d(y, (1, 1))
    y = y.view(y.size(0), -1)
    # y = F.relu(self.dfc1(y)) remove this here, if you are using the first relu

    x = torch.cat((x,y), 1)
...

deadwalker7539 · July 26, 2020, 6:09pm

Thanks, This Worked.