Add multiple FC layers in parallel

justusschock · June 1, 2018, 12:26pm

have you modified the class definition I wrote above? Because otherwise the code should have to work…

justusschock · June 1, 2018, 12:37pm

To make the post readable for everyone:

No I didn’t , it’s same

class ResNet50(nn.Module):
    def __init__(self,num_classes,loss={'htri'},**kwargs):
        super(ResNet50,self).__init__()
        self.loss = loss
        resnet50 = torchvision.models.resnet50(pretrained=True)
        self.base= nn.Sequential(*list(resnet50.children())[:-2])

        # To freeze resnet50
        #for param in resnet50.parameters():
        #    param.requires_grad=False
        num_fcs = 2
        for i in range(num_fcs):
           setattr(self, "fc%d" %i, nn.Linear(2048,num_classes))
        #resnet50.fc0.train(False)
        resnet50.fc1.train(False)

    def forward(self,x):
        x = self.base(x)
        x = F.avg_pool2d(x,x.size()[2:])
        f = x.view(x.size(0),-1)
        clf_outputs = {}
        for i in range(self.num_fcs):
            clf_outputs["fc%d" %i] = getattr(self, "fc%d" %i)(f)


        if self.loss == {'xent'}:
            return clf_outputs
        elif self.loss == {'xent','htri'}:
            return clf_outputs,f
        elif self.loss == {'cent'}:
            return clf_outputs,f
        else:
            raise KeyError("Unsupported loss:{}".format(self.loss))
        return clf_outputs

justusschock · June 1, 2018, 1:06pm

clf_outputs does not need to be a tensor. You are getting a dict as output (which is intended this way!). You have to apply every function which accepts only tensors (as I don’t know what you are trying to do) to EVERY item in this dict like this:

for k, v in clf_outputs.items():
    pytorch_fn(v)

deJQK · June 2, 2018, 3:47am

So is that possible to do transfer learning with conv layer as feature extractor this way?

model = ResNet50(5)
model.train(False)
model.fc.train(True)

justusschock · June 2, 2018, 7:06am

Yes this should be possible. Depending on your task you should be able to reuse up to 99% of the code

justusschock · June 4, 2018, 6:14am

You have to unpack your tensors inside the tuple and then call data on top of the unpacked tensors

kl_divergence · June 4, 2018, 6:17am

The problem is

qf.append(features) #qf is an empty list

I’ve tried unpacking tuple also, but I also have to append it. I can’t append tensors with different dimensions

justusschock · June 4, 2018, 6:25am

That’s true but why do you even want to do this? I assume that one of the values is your classifier output and the other one is your feature vector?

kl_divergence · June 4, 2018, 6:55am

As per my model, above:

class ResNet50(nn.Module):
    def __init__(self,num_classes,loss={'htri'},**kwargs):
        super(ResNet50,self).__init__()
        self.loss = loss
        resnet50 = torchvision.models.resnet50(pretrained=True)
        self.base= nn.Sequential(*list(resnet50.children())[:-2])
        num_fcs = 2
        for i in range(num_fcs):
           setattr(self, "fc%d" %i, nn.Linear(2048,num_classes))
        #resnet50.fc0.train(False)
        resnet50.fc1.train(False)

I think features[1] is the classifier output and would be used for calculating loss, would features[0] would be used ?

justusschock · June 4, 2018, 6:59am

In your model above features[0] is the classifier output while features[1] are the features which have been extracted from the resnet (usually you don’t need them for calculating the loss)

kl_divergence · June 4, 2018, 7:05am

So features[0] won’t be used anywhere right ? Not in finetuning as well ?

justusschock · June 4, 2018, 7:09am

No, features[0] will be used everywhere and features[1] won’t be used anywhere

kl_divergence · June 4, 2018, 7:12am

Thanks for the clarity

justusschock · June 4, 2018, 9:10am

I think this could be due to the fact that you train different FC layers (with different loss functions) on the same feature extractor. Have you tried freezing the resnet and only training the FC layers?

Also in your code you posted you don’t need the second FC layer as only the first FC layer’s output is returned and used for loss calculation.

justusschock · June 4, 2018, 9:13am

To be honest I think your whole network implementation and training idea is very strange. Do you simply want to finetune the resnet?

kl_divergence · June 4, 2018, 9:13am

Yes, but in different ways. It’s a different learning method altogether

justusschock · June 4, 2018, 9:28am

With the network definition as

class ResNet50(nn.Module):
    def __init__(self,num_classes,num_fcs=3, loss={'xent'},**kwargs):
        super(ResNet50,self).__init__()
        self.loss = loss
        resnet50 = torchvision.models.resnet50(pretrained=True)
        self.base= nn.Sequential(*list(resnet50.children())[:-2])
        self.num_fcs = num_fcs
        for i in range(num_fcs):
            setattr(self, "fc%d" % i, nn.Linear(2048, num_classes))


    def forward(self,x):
        x = self.base(x)
        x = F.avg_pool2d(x,x.size()[2:])
        f = x.view(x.size(0),-1)

        clf_outputs = {}
        for i in range(self.num_fcs):
            clf_outputs["fc%d" % i] = getattr(self, "fc%d" % i)(f)

        return clf_outputs

the training code should look as:

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

dataloader = YOUR CODE TO CREATE A DATALOADER
model = ResNet50(num_classes=10, num_fcs=2).to(device)
optim = torch.optim.Adam(model.parameters())

num_epochs = 100 # set your custom number here
switch_fc_epoch = 50 # set your custom number here


# I used the following loss functions as examples. You have to replace them by your own functions
loss_fc0 = torch.nn.MSELoss()
loss_fc1 = torch.nn.L1Loss()

for epoch in range(num_epochs):

    if epoch < switch_fc_epoch:
        model.fc0.train(True)
        model.fc1.train(False)
        output_fc = "fc0"
        loss_fn = loss_fc0
    else:
        model.fc0.train(False)
        model.fc1.train(True)
        output_fc = "fc1"
        loss_fn = loss_fc1

        # eventually you want to freeze the resnet structure. If you want to do so, you should uncomment the following line
        #model.base.train(False)

    for batch, target in dataloader:
        batch, target = batch.to(device), target.to(device)

        clf_outputs = model(batch)
        
        optim.zero_grad()
        loss_value = loss_fn(clf_outputs["output_fc"], target)
        loss_value.backward()
        optim.step()

kl_divergence · June 4, 2018, 9:34am

Thanks a lot for this, but I’ll be finetuning on a different dataset.

justusschock · June 4, 2018, 9:34am

then you simply have to create separate loaders in the if statement.

justusschock · June 4, 2018, 9:38am

From skimming your code it looks okay. Now you simply need to integrate my model definition and the way I select the used FC layer in each epoch.