Fine tuning part of a pre trained model which fed a new classifier

catt_ale · November 13, 2020, 4:37pm

Hello to everyone!

I have a pre-trained model made by myself, now I pick up some layers of the base model and use them to feed a classifier. Now I want to train the classifier and fine-tune the layers of my pre-trained net. My question is: is it correct how I would like to do here in the code below? I am not sure where there is the comment

#???

Someone can explain if, where, and why I am going wrong?
Thank you so much!

class Mybasemodel(nn.Module):
    def __init__(self, modelA, modelB):
        self.model = nn.Sequential(
            .
            .
            .
        )
    def forward(self, x1, x2):
        x = self.model(x)

        return x


class Classifier(nn.Module):
    def __init__(self, modelA, modelB):
        self.model = nn.Sequential(
            .
            .
            .
        )
    def forward(self, x1, x2):
        x = self.model(x)

        return x

basemodel = Mybasemodel()
basemodel.load_state_dict(torch.load('...'))
basemodel.train()

calssifier = Classifier()
calssifier.train()

loss_basemodel_fn = torch.nn.L1Loss()
loss_classifier_fn = torch.nn.L1Loss()

opt_basemodel = torch.optim.(basemodel.parameters())
opt_classifier = torch.optim.(calssifier.parameters())

first_layer = 0
N_layer = 3

i = 0
for param in model.basemodel():
    i += 1
    if i > N_layer:
        param.requires_grad = False


for i, data in enumerate(dataset):

    input = data['input']
    label_basemodel = data['label_basemodel']
    label_classifier = data['label_classifier']

    output_base_model = basemodel.model(input)

    feature_map = basemodel.model[first_layer:N_layer](input)
    pred = calssifier(feature_map)

    opt_basemodel.zero_grad()
    opt_classifier.zero_grad()

    loss_classifier = loss_classifier_fn(pred, label_basemodel)
    #???
    loss_basemodel = loss_basemodel_fn(output_base_model, label_classifier) + loss_classifier

    loss_basemodel.backward()
    loss_classifier.backward()

    opt_basemodel.step()
    opt_classifier.step()

ptrblck · November 15, 2020, 9:55am

It depends on your use case and how you would like to fine tune the basemodel.
Currently both losses would create gradients in the parameters or basemodel, since these parameters were used to compute the losses. If you only want to create the gradients of loss_basemodel w.r.t to the parameters of basemodel, you would have to .detach() output_base_model before feeding it to classifier.

Note that there are minor issues (such as using for param in model.basemodel() instead of for param in basemodel.parameters()) but I assume these are copy-paste errors.
Also, if you are iterating the parameters, i will not reflect the layers, since some modules have more than a single parameter (often weight and bias).

catt_ale · November 16, 2020, 9:11am

Hello, thank you for your answer.

I don’t understand why would I detach output_base_model before feeding it to the classifier? This sounds new to me.

To give you some additional information, my base model is a cycle GAN, I am using some layer of the generator to reuse learned features to fed a classifier. Train the two networks separately is not a problem, I guess. With the simplified example above I wanted to show how I would optimize base-model parameters w.r.t. the classification error. The optimal for me would be able both to:

Optimize the base model w.r.t his original loss function combined with the loss of the classifier
Optimize the base model w.r.t. the loss of the classifier.