Can I split the network, and Tune different layers with different learning rates?

god_sp33d · March 9, 2019, 8:04pm

I am trying to update the feature extractor and classifier with different learning rates, basically I split the network and do it. In optimizer I used param groups for optimizing the network.

resnet50 = models.resnet50(pretrained=True)
classifier = nn.Sequential(OrderedDict([("classifier", nn.Linear(1000, 31))]))

resnet50 = nn.Sequential(resnet50, classifier)
featExtractor = nn.Sequential(*(list(resnet50.children())[:-1])).cuda()
classifierModel = nn.Sequential(*(list(resnet50.children())[-1:])).cuda()

clf_optim = torch.optim.Adam([{'params': featExtractor.parameters(), 'lr':1e-4},
                             {'params': classifierModel.parameters()}], lr=5e-4)

for epoch in trange(epochs, leave=False):

    for _ in trange(iterations, leave=False):
        source_x, source_y = next(iter(amazonData))
        source_x, source_y = source_x.to(device), source_y.to(device)

        for _ in range(k_clf):
            features = featExtractor(source_x)
            out = classifierModel(features)
            clf_loss = clf_criterion(out, source_y)
            clf_optim.zero_grad()
            clf_loss.backward()
            clf_optim.step()

    print("total_loss: ", clf_loss)

Ps: I know I could have just used resnet50 instead of featExtractor and classifier instead of classifierModel, but this is just shortest version of what I am doing, and I am basically looking to validate the idea ?

god_sp33d · March 9, 2019, 10:20pm

@ptrblck what do you think ?

ptrblck · March 9, 2019, 10:29pm

The code looks basically alright and the per-parameter option should work, too.
However, I’m not sure, if it’s the best idea to just add another custom classifier on top on the pre-trained model. Usually you would remove the last linear layer and replace it with a new one.

PS: I’m not a big fan of tagging certain people as this might demotivate others to write an answer.

god_sp33d · March 9, 2019, 10:34pm

I am sorry, I will definitely avoid tagging.

Yes, I understand that usually we remove the pretrained fc layer. But, I don’t want to directly go from 2048 -> 31.

So, do you mean to say replace the last trained fc layer and add 2 custom fc layers ? like 2048-1000-31 ??

ptrblck · March 9, 2019, 10:36pm

Yeah, that’s a good point and I’m really not sure which would work the best.
Just based on my gut feeling, I would assume both described alternatives might work better than going from the pre-trained 1000 class output to your custom layer.

Could you post your results in case you are trying some different approaches as this is quite interesting.

god_sp33d · March 9, 2019, 10:39pm

sure, I will definitely post it once I am done with my training. I am working on it.