Different fine-tuning results between Torch and PyTorch

panovr · March 2, 2017, 5:07pm

I’m fine-tuning a same custom dataset using Torch and PyTorch based on ResNet 34.

With Torch, I use fb.resnet.torch with learning rate 0.001. After 50 epoch:
top1: 89.267 top5: 97.933
With PyTorch, I use code below for fine-tuing. After 90 epoch:
- learning rate = 0.01, top1: 78.500, top5: 94.083
- learning rate = 0.001, top1: 74.792, top5: 92.583

You can see that PyTorch fine-tuning result is still not so good as Torch. Fine-tuning ResNet 18 has similar results.

Any suggestion or guidance?

PyTorch code used for fine-tuning:

class FineTuneModel(nn.Module):
    def __init__(self, original_model, arch, num_classes):
        super(FineTuneModel, self).__init__()
            # Everything except the last linear layer
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            self.classifier = nn.Sequential(
                nn.Linear(512, num_classes)
            )

        # Freeze those weights
        for p in self.features.parameters():
            p.requires_grad = False


    def forward(self, x):
        f = self.features(x)
        f = f.view(f.size(0), -1)
        y = self.classifier(f)
        return y

colesbury · March 2, 2017, 8:23pm

Your fine tune model looks OK to me. I don’t have enough context to see what could be different. Can you post your fine-tuning code?

panovr · March 2, 2017, 11:06pm

@colesbury I use this code for fine-tuning.

shicai · March 3, 2017, 8:41am

one possible reason is that, torch adopts more image transforms for data augmentation than pytorch.

fmassa · March 3, 2017, 11:48am

I think that the difference comes from the fact that you fix all the weights in the pytorch model (except from the last classifier), while in lua torch you are fine-tuning the whole network.

apaszke · March 3, 2017, 12:33pm

@shicai as far as I remember some of the transforms were not ported, because they didn’t have a noticeable effect on the accuracy

panovr · March 3, 2017, 12:47pm

@shicai @fmassa thanks for the suggestions!
I think it makes sense that the two reasons make the difference:

different data augmentations or image transforms
“hard” fine-tuning and “soft” fine-tuning

From my dataset experiment result, I think maybe “soft” fine-tuning is better than “hard” fine-tuning.

@apaszke from my dataset experiment result, maybe these transforms have effect indeed