Different fine-tuning results between Torch and PyTorch

I’m fine-tuning a same custom dataset using Torch and PyTorch based on ResNet 34.

  • With Torch, I use fb.resnet.torch with learning rate 0.001. After 50 epoch:

  • top1: 89.267 top5: 97.933

  • With PyTorch, I use code below for fine-tuing. After 90 epoch:

    • learning rate = 0.01, top1: 78.500, top5: 94.083
    • learning rate = 0.001, top1: 74.792, top5: 92.583

You can see that PyTorch fine-tuning result is still not so good as Torch. Fine-tuning ResNet 18 has similar results.

Any suggestion or guidance?

PyTorch code used for fine-tuning:

class FineTuneModel(nn.Module):
    def __init__(self, original_model, arch, num_classes):
        super(FineTuneModel, self).__init__()
            # Everything except the last linear layer
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            self.classifier = nn.Sequential(
                nn.Linear(512, num_classes)
            )

        # Freeze those weights
        for p in self.features.parameters():
            p.requires_grad = False


    def forward(self, x):
        f = self.features(x)
        f = f.view(f.size(0), -1)
        y = self.classifier(f)
        return y
1 Like

Your fine tune model looks OK to me. I don’t have enough context to see what could be different. Can you post your fine-tuning code?

@colesbury I use this code for fine-tuning.

one possible reason is that, torch adopts more image transforms for data augmentation than pytorch.

1 Like

I think that the difference comes from the fact that you fix all the weights in the pytorch model (except from the last classifier), while in lua torch you are fine-tuning the whole network.

1 Like

@shicai as far as I remember some of the transforms were not ported, because they didn’t have a noticeable effect on the accuracy

@shicai @fmassa thanks for the suggestions!
I think it makes sense that the two reasons make the difference:

  • different data augmentations or image transforms
  • “hard” fine-tuning and “soft” fine-tuning

From my dataset experiment result, I think maybe “soft” fine-tuning is better than “hard” fine-tuning.

@apaszke from my dataset experiment result, maybe these transforms have effect indeed :smile: