Once upon a time I was fine-tuning the pretrained resnet for an image retrieval task and noticed that I got worse performance than using the pretrained vgg.
Recently I looked at another dataset paper, where they reported using off the shelf networks’ features as baselines, the result is that resnet is better than vgg, which is better than alexnet (makes sense). And I tried reproducing these baselines, succeeded with torchvision pretrained alexnet & vgg, but not resnet101. It’s supposed to be better than vgg, but it’s actually on par with alexnet only.
So now, I figure the problem lies with torchvision pretrained resnet? Maybe it’s not as good as Caffe’s or tensorflow’s model? Has somebody else released their own resnet models pretrained on Imagenet, that I can test
once you loaded the pretrained resnet-101, make sure you set it in
eval() mode. Otherwise BatchNorm effects might be weird (your inference performance will depend on batch size).
model = torchvision.models.resnet101(pretrained=True)
Hey Soumith, yeah I did use eval mode when doing feature extraction.
I just tried the caffe ported version from https://github.com/Cadene/pretrained-models.pytorch
and got much better result (close to the baseline reported in the paper). So something might be wrong with the torchvision pretrained resnet
@lugiavn I evaluated the torchvision pretrained models in eval mode using pytorch/examples/imagenet code on ILSVRC2012 val dataset (50,000 images), and got similar error-rate as the current benchmark.
The way how to ran the eval code was:
python main.py -a resnet101 --pretrained -e data
PyTorch version: 0.4.0
Python version: 3.6.4
CUDA/cuDNN version: cuda/9.0, cudnn/v7.0-cuda.9.0
GPU: Quadro GP100 X2
Let me know if you cannot reproduce these results, or have any question.
Hey, thanks for testing that out
Actually I have no doubt that they matched the reported performance when trying to reproduce resnet.
What I meant is that maybe it didn’t transfer well to the other task (this one https://arxiv.org/abs/1803.11285)
Hello, sir. I met similar problem with you about using pytorch pre-trained resnet-101 to extract feature on oxford5k, but the mAP is so low, even worse than vgg16. Did you solve it? how?
I also use the caffe resnet-101 model in the link your provided, but it didn’t improve my mAP. Sad.
My code is below:
import torch.nn as nn
from scipy.misc import imread
import numpy as np
from torchvision import transforms
from sklearn.preprocessing import normalize
model_name = 'cafferesnet101'
model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
newmodel = list(model.children())[:-2]
newmodel = nn.Sequential(*newmodel)
newmodel = newmodel.cuda()
for p in newmodel.parameters():
p.requires_grad = False
normalize_p = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
transform = transforms.Compose([transforms.ToTensor(),normalize_p])
ave_Result = 
Qimg = imread(Image address, mode='RGB')
Qimg = transform(Qimg).cuda()
Qimg = torch.unsqueeze(Qimg, 0)
outputQ = newmodel(Qimg).cuda()
outputQ = outputQ.cpu().numpy()
P5A = np.transpose(outputQ,(1,2,0)) # feature map
ave_DPA = np.mean(P5A,axis=0) # average pooling
ave_DPA = np.reshape(ave_DPA,(1,2048))
ave_DPA = normalize(ave_DPA,axis=1) # normalization
Use their code to load images, the preprocessing is different from torchvision model
@lugiavn I’m trying to reproduce the same paper and found TensorFlow Slim’s ResNet-v1 model also produces a worse result compared to VGG-16.
Is there some difference in the pretrained alexnet model between PyTorch and Caffe?
Image input size and some parameters of convolution layers are different in a way.
@lugiavn Any progress on this problem?
@Hzzone Yes, pytorch use a different version of AlexNet from Caffe. For my experiments, I find pytorch verison has more than 10% in performance drop when fine-tuning to other tasks.
I also observed the same situation in my experiments. The pytorch pretrained models (including vgg, resnet) perform much worse than the original caffe version, when they are used for finetuning.
I just want to add a note since there’s others with similar experience
Here it’s also found that caffe version is better for finetuning for object detection task:
You can get the caffe pretrained version there, or another version is at: https://github.com/Cadene/pretrained-models.pytorch