Once upon a time I was fine-tuning the pretrained resnet for an image retrieval task and noticed that I got worse performance than using the pretrained vgg.
Recently I looked at another dataset paper, where they reported using off the shelf networks’ features as baselines, the result is that resnet is better than vgg, which is better than alexnet (makes sense). And I tried reproducing these baselines, succeeded with torchvision pretrained alexnet & vgg, but not resnet101. It’s supposed to be better than vgg, but it’s actually on par with alexnet only.
So now, I figure the problem lies with torchvision pretrained resnet? Maybe it’s not as good as Caffe’s or tensorflow’s model? Has somebody else released their own resnet models pretrained on Imagenet, that I can test
once you loaded the pretrained resnet-101, make sure you set it in eval() mode. Otherwise BatchNorm effects might be weird (your inference performance will depend on batch size).
model = torchvision.models.resnet101(pretrained=True)
model.eval()
Hey Soumith, yeah I did use eval mode when doing feature extraction.
I just tried the caffe ported version from https://github.com/Cadene/pretrained-models.pytorch
and got much better result (close to the baseline reported in the paper). So something might be wrong with the torchvision pretrained resnet
@lugiavn I evaluated the torchvision pretrained models in eval mode using pytorch/examples/imagenet code on ILSVRC2012 val dataset (50,000 images), and got similar error-rate as the current benchmark.
model
benchmark_top1
local_top1
benchmark_top5
local_top5
AlexNet
43.45
43.48
20.91
20.93
ResNet-50
23.85
23.87
7.13
7.14
ResNet-101
22.63
22.63
6.44
6.45
The way how to ran the eval code was: python main.py -a resnet101 --pretrained -e data
Hey, thanks for testing that out
Actually I have no doubt that they matched the reported performance when trying to reproduce resnet.
What I meant is that maybe it didn’t transfer well to the other task (this one https://arxiv.org/abs/1803.11285)
Hello, sir. I met similar problem with you about using pytorch pre-trained resnet-101 to extract feature on oxford5k, but the mAP is so low, even worse than vgg16. Did you solve it? how?
I also use the caffe resnet-101 model in the link your provided, but it didn’t improve my mAP. Sad.
My code is below:
import pretrainedmodels
import torch
import torch.nn as nn
from scipy.misc import imread
import numpy as np
import os
from torchvision import transforms
from sklearn.preprocessing import normalize
torch.cuda.set_device(0)
model_name = 'cafferesnet101'
model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
newmodel = list(model.children())[:-2]
newmodel = nn.Sequential(*newmodel)
newmodel = newmodel.cuda()
for p in newmodel.parameters():
p.requires_grad = False
newmodel.eval()
normalize_p = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
transform = transforms.Compose([transforms.ToTensor(),normalize_p])
ave_Result = []
Qimg = imread(Image address, mode='RGB')
Qimg = transform(Qimg).cuda()
Qimg = torch.unsqueeze(Qimg, 0)
outputQ = newmodel(Qimg).cuda()
outputQ = outputQ.cpu().numpy()[0]
P5A = np.transpose(outputQ,(1,2,0)) # feature map
ave_DPA = np.mean(P5A,axis=0) # average pooling
ave_DPA = np.reshape(ave_DPA,(1,2048))
ave_DPA = normalize(ave_DPA,axis=1) # normalization
ave_Result.append(ave_DPA)
@Hzzone Yes, pytorch use a different version of AlexNet from Caffe. For my experiments, I find pytorch verison has more than 10% in performance drop when fine-tuning to other tasks.
I also observed the same situation in my experiments. The pytorch pretrained models (including vgg, resnet) perform much worse than the original caffe version, when they are used for finetuning.
I just want to add a note since there’s others with similar experience
Here it’s also found that caffe version is better for finetuning for object detection task: