Torchvision pretrained resnet was not trained well?

lugiavn · April 24, 2018, 5:31am

Once upon a time I was fine-tuning the pretrained resnet for an image retrieval task and noticed that I got worse performance than using the pretrained vgg.

Recently I looked at another dataset paper, where they reported using off the shelf networks’ features as baselines, the result is that resnet is better than vgg, which is better than alexnet (makes sense). And I tried reproducing these baselines, succeeded with torchvision pretrained alexnet & vgg, but not resnet101. It’s supposed to be better than vgg, but it’s actually on par with alexnet only.

So now, I figure the problem lies with torchvision pretrained resnet? Maybe it’s not as good as Caffe’s or tensorflow’s model? Has somebody else released their own resnet models pretrained on Imagenet, that I can test

smth · April 24, 2018, 5:58am

once you loaded the pretrained resnet-101, make sure you set it in eval() mode. Otherwise BatchNorm effects might be weird (your inference performance will depend on batch size).

model = torchvision.models.resnet101(pretrained=True)
model.eval()

lugiavn · April 24, 2018, 3:50pm

Hey Soumith, yeah I did use eval mode when doing feature extraction.
I just tried the caffe ported version from https://github.com/Cadene/pretrained-models.pytorch
and got much better result (close to the baseline reported in the paper). So something might be wrong with the torchvision pretrained resnet

Wei_Yang · May 31, 2018, 5:43pm

@lugiavn I evaluated the torchvision pretrained models in eval mode using pytorch/examples/imagenet code on ILSVRC2012 val dataset (50,000 images), and got similar error-rate as the current benchmark.

model	benchmark_top1	local_top1	benchmark_top5	local_top5
AlexNet	43.45	43.48	20.91	20.93
ResNet-50	23.85	23.87	7.13	7.14
ResNet-101	22.63	22.63	6.44	6.45

The way how to ran the eval code was: python main.py -a resnet101 --pretrained -e data

System Info:
PyTorch version: 0.4.0
Python version: 3.6.4
CUDA/cuDNN version: cuda/9.0, cudnn/v7.0-cuda.9.0
GPU: Quadro GP100 X2

Let me know if you cannot reproduce these results, or have any question.

lugiavn · June 1, 2018, 5:13am

Hey, thanks for testing that out
Actually I have no doubt that they matched the reported performance when trying to reproduce resnet.
What I meant is that maybe it didn’t transfer well to the other task (this one https://arxiv.org/abs/1803.11285)

Kangdi_Shi · June 22, 2018, 2:48am

Hello, sir. I met similar problem with you about using pytorch pre-trained resnet-101 to extract feature on oxford5k, but the mAP is so low, even worse than vgg16. Did you solve it? how?

I also use the caffe resnet-101 model in the link your provided, but it didn’t improve my mAP. Sad.

My code is below:

import pretrainedmodels

import torch
import torch.nn as nn
from scipy.misc import imread
import numpy as np
import os
from torchvision import transforms
from sklearn.preprocessing import normalize

torch.cuda.set_device(0) 
model_name = 'cafferesnet101'
model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
newmodel = list(model.children())[:-2]
newmodel = nn.Sequential(*newmodel)
newmodel = newmodel.cuda()
for p in newmodel.parameters():
    p.requires_grad = False
newmodel.eval()

normalize_p = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
transform = transforms.Compose([transforms.ToTensor(),normalize_p])

ave_Result = []

Qimg = imread(Image address, mode='RGB')
Qimg = transform(Qimg).cuda() 
Qimg = torch.unsqueeze(Qimg, 0)
outputQ = newmodel(Qimg).cuda()
outputQ = outputQ.cpu().numpy()[0]
P5A = np.transpose(outputQ,(1,2,0)) # feature map 

ave_DPA = np.mean(P5A,axis=0) # average pooling
ave_DPA = np.reshape(ave_DPA,(1,2048))
ave_DPA = normalize(ave_DPA,axis=1) # normalization
ave_Result.append(ave_DPA)

lugiavn · June 22, 2018, 6:55am

Use their code to load images, the preprocessing is different from torchvision model

derekhh · July 7, 2018, 9:08am

@lugiavn I’m trying to reproduce the same paper and found TensorFlow Slim’s ResNet-v1 model also produces a worse result compared to VGG-16.

Hzzone · August 30, 2018, 10:51am

Is there some difference in the pretrained alexnet model between PyTorch and Caffe?

Image input size and some parameters of convolution layers are different in a way.

kli-nlpr · September 1, 2018, 8:45am

@lugiavn Any progress on this problem?

Xiang_Xu · September 2, 2018, 6:40am

@Hzzone Yes, pytorch use a different version of AlexNet from Caffe. For my experiments, I find pytorch verison has more than 10% in performance drop when fine-tuning to other tasks.

Rongcheng_Lin · October 13, 2018, 2:10am

I also observed the same situation in my experiments. The pytorch pretrained models (including vgg, resnet) perform much worse than the original caffe version, when they are used for finetuning.

lugiavn · February 11, 2019, 6:48am

I just want to add a note since there’s others with similar experience
Here it’s also found that caffe version is better for finetuning for object detection task:

You can get the caffe pretrained version there, or another version is at: https://github.com/Cadene/pretrained-models.pytorch