Torchvision pretrained resnet was not trained well?


(Nam Vo) #1

Once upon a time I was fine-tuning the pretrained resnet for an image retrieval task and noticed that I got worse performance than using the pretrained vgg.

Recently I looked at another dataset paper, where they reported using off the shelf networks’ features as baselines, the result is that resnet is better than vgg, which is better than alexnet (makes sense). And I tried reproducing these baselines, succeeded with torchvision pretrained alexnet & vgg, but not resnet101. It’s supposed to be better than vgg, but it’s actually on par with alexnet only.

So now, I figure the problem lies with torchvision pretrained resnet? Maybe it’s not as good as Caffe’s or tensorflow’s model? Has somebody else released their own resnet models pretrained on Imagenet, that I can test


#2

once you loaded the pretrained resnet-101, make sure you set it in eval() mode. Otherwise BatchNorm effects might be weird (your inference performance will depend on batch size).

model = torchvision.models.resnet101(pretrained=True)
model.eval()

(Nam Vo) #3

Hey Soumith, yeah I did use eval mode when doing feature extraction.
I just tried the caffe ported version from https://github.com/Cadene/pretrained-models.pytorch
and got much better result (close to the baseline reported in the paper). So something might be wrong with the torchvision pretrained resnet :confused:


(Wei Yang) #4

@lugiavn I evaluated the torchvision pretrained models in eval mode using pytorch/examples/imagenet code on ILSVRC2012 val dataset (50,000 images), and got similar error-rate as the current benchmark.

model benchmark_top1 local_top1 benchmark_top5 local_top5
AlexNet 43.45 43.48 20.91 20.93
ResNet-50 23.85 23.87 7.13 7.14
ResNet-101 22.63 22.63 6.44 6.45

The way how to ran the eval code was: python main.py -a resnet101 --pretrained -e data

System Info:
PyTorch version: 0.4.0
Python version: 3.6.4
CUDA/cuDNN version: cuda/9.0, cudnn/v7.0-cuda.9.0
GPU: Quadro GP100 X2

Let me know if you cannot reproduce these results, or have any question.


(Nam Vo) #5

Hey, thanks for testing that out
Actually I have no doubt that they matched the reported performance when trying to reproduce resnet.
What I meant is that maybe it didn’t transfer well to the other task (this one https://arxiv.org/abs/1803.11285)


(Kangdi Shi) #6

Hello, sir. I met similar problem with you about using pytorch pre-trained resnet-101 to extract feature on oxford5k, but the mAP is so low, even worse than vgg16. Did you solve it? how?

I also use the caffe resnet-101 model in the link your provided, but it didn’t improve my mAP. Sad.

My code is below:

import pretrainedmodels

import torch
import torch.nn as nn
from scipy.misc import imread
import numpy as np
import os
from torchvision import transforms
from sklearn.preprocessing import normalize

torch.cuda.set_device(0) 
model_name = 'cafferesnet101'
model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
newmodel = list(model.children())[:-2]
newmodel = nn.Sequential(*newmodel)
newmodel = newmodel.cuda()
for p in newmodel.parameters():
    p.requires_grad = False
newmodel.eval()

normalize_p = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
transform = transforms.Compose([transforms.ToTensor(),normalize_p])

ave_Result = []

Qimg = imread(Image address, mode='RGB')
Qimg = transform(Qimg).cuda() 
Qimg = torch.unsqueeze(Qimg, 0)
outputQ = newmodel(Qimg).cuda()
outputQ = outputQ.cpu().numpy()[0]
P5A = np.transpose(outputQ,(1,2,0)) # feature map 

ave_DPA = np.mean(P5A,axis=0) # average pooling
ave_DPA = np.reshape(ave_DPA,(1,2048))
ave_DPA = normalize(ave_DPA,axis=1) # normalization
ave_Result.append(ave_DPA) 


(Nam Vo) #7

Use their code to load images, the preprocessing is different from torchvision model


(Derek Hao Hu) #8

@lugiavn I’m trying to reproduce the same paper and found TensorFlow Slim’s ResNet-v1 model also produces a worse result compared to VGG-16. :frowning:


(Hzzone) #9

Is there some difference in the pretrained alexnet model between PyTorch and Caffe?

Image input size and some parameters of convolution layers are different in a way.


(Kai Li) #10

@lugiavn Any progress on this problem?


(Xiang Xu) #11

@Hzzone Yes, pytorch use a different version of AlexNet from Caffe. For my experiments, I find pytorch verison has more than 10% in performance drop when fine-tuning to other tasks.


(Rongcheng Lin) #12

I also observed the same situation in my experiments. The pytorch pretrained models (including vgg, resnet) perform much worse than the original caffe version, when they are used for finetuning.


(Nam Vo) #13

I just want to add a note since there’s others with similar experience
Here it’s also found that caffe version is better for finetuning for object detection task:


You can get the caffe pretrained version there, or another version is at: https://github.com/Cadene/pretrained-models.pytorch