VGG16 torch weights trained with scale jitting

amangasaryan · June 1, 2020, 9:44am

Torchvision provides pretrained weights for vgg16 with an error rate of 28.4. This seems to correspond to fixed 256 scale training. As authors of VGG released caffe weights for the model trained with scale jitting, was wondering if there are also torch weights available for vgg16 training with scale jitting?

ptrblck · June 2, 2020, 6:28am

Would you like to apply the scale jittering during evaluation as described in Section 4.2?
If I’m not mistaken, this script was used to pretrain the models, which uses RandomResizedCrop during training and is thus already applying a variety of different scaled and ratios.

amangasaryan · June 2, 2020, 9:59pm

I actually would like the scale jittering to be applied during training as described in Section 3.1. i.e each training image should be isotropically-rescaled with shortest edge S being randomly sampled from the range [256, 512]. And only then cropped to the fixed size of 224.
Indeed RandomResizedCrop brings a variety of scales and aspect ratios into the training data. However, it results in a different distribution of data then random rescaling of the whole image as suggested in the paper.
Giving a context to the topic I want to reproduce generalization experiments on VOC2007 described in Appendix B. As I use torchvision pretrained weights for vgg16 I am getting 72.06 mean AP which is around 17 points less than the reported score. Thus, I was suspecting the difference might be caused by vgg weights not being trained with scale jittering.

ptrblck · June 3, 2020, 6:09am

You could apply transform.Resize first with a manually sampled size in the range [256, 512] first and later transforms.CenterCrop.
Since you would sample the size in each iteration, I would suggest to use the functional API via torchvision.transforms.functional.resize(img, size, interpolation=2).

That makes sense, so you could follow the previous suggestion to manually reproduce the paper.