Regarding the interpolation methods for the various augmentations present in the torchvision library

I was going through the various interpolation methods in the documentation and noticed that most of the time the default interpolation method is given as InterpolationMode.BILINEAR or InterpolationMode.NEAREST while isn’t BICUBIC considered to be better and hence it should made the default.

Is there any reason to consider the other 2 interpolation methods for Imagenet , I work with medical images and have found BICUBIC better , there are some research also supporting the work (Effect of the Pixel Interpolation Method for Downsampling Medical Images on Deep Learning Accuracy)

Bicubic interpolation algorithm can get relatively clear picture quality, but it needs larger amount of calculation. This algorithm now is most commonly used in many image processing software such as Photoshop, After Effects, Avid and Final Cut Pro etc. More info can be viewed here

Is there any reason the torchvision community decided to use NEAREST and BILINEAR as default over BICUBIC , one reason I could think of would be to compromise image quality over speed.
Anyway currently if we want to change the interpolation method we just have to set the
while running the transforms.Rotation or functional.affine so not much of a hasle.

I would guess the defaults were picked because they worked fine and a bicubic interpolation might not have shown any benefits while training CNNs for e.g. ImageNet. Your observation for medical use cases is interesting and MONAI uses the 'area' mode as the default for it’s monai.transforms.Resize transformation.
Did you see any advantage in using bicubic interpolation for any of the current CNNs (e.g. in the TIMM repo)?
A quick check shows that 'bilinear' might also be the default for ImageNet evaluation as seen here, but @rwightman would have more insight if bicubic interpolation is outperforming the defaults in any use case.

1 Like

I haven’t checked it yet, need to perform various comparative analysis to see if it actually benefits for the image net dataset , my work currently is focused on medical images so wanted to point that out, I feel it should translate to Image Net too unless people have already tried it and felt it wasn’t too useful.