I need to switch to albumentations for more flexibility (using some custom image transforms). However, doing a simple test of the following transforms when switching from Torchvision yields lower performance:
from torchvision import transforms as transforms
import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2
transforms_ = transforms.Compose([
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
transforms_ = A.Compose([
mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5],
I thought I was very thorough, even being very particular to replace the integer-argument Resize transform with albumentations “SmallestMaxSize” transform which I am pretty sure is the equivalent.
What am I missing here (i.e. what is different)?
Did you check the performance using a few runs with different seeds to check the mean and stddev of the final accuracy? This could show if the difference is real or if you just had a “bad” seed.
Thanks for the suggestion @ptrblck! Unfortunately (or perhaps, fortunately), I am keeping the same seed (0) for both eval runs of my model. But do the two transforms look functionally the same? I am not too familiar with albumentations, so I was looking at their source code for the first time.
I’m not deeply familiar with the
albumentations library but would assume both transformations yield the same outputs. The second resize transformation doesn’t seem to be necessary as the previously applied center crop already returns the desired shape. Since you are not using any random transformations (I just realized it now) it would be a good idea to test both transforms on a defined input and check the difference in their outputs.
Thank you for the suggestion! After doing some digging, I found that Albumentations uses cv2.INTER_LINEAR for interpolating during resizing while Torchvision uses InterpolationMode.BILINEAR as default interpolation when resizing. This has shown to non-trivially affect performance. Hopefully, this is helpful for anyone else dealing with this issue as I know Albumentations is quite popular.
If anyone has any further suggestions on how to incorporate the exact same interpolation as torchvision in albumentations, I would appreciate that. For the time being, I will use torchvision.