I have a dataset of (N, H, W, C) images with values in [0, 255]. Some are saved as np.uint8, some as torch.uint8.
I need to process them to pass them to Resnet, and I need to do it efficiently.
What I currently do is
import torchvision.transforms as T
transforms = nn.Sequential(
T.Resize(256, interpolation=3, antialias=None),
T.CenterCrop(224),
T.ConvertImageDtype(torch.float), # also divides by 255 if input is uint8
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
)
img4resnet = transforms(img.permute(0, 3, 1, 2).contiguous())
This is slow, especially if I need to pass many images at once.
There is ToTensor which is faster than permute + contiguous and accepts np.array, but it works with only 1 image at the time (I get an error saying I tried to pass 4D input and it expects 3D). If I have many images I am not sure if looping would be faster…
I have found other old issues asking something similar, but they are all years old.
Finally, I see that the latest version of torchvision now gets the preprocessing transformation from the weights. Is it faster? Can I just replace my current transform?