Hi, I have encountered an unexpected behavior in the inference transforms of a pretrained model. The issue occurs when loading images from a dataset with and without the transform
function applied.
When the images are loaded with transform
applied, everything seems to work as intended, and the model accuracy is reasonably high. However, when loading images without applying transform
and then applying it later (after converting the image to a tensor using to_tensor
), the resulting images differ from those loaded with transform
.
I have tried adjusting the antialias
attribute of the transform
function, but it did not lead to any improvements.
Here is the code:
# torchvision
import torchvision.transforms as transforms
from torchvision import models
from torchvision.datasets import ImageNet
# torch
import torch
import torch.nn.functional as F
weights = models.ResNet50_Weights.IMAGENET1K_V2
model = models.resnet50(weights=weights).eval().to('cuda')
transform = weights.transforms()
# Setting `antialias` to either `True` or `False` didn't fix things.
# transform.antialias = ...
# Dataset with transform.
dataset = ImageNet('./data/', split='val', transform=transform)
img_1, _ = dataset[111]
# Dataset without transform.
dataset = ImageNet('./data/', split='val')
img_2, _ = dataset[111]
# 1. Convert PIL image to tensor.
# 2. Trnasform it.
img_2 = transforms.functional.to_tensor(img_2)
img_2 = transform(img_2)
# Expected to be same but it's not.
print(img_1 == img_2)
Is there anything I’m missing here? I would appreciate any insights or suggestions to resolve this issue.