I performed a simple classification inference on a sample image using ResNet-50. I used both the IMAGENET1K_V1
and IMAGENET1K_V2
versions of the model weights. I found that there was around 70% jump in the final confidence scores (even after applying the appropriate transforms) in the V1 version as opposed to the V2.
This is odd since the V2 version was expected to give a better confidence score!
I have attached a Colab Notebook as a reference.
The class with the highest confidence score on the same image across the IMAGENET1K_V1
and IMAGENET1K_V2
weights (even with the appropriate transforms) are 99.771%
and 58.404%
. Something seems quite off!
- The custom transforms for
IMAGENET1K_V1
weights were:
T.Compose([T.Resize(256),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)])
- The custom transforms for
IMAGENET1K_V2
weights were:
T.Compose([T.Resize(232),
T.CenterCrop(224),
T.ToTensor(),
T.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)])
Even the ResNet50_Weights.IMAGENET1K_V1.transforms()
and ResNet50_Weights.DEFAULT.transforms()
were used with literally no difference in the results!
The Torchvision documentation for ResNet50 with IMAGENET1K_V2
weights states:
The inference transforms are available at
ResNet50_Weights.IMAGENET1K_V2.transforms
and perform the following preprocessing operations: AcceptsPIL.Image
, batched(B, C, H, W)
and single(C, H, W)
imagetorch.Tensor
objects. The images are resized toresize_size=[232]
usinginterpolation=InterpolationMode.BILINEAR
, followed by a central crop ofcrop_size=[224]
. Finally the values are first rescaled to[0.0, 1.0]
and then normalized usingmean=[0.485, 0.456, 0.406]
andstd=[0.229, 0.224, 0.225]
.
This is precisely what I have performed above.
Could someone possibly point out what the issue is here?
PS: I have also raised an issue for the same.