Significant Differences in the Confidence scores across IMAGENET1K_V1 and IMAGENET1K_V2 ResNet50 during Inference

Kunal_7 · May 9, 2023, 7:39pm

I performed a simple classification inference on a sample image using ResNet-50. I used both the IMAGENET1K_V1 and IMAGENET1K_V2 versions of the model weights. I found that there was around 70% jump in the final confidence scores (even after applying the appropriate transforms) in the V1 version as opposed to the V2.

This is odd since the V2 version was expected to give a better confidence score!

I have attached a Colab Notebook as a reference.

The class with the highest confidence score on the same image across the IMAGENET1K_V1 and IMAGENET1K_V2 weights (even with the appropriate transforms) are 99.771% and 58.404%. Something seems quite off!

The custom transforms for IMAGENET1K_V1 weights were:

T.Compose([T.Resize(256),                    
                    T.CenterCrop(224),                
                    T.ToTensor(),                     
                    T.Normalize(                      
                       mean=[0.485, 0.456, 0.406],                
                       std=[0.229, 0.224, 0.225]                  
                 )])

The custom transforms for IMAGENET1K_V2 weights were:

T.Compose([T.Resize(232),                    
                    T.CenterCrop(224),                
                    T.ToTensor(),                     
                    T.Normalize(                      
                       mean=[0.485, 0.456, 0.406],                
                       std=[0.229, 0.224, 0.225]                  
                 )])

Even the ResNet50_Weights.IMAGENET1K_V1.transforms() and ResNet50_Weights.DEFAULT.transforms() were used with literally no difference in the results!

The Torchvision documentation for ResNet50 with IMAGENET1K_V2 weights states:

The inference transforms are available at ResNet50_Weights.IMAGENET1K_V2.transforms and perform the following preprocessing operations: Accepts PIL.Image , batched (B, C, H, W) and single (C, H, W) image torch.Tensor objects. The images are resized to resize_size=[232] using interpolation=InterpolationMode.BILINEAR , followed by a central crop of crop_size=[224] . Finally the values are first rescaled to [0.0, 1.0] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] .

This is precisely what I have performed above.

Could someone possibly point out what the issue is here?

PS: I have also raised an issue for the same.