ResNet50s with the same parameters behave differently for the same input when torch & torchvision versions differ

I am trying to reproduce experimental results of one paper.
The authors specify “requirements.txt” for Python versions

click
matplotlib 
numpy 
opencv-python==4.5.1.48
pandas
pathy==0.4.0
PyYAML==5.4.1
scikit-learn
scipy
seaborn 
torch==1.7.1
torchvision==0.8.2
tqdm 
urllib3==1.26.3

But this does not work on A100 GPU because of torch version (maybe?).
So I updated the torch & torchvision version as follows:

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Now, the environment is prepared and I am using ResNet50 loading from torch.hub (https://download.pytorch.org/models/resnet50-19c8e357.pth).

Then, I take an image from ImageNet and give it as an input.
However, the outputs of ResNet50 for different versions of torch and torchvision were different even though the pre-processing of the input was the same.
This means that when a model has the same parameters, the prediction of it is dependent on torch & torchvision versions, which I think is undesirable.
So what is the cause of this?

Generally, each library update (PyTorch or any CUDA library) could change the numerical behavior, which is expected as long as the numerical errors are in the expected range for the used dtype.
There is no guarantee to yield bitwise-identical results between different versions or hardware.
However, since you are using an A100, check if disabling TF32 reduces the numerical errors as described here.

Thank you for the reply.
I agree with the opinion that we cannot gain bitwise-indentical results if versions are different.
But I am not sure whether this can be considered as numerical errors, because the output values change to a large extent.

In torch==1.7.1,

tensor(12.5373, device='cuda:0') # for class A
tensor(12.5295, device='cuda:0') # for class B

and in torch==1.9.0+cu111,

tensor(12.0281, device='cuda:0') # for class A
tensor(12.5205, device='cuda:0') # for class B

which are the values contained in net(input).

Isn’t it too big a difference to explain from the viewpoint of numerical errors?

The difference could indicate an issue, could be expected if TF32 is used in one setup (you didn’t respond to my question about it), of could be expected depending on the model architecture.
I would thus recommend to use e.g. forward hooks and check the intermediate outputs to narrow down if the error increases unexpectedly in one layer.

I forgot to mention my answer to your question. TF32 wasn’t used in my setup. But I found out where things go wrong and it seems TF32 isn’t related to my case.

It turned out that the problem occured due to “torchvision.transforms”.
the following is my transformation which is usually used for ImageNet pre-processing.

my_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        # transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)), 
        # I comment out normalization because it is irrelevant.
    ])

When my_transform() is applied, the outputs were varying, though lots of elements were overlapping.
I printed out the elements of transformed image like this.

print(transformed_image[0, 0, :5, :5]

What I got was

# torchvision==0.8.2
tensor([[0.2588, 0.3294, 0.2824, 0.3843, 0.4588],
        [0.3569, 0.4000, 0.4235, 0.3725, 0.2980],
        [0.4039, 0.3725, 0.3451, 0.2941, 0.1725],
        [0.1451, 0.2549, 0.1922, 0.2902, 0.3451],
        [0.1020, 0.1020, 0.2039, 0.4549, 0.4667]])

and

# torchvision==0.10.0+cu111
tensor([[0.2353, 0.2588, 0.3294, 0.2824, 0.3843], # In 0.8.2, the tensor starts from 0.2588 which is the 2nd element of this.
        [0.3686, 0.3569, 0.4000, 0.4235, 0.3725],
        [0.4157, 0.4039, 0.3725, 0.3451, 0.2941],
        [0.1451, 0.1451, 0.2549, 0.1922, 0.2902],
        [0.1020, 0.1020, 0.1020, 0.2039, 0.4549]])

However, when I comment out
transforms.Resize(256) or transforms.CenterCrop(224)
my_transform(image) remains the same across PyTorch versions.

I don’t understand why outputs were different only when both transforms.Resize(256) and transforms.CenterCrop(224) are used.

One possible reason is that after resizing an image, its width becomes an odd number and the centercrop() function works differently.

Looking at the implementation of transforms.centercrop(), it is dependent on “functional.center_crop()”.

So I must check if something changed with the version update (maybe?).

Thanks for the update and indeed interesting findings!
Are you using tensors as inputs to the transformations or PIL.Images?
In the latter case, could you check if the PIL version also changes between both setups?

1 Like

I was using PIL.Image and the errors were all caused by the version difference of PIL.
Thank you!

Thanks for checking! Could you post the different PIL versions (in case you remember them), as it would be interesting to see why they changed the behavior?

My environment was Python 3.9 + Pillow 9.3.0.
Then I changed it to Python 3.8 + Pillow 7.1.2 because Python 3.8 doesn’t support 9.3.0.

If I remember correctly, the behavior was consistent across Pillow 7.1.2 - 8.3.2.

1 Like