ToTensor transform does not work well on PIL F mode images

Deeply · November 27, 2018, 10:06am

I was trying to use torchvision.transforms.Normalize() transform on a 3-channel PIL image that has F format (32-bit float). The min max of the image are -54.3 and 24.6, respectively.

To use the Normalize() transform with mean=0.5 and std=0.25, for every channel, the image min max values should be between 0 and 1.

It seems that the ToTensor() transform puts the PIL RGB image to values between 0 and 1, but it does not do it correctly for the PIL image with F format. I was wondering if this issue can be solved without a special transform that I need to write for the PIL images that have F format.

krishnavishalv · November 27, 2018, 11:22am

Can you post the code that reproduces your error ?

Deeply · November 27, 2018, 12:46pm

Posting the code is a bit complicated, as it is spread into different files and classes. But, here is how to easily replicate the diagnostics.

1- The transform:

mu = ((0.5),) * 3
std = ((0.25),) * 3
image_transform = transforms.Compose([            
            # transforms.Resize(input_size) # we'll skip this for now
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x.repeat(3, 1, 1)),
            transforms.Normalize( mu, std ) ,                                   
            ])

Note. The image has only one channel, thus, I used x.repeat(3, 1, 1) in the transform, but that should not pose a problem. In fact, one can only use ToTensor().

2- Reading the image, and before applying the transform


ipdb>data.getextrema()
(-20.730741500854492, 10.131645202636719)

3- After using the above transform:

ipdb>data.min()
tensor(-84.9230)

ipdb>data.max()
tensor(38.5266)

Using the transform after removing Normalize:

ipdb> data.min()
tensor(-20.7307)

ipdb> data.max()
tensor(10.1316)

4- How to replicate the diagnostics? The best way is that someone try one of the format F PIL images I am using, 'test.tiff' is attached in the link shown below. To read it:

x = Image.open('test.tiff')

x.getextrema()
(-20.730741500854492, 10.131645202636719

The upload here does not accept '.tiff' files, hence, please find the image at this link.

5- I replicated the same diagnostics with a Cifar100 image (the mode is RGB), here are the numbers.

Before applying the transform:


ipdb>data.getextrema()
((166, 225), (32, 236), (11, 254))

After applying the transform (repeat has been removed here):

ipdb> data.min()
tensor(-2.)

ipdb> data.max()
tensor(1.9843)

Using only ToTensor() transform, no Normalize():

ipdb> data.min()
tensor(0.)

ipdb> data.max()
tensor(0.9961)

NB. I think one can replicate the above diagnostics using only ToTensor()

Conclusion: ToTensor() normalizes an RGB image to be within [0, 1], but, it leaves a three-channel PIL F mode (32-bit float) image as is. And this is why, and intended that, one can use Normalize() with mean around 0.5 and std 0.25, but, this won’t work for F mode PIL images.