Posting the code is a bit complicated, as it is spread into different files and classes. But, here is how to easily replicate the diagnostics.
1- The transform:
mu = ((0.5),) * 3
std = ((0.25),) * 3
image_transform = transforms.Compose([
# transforms.Resize(input_size) # we'll skip this for now
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3, 1, 1)),
transforms.Normalize( mu, std ) ,
])
Note. The image has only one channel, thus, I used x.repeat(3, 1, 1)
in the transform, but that should not pose a problem. In fact, one can only use ToTensor()
.
2- Reading the image, and before applying the transform
ipdb>data.getextrema()
(-20.730741500854492, 10.131645202636719)
3- After using the above transform:
ipdb>data.min()
tensor(-84.9230)
ipdb>data.max()
tensor(38.5266)
Using the transform after removing Normalize
:
ipdb> data.min()
tensor(-20.7307)
ipdb> data.max()
tensor(10.1316)
4- How to replicate the diagnostics? The best way is that someone try one of the format F PIL
images I am using, 'test.tiff'
is attached in the link shown below. To read it:
x = Image.open('test.tiff')
x.getextrema()
(-20.730741500854492, 10.131645202636719
The upload here does not accept '.tiff'
files, hence, please find the image at this link.
5- I replicated the same diagnostics with a Cifar100
image (the mode is RGB
), here are the numbers.
Before applying the transform:
ipdb>data.getextrema()
((166, 225), (32, 236), (11, 254))
After applying the transform (repeat has been removed here):
ipdb> data.min()
tensor(-2.)
ipdb> data.max()
tensor(1.9843)
Using only ToTensor()
transform, no Normalize()
:
ipdb> data.min()
tensor(0.)
ipdb> data.max()
tensor(0.9961)
NB. I think one can replicate the above diagnostics using only ToTensor()
Conclusion: ToTensor()
normalizes an RGB
image to be within [0, 1]
, but, it leaves a three-channel PIL F mode (32-bit float) image as is. And this is why, and intended that, one can use Normalize()
with mean
around 0.5
and std
0.25
, but, this won’t work for F
mode PIL
images.