ToTensor and Normalize transforms

After converting a PIL or NumPy image using ToTensor, I usually see Normalize called. The arguments are usually tuples of 0.5 or the values calculated from ImageNet.

  • First question: If the input range is [0,1] then the output would be [-1,1], which I’ve read is good for training a network. So that must mean that ToTensor changes the range of the input from [0, 255] to [0,1]. Is that correct?

  • Second question is, why is [-1, 1] any better than [0, 1] when it comes to pixel value ranges?

  • Third question is when would I want to tweak the mean and standard deviation? For example, I am working on the style transfer exercise of Udacity’s intro to PyTorch class. I noticed that when they load a new image for style transfer, they use the mean and standard from ImageNet. Why was this choice made?

1 Like


About ToTensor you are right, it standardizes input. Please see this post.

About your questions:

  1. If you use mean=[0.5,...], std=[0.5,...] then a standardized input will be in [-1, 1] but this exactly true for ImanetNet mean and std. You can verify this by calculating (x - mean(x)) / std.

  2. This question is little bit tricky, but if I want to summarize my answer, it has almost similar effect when you do not standardize input. As you might know, when images are in range of [0-255] or in case of other data, when a feature has a bigger range than the other features, neural network tend to change based on that range so that particular feature will dominate other features. About [0, 1], still data is biased toward positive side of numbers. If you plot [-1, 1] data, all are gathered around mean like a circle which helps to have a better learning. Actually, it has more to do with statistics which you can find many good lectures about it.

  3. In summary, when your dataset is far from ImageNet, you need to define your own custom mean and std, for instance in medical image processing.
    The idea is that a subset of a set tends to have similar statistics to the original set and that is why sampling works. Now, if your dataset is similar to ImageNet, then it has similar mean and std to ImageNet because it is like you sampled ImageNet to get your own. In your example, probably, your style transfer network is working with images with similar structure to ImageNet.