Normalization in the mnist example

So How should I know what mean and std should I use to transfer my images to? it is different for MNIST, CIFAR10, and ImageNEt…

Any role that I need to stick with?


Just caculate them on the whole datasets like @dlmacedo did.

The code is not widely applicable, if the training images are not the same size and in image format, you can not use the code to calculate per channel mean and std

I tired, using transforms.Lambda(), to even try to normalize data per pixel from the whole data set.
For some reason it made results worse though I’d think it would be better strategy.

I wonder about something, Let’s say the first layer is Linear Layer (Fully Connected).
What’s the point in removing the mean from the data, as there is a Bias term is is optimized, wouldn’t it calculate the best term to begin with?

By normalizing the input, SGD algorithm will work better. If the feature scale is not approximately the same, it will takes longer time to find the minimum.

@jdhao, I wasn’t talking about the scaling, I was talking about the bias term.
Moreover, in the case of images all pixels are within the same range so stuff like normalizing different features units doesn’t apply here.

Put my question differently, after this “Centering” does the Bias of the first layer filter is around 0?

Training is more stable and faster when parameters are small. As a fact, none of these first order optimization method guarantees finding minimum for arbitrary network (in fact, they can’t even find it for the simple ones). Therefore, although scaling & offsetting is equivalent to scaling the weights and offsetting bias at first linear layer, normalization proves to often give better results.

Moreover, you shouldn’t normalize using every pixel’s mean and std. Since conv is an operation on channels, you should just use each channel’s mean and std.


Do we need tensors to be in the range of [-1,1] or is [0,1] okay? I have my own dataset of RGB images with a range of [0,1]. I manually normalized the dataset but the tensors are still in the range of [0,1]. What is the benefit of transforming the range to [-1,1]?

1 Like

@lkins, @smth
why you guys said [-1,1]? From the document, I just see [0,1]

class torchvision.transforms.ToTensor

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].

So if I do the normalization on each channel by myself, to convert [a,b] to [0,1], I don’t need transforms.ToTensor anymore, right?

But what if my data has a different range of each channel, such as x: -10 ~ 10, y: 1 -100, z: 20 -25 (actually they have some hidden correlation between each other), how to normalization? It doesn’t make sense to normalize them to the same range.

So the imagenet’s parameter
mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]
can also be used for cifar10 dataset’s normalization?

1 Like

This normalization can also applied on one channel image (gray image)?

I do not think so, for gray image, maybe you can just use 0.5 for its mean and 0.5 for its std.

Is it necessary to normalize the data? I’m just curious about two cases:

  1. If you don’t normalize the data
  2. If you don’t know the mean and std and just use 0.5 for all values.

Can you please these explanations as probably a footnote in the tutorials? In its current form it seems too intimidating to see constants popping without proper explanation. Great work BTW.

1 Like

@smth Why should they be in [-1, 1] range? How does that help the network?

I get why the input has to be normalized, but if the values are between 0 and 1 isn’t that already considered normalized? Why -1 and 1?

I guess that depends on the activation function(s) used. If you are using Sigmoid, then you are better off with [0, 1] normalization, else if you are using Tan-Sigmoid then [-1, 1] normalization will do. The normalization might, in many occasions, affect the time your network needs to converge; as the synaptic weights will adapt to the situation with time.


To anybody looking for a more universal solution for custom datasets, this is what worked for me:

# Note: data type must be numpy.ndarray
# example of data shape: (50000, 32, 32, 3). Channel is last dimension
data =
# find mean and std for each channel, then put it in the range 0..1
mean = np.round(data.mean(axis=(0,1,2))/255,4)
std = np.round(data.std(axis=(0,1,2))/255,4)
print(f"mean: {mean}\nstd: {std}")

Thanks for the explanation