Why image datasets need normalizing with means and stds specified like in transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])?

Why image datasets need normalizing with means and standard deviations specified like in transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ?

Where did those numbers come from and why isn’t this done already in the datasets?


1 Like

These stats are calculated from the ImageNet dataset, which was used to pretrain classification models in torchvision. You don’t need to use these stats and could also try to use the stats calculated from your custom dataset.


Thanks. But aren’t we only concerned with getting the values between 0 and 1?
To get the values between 0 and 1 we just need something like a min-max normalization right?
Why do we care what the mean and standard deviation are?

Even the dataset in torchvision.datasets.CIFAR10 apparently needs a similar step with
data_means = [0.4914, 0.4822, 0.4465]
data_stds = [0.2023, 0.1994, 0.2010]
torchvision.transforms.Normalize(mean = data_means, std = data_stds)

1 Like

I’ll only reply the first question: no, we are not concerned with values between 0 and 1.

0-255 pixel values can blow up the gradients and the model wont converge, because it oscillates between large values.

We could imagine concatenating the red channel of all images in this single array [0,1,45,250,...]

By normalising we make sure that the new array has mean=0 and a std=1, which is more or less to say that we centre it. If you plot it now, the values in the red channel would be equally distributed around 0 (average value is 0), and most values won’t be too large.

I can not just justify why would it matter to centre rather than just .div(255) but it seems a common practice.

I think what was suggested by @ptrblck is a good idea, and common practice i.e to normalise from your dataset; I sometimes use .div(255) and haven’t had real issues.

Anyways, this is my interpretation, I am also learning.

You may benefit from his post.

Normalize will create outputs with a zero-mean and unit-variance as @Mah_Neh also explained. This normalization is also sometimes called standardization, z-scoring, or whitening.
In ML literature you can often read that whitening a signal is beneficial for the convergence with some theoretical analysis. E.g. I’m quite sure Bishop explains it also in his fantastic “Pattern Recognition and Machine Learning” book.
This approach is thus also used commonly for deep learning training. Normalizing the values to the range of [0, 1] could also work and you could test if your use case would see a difference between [0, 1] values and standardized values.


I’ve noticed difference numbers for the CIFAR10 means and stds.

For example, in the PyTorch docs they use 0.5 everywhere!?


transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)
1 Like

I guess nobody calculated these stats for CIFAR10 or the calculated stats didn’t yield any significant improvement over the default 0.5.

1 Like


We normalize training data the way we think best for training. However, when a model
is used in production for inferences, the real testing data is not normalized!? Why
isn’t that a problem? Didn’t the training phase create a model that now expects a certain


We have to normalise before inference, as a pre-processing step.

1 Like

Thanks. I really appreciate it.