# [Transfer Learning] What is the purpose of normalization?

Hi Everybody,

I have been discussing intensively with my colleagues about dataset normalization in transfer learning.
In particular, we understand why z-score normalization is performed over the dataset when a network is trained from scratch.
However, we are not sure about the utility of the normalization in transfer learning.

Let’s suppose that we want to apply transfer learning to a network that has been previously trained on a scalar (1-D) dataset. This dataset, named dataset0, is characterized by mean u0 and standard deviation std0. The network has been trained from scratch with the given dataset z-score-normalized using u0 and std0. That is to say: each data point has been subtracted by u0 and divided by std0.

When we want to apply transfer learning to the pretrained network, the common data transformation routine that precede the training on the new dataset, named dataset1, is the following:

1. Variable data augmentation techniques applied to the dataset
2. transformation to tensor which also normalize data values from 0 to 1
3. normalization with values of mean u0 and a standard deviation std0 of the dataset0 used to train the pre-trained model

The question is: “Why u0 and std0 are used in point number 3?”
We might think that standard deviation and mean of dataset1 should be used instead. If u0 and std0 are used on dataset1, the resulting normalized dataset will not necessarily have zero mean nor standard deviation equal to 1.

Our hypotheses about this approach is that the usage of u0 and std0 keeps a statistic consistency between the two datasets. In particular, a “1” in dataset0 will be mapped to “(1-u0)/std0” after the normalization in the training from scratch phase. Then, a “1” in dataset1 will be mapped to the same value “(1-u0)/std0” after the normalization in the transfer learning phase if u0 and std0 are used to normalize dataset1. If u1 and std1 were used as normalization parameters instead, two equal values in dataset1 and dataset0 will be mapped to two different values after the normalization.
This explanation is what we think stays behind the torchvision.transforms.Normalize(u0,std0) in transfer learning.