Image normalisation after applying augmentation techniques

Yanhong_Zhao · January 31, 2020, 1:32pm

data_transform = transforms.Compose([
transforms.Resize((229,229)),
transforms.RandomResizedCrop((229,229)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[unknown],
std=[unknown])
])

In order to use Data Transforms like this on my own training set, I first need to calculate the mean and standard deviation of my images in the training set which is [87.92133030884222, 89.66749718221185, 81.1012737560522] and [53.17980884, 51.26849681, 51.69558279].

Now I am wondering after applying random sized crop and random horizontal flip which essentially is generating more training examples in the data loader object so surely now my standard deviation and mean of the images will change a little bit.

What is the workaround of this or rather people tend to normalise images first upfront to the [0,1] for all three channels and apply a vanilla mean and variance like [0.5,0.5,0.5] and [0.5,0.5,0.5]?

Thank you!

KFrank · January 31, 2020, 3:09pm

Hello Yanhong!

It really doesn’t matter. There is nothing magic about normalizing your
data to get exactly some mean and standard deviation – you just want
to get the general scale of your data to be sensible.

Unless there is something perniciously weird about your data, the
augmentation you are using will have very little effect on the statistics
of your data set. You’ll be fine calculating the mean and standard
deviation of your original training set and using those values to
normalize your augmented set.

Good luck!

K. Frank

Yanhong_Zhao · January 31, 2020, 4:19pm

Do you know a quick way of calculating the mean and standard deviation of pixel value for the three respective channels of the training set?

KFrank · January 31, 2020, 4:56pm

Hi Yanhong!

If your training set is packaged as a giant tensor, train, (of shape
(nSample, width, height, nChannel)), then you can use
torch.mean() and torch.std() with the dim argument:
torch.mean (train, dim = (0, 1, 2). This will return a tensor
of shape (nChannel, ) containing the means for each of the channels.

Best.

K. Frank