In order to use Data Transforms like this on my own training set, I first need to calculate the mean and standard deviation of my images in the training set which is [87.92133030884222, 89.66749718221185, 81.1012737560522] and [53.17980884, 51.26849681, 51.69558279].

Now I am wondering after applying random sized crop and random horizontal flip which essentially is generating more training examples in the data loader object so surely now my standard deviation and mean of the images will change a little bit.

What is the workaround of this or rather people tend to normalise images first upfront to the [0,1] for all three channels and apply a vanilla mean and variance like [0.5,0.5,0.5] and [0.5,0.5,0.5]?

It really doesn’t matter. There is nothing magic about normalizing your
data to get exactly some mean and standard deviation – you just want
to get the general scale of your data to be sensible.

Unless there is something perniciously weird about your data, the
augmentation you are using will have very little effect on the statistics
of your data set. You’ll be fine calculating the mean and standard
deviation of your original training set and using those values to
normalize your augmented set.

If your training set is packaged as a giant tensor, train, (of shape (nSample, width, height, nChannel)), then you can use torch.mean() and torch.std() with the dim argument: torch.mean (train, dim = (0, 1, 2). This will return a tensor
of shape (nChannel, ) containing the means for each of the channels.