# Is my calculation of normalisation correct

Hi,
I am applying 0 Mean & Std Dev-1 normalisation to my train and dev set but it is giving me poor metrics in comparison to without normalization after applying them.
Am I calculating the mean & Std dev correctly?

``````train_mean = []
train_std = []

for i,image in enumerate(tqdm(train_loader, 0)):
numpy_image = image.numpy()
batch_mean = np.mean(numpy_image, axis=(0, 2, 3))
batch_std = np.std(numpy_image, axis=(0, 2, 3))
train_mean.append(batch_mean)
train_std.append(batch_std)
train_mean = torch.tensor(np.mean(train_mean, axis=0))
train_std = torch.tensor(np.mean(train_std, axis=0))
``````

My application of normalisation from the values obtained from above in the dataset class below -

``````import torchvision.transforms.functional as TF
image = TF.normalize(image, [0.7259, 0.7320, 0.6922], [0.2350, 0.1569, 0.1024]) # noqa

``````

Strictly speaking, you are not supposed to aggregate the standard deviation that way.
Is there a reason why the RGB channels have vastly different stds? This seems surprising, but of course, only you know your data.

Best regards

Thomas

1 Like

Data is pretty similar in terms of appearance, is the difference in standard deviation a lot?
Am I calculating the std dev correctly?

My bad, I was calculating mean&std incorrectly.
These are the new values with the updated code, are these correct or they still have problems -

Mean: tensor([-1.7882, -3.5365, -5.7360])
Std Dev: tensor([1.0839, 1.0510, 1.1245])

``````var = 0.0
mean = 0.0
nimages = 0
for i,(images,_) in enumerate(tqdm(train_loader, 0)):
batch_samples = images.size(0)
images = images.view(batch_samples, images.size(1), -1)
nimages+= images.size(0)
mean += images.mean(2).sum(0)
var += images.var(2).sum(0)
mean /= nimages
var /= nimages
std = torch.sqrt(var)
``````

Can you say something about the images and your pipeline to load them?
The common thing is to use PIL + ToTensor and then you’d get 0…1 images, which the statistics say you don’t have.
I mentioned that before, but strictly speaking you cannot aggregate variance like that but should use `M2 = (images ** 2).mean(dim=(0, 2, 3))` or so and then subtract `mean**2` at the end. (see Algorithms for calculating variance - Wikipedia for more sophistication)

Best regards

Thomas

1 Like

These are medical images.
Below is my `dataset class`

``````from PIL import Image
import torchvision.transforms.functional as TF

def __init__(self, list_IDs, labels, root_dir, train_transform=False, valid_transform=False, test_transform=True): # noqa
"""To instantiate labels and the data distribution."""
self.labels = labels
self.list_IDs = list_IDs
self.dir = root_dir
self.race_folder = glob.glob(os.path.join(root_dir, '*.JPG'))
self.train_transform = train_transform
self.valid_transform = valid_transform
self.test_transform = test_transform

def train_transforms(self, image):
"""To train transformation."""
image = TF.to_tensor(image)
image = TF.normalize(image, [-1.7882, -3.5365, -5.7360],[1.0839, 1.0510, 1.1245])

def __getitem__(self, index):
"""Generate one sample of data."""
ID = self.list_IDs[index]
image = Image.open(os.path.join(self.dir, ID))
y = self.labels[ID]
if self.train_transform:
image = self.train_transforms(image)
elif self.valid_transform:
image = self.valid_transforms(image)
else:
image = self.test_transforms(image)
img = np.array(image)
return img, y
``````

Image is PIL? What kind of transforms are you using?

1 Like

Yes, I imported `Image` from `PIL`.
Transforms are just `to_tensor` and then `normalise`. You could see it in the `train_transform` function above.

But then something is fishy with the averages you get - to_tensor should give you tensors with values 0…1, so you cannot possibly get negative means or std/var > 1.
Be extra sure that your normalization gets the same images (maybe up to random) as your training.

Best regards

Thomas

1 Like

Yesss, you were right. I was calculating the normalisation values twice.
Below are my new mean and std_dev values, do you think these are correct, Also how you are able to see that values are not correct, just by looking at them?

``````Mean: tensor([0.3057, 0.1771, 0.1048])
Std Dev: tensor(0.2547, 0.1649, 0.1152])
``````

Thanks.

I think these the new values are plausible. The red is quite a bit stronger than green and blue, which would be funny for “natural images”, but of course the medical images will look differently. Do the images look reddish when you open them in an image viewer?

Best regards

Thomas

1 Like

Yes, those are fundus images which are red in colour. I’ll also fix the `std dev` value as you suggested above, I still quite didn’t understand it but I’ll figure it out.
Thanks.