Abnormal Mean and standard deviation values for image dataset

siddharth_MV · April 18, 2022, 5:51pm

I tried calculating the mean and standard deviation for an image dataset and I am getting very high values. If I am not wrong the values should be between 0 and 1.

def mean_std(loader):
    mean = 0.0
    std = 0.0
    total_images_count = 0
    for images, _ in loader:
        image_count_in_a_batch = images.size(0)
        images = images.view(image_count_in_a_batch, images.size(1), -1)
        mean += (images * 1.0).mean(2).sum(0)
        std += (images * 1.0).std(2).sum(0)
        total_images_count += image_count_in_a_batch
    mean /= total_images_count
    std /= total_images_count
    return mean, std

(tensor([112.1058, 126.2224,  79.7943]), tensor([49.5487, 50.0356, 43.0908]))

I can’t seem to find the mistake in this. Any help would be appreciated!

ptrblck · April 19, 2022, 4:58am

If you are using normalized inputs in this range, then the mean should be expected in this range, too.
However, based on your values I would guess that your inputs might be raw pixel values in [0, 255].

siddharth_MV · April 19, 2022, 2:32pm

Since I am using albumentation library, I had to specify the max_pixel_value in the normalize function. It’s working fine now. Thank you!

Sarmad_GTU · April 21, 2022, 1:10pm

I have a question here.
When I want to take the mean and standard deviation value of my dataset, should I take all the data, or do I just have to take the training part?

InnovArul · April 21, 2022, 1:52pm

The Ideal protocol is to use only the training data to calculate the mean and std.

Sarmad_GTU · April 21, 2022, 3:01pm

and the use this in normalization of all parts?

InnovArul · April 21, 2022, 3:16pm

yes, you are right. Use the mean, std from the training set to normalize validation, test, dev sets of data, if any.

Sarmad_GTU · April 21, 2022, 3:33pm

thank you for your explination