Most accurate, quickest, and efficient way to calculate the 3 number mean and std of a dataset for normalization?

ProGamerGov · July 29, 2020, 4:05pm

At the moment I’ve been using a method that I found here: Computing the mean and std of dataset

    mean = 0.0
    for images, _ in loader:
        images = images.view(images.size(0), images.size(1), -1)
        mean += images.mean(2).sum(0)
    mean = mean / len(loader.dataset)

    std = 0.0
    for images, _ in loader:
        images = images.view(images.size(0), images.size(1), -1)
        std += ((images - mean.unsqueeze(1))**2).sum([0,2])
    std = torch.sqrt(std / (len(loader.dataset)*224*224))


    print('\n' + str(round(mean[0].item(), 4)) + ',' + str(round(mean[1].item(), 4)) + ',' + str(round(mean[2].item(), 4)) + \
          '  ' + str(round(std[0].item(), 4)) + ',' + str(round(std[1].item(), 4)) +',' +  str(round(std[2].item(), 4)))

I also found another slightly different method here that’s quicker than the method above, but it produces a different set of numbers for the dataset STD: https://stackoverflow.com/questions/60101240/finding-mean-and-standard-deviation-across-image-channels-pytorch

    nimages = 0
    mean = 0.
    std = 0.
    for batch, _ in loader:
        # Rearrange batch to be the shape of [B, C, W * H]
        batch = batch.view(batch.size(0), batch.size(1), -1)
        # Update total number of images
        nimages += batch.size(0)
        # Compute mean and std here
        mean += batch.mean(2).sum(0) 
        std += batch.std(2).sum(0)

    # Final step
    mean /= nimages
    std /= nimages

    print(mean)
    print(std)

Which method is more accurate? Is there a better way to perform these calculations?