At the moment I’ve been using a method that I found here: Computing the mean and std of dataset
mean = 0.0
for images, _ in loader:
images = images.view(images.size(0), images.size(1), -1)
mean += images.mean(2).sum(0)
mean = mean / len(loader.dataset)
std = 0.0
for images, _ in loader:
images = images.view(images.size(0), images.size(1), -1)
std += ((images - mean.unsqueeze(1))**2).sum([0,2])
std = torch.sqrt(std / (len(loader.dataset)*224*224))
print('\n' + str(round(mean[0].item(), 4)) + ',' + str(round(mean[1].item(), 4)) + ',' + str(round(mean[2].item(), 4)) + \
' ' + str(round(std[0].item(), 4)) + ',' + str(round(std[1].item(), 4)) +',' + str(round(std[2].item(), 4)))
I also found another slightly different method here that’s quicker than the method above, but it produces a different set of numbers for the dataset STD: https://stackoverflow.com/questions/60101240/finding-mean-and-standard-deviation-across-image-channels-pytorch
nimages = 0
mean = 0.
std = 0.
for batch, _ in loader:
# Rearrange batch to be the shape of [B, C, W * H]
batch = batch.view(batch.size(0), batch.size(1), -1)
# Update total number of images
nimages += batch.size(0)
# Compute mean and std here
mean += batch.mean(2).sum(0)
std += batch.std(2).sum(0)
# Final step
mean /= nimages
std /= nimages
print(mean)
print(std)
Which method is more accurate? Is there a better way to perform these calculations?