Imagenet Standard Normalization/Standardization

Can someone point out which reliable source does the “mean = [0.485, 0.456, 0.406]” and “std = [0.229, 0.224, 0.225]” normalization for imagenet come from?

After researching and reading some articles, whenever someone asks this, it seems like all articles I read are just referring to non-academic or unreliable sources like blogs, discussions, or even “by conventions”.

The reason I have this question is that in TensorFlow/Keras application implementations (for example, EfficientNet), they use [0.229, 0.224, 0.225] as variance to standardize the images. However, Pytorch official document uses the exact same value as standard deviation to standardize images.

This causes difference since normalized = (data - mean) / std = (data - mean) / sqrt(variance),
which therefore causes difference in resulting standardized data, the pytorch version results in data ranging in approximately (-2.1, 2.2), while tensorflow version results in data ranging in approximately (-1.0, 1.1).

From what I learned, standardizing data to approx. the unit length seems to make more sense, but using [0.229, 0.224, 0.225] as std seems to be a more common approach. Can anyone explain which is correct and is there any paper/academic article on it. Thanks

You could check the stddev and variance of the resulting tensor and make sure it has a unit variance (=1), which is the purpose of this normalization/standardization step.
I’m not familiar with the TF implementation and don’t know what internally is used.

1 Like