Standardize data on the fly?

I have a large data set of 3D images that does not fit into memory. The channels of this dataset have very different scale and location. E.g. there is one channels with values in [0,1] and another one with values in [900,1200]. I would like to standardize each channel to have mean 0 and variance one.

I would prefer not to safe an extra copy of the dataset, but do the normalization on the fly. Something like batch normalzation, but where the mean/std over all previously seen examples is used instead of the current batch. How to do this?

we normalize general RGB images using mean and std of imagenet data.
Similarly precompute mean and std of each channel across the whole dataset - just once - dont need to save the normalize data, just save the mean and std.
And in your dataset’s __getitem__ sub and divide the current sample by precomputed mean and std.

I hope this helps

1 Like