Calculate the mean and standard deviation for a multimodal dataset

Hello Everyone,

There is a bit of a confusion about calcluating the mean and std of a dataset. Can you please answer my questions?

  1. What is the correct way to calculate the mean and std of a dataset?
    a) Calculate the mean and std for the whole dataset before splitting the dataset in to test,train and eval?
    b) Split the dataset in to test,train and eval and then calculate the mean and std individually for test, train and eval
  2. In case, if we are using datasets of different modalities. Example (RGB, Infrared, depthmaps etc). How to calculate the mean and std?
    a) Individual mean and std for each modality?
    b) Mix the image modalities and the calulate the mean and std for all the modalites together?

Thank you.

  1. You would usually calculate the stats from the training dataset, since the validation and test sets are “unknown” and using their stats could be considered as a data leak.

  2. I would claim it depends on the overall use case, i.e. are you able to use different transformations for each modality during testing (is the modality a separate input you’ll get)? If so, you could calculate and apply different stats to each input type. If you are only getting an image and don’t know the modality, I would claim you should try to use the mixed stats. However, this approach might not standardize the data properly (zero mean and unit variance) if the input modalities have different stats.