I see plenty of guides on how to calculate std and mean for an image dataset, but how would you do so for videos in the standard format (B x L x C x W x H)?
Do you mean to ask what works best practically or do you wanna know how to calculate mean, std for an n-dim tensor in Pytorch?
I guess it’s the former because the latter is too easy. I’ve never worked with a video dataset so I don’t know what works best practically. But usually with any kind of dataset, you should start with the easiest normalization and then make it more nuanced with experimentation.
So I’d have tried normalisations in the following order:
- Simply divide each pixel by 255 to get values between 0 and 1. This is what you do with images as well. It’s a standard practice so I guess you’re doing it already.
- Find the mean, std of the whole dataset. So, basically mean, std across all pixels of all frames of all videos i.e. across all
B,L,C,W,Hdims. Stadardise all pixels using this mean and variance.
- Calculate mean, std for each video separately. So mean, std across
- Mean across each frame separately i.e. across
I’d expect approach
3 to work best but that’s just a guess.