I am using the Mini-Kinetics-200 dataset, so, the data are coming from the same distribution. I do the same rescaling on all frames of all of the videos, that is, rescaling them to 224 x 224. Batchnorm layers are present in the DenseNet-121 model from torchvision. I use batch size of 1, to simulate online training.
Thanks.