BatchNorm2d behaviour in video inputs (5d)

When i use a 2d cnn for feature extraction of the video in a CNN-RNN architecture to pass the 5d tensor through the cnn it is reshaped as (batch_sizenumber_of_frames , C , H ,W ) 4d tensor so as batch_size = (batch_sizenumber_of_frames )
Does BatchNorm2d affect performance or should I pass every frame at a time from the cnn ?