I’m training a video classification model with 8 classes. Each video contains 64 frames and each frame is 600x600 size.
since every video is quite big I can only use batch size of 16 on 8 V100 GPU’s (each gpu gets 2 videos randomly) - therefor the BatchNormalization layers calculated for 2 videos and not on the entire 16 videos which gives me low results.

Anyone has an idea how to solve this?

refe to Calling loss.backward() reduce memory usage?

Free graph of last iteration will help you save gpu memory, this made my bachsize doubled.