BatchNorm cause "CUDA out of memory"

Hi. My code works well with small batch size on single GPU. However, training samples with larger batch size on multiple GPUs, an error will raise when operating BN layers, e.g., “CUDA out of memory. Tried to allocate 418.00 MiB (GPU 0; 11.91 GiB total capacity; 11.13 GiB already allocated; 194.56 MiB free; 11.18 GiB reserved in total by PyTorch)”.

Besides, for some reasons, it is unsuitable to use smaller batch size and other normalization methods (e.g., LayerNorm).

Is there any advice to fix this bug? Thank you in advance.

1 Like


Can you give more details about how you are training on multiple GPUs.

For training on MultiGPUs, One way is to use DataParallel() where batches of input data are split across GPUs and after each step of computation gradient accumulation happens on a single GPU. You can refer to this tutorial link.

1 Like