Running resnet50 with cuda out of memory in colab

Hi all, when I trained renset50 with cross entropy loss in Google Colab I got this error:

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 14.75 GiB total capacity; 13.25 GiB already allocated; 126.81 MiB free; 13.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

how can I run resnet50 without any problem on Pytorch using GPU?
any insight is very appreciated.

You might need to reduce the batch size of the training, use mixed-precision training, apply checkpointing etc. to lower the memory requirements of your training.

Hi @ptrblck, my batch size is 128 and I used checkpoint but I don’t know about mixed-precision training. Could you please explain more?

Take a look at these docs to learn more about how to use torch.amp.

1 Like