Cuda out of memory exception

Fabian_Wayne · February 14, 2022, 3:44pm

Hello,

I’m trying to get into pytorch and image segmentation using the U-Net architecture. As an input, I’m using 400 x 288 binarized pictures. First I tried to train a model with the GeForce GTX 1060 6GB which is running just fine with a batch size of 1. When I try to increase the batch size (e.g. use 8 as a batch size) I get the following error:

CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 6.00 GiB total capacity; 
4.71 GiB already allocated; 0 bytes free; 4.76 GiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried several approaches (use gc.collect and torch.cuda.empty_cache()) to fix the error but ultimately gave up couldn’t fix the problem. I recently got access to a RTX 3080 10GB, which I want to use to train my model. But I get the same error:

CUDA out of memory. Tried to allocate 114.00 MiB (GPU 0; 10.00 GiB total capacity; 
7.54 GiB already allocated; 0 bytes free; 7.86 GiB reserved in total by PyTorch) 
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Normally I would have expected that using more RAM would solve the problem - apparently a misjudgment. I’m running out of ideas on how to fix this issue. Maybe someone on this forum has an idea on how to get this model running, or maybe on how to approximate the RAM requirements?

Best Regards

gradienttrenchcoat · February 14, 2022, 4:42pm

As you mention, you are running out of RAM in both cases. Perhaps experimenting with the PyTorch profiler with lower batch sizes such that you don’t run out of memory is a good place to get started? This way, you can get an idea of how much memory the model + x batches of data (1, 2, etc) take on GPU and perhaps even identify memory bottlenecks in your script.