CUDA out of memory during training

PauleB · June 11, 2020, 8:30am

Hello,

I am pretty new to machine learning and I am facing an issue I cannot solve by myself.

I took this code to implement U-net model and modify it a little bit to fit my dataset: https://www.kaggle.com/hsankesara/unet-image-segmentation

I have 240 pairs of images-masks both 256x256 px size.
My batch size is reduce to 8 but I have the error “CUDA out of memory. Tried to allocate 408.00 MiB (GPU 0;5.00 GiB total capacity; 3.00 GiB already allocated; 236.37 MiB free; 3.46 GiB reserved in total by PyTorch)”.

The error happens at the epoch 49/1000 in the middle of the forward function of the Unet class.

From suggestions to same issue topics, I have tried to del some variables and use torch.cuda.empty_cache() after each iterations, I have also tried to use torch.no.grad() to get the validation loss.
Those did not help, but it is possible that I did not put them in the right place.

From the linked code, is it possible to have at least advice to where I should empty the cache, delete the variables etc ?
Or if you have other hypothesis to solve this issue of memory ?

Thank you for the help !

Paule.

ptrblck · June 12, 2020, 7:50am

You don’t need to call torch.cuda.empty_cache(), as it will only slow down your code and will not avoid potential out of memory issues.
If PyTorch runs into an OOM, it will automatically clear the cache and retry the allocation for you.

That being said, you shouldn’t accumulate the batch_loss into total_loss directly, since batch_loss is still attached to the computation graph, which will be stored as well.
Use total_loss += batch_loss.detach() instead and rerun the code.

Also, you are right in wrapping the validation loop into a with torch.no_grad() block, as this will avoid storing the intermediate tensors, which would be needed for the backward pass.

If that doesn’t help, you might need to further decrease the batch size or use torch.utils.checkpoint to trade compute for memory.

PauleB · June 15, 2020, 9:11am

Hey,
Thank you for the advice!
I have tried to reduce the batch size to 4, the checkpoints, the detach() for the batch loss and I have also tried the torch.cuda.amp.GradScaler().
But nothing changed, the error still happened at the same place which I found quite weird.
I will try to investigate it further, maybe reducing the images size (but for me, if I do that it is more about putting a plaster on the problem instead of solving it).

PauleB · June 16, 2020, 2:34pm

Update:
The problem came from the validation calculation actually. I do a validation loss calculation each 50 epochs and my validation dataset size is 72 which is too large. That is why I have an error " CUDA out of memory" while finishing the epoch 49 and starting the epoch 50.
I will change the code in that way and hopefully it will work !

Bohao_Cheung · December 26, 2024, 2:37am

A good suggestion that use with torch.no_grad() in test and validation phase(clear immediate tensors) and detach the loss(remove cached calculating phase) while calculating the total loss!!!