RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.94 GiB total capacity; 3.36 GiB already allocated; 13.06 MiB free; 78.58 MiB cached)

NarasimmanSaravana19 · September 23, 2019, 12:00pm

My PyTorch version

print(torch. version )
1.1.0

Need to downgrade the PyTorch version? And how?
Please guide me…Thanks

DerekGloudemans · September 23, 2019, 6:13pm

I’ve run into this error when I try to load a batch onto the GPU that is too large to fit. The solution in this case is to select a smaller batch size.

I’ve also had this issue when I stop code during execution, I think because the GPU memory is not properly cleared in this case. The solution for me here has been to restart the kernel.

I’d start with checking these first. If neither of these solutions works for you, you’ll have to get some more detailed help.

Lastly, the following command line command provides a display of the GPU resources and their utilization, and can be useful for diagnosing issues:

watch nvidia-smi -l 1

NarasimmanSaravana19 · September 24, 2019, 4:48am

@DerekGloudemans Thanks for the support, let me check thanks…

rustagiadi95 · September 24, 2019, 5:05am

This error usually occurs when your are running the model without clearing the previous gradients using model.zero_grad(), as the gradients take up memory and get piled up to eat your GPU if not cleared.

While the usual practice is using model.zero_grad() while training, hence this should not occur while training until the very first iteration of the data is costly in terms of memory on your gpu - in that case, try to use torch.cuda.empty_cache() to clear the cache or try to detach any useless tensors using tensor.detach() to not evaluate and store their gradients in your GPU.

While on testing or validation side, we don’t want to create and store gradients for any forward pass so try to run your testing or validation code under the torch.no_grad() context manager.

Hope this helps,
Thanks.