CUDA out of memory. Tried to allocate 538.00 MiB (GPU 0; 11.00 GiB total capacity; 230.80 MiB already allocated; 8.53 GiB free; 242.00 MiB reserved in total by PyTorch)

Fewen · May 11, 2020, 8:47am

Hello everyone!
I tried to run the camvid project of the FastAI course (https://course.fast.ai/videos/?lesson=3, https://nbviewer.jupyter.org/github/fastai/course-v3/blob/master/nbs/dl1/lesson3-camvid.ipynb) on my pc.
(specs: Windows 10 Education 64 bit, processor Intel® Core™ i7-7700K CPU @ 4.20GHz, 4200 MHz,
GPU: Nvidia GeForce GTX 1080 Ti 11GB )

Its my first time running any kind of cnn/unet and for me the error message doesnt really make sense since it tries to allocate 538 MiB and it wont work altough 8.53 GiB is free.
Is it because PyTorch has only reserved 242 MiB ?
I checked the other related forum topics but all of the have “Tried to allocate X GiB and <X GiB is free”.
And batchsize reduction doesnt solve the problem for me like mentioned in the other forum topics.

Anyone has had the same problem?

Best regards!

ptrblck · May 12, 2020, 6:08am

What was your initial batch size and what is the current one?
Also, did you make sure that your GPU is empty and no other processes use device memory?

Fewen · May 12, 2020, 11:20am

My inital batch size was 8 but I tried it with 4 and 2 and had the same issue for both of them.
Yes before im running the code I clear the GPU memory with torch.cuda.empty_cache()
Is it maybe because Im running it on windows?

ptrblck · May 12, 2020, 10:38pm

torch.cuda.empty_cache() will only clear the PyTorch memory cache on the device.
I meant you should check via nvidia-smi, if other processes are using the GPU.
Also, if a batch size of 1 doesn’t fit on the GPU, you might need to use torch.utils.checkpoint to trade compute for memory.
Although it would be surprising to see a FastAI lecture code would need a bigger GPU.

Fewen · May 18, 2020, 5:37pm

Thanks for the help!
I guess there was a problem with my Anaconda version and the fastAi library together with windows.
I solved it by buying a new ssd where I installed the new ubuntu 20.04 and it worked first try with batch size of 4.