GPU full and code not running

Andre_Amaral_IST · May 4, 2022, 7:11pm

Hey!

I am trying to free the GPU memory, since i am not running any code and it is full.
I also closed the Visual Studio Code, and re-open it and nothing happens.
I also used torch.cuda.empty_cache() but the memory is not released.

Anyone can help me?
Regards!
André

Andrei_Cristea · May 4, 2022, 7:22pm

Could you attach the output of nvidia-smi? Someone here will likely be able to help you, but will need that as a starting point.

Andrei_Cristea · May 4, 2022, 8:22pm

I’m really not versed in reading nvidia-smi output but it looks to me like you have a bunch of python processes running that are using up GPU memory (bottom section of your image). Could you kill those and check if that helps release your memory? e.g. run $ kill 111715 in your shell, where 111715 is the PID of the first process that shows up in nvidia-smi.

Andre_Amaral_IST · May 4, 2022, 8:27pm

Thank you! It realy helped, now my cuda memory is empty as i wanted! Thank you very much for your help.

Regards
André Amaral

Andre_Amaral_IST · May 5, 2022, 4:36pm

Hey,
Sorry to re-open this topic but now i have no process running and my cuda memory is full.

And i did not change the code I was running and now it complains about the CPU too.

Andrei_Cristea · May 5, 2022, 5:24pm

Hi Andre -

From this section of nvidia-smi, the GPUs seem to be far from full:

From your error message…

…it seems like you’re trying to load a dataset of about 11.7 GB into memory, and each of your GPUs can only 6 GB available (and maybe your CPU also doesn’t have enough memory for this). Perhaps you could somehow split up the dataset and send half of it to one GPU and another half to another GPU (I’ve never done that before so I’m not sure how that works).

However, just wanted to say that it is atypical to load the entire dataset into memory. What’s more typical is to use dynamic / lazy loading, using Datasets and DataLoaders like here.

Andre_Amaral_IST · May 5, 2022, 5:30pm

Thanks, I will look to it, but it is strange since like for 10 days I was able to upload that dataset on the CPU and the code did not break in that section.

Thanks again for your help
Regards
André

Andre_Amaral_IST · May 5, 2022, 5:32pm

The code that provides the error is the following:

Where I am used the DataLoader but for what I debug the code crashes in the np.array(X).

Andrei_Cristea · May 5, 2022, 6:07pm

Your CPU will have more or less memory available depending on what else is running on your system, independent of the GPU and what nvidia-smi might tell you. So it’s possible that it would work at some moments but not others, since the amount of memory consumed by other processes is not going to be consistent over time.

Andre_Amaral_IST · May 5, 2022, 6:11pm

That makes sense!
Thanks very much for your time!
Regards
André

ptrblck · May 6, 2022, 11:50am

Also double post from here with the same explanation as given by @Andrei_Cristea.