CUDA suddenly always runs out of memory even with small models

clemo · June 6, 2024, 5:02pm

I’ve been building some models, some larger some smaller, but all of them could run on my laptop’s GPU (GeForce GTX 1650 Ti). I always ran them with a batch size of 32 but once try to increase it to 64 which led to my laptop basically freezing. I had to force it to shutdown. Ever since then, with even the smallest model and any batch size, I get a CUDA out of memory error.

I have done quite some googling and searched this forum a fair bit, and I assume I have some zombie process occupying a lot of memory. However, I cannot even localize it. The nvidia-smi command doesn’t show any processes. Btw I am on Windows, so I cannot use the “killall” or the “nvidia-smi --gpu-reset” commands.

Via PowerShell, I have also inspected active processes using “Get-Process” but couldn’t find anything. That is, when Spyder (which I am using, but running my code via the command prompt leads to similar issues) is closed, there aren’t any Python-related processes (as far as I can tell). torch.cuda.memory_allocated() also indicates that 0 memory is allocated (on start up before) running a model.

ptrblck · June 6, 2024, 8:55pm

After a restart no zombie process would be alive from the previous run which is also confirmed by checking nvidia-smi showing a 0MiB usage.

Are you able to use the GPU at all using any other CUDA application?

clemo · June 7, 2024, 10:19am

Oh boy, I guess I googled a bit too much and convinced myself that it must be a zombie process. All the posts about it I read seemed so similar to what I experienced. Turns out, I just created very large latent vectors by accident. Sorry about that.

@ptrblck Thank you for the quick response and in general all your replies on this forum. You have already helped me countless times.