Why did pytorch run through the code before, when I run it again this time, I reminded: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 6.00 GiB total capacity; 353.72 MiB already allocated; 3.86 GiB free; 396.00 MiB reserved in total by PyTorch)

hg6a6 · April 15, 2021, 10:27am

ptrblck · April 16, 2021, 5:49am

Could you check, if the previous script crashed and if zombie processes might be using the device memory? If so, kill them, and rerun the script.

learner12345 · April 24, 2022, 9:45am

Hi hope you are well.
I’m also getting a similar kind of error.
First I finetuned Yolov5 on custom data.
After obtaining best I tried to refine model on those best weights but with different image size.
but now i can’t run model even with small batch size.
And I can’t run any other mode like detection transformer on machine
I tried to emtpy cache as suggested but all in vain
I tried to run code on different machine with same GPU RTX 3080 and and Vram 10GB

Any idea how can i fix this error.

ptrblck · April 24, 2022, 8:14pm

I guess other (dead) processes might use the device memory, so check them via nvidia-smi and/or ps aux and kill them if needed.

learner12345 · April 26, 2022, 8:20am

i can’t see any heavy projects ruuning
Screenshot from 2022-04-26 13-17-23

Why 7.18 GiB reserved in total by PyTorch??

ptrblck · April 26, 2022, 6:25pm

The memory usage depends on the actual model architecture as well as the input shapes besides some other requirements for e.g. workspaces for specific kernels etc.
E.g. you would have to store all model parameters, buffers, input tensors, forward activations, gradients, optimizer states (if used). What are your expectations and how did you estimate the memory requirement?