How does "reserved in total by PyTorch" work?

It appears that this is fixed by Don't split oversize cached blocks by mwootton · Pull Request #44742 · pytorch/pytorch · GitHub. May I know the version of the PyTorch this is fixed in?
@ptrblck

It should be available in PyTorch >=1.10.0.

1 Like

Still the same issue. after trying different version

CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 2.60 GiB already allocated; 0 bytes free; 2.64 GiB reserved in total by PyTorch)

The other 2 GB of GPU are not being used, just reserved for nothing.

Your current workload hat 2.6GB allocated and approx. 40MB in the cache which might be fragmented and thus won’t be able to be used for the desired 20MB tensor. The rest is used by the CUDA context as well as other applications.

Then any suggestion on how to solve this, like how can I devote all of it just for pytorch.

Check if other applications are using the GPU e.g. via nvidia-smi and close them if possible.

RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 4.00 GiB total capacity; 3.49 GiB already allocated; 0 bytes free; 3.53 GiB reserved in total by PyTorch)

I have what appears to be a fairly extreme case of this:

CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 15.74 GiB total capacity; 1.44 GiB already allocated; 25.56 MiB free; 1.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have ~16GB capacity, but for some reason only 1.47 GB is reserved by PyTorch. Furthermore, I am running it on a fresh Paperspace cluster that is doing literally nothing other than downloading a Huggingface model for inference. It would be great to understand what might be happening here!