I know that the latest version of PyTorch supports mixed use of VRAM and RAM (when VRAM is insufficient), but in some cases, I want to manually swap VRAM to RAM. Does PyTorch or CUDA provide such APIs?
Could you describe your use case in more detail?
If you want to just move tensors from the GPU to the host, you can use x = x.to("cpu").
PyTorch provides APIs for offloading and custom memory allocators via torch.cuda.MemPool
in case you are looking for more advanced use cases.
Sometimes I try run different models in multiple different Jupyter notebooks, they are not popular CNN, ANN, etc.; they are some of my own attempts, which are sensitive to latency. PyTorch/CUDA automatically swaps VRAM to RAM, which is uncontrollable, and I don’t know when the swap will occur, leading to uncontrollable latency.
In details, my computer has 32GB RAM and 8GB VRAM, then I do the follows:
- Run the
model 1
innotebook 1
, which use about 7GB VRAM. - Run
model 2
innotebook 2
, insufficient VRAM, swapping occurs duringmodel 2
running, this will cause delay which is unacceptable for me. (I don’t want to kill thenotebook 1
jupyter kernel for releasing VRAM, because I want to keep the environment for later use.)
What I want?
- Run the
model 1
innotebook 1
, which use about 7GB VRAM. - Swap
notebook 1
VRAM into RAM manually. - Run
model 2
innotebook 2
. No swapping so no delay, this is very good. - Swap
notebook 2
VRAM into RAM manually. - Swap
notebook 1
RAM into VRAM, runmodel 1
, no swapping no delay, very good.
No, it doesn’t. PyTorch will raise an OOM when you are trying to allocate too much memory on the GPU.
I’ve heard some Windows display drivers are automatically swapping GPU memory to the host, but this seems to be a Windows “feature”.
Yes, I’m Windows 11. I’m sure it will swap into RAM if VRAM insufficient and will not raise OOM.
In that case you are most likely seeing the Windows driver “feature” I mentioned and PyTorch cannot stop it.
If you are on Linux, you will get an OOM unless you explicitly are using UVM, CPU-offloading etc.
EDIT: to be precise: I don’t know if and how this behavior can be disabled, but maybe a Windows expert might know.
The first method is adjusting the VRAM allocation in your computer’s UEFI or BIOS. Enter your BIOS and look for an option in the menu named Advanced Features.