How to force/manually swap the VRAM into RAM?

I know that the latest version of PyTorch supports mixed use of VRAM and RAM (when VRAM is insufficient), but in some cases, I want to manually swap VRAM to RAM. Does PyTorch or CUDA provide such APIs?

Could you describe your use case in more detail?
If you want to just move tensors from the GPU to the host, you can use x = x.to("cpu"). PyTorch provides APIs for offloading and custom memory allocators via torch.cuda.MemPool in case you are looking for more advanced use cases.

Sometimes I try run different models in multiple different Jupyter notebooks, they are not popular CNN, ANN, etc.; they are some of my own attempts, which are sensitive to latency. PyTorch/CUDA automatically swaps VRAM to RAM, which is uncontrollable, and I don’t know when the swap will occur, leading to uncontrollable latency.

In details, my computer has 32GB RAM and 8GB VRAM, then I do the follows:

  1. Run the model 1 in notebook 1, which use about 7GB VRAM.
  2. Run model 2 in notebook 2, insufficient VRAM, swapping occurs during model 2 running, this will cause delay which is unacceptable for me. (I don’t want to kill the notebook 1 jupyter kernel for releasing VRAM, because I want to keep the environment for later use.)

What I want?

  1. Run the model 1 in notebook 1, which use about 7GB VRAM.
  2. Swap notebook 1 VRAM into RAM manually.
  3. Run model 2 in notebook 2. No swapping so no delay, this is very good.
  4. Swap notebook 2 VRAM into RAM manually.
  5. Swap notebook 1 RAM into VRAM, run model 1, no swapping no delay, very good.

No, it doesn’t. PyTorch will raise an OOM when you are trying to allocate too much memory on the GPU.
I’ve heard some Windows display drivers are automatically swapping GPU memory to the host, but this seems to be a Windows “feature”.

Yes, I’m Windows 11. I’m sure it will swap into RAM if VRAM insufficient and will not raise OOM.

In that case you are most likely seeing the Windows driver “feature” I mentioned and PyTorch cannot stop it.
If you are on Linux, you will get an OOM unless you explicitly are using UVM, CPU-offloading etc.

EDIT: to be precise: I don’t know if and how this behavior can be disabled, but maybe a Windows expert might know.

1 Like

The first method is adjusting the VRAM allocation in your computer’s UEFI or BIOS. Enter your BIOS and look for an option in the menu named Advanced Features.