What is the initial ~1.3GB allocated VRAM when first using .cuda()?

Hi, I was playing with pytorch trying to find how much space some things occupy in VRAM using this script and found that when I first send anything to the GPU, I get a fixed memory usage (seen through nvidia-smi) of around 1284MB.

Was wondering what it is due to? I would’ve guessed the cuda runtime sits in normal RAM, but it seems to also occupy space in VRAM? Though I am not understanding how that would work.


Great question. There are several things going on here. First, a bare CUDA context on a GPU will use about 300MiB VRAM (this gets initialized the first time you call .cuda()). Then libtorch_cuda.so (containing all GPU kernels for various tensor operators, etc.) will be loaded into VRAM taking another 500MiB or so. Finally if you use any cuDNN or cuBLAS functions loading those libraries will also take memory. Additionally, the caching allocator may reserve additional GPU memory ahead of time even before it is required for tensors to save time when more allocations are needed.

Hi thanks a lot for the answer! That makes perfect sense.

The way I originally thought about VRAM was that it only stored the tensors the GPU would operate on, but that the instructions were stored in RAM and fed into the GPU live on each step. From your answer I can see that I was totally wrong, and you actually do store libraries in VRAM. If you don’t mind me asking, is there any resource I could read to get a decent overall idea of how this flow works?