Suddenly I noticed that the virtual memory usage is huge during my training. The size grows when the first Tensor is passed to GPU. Also, I noticed that using more GPU is much slower than training using one GPU. I am almost certain that the training time on 2 GPUs what almost 2 times faster than using one GPU.
I am using ubuntu, PyTorch 1.0.1 and CUDA 10.
Here I can show htop output of training on one GPU
Pytorch claims all the memory of your GPU and even when it is not using that memory, if you run nvidia-smi it would show, no memory is free.
Refer to memory management docs for more details.
Excuse me, I am not able to understand the answer. nvidia-smi behaves normally - some of my chosen GPU’s are occupied at the same level as before in terms of RAM.
The simplest code that creates 3 visible lines with more than 20GB virtual memory taken in htop is here:
import torch
import time
cuda = torch.device('cuda')
A = torch.empty((100, 100), device=cuda).normal_(0.0, 1.0)
time.sleep(5)