I’m trying to understand what happens to the both RAM and GPU memory when a tensor is sent to the GPU.
In the following code sample, I create two tensors - large tensor
arr = torch.Tensor.ones((10000, 10000)) and small tensor
c = torch.Tensor.ones(1). Tensor
c is sent to GPU inside the target function
step which is called by
multiprocessing.Pool. In doing so, each child process uses 487 MB on the GPU and RAM usage goes to 5 GB. Note that the large tensor
arr is just created once before calling
Pool and not passed as an argument to the target function. Ram usage does not explode when everything is on the CPU.
I have the following questions on this example:
torch.Tensor.ones(1)to GPU and yet it consumes 487 MB of GPU memory. Does CUDA allocate minimum amount of memory on the GPU even if the underlying tensor is very small? GPU memory is not a problem for me, and this is just for me to understand how the allocation is done.
The problem lies in the RAM usage. Even though I am sending a small tensor to the GPU, it appears as if everything in memory (large tensor
arr) is copied for every child process (possibly to pinned memory). So when a tensor is sent to the GPU, what objects are copied to pinned memory? I’m missing something here as it does not make sense to prepare everything to be sent to GPU when I’m only sending a particular object.
from multiprocessing import get_context import time import torch dim = 10000 sleep_time = 2 npe = 4 # number of parallel executions # cuda if torch.cuda.is_available(): dev = 'cuda:0' else: dev = "cpu" device = torch.device(dev) def step(i): c = torch.ones(1) # comment the line below to see no memory increase c = c.to(device) time.sleep(sleep_time) if __name__ == '__main__': arr = torch.ones((dim, dim)) # create list of inputs to be executed in parallel inp = list(range(npe)) # sleep added before and after launching multiprocessing to monitor the memory consumption print('before pool') # to check memory with top or htop time.sleep(sleep_time) context = get_context('spawn') with context.Pool(npe) as pool: print('after pool') # to check memory with top or htop time.sleep(sleep_time) pool.map(step, inp) time.sleep(sleep_time)