Why moving model and tensors to GPU?


I have a basic conceptual question that I don’t understand.
Why when working with cuda do I need to move my model to cuda device and also the X and y tensors to cuda device?

at what scenario, in case I’m working with cuda, I would move for example my model to cuda device but keep on working with tensors on CPU?

Is there a simple way to set for example all working torch objects to work with cuda?
something like:

torch.cuda = True

and it will send all relevant object to cuda device automatically ?

Thanks in advance.

Mainly because the required time to share information between cpu and gpu is huge. So basically everything is relevant.


When you create a tensor or create a model (that create tensors that represent your parameters), you allocate memory in your RAM (i.e your CPU memory). If you want your model to run in GPU then you have to copy and allocate memory in your GPU-RAM space. Note that, the GPU can only access the GPU-memory.

Pytorch by default stores everything in CPU (in fact torch tensors are wrappers over numpy objects) and you can call .cuda() or .to_device() to move a tensor to gpu. Example:

import torch
import torch.nn as nn
a=torch.zeros((10,10)) #in cpu
a=a.cuda() #copy the CPU memory to GPU memory

class mymodel(nn.module):
model=mymodel().cuda() #moves all the parameters from the model to cuda

To store memory in cuda you typically allocate CPU-ram memory and then use a function called cudaMemcpy link to move between GPU and CPU. That is way you need to tell PyTorch to move things to GPU (so it call cudaMemcpy under the hood, well not exactly, see the last paragraph).

There is no standard way to tell PyTorch to move everything to GPU (as far as I now). Just take in account that a GPU can only access GPU memory space (which is divided in Global, shared and registers). I think there are special cases with the pinned memory but, as a global, you can think it in that way (GPU only access GPU). Then to run in GPU the data must be in GPU.

[As additional information regarding memory and performance] In other softwares like Theano the advice was to store as many amount of data in the GPU as you could, in order to avoid memory transfers through the PCI of your system (think that if you store your dataset on the GPU you avoid sending a batch each time you want to update your model). However, PyTorch overcomes this limitation by an efficient memory managment which is based in memory pools. These are memory managers. There are different possibilities. Theano used the cnmem module. Memory managers are very used in practice. Malloc C function does something similar to avoid calling the OS each time you require a chunk of memory.

Hope it helps.


@jmaronas it does help, but again my question is of more conceptual point of view.
if I were to compare it to keras (or tensorflow even), all you need to do in order to work with a GPU is install the proper GPU version of tensorflow (as a backend) and it will pickup all the available cuda devices automatically, whereas in pytorch you need to shift those objects each time manually. maybe it is because of the dynamic nature of pytorch.

Ah okei, I thought you were asking about another thing. I think it is more done by convention. Having dynamic or static graphs has nothing to do. You can always distribute the different nodes of the graph in CPU and GPU if you want and only care about a correct use of memory locations.

Personally, I prefer to have as much control as I can over what the software does for me. So I prefer the way PyTorch manage GPU and CPU computation.


Hi I have a similar question here, if I have already created the Tensor in CUDA, do I still need to explicitly copy tensor?

which means should I do it like create tensor and copy?

torch.tensor(batchD, dtype=torch.float).cuda(device=self.device, non_blocking=True)

or directly create tensor in CUDA?

torch.tensor(batchD.state, device=self.device, dtype=torch.float)