About Cuda data allocate problems

  1. how to check the model on which specific cuda device, not just is_cuda, is there a solution?
  2. if the input on cuda, but the model hasn’t been explicitly declared on cuda, in this time, i execute mode(input),what happens? if model and input are not on the same device ,it’ll be wrong?
  3. about the details of DataParallel, what is the difference between the host cuda device data and other devices data, the model and input allocate and sth , what is the inner mechanics of DataParallel, it seems host device uses more memory.

totally,i feel a little confused on the details of data allocate…someone can help me ? thx!!

Hi,

  1. You can use tensor.device. It returns you the current device of the tensor, and the index of the GPU if the device is cuda.

  2. If model and input are not in the same device, it will crash. You cannot do operations between tensors in cpu and tensors in cuda.

Exemple:

In [12]: torch.rand(10, 10).cuda() + torch.rand(10, 10)                                                                                                                                                            
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-4280bdc898ed> in <module>
----> 1 torch.rand(10, 10).cuda() + torch.rand(10, 10)
RuntimeError: expected device cuda:0 and dtype Float but got device cpu and dtype Float
  1. In DataParallel, the batch will be split among the gpus. At the end, the gradients will be sent back to the host and summed, and the step happens here.
    So every device is hosting the whole model, plus a portion of the batch.
1 Like

thanks , i understand it now,:smile: