How to specify GPU usage?


(Bin) #1

I am training different models on different GPUs.

I have 4 GPUs indexed as 0,1,2,3

I try this way:

model = torch.nn.DataParallel(model, device_ids=[0,1]).cuda()

But actual process use GPU index 2,3 instead.

and if I use:

model = torch.nn.DataParallel(model, device_ids=[1]).cuda()

I will get the error:

RuntimeError: Assertion `THCTensor_(checkGPU)(state, 4, r_, t, m1, m2)’ failed. at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.8_1486039719409/work/torch/lib/THC/generic/THCTensorMathBlas.cu:230

How to specify the GPU usage with index?


How to change the default device of GPU? device_ids[0]
(Seungyoung Park) #2

I am using Ubuntu 16.04. The GPU indexing are the same as you have.

If you want to execute xxx.py using only GPUs 0,1 in Ubuntu 16.04, use the following command as

CUDA_VISIBLE_DEVICES=2,3 python xxx.py

with nn.DadaParallel in xxx.py.

In addition, I don’t think that dataparallel accepts only one gpu.


(Bin) #3

Thanks a lot, it works :slight_smile:
Hope pytorch can integrate with this argument to specify gpu usage.


(Adam Paszke) #4

What’s your PyTorch version? It should accept a single GPU. How is that even possible that it uses last two GPUs if you specify device_ids=[0,1]?

If you run your script with CUDA_VISIBLE_DEVICES=2,3 it will always execute on the last two GPUs, not on the first ones. I can’t see how that helps in this case. CUDA_VISIBLE_DEVICES=0,1 would make more sense.


(Seungyoung Park) #5

I am using pytorch 0.1.9 and Ubuntu 16.04.

When I use CUDA_VISIBLE_DEVICES=2,3 (0,1), ‘nvidia-smi’ tells me that gpus 0,1 (2,3) are used.

I do not know the reason, but the gpu id used in nvidia-smi and the gpu id used in pytorch are reversed.

You can check it if you use Ubuntu 16.04.


(Shicai) #6

I think it is more likely a cuda/nvidia problem.
I have met this problem before when using Caffe with Tesla K10/K80 GPUs.


(Alban D) #7

@Seungyoung_Park from my experience, it’s usually nvidia-smi that is reversed with everything else.
For example, on my machine, the numbering from pytorch agrees with the numbering of the deviceQuery nvidia sample (and any cuda program for that matter) while nvidia-smi is the only one giving a different numbering.


(Bin) #8

Thanks a lot for your answering.
My pytorch version is 0.1.8.
There may be a numbering problem of GPU device, but it does not affect our usages.
My problem is about how to allocate GPU usages, now everything is fine :slight_smile:


(Jason Ramapuram) #9

I’m curious about this as well. Can you currently use fractional GPU usage as in tensorflow? The tf equivalent is something like this:

    with tf.device(FLAGS.device):
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.device_percentage)
        sess_cfg = tf.ConfigProto(allow_soft_placement=FLAGS.allow_soft_placement,
                                  gpu_options=gpu_options)

(MirandaAgent) #10

How does one use GPUs if one has a custom NN class (that inherits from torch.nn.Module)?

For example, I know that using the easy example from (http://pytorch.org/tutorials/beginner/pytorch_with_examples.html) one can just change the type of the tensors being created:

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

however, when using things like torch.nn.Linear and also Variable, how does one make sure to use GPUs?

Also, do I really have to track how GPUs are assigned, I am fine with torch just doing its stuff automagically.

In particular I would love to see how:

http://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html#sphx-glr-beginner-examples-nn-two-layer-net-module-py

is made into a GPU version of it.

Related SO question: https://stackoverflow.com/questions/45553613/how-does-one-make-sure-that-everything-is-running-on-gpu-automatically-in-pytorc


(Saurabh Jha) #11

Dear All,

I have installed Nvidia Cuda 9.0 toolkit with Cudnn to my ubuntu machine.
I have installed pytorch when i am trying to check for gpu usage by running the below code -

Code-

import torch
print(torch.rand(2,3).cuda())

I am getting the below error:


RuntimeError Traceback (most recent call last)
in ()
----> 1 print(torch.rand(2,3).cuda())

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py in cuda(self, device, async)
67 else:
68 new_type = getattr(torch.cuda, self.class.name)
—> 69 return new_type(self.size()).copy
(self, async)
70
71

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/cuda/init.py in _lazy_new(cls, *args, **kwargs)
385 # We need this method only for lazy init, so we can remove it
386 del _CudaBase.new
–> 387 return super(_CudaBase, cls).new(cls, *args, **kwargs)
388
389

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58

I think pytorch is not communicating with the Nvidia GPU, please advise.

Regards
Saurabh Jha


#12

This error might occur after you installed CUDA etc. without restarting your machine.
Have you rebooted after the driver installation?


(Saurabh Jha) #13

Yes you are correct, it was fine after i restart the machine


(MirandaAgent) #14

for a Unix command soln you can also do:

export CUDA_VISIBLE_DEVICES=$i

though of course that only works if the scripts are independent and stuff like that…otherwise the other solutions here are probably better…


(Janna Shen) #15

CUDA_VISIBLE_DEVICES=$i python main.py