I am training different models on different GPUs.
I have 4 GPUs indexed as 0,1,2,3
I try this way:
model = torch.nn.DataParallel(model, device_ids=[0,1]).cuda()
But actual process use GPU index 2,3 instead.
and if I use:
model = torch.nn.DataParallel(model, device_ids=).cuda()
I will get the error:
RuntimeError: Assertion `THCTensor_(checkGPU)(state, 4, r_, t, m1, m2)’ failed. at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.8_1486039719409/work/torch/lib/THC/generic/THCTensorMathBlas.cu:230
How to specify the GPU usage with index?
I am using Ubuntu 16.04. The GPU indexing are the same as you have.
If you want to execute xxx.py using only GPUs 0,1 in Ubuntu 16.04, use the following command as
CUDA_VISIBLE_DEVICES=2,3 python xxx.py
with nn.DadaParallel in xxx.py.
In addition, I don’t think that dataparallel accepts only one gpu.
Thanks a lot, it works
Hope pytorch can integrate with this argument to specify gpu usage.
What’s your PyTorch version? It should accept a single GPU. How is that even possible that it uses last two GPUs if you specify
If you run your script with
CUDA_VISIBLE_DEVICES=2,3 it will always execute on the last two GPUs, not on the first ones. I can’t see how that helps in this case.
CUDA_VISIBLE_DEVICES=0,1 would make more sense.
I am using pytorch 0.1.9 and Ubuntu 16.04.
When I use CUDA_VISIBLE_DEVICES=2,3 (0,1), ‘nvidia-smi’ tells me that gpus 0,1 (2,3) are used.
I do not know the reason, but the gpu id used in nvidia-smi and the gpu id used in pytorch are reversed.
You can check it if you use Ubuntu 16.04.
I think it is more likely a cuda/nvidia problem.
I have met this problem before when using Caffe with Tesla K10/K80 GPUs.
@Seungyoung_Park from my experience, it’s usually
nvidia-smi that is reversed with everything else.
For example, on my machine, the numbering from pytorch agrees with the numbering of the
deviceQuery nvidia sample (and any cuda program for that matter) while
nvidia-smi is the only one giving a different numbering.
Thanks a lot for your answering.
My pytorch version is 0.1.8.
There may be a numbering problem of GPU device, but it does not affect our usages.
My problem is about how to allocate GPU usages, now everything is fine
I’m curious about this as well. Can you currently use fractional GPU usage as in tensorflow? The tf equivalent is something like this:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.device_percentage)
sess_cfg = tf.ConfigProto(allow_soft_placement=FLAGS.allow_soft_placement,
How does one use GPUs if one has a custom NN class (that inherits from
For example, I know that using the easy example from (http://pytorch.org/tutorials/beginner/pytorch_with_examples.html) one can just change the type of the tensors being created:
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
however, when using things like
torch.nn.Linear and also
Variable, how does one make sure to use GPUs?
Also, do I really have to track how GPUs are assigned, I am fine with torch just doing its stuff automagically.
In particular I would love to see how:
is made into a GPU version of it.
Related SO question: https://stackoverflow.com/questions/45553613/how-does-one-make-sure-that-everything-is-running-on-gpu-automatically-in-pytorc
I have installed Nvidia Cuda 9.0 toolkit with Cudnn to my ubuntu machine.
I have installed pytorch when i am trying to check for gpu usage by running the below code -
I am getting the below error:
RuntimeError Traceback (most recent call last)
----> 1 print(torch.rand(2,3).cuda())
~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py in cuda(self, device, async)
68 new_type = getattr(torch.cuda, self.class.name)
—> 69 return new_type(self.size()).copy(self, async)
~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/cuda/init.py in _lazy_new(cls, *args, **kwargs)
385 # We need this method only for lazy init, so we can remove it
386 del _CudaBase.new
–> 387 return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58
I think pytorch is not communicating with the Nvidia GPU, please advise.
This error might occur after you installed CUDA etc. without restarting your machine.
Have you rebooted after the driver installation?
Yes you are correct, it was fine after i restart the machine
for a Unix command soln you can also do:
though of course that only works if the scripts are independent and stuff like that…otherwise the other solutions here are probably better…
CUDA_VISIBLE_DEVICES=$i python main.py
hi， do you have the answer？
Is there anyone who knows that…
When I attach below code in python file(in main.py),
os.environment["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environment["CUDA_VISIBLE_DEVICES"] = "0"
it does’t work the same as
CUDA_VISIBLE_DEVICES=0 python main.py do.
The former one doesn’t specify(divide) GPU but, the latter one works well.
It seems strange to me.
I wouldn’t recommend the first approach, since you would have to make sure these lines of code are imported before any other library, which might take the GPU. If some script imports PyTorch and these lines are executed afterwards, they won’t have any effect anymore.
The second approach makes sure to mask the devices before running the Python script.
Totally understand thanks!!