How to specify GPU usage?

I am using pytorch 0.1.9 and Ubuntu 16.04.

When I use CUDA_VISIBLE_DEVICES=2,3 (0,1), ‘nvidia-smi’ tells me that gpus 0,1 (2,3) are used.

I do not know the reason, but the gpu id used in nvidia-smi and the gpu id used in pytorch are reversed.

You can check it if you use Ubuntu 16.04.

2 Likes

I think it is more likely a cuda/nvidia problem.
I have met this problem before when using Caffe with Tesla K10/K80 GPUs.

@Seungyoung_Park from my experience, it’s usually nvidia-smi that is reversed with everything else.
For example, on my machine, the numbering from pytorch agrees with the numbering of the deviceQuery nvidia sample (and any cuda program for that matter) while nvidia-smi is the only one giving a different numbering.

4 Likes

Thanks a lot for your answering.
My pytorch version is 0.1.8.
There may be a numbering problem of GPU device, but it does not affect our usages.
My problem is about how to allocate GPU usages, now everything is fine :slight_smile:

I’m curious about this as well. Can you currently use fractional GPU usage as in tensorflow? The tf equivalent is something like this:

    with tf.device(FLAGS.device):
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.device_percentage)
        sess_cfg = tf.ConfigProto(allow_soft_placement=FLAGS.allow_soft_placement,
                                  gpu_options=gpu_options)
2 Likes

How does one use GPUs if one has a custom NN class (that inherits from torch.nn.Module)?

For example, I know that using the easy example from (http://pytorch.org/tutorials/beginner/pytorch_with_examples.html) one can just change the type of the tensors being created:

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

however, when using things like torch.nn.Linear and also Variable, how does one make sure to use GPUs?

Also, do I really have to track how GPUs are assigned, I am fine with torch just doing its stuff automagically.

In particular I would love to see how:

http://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html#sphx-glr-beginner-examples-nn-two-layer-net-module-py

is made into a GPU version of it.

Related SO question: https://stackoverflow.com/questions/45553613/how-does-one-make-sure-that-everything-is-running-on-gpu-automatically-in-pytorc

Dear All,

I have installed Nvidia Cuda 9.0 toolkit with Cudnn to my ubuntu machine.
I have installed pytorch when i am trying to check for gpu usage by running the below code -

Code-

import torch
print(torch.rand(2,3).cuda())

I am getting the below error:


RuntimeError Traceback (most recent call last)
in ()
----> 1 print(torch.rand(2,3).cuda())

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py in cuda(self, device, async)
67 else:
68 new_type = getattr(torch.cuda, self.class.name)
—> 69 return new_type(self.size()).copy
(self, async)
70
71

~/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/cuda/init.py in _lazy_new(cls, *args, **kwargs)
385 # We need this method only for lazy init, so we can remove it
386 del _CudaBase.new
–> 387 return super(_CudaBase, cls).new(cls, *args, **kwargs)
388
389

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58

I think pytorch is not communicating with the Nvidia GPU, please advise.

Regards
Saurabh Jha

This error might occur after you installed CUDA etc. without restarting your machine.
Have you rebooted after the driver installation?

Yes you are correct, it was fine after i restart the machine

for a Unix command soln you can also do:

export CUDA_VISIBLE_DEVICES=$i

though of course that only works if the scripts are independent and stuff like that…otherwise the other solutions here are probably better…

CUDA_VISIBLE_DEVICES=$i python main.py

hi, do you have the answer?

Is there anyone who knows that…

When I attach below code in python file(in main.py),

import os
os.environment["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environment["CUDA_VISIBLE_DEVICES"] = "0"

it does’t work the same as CUDA_VISIBLE_DEVICES=0 python main.py do.

The former one doesn’t specify(divide) GPU but, the latter one works well.

It seems strange to me.

Thanks ahead.

I wouldn’t recommend the first approach, since you would have to make sure these lines of code are imported before any other library, which might take the GPU. If some script imports PyTorch and these lines are executed afterwards, they won’t have any effect anymore.

The second approach makes sure to mask the devices before running the Python script.

2 Likes

Totally understand thanks!!

1 Like
  1. try CUDA_VISIBLE_DEVICES=0,1,2,3 xxx.py to specify GPU
  2. add os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3" in you python code
2 Likes

related but how do I check which GPU is being used if nvidia-smi is not working for me?

This would sound a bit concerning, as it could indicate that your driver installation is broken, so I would expect PyTorch would also not be able to detect GPUs.

is there a pytorch recommended way to install gpus in linux ubuntu?

This is my current way:

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
sudo apt-get install nvidia-driver-460
sudo reboot now

to uninstall things I’ve done:

# ubuntu-drivers list
# sudo apt-get --purge remove nvidia-driver-460
# sudo apt-get --purge remove nvidia-driver*
# modinfo nvidia
#sudo apt-get install nvidia-driver-450

when trying to start from scratch on my ubuntu vm

I don’t think there is a “PyTorch recommended way” to install the drivers/CUDA and I would stick to an approach, which works for you. Personally, I use the .run files to install the drivers and/or CUDA toolkit, but I’m also reinstalling it quite often (e.g. to test new bringup versions etc.), so I don’t really care for stability.