First tutorial error, installation problem RHEL 6.9

davidsaroff · December 14, 2017, 12:42am

I’m an Astrophysics graduate student interested in torch, for analysis of astronomical data.

I’m an absolute beginner with torch. I’ve been writing python using numpy and matplotlib for my data analysis for about 2 years.

The install suggested at

http://pytorch.org/

seemed to work

conda install pytorch torchvision -c pytorch

this is what transpired:

The following NEW packages will be INSTALLED:

cudatoolkit: 8.0-3                                                
pytorch:     0.3.0-py27_cuda8.0.61_cudnn7.0.3hf383a3f_4 pytorch   
torchvision: 0.2.0-py27hfb27419_1                       pytorch

The following packages will be UPDATED:

conda:       4.3.29-py27_0                              conda-forge --> 4.3.30-py27h6ae6dc7_0
conda-env:   2.6.0-0                                    conda-forge --> 2.6.0-h36134e3_1

Proceed ([y]/n)? y

cudatoolkit-8. 100% |##############################################################| Time: 0:02:16 2.48 MB/s
pytorch-0.3.0- 100% |##############################################################| Time: 0:03:08 2.32 MB/s
torchvision-0. 100% |##############################################################| Time: 0:00:00 2.44 MB/s
conda-4.3.30-p 100% |##############################################################| Time: 0:00:00 2.40 MB/s

I’m starting with the tutorial at
http://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html

I’ve downloaded the jupyter notebook. It runs, all but the last cell.
There is a GTX 1070 in the box

here is the error

let us run this cell only if CUDA is available

if torch.cuda.is_available():
x = x.cuda()
y = y.cuda()
x + y

RuntimeError Traceback (most recent call last)
in ()
3 x = x.cuda()
4 y = y.cuda()
----> 5 x + y

/home/david/anaconda2/lib/python2.7/site-packages/torch/tensor.pyc in add(self, other)
291 # TODO: add tests for operators
292 def add(self, other):
–> 293 return self.add(other)
294 radd = add
295

RuntimeError: cuda runtime error (8) : invalid device function at /opt/conda/conda-bld/pytorch_1512378360668/work/torch/lib/THC/generated/…/generic/THCTensorMathPointwise.cu:301

to verify the card is seen
torch.cuda.is_available() gives True

What do I need to do to correct the installation?

smth · December 14, 2017, 1:29am

can you give me the output of:

import torch

for i in range(torch.cuda.device_count()):
    print(torch.cuda.get_device_name(i))
    print(torch.cuda.get_device_capability(i))

davidsaroff · December 14, 2017, 3:15am

Thanks for looking at this.

It is working now in the “Python[conda root]” environment in jupyter
notebook,
but not the additional 2.7 and 3.6 environments I’ve made. I’m sure it has
to do with missing packages.

This is what is working now, and sufficient for present purposes

for i in range(torch.cuda.device_count()):
print(torch.cuda.get_device_name(i))
print(torch.cuda.get_device_capability(i))

GeForce GTX 1070
(6, 1)
GeForce GTX 560 Ti
(2, 1)

smth · December 14, 2017, 4:26am

the 560 Ti is the main issue. It is a graphics card that’s old enough that we dont support it.

What you can do is hide it from python with the CUDA_VISIBLE_DEVICES environment variable.

# only make the 0-th device visible to python (effectively hiding device 1)
CUDA_VISIBLE_DEVICES=0 python

davidsaroff · December 14, 2017, 1:43pm

Is this added to the ipython notebook program, or somewhere else if it is
an environment variable?

smth · December 15, 2017, 4:25pm

it is run in the terminal before ipython is started.

CUDA_VISIBLE_DEVICES=0 ipython notebook

11170 · March 30, 2018, 11:40pm

Hi, I also met the problem, so I runned the code and got output as follow,

GeForce 940MX
(5, 0).

Could you please give me some suggestions? Thanks!