Error: torch.cuda.is_available() is True but .cuda() is failed

bbo · January 21, 2022, 9:47pm

I used docker to build an environment to reproduce the experiment.
The environment is as follows.

torch==1.2.0
torchvision==0.4.0

The contents of the docker file are as follows.

FROM nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04

RUN apt-get update 
RUN apt-get install -y python3 python3-pip

# install PyTorch == 1.2.0
RUN pip3 install torch==1.2.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html

# install Pillow to install torchvision
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install --upgrade Pillow

# install torchvision == 0.4.0
RUN pip3 install torchvision==0.4.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html

RUN apt-get install -y vim

WORKDIR /workspace

ENV LIBRARY_PATH /usr/local/cuda/lib64/stubs

I was able to build the environment, but it froze when I ran the code. I tried to figure out what was causing it, and realized that .cuda() might be doing something wrong.
So, I checked torch.cuda.is_available() and found that it is True, but .cuda() is not available. I am beginnner for pytorch, so I don’t know the cause of this.

Here is the actual bug trace I did.
This is running in a docker container.

$ cat /usr/local/cuda/version.txt 
CUDA Version 9.2.148

$ python3   
Python 3.6.9 (default, Dec  8 2021, 21:08:43) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.__version__
'1.2.0+cu92'
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name()
'A100-PCIE-40GB'
>>> torch.cuda.current_device()
0
>>> torch.version.cuda
'9.2.148'

>>> import torchvision
>>> torchvision.__version__
'0.4.0+cu92'

>>> T = torch.tensor([[1,2],[3,4]])
>>> T = T.cuda()

Freeze when “”“T = T.cuda()”"" is executed.

Caipi · January 21, 2022, 10:15pm

Maybe try different cuda versions? Cuda version 9.x might be available but not suitable for your GPU? And also why are your torch distributions so low? The newest stable torch is 1.10 i guess.

bbo · January 21, 2022, 10:25pm

Thank you for your reply.
I’m trying it now with version 10.0, but I’m not sure if it will work yet.
The reason for the torch version 1.2.0 is that I want to match the environment of the paper I want to reproduce.