I used docker to build an environment to reproduce the experiment.
The environment is as follows.
torch==1.2.0
torchvision==0.4.0
The contents of the docker file are as follows.
FROM nvidia/cuda:9.2-cudnn7-devel-ubuntu18.04
RUN apt-get update
RUN apt-get install -y python3 python3-pip
# install PyTorch == 1.2.0
RUN pip3 install torch==1.2.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
# install Pillow to install torchvision
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install --upgrade Pillow
# install torchvision == 0.4.0
RUN pip3 install torchvision==0.4.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html
RUN apt-get install -y vim
WORKDIR /workspace
ENV LIBRARY_PATH /usr/local/cuda/lib64/stubs
I was able to build the environment, but it froze when I ran the code. I tried to figure out what was causing it, and realized that .cuda() might be doing something wrong.
So, I checked torch.cuda.is_available() and found that it is True, but .cuda() is not available. I am beginnner for pytorch, so I don’t know the cause of this.
Here is the actual bug trace I did.
This is running in a docker container.
$ cat /usr/local/cuda/version.txt
CUDA Version 9.2.148
$ python3
Python 3.6.9 (default, Dec 8 2021, 21:08:43)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.__version__
'1.2.0+cu92'
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name()
'A100-PCIE-40GB'
>>> torch.cuda.current_device()
0
>>> torch.version.cuda
'9.2.148'
>>> import torchvision
>>> torchvision.__version__
'0.4.0+cu92'
>>> T = torch.tensor([[1,2],[3,4]])
>>> T = T.cuda()
Freeze when “”“T = T.cuda()”"" is executed.