I’m trying to build pytorch from source following the official documentation. I’m on a universities cluster and thus use conda to have control over my environment. I installed magma-cuda101 and cudatoolkit=10.1. The whole install-command within a so far empty environment is
But if i try python setup.py install the following happens:
-- Found CUDA: /usr/local/cuda (found version "8.0")
-- Caffe2: CUDA detected: 8.0
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
CMake Error at cmake/public/cuda.cmake:42 (message):
PyTorch requires CUDA 9.0 and above.
So it does not find the proper cuda version. nvidia-smi tells me cuda 10.1 is available. However I cannot find a corresponding folder in /usr/. Right now I’m wondering how to find the origin of the cuda 10.1 reported by nvidia-smi and how to build pytorch against it.
Best Regards
Scipio
The CMAKE_PREFIX_PATH is set properly. When I try to run nvcc from the installation described above no nvcc is found. If I add nvcc_linux-64 to the environment the last output of the installation is
Cannot determine CUDA_HOME: cuda-gdb not in PATH
Subsequently, which nvcc yields the one in /conda_env/bin/ but running it returns
/conda_env/bin/nvcc: line 2: /bin/nvcc: No such file or directory
This is no surprise as the nvcc in the env itself is nothing but a shell script pointing to the system-wide nvcc. I could set CUDA_HOME manually if I knew where to look for the proper version.
It looks like the cuda in your env is not properly installed. It should contain a full install that is independent to the system wide one. Especially if your system-wide cuda is not the same version as the one in conda.
So reinstalling brought no change. A short search looks like cuda samples are neither shipped with the cuda toolkit nor available as a package in conda. I’m wondering, is the cuda version reported by nvidia-smi just the highest one supported by the driver itself or does it reside somewhere on the system? Or should CUDA_HOME somehow point to my environment?
According to https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi Nvidia-smi reports which is the highest CUDA version that can be used with the installed driver. So the output of nvidia-smi actually has little to do with my problem. However, I’m going to go on working on this tomorrow. I hope I can provide an answer for anyone stumbling upon this thread within the next few days.
So now I’m trying to figure out how to get this file. It looks like there’s still something wrong/missing with my cuda-installation. Is magma-cuda101 the relevant package or what am I looking for?
I found /usr/lib/x86_64-linux-gnu/libcuda.so and created a symlink in /path_to_conda/miniconda3/envs/pytorch_build/lib64/stubs/ but as was to be expected this just led to another message
grep: /path_to_conda/miniconda3/envs/pytorch_build/version.txt: No such file or directory
And if I try to build pytorch I’m back at the original error, although which nvcc now yields the one within the conda environment.
You should not need to do that… There is definitely something not right here.
Have you tried to set the CUDA_HOME to the cuda version in conda, and the PATH to make sure that the nvcc is the one from conda. (the real one that was installed! not a symlink to the system one that is cuda 8.0).
Hello again,
Sorry for taking so long but construction workers damaged the clusters power supply and I couldn’t access the system for the past nine days. Now I am trying again and still encounter the problem from the first post. I am wondering what the proper value for CUDA_HOME would be. I tried /miniconda3/envs/pytorch_build/pkgs/cuda-toolkit/include/thrust/system/cuda/ and /miniconda3/envs/pytorch_build/bin/ but neither did the trick.