I’d like to submit a script file to slurm using sbatch, but always failed because the error AssertionError: Torch not compiled with CUDA enabled.
And the result of torch.cuda.is_available() is always false.
The CUDA version that slurm cluster supports includes 11.8 and I also installed related version of pytorch 11.3.1.
Others may relate with this problem are as follows:python3.7.16 , pytorch-cuda11.8 , cudatoolkit11.8.0.
I’d be extremly grateful if anyone could help me solve this problem!!!
You are using a CPU-only PyTorch binary and the error is thus expected. Your locally installed CUDA toolkit won’t be used and you need to install a PyTorch binary with CUDA support.
Thanks for your reply.
But I installed pytorch by this way: conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia.
Doesn’t this version of pytorch support CUDA?
Yes, this command is correct and installs the right binary for me:
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
...
The following packages will be downloaded:
package | build
---------------------------|-----------------
blas-1.0 | mkl 6 KB
cuda-cudart-11.8.89 | 0 197 KB nvidia
cuda-cupti-11.8.87 | 0 25.3 MB nvidia
cuda-libraries-11.8.0 | 0 1 KB nvidia
cuda-nvrtc-11.8.89 | 0 19.1 MB nvidia
cuda-nvtx-11.8.86 | 0 57 KB nvidia
cuda-runtime-11.8.0 | 0 1 KB nvidia
cuda-version-12.6 | 3 16 KB nvidia
...
pytorch-2.5.1 |py3.12_cuda11.8_cudnn9.1.0_0 1.55 GB pytorch
pytorch-cuda-11.8 | h7e8668a_6 7 KB pytorch
pytorch-mutex-1.0 | cuda 3 KB pytorch
pyyaml-6.0.2 | py312h5eee18b_0 217 KB
requests-2.32.3 | py312h06a4308_1 123 KB
sympy-1.13.2 | py312h06a4308_0 15.0 MB
tbb-2021.8.0 | hdb19cb5_0 1.6 MB
torchaudio-2.5.1 | py312_cu118 6.4 MB pytorch
torchtriton-3.1.0 | py312 233.6 MB pytorch
torchvision-0.20.1 | py312_cu118 8.5 MB pytorch
...
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.randn(1).cuda())"
2.5.1
11.8
True
tensor([-1.7348], device='cuda:0')
The problem has been solved.
Thank you for your help!