Compile a CUDA compatible version on a non CUDA computer?

Data-drone · February 20, 2021, 12:28pm

Can I compile a cuda compatible version of pytorch on a machine with no GPUs available?

I have access to a computer with high CPU count but no GPUs and was wanting to leverage it to compile PyTorch faster. I am compiling within an nvidia docker container which I am then using on my machine that has GPUs. nvidia-smi is returning the gpus properly nvcc seems to be working properly

But when I run torch.cuda.is_available() in python I get:

/opt/conda/envs/computer_vision/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /opt/pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

Any guidance?

ptrblck · February 21, 2021, 5:24am

It should be possible to use nvcc without a GPU.
The error message might be raised, if your NVIDIA driver or local CUDA toolkit isn’t properly installed or found. Did you execute the docker container via nvidia-docker or with the --gpus=all option?

Data-drone · February 21, 2021, 7:43am

I did run with --gpus=all and I set --ipc=host as required

Data-drone · February 21, 2021, 11:21pm

I am using driver 460.32.03 with the nvidia-cuda container cuda:11.2.1-cudnn8-devel and I installed magma-cuda112 could it be that magma-cuda112 and cuda:11.2.1-cudnn8-devel don’t mix?

ptrblck · February 22, 2021, 12:06am

How did you install magma-cuda112? If you’ve installed a conda packagge, did the install logs show that PyTorch would be downgraded to a CPU-only version?
If not, then I don’t think that magma-cuda112 should have influence in PyTorch being able to find a GPU.

Data-drone · August 31, 2021, 7:07am

Sorry missed the reply. It seems variable sometimes whether conda wants to downgrade or not. I never dug deep enough to work out why. But I did get it working without downgrade