I am running an RL training sessions using Pytorch and I have been able to run all my training sessions on my GPU for past month without any issues. For some unknown reason my computer rebooted today and ever since that, I am not able to run my training code on the GPU. I keep getting an error that says “RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable”. I found a similar post on this forum that suggested rebooting the computer but that hasn’t worked either. I even purged and reinstalled all my NVIDIA drivers and I am able to recognize them from the terminal. I will attached a few software related specifics to this message. Any advice on how to resolve this issue would be greatly appreciated!
$ nvcc --version
Cuda compilation tools, release 9.1, V9.1.85
$ conda list
cudatoolkit 10.1.243
pytorch 1.8.1
torch 1.7.1 pypi_0
$ nvidia-smi
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2
$ torch.cuda.is_available()
False
$ torch.cuda.device_count()
0