I am running an RL training sessions using Pytorch and I have been able to run all my training sessions on my GPU for past month without any issues. For some unknown reason my computer rebooted today and ever since that, I am not able to run my training code on the GPU. I keep getting an error that says “RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable”. I found a similar post on this forum that suggested rebooting the computer but that hasn’t worked either. I even purged and reinstalled all my NVIDIA drivers and I am able to recognize them from the terminal. I will attached a few software related specifics to this message. Any advice on how to resolve this issue would be greatly appreciated!
$ nvcc --version
Cuda compilation tools, release 9.1, V9.1.85
$ conda list
torch 1.7.1 pypi_0
NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2