Steps to reproduce:
# Create test tensor
x = torch.tensor([[1.0, 2.0]])
# Ensure tensor is on cpu
x = x.cpu()
# This code works fine, so there is no issue with syntax/shapes
# Send tensor to cuda device (without errors)
x = x.cuda()
# This returns an error.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling
- RTX 3090
Anyone have any idea what’s going on? Most discussions around this topic talk about this error actually being because of
shape mis-matches or CUDA memory errors, but neither of these are the case with this example. Happy to provide more detail about my environment if necessary.
Which CUDA runtime is your PyTorch installation using? If it’s 10.2, please update the binaries with the CUDA 11 runtime as this is needed for your Ampere GPU.
It turns out that I had built
torch with the 10.2 runtime.
I uninstalled it and reinstalled the nightly binaries for 11.6 using this command:
pip install torch --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu116
But now the code above is giving me a different error:
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
CUDA_VISIBLE_DEVICES is set to
Somehow I fixed the issue by playing around with a bunch of cuda stuff and restarting my machine. I’ll close this issue now.