`torch.mm()` returns RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasSetStream(handle, stream)`

Steps to reproduce:

# Create test tensor
x = torch.tensor([[1.0, 2.0]])

# Ensure tensor is on cpu
x = x.cpu()

# This code works fine, so there is no issue with syntax/shapes
torch.mm(x, x.T)

# Send tensor to cuda device (without errors)
x = x.cuda()

# This returns an error.
torch.mm(x, x.T)

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasSetStream(handle, stream)

Some specs:

  • ubuntu 20.04
  • torch==1.10.1
  • RTX 3090

Anyone have any idea what’s going on? Most discussions around this topic talk about this error actually being because of shape mis-matches or CUDA memory errors, but neither of these are the case with this example. Happy to provide more detail about my environment if necessary.

Which CUDA runtime is your PyTorch installation using? If it’s 10.2, please update the binaries with the CUDA 11 runtime as this is needed for your Ampere GPU.

It turns out that I had built torch with the 10.2 runtime.
I uninstalled it and reinstalled the nightly binaries for 11.6 using this command:

pip install torch --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu116

But now the code above is giving me a different error:

RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.


Somehow I fixed the issue by playing around with a bunch of cuda stuff and restarting my machine. I’ll close this issue now.