CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`

I am joining the wagon.

I am getting the same error as @shamoons on the same example. The fix of CUDA_LAUNCH_BLOCKING=1 and CUDA_VISIBLE_DEVICES=0 did nothing. Running on CPU works well.

This operation succeeds:

>>> a = torch.tensor([1]).cuda()
>>> b = torch.rand([1]).cuda()
>>> c = a + b
>>> print(c)
$ tensor([2], device='cuda:0')

The following throws the original error in this post

>>> l = torch.nn.Linear(1, 1).cuda()
>>> a = torch.tensor([1.]).cuda()
>>> l(a)

My specs:
PyTorch version: 1.8.0
CUDA version: 11.0
Driver version: 450.102.04
NVIDIA-SMI: 450.102.04
GPU: NVIDIA GeForce RTX 2080 SUPER
OS: Ubuntu 20.04.2
Kernel: 5.8.0-44