Pytorch - Cuda Gpu issue

Hello everyone,
I have been using torch + cuda for almost a year now and just upgraded my gpu from 1050ti to 3060ti.
I am having difficulties transferring tensors-models to gpu, with torch.device(0) or similar methods.

I have noticed that when I type

>>> import torch
>>> torch.cuda.is_available()
True

>>> torch.cuda.get_device_name(0)
u'NVIDIA GeForce RTX 3060 Ti'

there is an u here(Does this mean uncompatible)? Also device count returns this.

>>> torch.cuda.device_count()
1L

Could you describe your issue in more detail please? Do you see a runtime error or what exactly is failing?

The error is that it waits for too long, then I need to terminate with ctrl + z.

>>> device = torch.device('cuda')
>>> device
device(type='cuda')
>>> device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
>>> if device.type == 'cuda':
...     print(torch.cuda.get_device_name(0))
...     print('Memory Usage:')
...     print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
...     print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')
... 
NVIDIA GeForce RTX 3060 Ti
Memory Usage:
('Allocated:', 0.0, 'GB')
('Cached:   ', 0.0, 'GB')
>>> z = torch.tensor([5])
>>> k = z.to(device)

Python gets stuck at the last part, while transferring tensor to cuda device.

Nvidia smi output is like this

Mon Mar  7 11:55:56 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   46C    P8    17W / 200W |    649MiB /  8192MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1015      G   /usr/lib/xorg/Xorg                182MiB |
|    0   N/A  N/A      1236      G   /usr/bin/gnome-shell               98MiB |
|    0   N/A  N/A      1649      G   /usr/lib/firefox/firefox          146MiB |
|    0   N/A  N/A      2008      C   python                            213MiB |
|    0   N/A  N/A      2956      G   /usr/lib/firefox/firefox            2MiB |
+-----------------------------------------------------------------------------+

Also my torch version is: 1.4.0,
Cuda version : 11.6
Nvidia driver: 510
and os is Ubuntu 18.04

After 10 minutes or so, the tensor is loaded to the device

>>> k
tensor([5], device='cuda:0')

This PyTorch release wouldn’t be compatible with your Ampere device, so update to the latest release and select the CUDA 11.3 or 11.5 runtime. Most likely you are JIT compiling the kernels for your architecture, which takes some time.

Hello,
Thanks for the feedback. Building with the 2nd option cuda 11.3 with conda did work. I am observing better performance now. Also, I have upgrade the device driver to 510. Just a note, the ‘u’ at the start of the device name and ‘1L’ in the device count now is back to the normal(its now 1).

The new nvidia-smi output looks like this

Mon Mar  7 23:37:41 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   47C    P8    16W / 200W |    683MiB /  8192MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       986      G   /usr/lib/xorg/Xorg                 18MiB |
|    0   N/A  N/A      1084      G   /usr/bin/gnome-shell               70MiB |
|    0   N/A  N/A      1361      G   /usr/lib/xorg/Xorg                208MiB |
|    0   N/A  N/A      1507      G   /usr/bin/gnome-shell               71MiB |
|    0   N/A  N/A      2290      G   /usr/lib/firefox/firefox          301MiB |
|    0   N/A  N/A      2456      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      2594      G   /usr/lib/firefox/firefox            2MiB |
|    0   N/A  N/A      2697      G   /usr/lib/firefox/firefox            2MiB |
+-----------------------------------------------------------------------------+

Tomorrow, I will have the opportunity to run this build with my previously trained network, back here soon.