Pytorch multiple GPU count error: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount()

Ali_Waqas · May 24, 2023, 4:18pm

The following code produce error when using nvidia docker on wsl or wsl.

import torch
torch.cuda.is_available() # False
torch.cuda.device_count() # Error

import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2" 
torch.cuda.is_available() # False
torch.cuda.device_count() # Error

However, it can be resolved by using the code below.

import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"   
torch.cuda.is_available() # True
torch.cuda.device_count() # 1

Here is the nvidia-smi output. Currently GPU-0 is running a task.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.03    Driver Version: 522.06       CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:01:00.0  On |                  Off |
| 46%   77C    P2   209W / 300W |  32485MiB / 49140MiB |     11%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000    On   | 00000000:21:00.0 Off |                  Off |
| 30%   32C    P8    13W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000    On   | 00000000:4B:00.0 Off |                  Off |
| 30%   28C    P8    10W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000    On   | 00000000:4C:00.0 Off |                  Off |
| 30%   34C    P8     9W / 300W |     36MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        34      C   /python3.7                      N/A      |
|    1   N/A  N/A        34      C   /python3.7                      N/A      |
|    2   N/A  N/A        34      C   /python3.7                      N/A      |
|    3   N/A  N/A        34      C   /python3.7                      N/A      |
+-----------------------------------------------------------------------------+

As a result i have been able to use only 1 GPU at a time. Any help will be appreciated. thanks.

eqy · May 24, 2023, 4:45pm

Changing the CUDA_VISIBLE_DEVICES environment variable after importing PyTorch seems suspicious e.g., Os.environ [“CUDA_VISIBLE_DEVICES”] not functioning - PyTorch Forums

Could you check if setting it before running your script helps?

Ali_Waqas · May 24, 2023, 8:32pm

Thanks. i also had to check the pytorch.version.cuda which was 10.2. I then reinstalled the pytorch and it worked.

The import of torch should be after after os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2"