CUDA error, param.data.cuda()/module.to('cuda') error

Please resolve the below error, PyTorch version = ‘2.3.0+cu118’ and CUDA version of the DGX server is 11.5 (–nvcc version)
Getting the below error

Traceback (most recent call last):
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/cuda/__init__.py", line 306, in _lazy_init
    queued_call()
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/cuda/__init__.py", line 174, in _check_capability
    capability = get_device_capability(d)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
    prop = get_device_properties(device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/cuda/__init__.py", line 448, in get_device_properties
    return _get_device_properties(device)  # type: ignore[name-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aryan/FSCIL/FaceKD/train_test.py", line 397, in <module>
    main(config)
  File "/home/aryan/FSCIL/FaceKD/train_test.py", line 66, in main
    base = BasePatchKD(config, loaders)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aryan/FSCIL/FaceKD/pkd/core/base_patch_kd.py", line 54, in __init__
    self._init_model()
  File "/home/aryan/FSCIL/FaceKD/pkd/core/base_patch_kd.py", line 74, in _init_model
    param.data = param.data.cuda()
                 ^^^^^^^^^^^^^^^^^
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/cuda/__init__.py", line 312, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

CUDA call was originally invoked at:

  File "/home/aryan/FSCIL/FaceKD/train_test.py", line 5, in <module>
    from pkd.utils import set_random_seed, time_now
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/aryan/FSCIL/FaceKD/pkd/__init__.py", line 3, in <module>
    from pkd import core, data_loader, models, evaluation, utils, visualization, losses, operation
  File "<frozen importlib._bootstrap>", line 1415, in _handle_fromlist
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/aryan/FSCIL/FaceKD/pkd/core/__init__.py", line 3, in <module>
    from .lr_schedulers import WarmupMultiStepLR
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/aryan/FSCIL/FaceKD/pkd/core/lr_schedulers.py", line 1, in <module>
    import torch
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/__init__.py", line 1478, in <module>
    _C._initExtension(manager_path())
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/cuda/__init__.py", line 238, in <module>
    _lazy_call(_check_capability)
  File "/home/aryan/miniconda3/envs/facekd_new/lib/python3.12/site-packages/torch/cuda/__init__.py", line 235, in _lazy_call
    _queued_calls.append((callable, traceback.format_stack()))

import torch and torch.cuda.is_available() are working fine

Based on the error message it seems your setup has issues using your GPU and I would assume even calling torch.randn(1).cuda() will fail. If so, make sure your setup is able to use the GPU by running e.g. any CUDA sample.

@ptrblck thanks for the reply. torch.randn(1).cuda() didn’t fail, gave output tensor([0.0166], device='cuda:0'). Can you guide more?

Is the code still failing after the smoke test of creating a tensor on the device? If so, do you still see the device in nvidia-smi?

I am able to create a random tensor on cuda but how to check whether it’s visible on that device in nvidia-smi.
Below smoke_test script

import torch

def smoke_test():
    if torch.cuda.is_available():
        device = torch.device('cuda')
        try:
            # Attempt to create a tensor on the GPU
            tensor = torch.tensor([1, 2, 3], device=device)
            print(f"Tensor on device {device}: {tensor}")
            return True
        except Exception as e:
            print(f"Failed to create tensor on device {device}: {e}")
            return False
    else:
        print("CUDA is not available.")
        return False

success = smoke_test()
print(f"Smoke test successful: {success}")

gave the following output

Tensor on device cuda: tensor([1, 2, 3], device='cuda:0')
Smoke test successful: True

runnning nvidia-smi gives the following output

Thu Jun 20 17:07:57 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:07:00.0 Off |                    0 |
| N/A   29C    P0              68W / 400W |  66332MiB / 81920MiB |     33%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  | 00000000:0F:00.0 Off |                    0 |
| N/A   31C    P0              85W / 400W |  36568MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  | 00000000:47:00.0 Off |                    0 |
| N/A   26C    P0              66W / 400W |  16564MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          On  | 00000000:4E:00.0 Off |                    0 |
| N/A   26C    P0              67W / 400W |  16001MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          On  | 00000000:87:00.0 Off |                    0 |
| N/A   59C    P0             334W / 400W |  56080MiB / 81920MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB          On  | 00000000:90:00.0 Off |                    0 |
| N/A   34C    P0              71W / 400W |  16558MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB          On  | 00000000:B7:00.0 Off |                    0 |
| N/A   52C    P0             249W / 400W |  52900MiB / 81920MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB          On  | 00000000:BD:00.0 Off |                    0 |
| N/A   33C    P0              67W / 400W |  36420MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   2647657      C   python                                    64482MiB |
|    0   N/A  N/A   3531806      C   ...iniconda3/envs/python311/bin/python      554MiB |
|    0   N/A  N/A   3531807      C   ...iniconda3/envs/python311/bin/python      554MiB |
|    0   N/A  N/A   3531808      C   ...iniconda3/envs/python311/bin/python      554MiB |
|    1   N/A  N/A    550956      C   ...n/miniconda3/envs/alice/bin/python3    20564MiB |
|    1   N/A  N/A   2647657      C   python                                    15830MiB |
|    2   N/A  N/A    225618      C   ...n/miniconda3/envs/alice/bin/python3      554MiB |
|    2   N/A  N/A   2647657      C   python                                    15830MiB |
|    3   N/A  N/A   2647657      C   python                                    15830MiB |
|    4   N/A  N/A    225618      C   ...n/miniconda3/envs/alice/bin/python3    40074MiB |
|    4   N/A  N/A   2647657      C   python                                    15830MiB |
|    5   N/A  N/A    190920      C   ...n/miniconda3/envs/alice/bin/python3      554MiB |
|    5   N/A  N/A   2647657      C   python                                    15830MiB |
|    6   N/A  N/A    190920      C   ...n/miniconda3/envs/alice/bin/python3    20610MiB |
|    6   N/A  N/A   1857886      C   python                                    16278MiB |
|    6   N/A  N/A   2647657      C   python                                    15830MiB |
|    7   N/A  N/A    359339      C   ...n/miniconda3/envs/alice/bin/python3    20558MiB |
|    7   N/A  N/A   2647657      C   python                                    15686MiB |
+---------------------------------------------------------------------------------------+

None of the above processes is my process (some other processes are also running for other users, I am using NVIDIA-DGX Server Version 6.1.0(GNU/Linux 5.15.0-1029-nvidia x86_64))

@ptrblck do you have any idea about this?

Did you run the previously mentioned use case of allocating a single tensor, making sure it’s created on the device, and continuing with your whole script? If so, did it work or are you seeing the same error?

@ptrblck I am not able to allocate the single tensor at the beginning of my script, I am seeing the same error while allocating it. Although, I am able to do it separately, as told in the previous reply and torch.cuda.is_available() is still True in my original script.

Just to make sure I understand the current runs: you are able to allocate a tensor on the GPU using a standalone script, but allocating this tensor at the beginning of your actual training script fails? If so, are you using different Python environments as I cannot explain why the same code would fail otherwise.

@ptrblck I was using the same python env but the error resolved by changing the location of the import torch. Thanks for the help!

Which other libraries did you need to load before/after PyTorch to fix this issue? It seems an import might cause issues in communicating with your GPU.