Hi,
I got the following error in my docker container.
It says CUDA is available but couldn’t get the current_device.
I was able to run nvidia-smi in my container, but even I pass the GPUID to CUDA_VISIBLE_DEVICES, the application cannot catch the device.
Could you give any guidance to solve this problem?
Thanks!
root@XXXX:/project# python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.
import torch
torch.cuda.is_available()
True
torch.cuda.current_device()
Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 242, in _lazy_init
queued_call()
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 125, in _check_capability
capability = get_device_capability(d)
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 357, in get_device_capability
prop = get_device_properties(device)
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 375, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at “…/aten/src/ATen/cuda/CUDAContext.cpp”:50, please report a bug to PyTorch.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File “”, line 1, in
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 552, in current_device
_lazy_init()
File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 246, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at “…/aten/src/ATen/cuda/CUDAContext.cpp”:50, please report a bug to PyTorch.
CUDA call was originally invoked at:
[’ File “”, line 1, in \n’, ’ File “”, line 991, in _find_and_load\n’, ’ File “”, line 975, in _find_and_load_unlocked\n’, ’ File “”, line 671, in _load_unlocked\n’, ’ File “”, line 843, in exec_module\n’, ’ File “”, line 219, in _call_with_frames_removed\n’, ’ File “/opt/conda/lib/python3.8/site-packages/torch/init.py”, line 798, in \n _C._initExtension(manager_path())\n’, ’ File “”, line 991, in _find_and_load\n’, ’ File “”, line 975, in _find_and_load_unlocked\n’, ’ File “”, line 671, in _load_unlocked\n’, ’ File “”, line 843, in exec_module\n’, ’ File “”, line 219, in _call_with_frames_removed\n’, ’ File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 179, in \n _lazy_call(_check_capability)\n’, ’ File “/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py”, line 177, in _lazy_call\n _queued_calls.append((callable, traceback.format_stack()))\n’]