PyTorch for Cuda 12

For me the issue was pretty simple.
I’m Windows 10 under my user profile and when I want to install some package, I do it with another ADM account with elevated privileges’.

So… The issue. I’ve installed Torch as it should be for CUDA under ADM account, writing directly to “Program Files”. As I started jupyter-notebook, I was doing this under my user account.

Following @mcgeochd advice, I tried to remove Torch and I accidentally did it under user CMD to find out, that user account has torch-1.9.0 installed alongside with proper torch-2.1.0.dev20230702+cu121 installed with elevated privileges.

Uninstalling torch under user account without privileges did the final trick.

worked wonderfully, I registered an account just to like your post

Hello - i am just a hobbyist, but was wondering if anyone could help me as i can’t seem to get CUDA to recognize the GPUs?

I have CUDA 12 installed, and should be using the latest version of PyTorch as well.

import torch

device = torch.device(“cuda:0”)
x = torch.tensor([1.0]).to(device)
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 260, in _lazy_init
queued_call()
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 145, in _check_capability
capability = get_device_capability(d)
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 381, in get_device_capability
prop = get_device_properties(device)
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 399, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at “…/aten/src/ATen/cuda/CUDAContext.cpp”:50, please report a bug to PyTorch.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 264, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at “…/aten/src/ATen/cuda/CUDAContext.cpp”:50, please report a bug to PyTorch.

CUDA call was originally invoked at:

[’ File “”, line 1, in \n’, ’ File “”, line 991, in _find_and_load\n’, ’ File “”, line 975, in _find_and_load_unlocked\n’, ’ File “”, line 671, in _load_unlocked\n’, ’ File “”, line 848, in exec_module\n’, ’ File “”, line 219, in _call_with_frames_removed\n’, ’ File “/usr/local/lib/python3.8/dist-packages/torch/init.py”, line 1146, in \n _C._initExtension(manager_path())\n’, ’ File “”, line 991, in _find_and_load\n’, ’ File “”, line 975, in _find_and_load_unlocked\n’, ’ File “”, line 671, in _load_unlocked\n’, ’ File “”, line 848, in exec_module\n’, ’ File “”, line 219, in _call_with_frames_removed\n’, ’ File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 197, in \n _lazy_call(_check_capability)\n’, ’ File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 195, in _lazy_call\n _queued_calls.append((callable, traceback.format_stack()))\n’]

print(x)
Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘x’ is not defined

±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GRID A100D-7-80C On | 00000000:06:00.0 Off | On |
| N/A N/A P0 N/A / N/A | 0MiB / 81920MiB | N/A Default |
| | | Enabled |
±------------------------------±---------------------±---------------------+
| 1 GRID A100D-7-80C On | 00000000:07:00.0 Off | On |
| N/A N/A P0 N/A / N/A | 0MiB / 81920MiB | N/A Default |
| | | Enabled |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| MIG devices: |
±-----------------±---------------------±----------±----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 0 0 0 | 0MiB / 76011MiB | 98 0 | 7 0 5 1 1 |
| | 0MiB / 4096MiB | | |
±-----------------±---------------------±----------±----------------------+
| 1 0 0 0 | 0MiB / 76011MiB | 98 0 | 7 0 5 1 1 |
| | 0MiB / 4096MiB | | |
±-----------------±---------------------±----------±----------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

It seems MIG is enabled so either create MIG compute instances, if you want to use it, or disable it.

idont know how to fix this , i am using automatic1111

venv “D:\stable-diffusion-webui\venv\Scripts\Python.exe”
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments:
no module ‘xformers’. Processing without…
no module ‘xformers’. Processing without…
No module ‘xformers’. Proceeding without it.
Loading weights [6ce0161689] from D:\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Creating model from config: D:\stable-diffusion-webui\configs\v1-inference.yaml
Startup time: 5.4s (prepare environment: 1.3s, import torch: 1.7s, import gradio: 0.5s, setup paths: 0.4s, initialize shared: 0.2s, other imports: 0.3s, load scripts: 0.5s, create ui: 0.3s, gradio launch: 0.3s).
Applying attention optimization: Doggettx… done.
Model loaded in 7.0s (load weights from disk: 0.5s, create model: 0.3s, apply weights to model: 2.0s, apply half(): 1.0s, calculate empty prompt: 3.1s).
10%|████████▎ | 2/20 [00:00<00:08, 2.03it/s]Exception in thread MemMon:█▋ | 2/20 [00:00<00:01, 14.60it/s]

Traceback (most recent call last):
File “C:\Users\ryan\AppData\Local\Programs\Python\Python310\lib\threading.py”, line 1016, in _bootstrap_inner
*** Error completing request
*** Arguments: (‘task(93urvt8596sudh2)’, ‘help’, ‘’, , 20, ‘DPM++ 2M Karras’, 1, 1, 7, 512, 512, False, 0.7, 2, ‘Latent’, 0, 0, 0, ‘Use same checkpoint’, ‘Use same sampler’, ‘’, ‘’, , <gradio.routes.Request object at 0x00000168940C7C40>, 0, False, ‘’, 0.8, -1, False, -1, 0, 0, 0, False, False, ‘positive’, ‘comma’, 0, False, False, ‘’, 1, ‘’, , 0, ‘’, , 0, ‘’, , True, False, False, False, 0, False) {}
self.run()
Traceback (most recent call last):
File “D:\stable-diffusion-webui\modules\call_queue.py”, line 57, in f
res = list(func(*args, **kwargs))
File “D:\stable-diffusion-webui\modules\call_queue.py”, line 36, in f
res = func(*args, **kwargs)
File “D:\stable-diffusion-webui\modules\txt2img.py”, line 55, in txt2img
processed = processing.process_images(p)
File “D:\stable-diffusion-webui\modules\processing.py”, line 732, in process_images
res = process_images_inner(p)
File “D:\stable-diffusion-webui\modules\processing.py”, line 867, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File “D:\stable-diffusion-webui\modules\processing.py”, line 1140, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File “D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py”, line 235, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File “D:\stable-diffusion-webui\modules\sd_samplers_common.py”, line 261, in launch_sampling
return func()
File “D:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py”, line 235, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils_contextlib.py”, line 115, in decorate_context
return func(*args, **kwargs)
File “D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py”, line 594, in sample_dpmpp_2m
denoised = model(x, sigmas[i] * s_in, **extra_args)
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\stable-diffusion-webui\modules\sd_samplers_cfg_denoiser.py”, line 169, in forward
x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py”, line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File “D:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py”, line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File “D:\stable-diffusion-webui\modules\sd_hijack_utils.py”, line 17, in
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File “D:\stable-diffusion-webui\modules\sd_hijack_utils.py”, line 28, in call
return self.__orig_func(*args, **kwargs)
File “D:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py”, line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py”, line 1335, in forward
out = self.diffusion_model(x, t, context=cc)
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\stable-diffusion-webui\modules\sd_unet.py”, line 91, in UNetModel_forward
return ldm.modules.diffusionmodules.openaimodel.copy_of_UNetModel_forward_for_webui(self, x, timesteps, context, *args, **kwargs)
File “D:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py”, line 802, in forward
h = module(h, emb, context)
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py”, line 84, in forward
x = layer(x, context)
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py”, line 327, in forward
x = self.norm(x)
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “D:\stable-diffusion-webui\extensions-builtin\Lora\networks.py”, line 459, in network_GroupNorm_forward
return originals.GroupNorm_forward(self, input)
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\normalization.py”, line 273, in forward
return F.group_norm(
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\functional.py”, line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.


File “D:\stable-diffusion-webui\modules\memmon.py”, line 53, in run
free, total = self.cuda_mem_get_info()
File “D:\stable-diffusion-webui\modules\memmon.py”, line 34, in cuda_mem_get_info
return torch.cuda.mem_get_info(index)
Traceback (most recent call last):
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py”, line 618, in mem_get_info
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py”, line 488, in run_predict
output = await app.get_blocks().process_api(
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py”, line 1431, in process_api
result = await self.call_function(
return torch.cuda.cudart().cudaMemGetInfo(device)
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py”, line 1103, in call_function
prediction = await anyio.to_thread.run_sync(
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
File “D:\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py”, line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(

File “D:\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py”, line 877, in run_sync_in_worker_thread
return await future
File “D:\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py”, line 807, in run
result = context.run(func, *args)
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py”, line 707, in wrapper
response = f(*args, **kwargs)
File “D:\stable-diffusion-webui\modules\call_queue.py”, line 77, in f
devices.torch_gc()
File “D:\stable-diffusion-webui\modules\devices.py”, line 51, in torch_gc
torch.cuda.empty_cache()
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py”, line 133, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

*** Error completing request
*** Arguments: (‘task(yzh9m3b7twmyjef)’, ‘help’, ‘’, , 20, ‘DPM++ 2M Karras’, 1, 1, 7, 512, 512, False, 0.7, 2, ‘Latent’, 0, 0, 0, ‘Use same checkpoint’, ‘Use same sampler’, ‘’, ‘’, , <gradio.routes.Request object at 0x000001689376A5F0>, 0, False, ‘’, 0.8, -1, False, -1, 0, 0, 0, False, False, ‘positive’, ‘comma’, 0, False, False, ‘’, 1, ‘’, , 0, ‘’, , 0, ‘’, , True, False, False, False, 0, False) {}
Traceback (most recent call last):
File “D:\stable-diffusion-webui\modules\call_queue.py”, line 57, in f
res = list(func(*args, **kwargs))
File “D:\stable-diffusion-webui\modules\call_queue.py”, line 32, in f
shared.state.begin(job=id_task)
File “D:\stable-diffusion-webui\modules\shared_state.py”, line 119, in begin
devices.torch_gc()
File “D:\stable-diffusion-webui\modules\devices.py”, line 51, in torch_gc
torch.cuda.empty_cache()
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py”, line 133, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.


Traceback (most recent call last):
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py”, line 488, in run_predict
output = await app.get_blocks().process_api(
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py”, line 1431, in process_api
result = await self.call_function(
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py”, line 1103, in call_function
prediction = await anyio.to_thread.run_sync(
File “D:\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py”, line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File “D:\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py”, line 877, in run_sync_in_worker_thread
return await future
File “D:\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py”, line 807, in run
result = context.run(func, *args)
File “D:\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py”, line 707, in wrapper
response = f(*args, **kwargs)
File “D:\stable-diffusion-webui\modules\call_queue.py”, line 77, in f
devices.torch_gc()
File “D:\stable-diffusion-webui\modules\devices.py”, line 51, in torch_gc
torch.cuda.empty_cache()
File “D:\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py”, line 133, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal instruction was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

1 Like

Hey bro,
I signed up just to say thanks.
So thanks, that seems to have done the trick.

Very nice of you to share this, you may have saved me many hours of time.

(Y) (Y) (Y)

Hello I tried to follow suggestions from the topic but I still couldn’t make it work.
I have NVIDIA Tesla P6. This is my nvidia-smi:
image

I have container based on nvcr.io/nvidia/pytorch:23.02-py3

When I run it: cuda_is_available() returns true.

But when I run following code:

model_path = "./resource/model/distilbart-cnn-12-6"
model = BartForConditionalGeneration.from_pretrained(model_path).to("cuda")

it throws:

File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1128, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: operation not supported

I tried to install torch==2.0.1+cu118 but it just change the error to:

  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 1878, in to
    return super().to(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
1 Like

I have the same problem…
I have:
Python 3.8.16
CUDA Version: 12.0
Driver Version: 525.105.17

Here is the all modules installed in environment:

absl-py==2.0.0
aiohttp==3.8.5
aiosignal==1.3.1
alabaster==0.7.13
annotated-types==0.5.0
antlr4-python3-runtime==4.9.3
anyio==3.7.1
appdirs==1.4.4
arrow==1.2.3
asttokens==2.4.0
async-timeout==4.0.3
attrdict==2.0.1
attrs==23.1.0
audioread==3.0.0
Babel==2.12.1
backcall==0.2.0
backoff==2.2.1
beautifulsoup4==4.12.2
black==19.10b0
blessed==1.20.0
boto3==1.28.51
botocore==1.31.51
braceexpand==0.1.7
cachetools==5.3.1
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==3.2.0
click==8.1.7
cmake==3.27.5
colorama==0.4.6
comm==0.1.4
contourpy==1.1.1
croniter==1.4.1
cycler==0.11.0
dateutils==0.6.12
decorator==5.1.1
deepdiff==6.5.0
Distance==0.1.3
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.20.1
editdistance==0.6.2
exceptiongroup==1.1.3
executing==1.2.0
fastapi==0.103.1
fasttext==0.9.2
filelock==3.12.4
fonttools==4.42.1
frozendict==2.3.8
frozenlist==1.4.0
fsspec==2023.9.1
g2p-en==2.1.0
gdown==4.7.1
gitdb==4.0.10
GitPython==3.1.36
google-auth==2.23.0
google-auth-oauthlib==1.0.0
grpcio==1.58.0
h11==0.14.0
h5py==3.9.0
huggingface-hub==0.17.2
hydra-core==1.3.2
idna==3.4
imagesize==1.4.1
importlib-metadata==6.8.0
importlib-resources==6.0.1
inflect==7.0.0
iniconfig==2.0.0
inquirer==3.1.3
ipadic==1.0.0
ipython==8.12.2
ipywidgets==8.1.1
isort==4.3.21
itsdangerous==2.1.2
jedi==0.19.0
jieba==0.42.1
Jinja2==3.1.2
jiwer==3.0.3
jmespath==1.0.1
joblib==1.3.2
jupyterlab-widgets==3.0.9
kaldi-io==0.9.8
kaldi-python-io==1.2.2
kaldiio==2.18.0
kiwisolver==1.4.5
latexcodec==2.0.1
lazy_loader==0.3
Levenshtein==0.21.1
librosa==0.10.1
lightning==2.0.9
lightning-cloud==0.5.38
lightning-utilities==0.9.0
lit==16.0.6
llvmlite==0.40.1
loguru==0.7.2
lxml==4.9.3
Markdown==3.4.4
markdown-it-py==3.0.0
MarkupSafe==2.1.3
marshmallow==3.20.1
matplotlib==3.7.3
matplotlib-inline==0.1.6
mdurl==0.1.2
mecab-python3==1.0.5
mpmath==1.3.0
msgpack==1.0.5
multidict==6.0.4
nemo-toolkit==1.18.1
networkx==3.1
nltk==3.8.1
numba==0.57.1
numpy==1.23.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.2
omegaconf==2.3.0
onnx==1.14.1
OpenCC==1.1.6
ordered-set==4.1.0
packaging==23.1
pandas==2.0.3
pangu==4.0.6.1
parameterized==0.9.0
parso==0.8.3
pathspec==0.11.2
pathtools==0.1.2
pesq==0.0.4
pexpect==4.8.0
pickleshare==0.7.5
Pillow==10.0.1
pip-api==0.0.30
pipreqs==0.4.13
plac==1.4.0
platformdirs==3.10.0
pluggy==1.3.0
pooch==1.7.0
portalocker==2.8.2
prompt-toolkit==3.0.39
protobuf==4.24.3
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyannote.core==5.0.0
pyannote.database==5.0.1
pyannote.metrics==3.2.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pybind11==2.11.1
pybtex==0.24.0
pybtex-docutils==1.0.3
pycparser==2.21
pydantic==2.1.1
pydantic_core==2.4.0
pydub==0.25.1
Pygments==2.16.1
PyJWT==2.8.0
pyparsing==3.1.1
pypinyin==0.49.0
PySocks==1.7.1
pystoi==0.3.3
pytest==7.4.2
pytest-runner==6.0.0
python-dateutil==2.8.2
python-editor==1.0.4
python-multipart==0.0.6
pytorch-lightning==1.8.6
pytz==2023.3.post1
PyYAML==5.4.1
rapidfuzz==3.3.0
readchar==4.0.5
regex==2023.8.8
requests==2.31.0
requests-oauthlib==1.3.1
rich==13.5.3
rsa==4.9
ruamel.yaml==0.17.32
ruamel.yaml.clib==0.2.7
s3transfer==0.6.2
sacrebleu==2.3.1
sacremoses==0.0.53
safetensors==0.3.3
scikit-learn==1.3.0
scipy==1.10.1
sentencepiece==0.1.99
sentry-sdk==1.31.0
setproctitle==1.3.2
shellingham==1.5.3
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soundfile==0.12.1
soupsieve==2.5
sox==1.4.1
soxr==0.3.6
Sphinx==7.1.2
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-bibtex==2.6.1
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
stack-data==0.6.2
starlette==0.27.0
starsessions==1.3.0
sympy==1.12
tabulate==0.9.0
tensorboard==2.14.0
tensorboard-data-server==0.7.1
tensorboardX==2.6.2.2
termcolor==2.3.0
text-unidecode==1.3
texterrors==0.4.4
threadpoolctl==3.2.0
tokenizers==0.13.3
toml==0.10.2
tomli==2.0.1
torch==2.0.1+cu118
torch-stft==0.1.4
torchmetrics==0.11.4
torchvision==0.15.2+rocm5.4.2
tqdm==4.66.1
traitlets==5.10.0
transformers==4.33.2
triton==2.0.0
typed-ast==1.5.5
typer==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
Unidecode==1.3.6
urllib3==1.26.16
uvicorn==0.23.2
wandb==0.15.10
wcwidth==0.2.6
webdataset==0.1.62
websocket-client==1.6.3
websockets==11.0.3
Werkzeug==2.3.7
wget==3.2
widgetsnbextension==4.0.9
wordninja==2.0.0
wrapt==1.15.0
yarg==0.1.9
yarl==1.9.2
youtokentome==1.0.6
zipp==3.17.0

Are you able to execute any CUDA application? Your nvidia-smi output indicates you are using a vGPU vía GRID drivers?

Yes, it is vGPU on k8s.

I tried to use onnxruntime lib but I struggle with it as well - it doesn’t support CUDA 12.0 out of the box. Do you have any app in mind that I can use to test my CUDA?

I also found out that it might be licencing problem and it will requred to configure licencing server in my environment before I can use it:
image

I’ve got this problem until now. In my case it was token problem. I refresh token in /etc/nvidia/ClientConfigToken/ directory. Maybe you have the same issue.


which PyTorch version should i install for this specs?

Hello,

I would like to ask if Pytorch is compatible with Cuda 12.2.

Yes, PyTorch is compatible with CUDA 12.2 and you can build from source if you need this CUDA version.

1 Like

There are cuda compatible Pytorch packages present in conda-forge now . That might help folks looking for cuda-12.0 compatible packages.

conda install -c conda-forge pytorch=2.1.0 cuda-version=12.0