hi everyone
here is my code
from transformers import AutoModelForCausalLM, AutoTokenizer, QuantoConfig
import torch
device = "cuda:0"
model_id = "bigscience/bloom-560m"
quantization_config = QuantoConfig(weights="int8")
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32, device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_id)
text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
when i run i obtain the next error:
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
however when i check if cuda is available i obtain:
print('-------------------------------')
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.current_device())
print(torch.cuda.device(0))
print(torch.cuda.get_device_name(0))
print('Memory Usage:')
print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
print('Cached: ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')
True
1
0
<torch.cuda.device object at 0x7f8bf6d4a9b0>
GRID T4-16Q
Memory Usage:
Allocated: 0.0 GB
Cached: 0.0 GB
i run this code on colab and i do not have any issues, also i run the code on another machine with anoter gpu and in runs as expected,
the configuration of the machine in where i need to run and fails is
(test310) admin@appdev-llm-lnx1:~/llm/ModelsService$ nvidia-smi
Tue Jun 18 16:02:22 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 GRID T4-16Q On | 00000000:02:01.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 1MiB / 16384MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+
and the libraries
accelerate 0.31.0
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
certifi 2024.6.2
charset-normalizer 3.3.2
datasets 2.20.0
dill 0.3.8
filelock 3.15.1
frozenlist 1.4.1
fsspec 2024.5.0
huggingface-hub 0.23.4
idna 3.7
Jinja2 3.1.4
MarkupSafe 2.1.5
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.3
ninja 1.11.1.1
numpy 2.0.0
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105
packaging 24.1
pandas 2.2.2
pip 24.0
psutil 5.9.8
pyarrow 16.1.0
pyarrow-hotfix 0.6
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
quanto 0.2.0
regex 2024.5.15
requests 2.32.3
safetensors 0.4.3
setuptools 65.5.0
six 1.16.0
sympy 1.12.1
tokenizers 0.19.1
torch 2.3.1
tqdm 4.66.4
transformers 4.42.0.dev0
triton 2.3.1
typing_extensions 4.12.2
tzdata 2024.1
urllib3 2.2.2
xxhash 3.4.1
yarl 1.9.4
i do not if this affect but the machine is a virtual machine with wmware under a vgpu, also i tried to run a simple nn just for checking if the problem id with the transformers libray but i obtained the same error when i tried to locate info to the gpu
import torch
import torch.nn as nn
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1) # tensor([[-0.2678, 1.9252]])
print(t2) # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev)
print(t1) # tensor([[-0.2678, 1.9252]])
print(t1.is_cuda) # False
t1 = t1.to(dev)
print(t1) # tensor([[-0.2678, 1.9252]], device='cuda:0')
print(t1.is_cuda) # True
class M(nn.Module):
def __init__(self):
super().__init__()
self.l1 = nn.Linear(1,2)
def forward(self, x):
x = self.l1(x)
return x
model = M() # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) # True
Traceback (most recent call last):
File “/home/admin/llm/ModelsService/test.py”, line 14, in
t2 = torch.randn(1,2).to(dev)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile withTORCH_USE_CUDA_DSA
to enable device-side assertions.
by the way here info about my cuda
(test310) admin@appdev-llm-lnx1:~/llm/ModelsService$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
regards