I am trying to test LLM for the first time. And so I tried to run the GPTQ language model from the huggingface website. I wrote the following code:
from transformers import pipeline
# Create a pipeline to generate text using the model
pipe = pipeline(“text-generation”, model=“TheBloke/Open_Gpt4_8x7B_v0.2-GPTQ”)
# Function to generate text based on user input
def generate_text(prompt, max_length=100):
return pipe(prompt, max_length=max_length, num_return_sequences=1)[0]['generated_text']
# Main loop for user input
if __name__ == “__main__”:
print(“Enter query text to generate or ‘exit’ to exit.”)
while True:
user_input = input("Query: ”)
if user_input.lower() == 'exit':
break
result = generate_text(user_input)
print(f “Result:\n{result}”)
This code prints that it doesn’t see CUDA, and tries to run it in quantized, which I don’t want:
/home/tim/PycharmProjects/pythonProject/.venv/lib/python3.10/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:411: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, qweight, scales, qzeros, g_idx, bits, maxq):
/home/tim/PycharmProjects/pythonProject/.venv/lib/python3.10/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:419: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
/home/tim/PycharmProjects/pythonProject/.venv/lib/python3.10/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@custom_fwd(cast_inputs=torch.float16)
CUDA extension not installed.
CUDA extension not installed.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
When checking for CUDA presence using pytorch and terminal in pycharm. Pytorch deduced that CUDA is present and displayed version 1.21. In terminal, I checked via nvidia commands to check driver and CUDA-Tools availability, which also showed that CUDA is present on the PC. Here is the full output:
(.venv) tim:~/PycharmProjects/pythonProject$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
(.venv) tim:~/PycharmProjects/pythonProject$ nvidia-smi
Mon Aug 26 17:55:16 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | MIG M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3050 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 54C P0 N/A / 60W | 8MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1449 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
(.venv) tim:~/PycharmProjects/pythonProject$ python -c “import torch; print(torch.cuda.is_available())”
True
(.venv) tim:~/PycharmProjects/pythonProject$ python -c “import torch; print(torch.version.cuda)”
12.1
Hence Pycharm and Pytorch see CUDA, so it comes out that some other or something else doesn’t see CUDA.
Could be a problem with the auto-gptq version, but reinstalling it didn’t help. Hence maybe a different version is needed, but couldn’t find which one is needed.
And do we need to pay attention to these warnings?
Maybe someone has encountered it, I would be very grateful for any help. Thanks in advance.
System: Ubuntu 22.04, pytorch for CUDA 12.1, cudnn 8.9.3, python 3.10, auto-gptq 7.1.0
Downloaded pytorch Nightly because it solves some bugs, in my case it didn’t help. ChatGPT did not help with this problem.