Do I need to renaming keys in state_dict? (warning : beginner)

Hi!

For the first time, I’m trying to use a LLM from Jupyterlab and not from an UI. I already met a lot of technical issues but I don’t have any idea about how to fix this one.

I already installed all the necessary libraries including GPTQ_for_LlaMa (for cuda).

Here is my code to load this model : TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ · Hugging Face

import sys
import os
import guidance
import transformers

sys.path.append(os.path.realpath("./libs/gptq")+"/")
import llama_inference

llama_inference.transformers = transformers

tokenizer = transformers.LlamaTokenizer.from_pretrained("TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ")

model = llama_inference.load_quant("TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ","Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors",4,128,0)

But after that, during the generation using Guidance, I get :

llm = guidance.llms.transformers.Vicuna(model=model, tokenizer=tokenizer)

# we can pre-define valid option sets
valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]

# define the prompt
character_maker = guidance("""The following is a character profile for an RPG game in JSON format.
```json
{
    "id": "{{id}}",
    "description": "{{description}}",
    "name": "{{gen 'name'}}",
    "age": {{gen 'age' pattern='[0-9]+' stop=','}},
    "armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
    "weapon": "{{select 'weapon' options=valid_weapons}}",
    "class": "{{gen 'class'}}",
    "mantra": "{{gen 'mantra' temperature=0.7}}",
    "strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
    "items": [{{#geneach 'items' num_iterations=5 join=', '}}"{{gen 'this' temperature=0.7}}"{{/geneach}}]
}```""")

# generate a character
character_maker(
    id="e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
    description="A quick and nimble fighter.",
    valid_weapons=valid_weapons, llm=llm
)


#===================================


Exception in thread Thread-6 (generate):
Traceback (most recent call last):

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/transformers/generation/utils.py", line 1522, in generate
    return self.greedy_search(

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
    outputs = self(

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 691, in forward
    outputs = self.model(

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 579, in forward
    layer_outputs = decoder_layer(

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 293, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 195, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)

  File "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)

  File "/home/x00/Bureau/libs/gptq/quant.py", line 279, in forward
    quant_cuda.vecquant4matmul(x.float(), self.qweight, out, self.scales.float(), self.qzeros, self.g_idx)

RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED at "/home/x00/anaconda3/envs/GPT/lib/python3.10/site-packages/torch/include/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h":60, please report a bug to PyTorch.

My PC :

  • Linux mint cinnamon

  • AMD Radeon 6800XT

I’m really lost now, I don’t even know where to start. What should I do to fix this? I tried to add everything I could but do not hesitate to ask me the questions you could have.

It seems you are running into an internal assert in the rocm stack. I would recommend creating an issue on GitHub with a minimal and executable code snippet (if possible) so that the code owners could take a look at it.