Hi there,
I am trying to use Gemma 3 12b it model to generate QA pairs. The pipeline is defined as follows:
model_id = "google/gemma-3-12b-it" # google/gemma-3-12b-it
bnb_config = BitsAndBytesConfig(
load_in_4bit=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype = torch.float32,
device_map="cuda",
quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
if tokenizer.pad_token is None:
eos_token_id = model.config.eos_token_id
eos_token = tokenizer.decode(eos_token_id)
tokenizer.pad_token = eos_token # this is a string, which is expected
text_gen_pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
torch_dtype=torch.float32,
top_p = 0.95,
top_k = 70,
temperature = 1.25,
do_sample=True,
repetition_penalty=1.3,
)
llm = HuggingFacePipeline(pipeline=text_gen_pipeline)
model = ChatHuggingFace(llm=llm)
When I use this model using invoke
function, at some point it threw an error:
File "/home/nokia-proj/miniconda3/envs/vrag/lib/python3.10/site-packages/transformers/integrations/sdpa_attent
ion.py", line 54, in sdpa_attention_forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: CUDA error: misaligned address
Any ideas why this error was encountered and how to resolve this?
Thank you!
PS: Could prompt format be an issue? My prompt is currently like this (when I say ‘like this’, I here mean that there is no indentation for the prompt):
"""
You will be given a context and a question.
Your task is to provide a 'total rating' scoring how well one can answer the given question unambiguously with the given context.
Give your answer on a scale of 1 to 5, where 1 means that the question is not answerable at all given the context, and 5 means that the question is clearly and unambiguously answerable with the context.
Provide your answer as follows:
Answer:::
Evaluation: (your rationale for the rating, as a text)
Total rating: (your rating, as a number between 1 and 5)
You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.
Now here are the question and context.
Question: {question}\n
Context: {context}\n
Answer::: """