Gemma 3 throws RuntimeError CUDA misaligned address

msi-sbraun-11 · June 2, 2025, 7:00am

Hi there,

I am trying to use Gemma 3 12b it model to generate QA pairs. The pipeline is defined as follows:

model_id = "google/gemma-3-12b-it" # google/gemma-3-12b-it

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype = torch.float32,
        device_map="cuda",
        quantization_config=bnb_config
        )
    tokenizer = AutoTokenizer.from_pretrained(model_id)

    if tokenizer.pad_token is None:
        eos_token_id = model.config.eos_token_id
        eos_token = tokenizer.decode(eos_token_id)
        tokenizer.pad_token = eos_token  # this is a string, which is expected

    text_gen_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=512,
        torch_dtype=torch.float32, 
        top_p = 0.95,
        top_k = 70,
        temperature = 1.25,
        do_sample=True,
        repetition_penalty=1.3,
    )

    llm = HuggingFacePipeline(pipeline=text_gen_pipeline)

    model = ChatHuggingFace(llm=llm)

When I use this model using invoke function, at some point it threw an error:

  File "/home/nokia-proj/miniconda3/envs/vrag/lib/python3.10/site-packages/transformers/integrations/sdpa_attent
ion.py", line 54, in sdpa_attention_forward
    attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: CUDA error: misaligned address

Any ideas why this error was encountered and how to resolve this?

Thank you!

PS: Could prompt format be an issue? My prompt is currently like this (when I say ‘like this’, I here mean that there is no indentation for the prompt):

"""

You will be given a context and a question.

Your task is to provide a 'total rating' scoring how well one can answer the given question unambiguously with the given context.

Give your answer on a scale of 1 to 5, where 1 means that the question is not answerable at all given the context, and 5 means that the question is clearly and unambiguously answerable with the context.

Provide your answer as follows:

Answer:::

Evaluation: (your rationale for the rating, as a text)

Total rating: (your rating, as a number between 1 and 5)

You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.

Now here are the question and context.

Question: {question}\n

Context: {context}\n

Answer::: """

ptrblck · June 3, 2025, 10:25pm

Could you add the missing parts of your code making it executable?
Also, which PyTorch version are you using and are you able to reproduce this error in the latest nightly or stable release?