Trying to explain Zephyr generative LLM


I’m trying to run the new captum’s features to explain zephyr, especially LLMAttribution and TextTokenInput to apply FeatureAblation, ShapleyValues and Lime.
I have the following common error message: “RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!”

Here is the related snippet of code (following Captum’s tutorial):

model_name = ‘HuggingFaceH4/zephyr-7b-beta’
model = AutoModelForCausalLM.from_pretrained(
device_map=“cuda”, # dispatch efficiently the model on the available ressources

tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt_template = “Dave lives in Palm Coast, FL and is a lawyer. His personal interests include”
target = ‘playing piano’

explainer = FeatureAblation(model)
llm_attr = LLMAttribution(explainer, tokenizer)

inp = TextTokenInput(
skip_tokens=[1], # skip the special token for the start of the text

attr_res = llm_attr.attribute(inp, target=target)

It works when device_map = ‘cpu’, and it doesn’t work when device_map=‘auto’

Hi @milanbhan , could you post the full error trace? The error simply means there is one tensor was not in cuda. The whole error log may help us identify the tensor.

I precise that I have the same problem using other models such as Phi2 or Orca.
Here is the error message :

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Curious if you have tried Llama and did you encounter this issue? I doubt the error is bug in Captum captum/attr/_core/ that attention_mask is not converted to the same device of the model. Huggingface may have different handlings in different models, as we did not see this issue in Llama.

To fix the issue for now, you may try to move the attention_mask with in captum/attr/_core/

Thank you it works now :slight_smile: