Automatically cast input to Huggingface model’s device map

souryadey · March 11, 2024, 9:02pm

This is a question on the Huggingface transformers library.

Is there a way to automatically infer the device of the model when using auto device map, and cast the input tensor to that?

Here’s what I have now:

import transformers
import torch

DEVICE = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

tokenizer = transformers.AutoTokenizer.from_pretrained(<model_id>)
model = transformers.AutoModelForCausalLM.from_pretrained(<model_id>, device_map = 'auto')

prompt = "Hello how are you"
prompt_obj = tokenizer(prompt, return_tensors = 'pt').to(DEVICE)
# proceed with model.generate

Instead of hardcoding DEVICE, I’d like to infer it from the model’s device map. Something like:

# inferred_device = <some code that maybe involves model.hf_device_map>
prompt_obj = tokenizer(prompt, return_tensors = 'pt').to(inferred_device)

Is there a way to do this?

nicomanzonelli · May 26, 2024, 3:37am

Hi, the model loaded using Huggingface will have an attribute named hf_device_map which maps the names of certain layers to the device that the layer is physically on. You can use this to map the input to the first layer’s device.

first_layer_name = list(model.hf_device_map.keys())[0]
self.device = model.hf_device_map[first_layer_name]

Keep in mind, you could run into memory issues if the if there’s no memory left on the device with the first layer on it.

The pipeline functions help abstract the need to do this manually away.