Hello, I’m really struggling with setting up model parallelism for a large T5 model. I have 2 GPUs and I want to evaluate (inference only) a flan-t5-large
model (as it does not fit in one GPU).
I load the model:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = T5ForConditionalGeneration.from_pretrained('google/flan-t5-large',torch_dtype=torch.float16, device_map="auto")
model.parallelize()
model.to(device)
I put the inputs in the gpu:
input_ids=batch[0].to(device)
attention_mask=batch[1].to(device)
labels=batch[3].to(device)
But during the forward function
outputs = self.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
I get RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
.
I check that all input tensors are in device='cuda:0'
and as for the model:
model.device
device(type='cuda', index=0)
model.device_map
{0: [0, 1, 2, 3, 4, 5, 6, 7, 8, ...], 1: [12, 13, 14, 15, 16, 17, 18, 19, 20, ...]}
The error is specifically in line 260 in modeling_t5.py where I see that self.weight
is in 'cuda:0'
and hidden_states
in 'cuda:1'
.
I would really appreciate some help with this! Thanks in advance.