[model.parallelize()] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Hello, I’m really struggling with setting up model parallelism for a large T5 model. I have 2 GPUs and I want to evaluate (inference only) a flan-t5-large model (as it does not fit in one GPU).

I load the model:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = T5ForConditionalGeneration.from_pretrained('google/flan-t5-large',torch_dtype=torch.float16, device_map="auto")
model.parallelize()
model.to(device)

I put the inputs in the gpu:

input_ids=batch[0].to(device)
attention_mask=batch[1].to(device)
labels=batch[3].to(device)

But during the forward function

outputs = self.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)

I get RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!.

I check that all input tensors are in device='cuda:0' and as for the model:

model.device
device(type='cuda', index=0)

model.device_map
{0: [0, 1, 2, 3, 4, 5, 6, 7, 8, ...], 1: [12, 13, 14, 15, 16, 17, 18, 19, 20, ...]}

The error is specifically in line 260 in modeling_t5.py where I see that self.weight is in 'cuda:0' and hidden_states in 'cuda:1'.

I would really appreciate some help with this! Thanks in advance.

Calling model.parallelize() and then model.to(device) immediately after looks suspicious as you would be trampling over the parallelize function’s own .to() call so I would consider removing the model.to(device) call in your script.

I removed it and still got the error. But, after trying everything I found that it worked if I also removed the argument device_map="auto" when I load the model!

So the above should be:

model = T5ForConditionalGeneration.from_pretrained('google/flan-t5-large',torch_dtype=torch.float16)
model.parallelize()

Thanks @eqy !

1 Like

Thank you @Katerina_Margatina ! I ran into the same problem and your solution works mysteriously like a charm!

1 Like