Did you use DataParallel or how do you use more than 1 GPU?
If so, did you keep batch_size=1 for multiple GPUs? DataParallel tries to split the batch between all GPUs, so batch_size=1 could be problematic.
However, the error message seems to point to another issue. Could you verify my assumptions?
I do have all batch size equal to 1, same as that example.
Now it gives me this error:
TypeError: Broadcast function not implemented for CPU tensors
Even though I make sure all input into model are type cuda in GPU mode, it still gives me above error. It works in CPU mode, in 1 GPU mode, but not in 4 GPU mode.
It’s because the batch size cannot be split between all GPUs.
For 4 GPUs you would need a batch size of at least 4.
Have a look at the DataParallel example.
I understand what you are saying and it makes sense
However, I got rid of my first error by updating pytorch, and now my error is (I’m still using 1 batch size):
TypeError: Broadcast function not implemented for CPU tensors
I understand this error is because in multi-GPUs mode, I have to make sure all input are cuda type? As in this example and this example. In both examples they don’t mention anything about batch size. So I’m wondering if the error is caused by some missing inputs that I havn’t converted into cuda type? See below code:
if not all(input.is_cuda for input in inputs):
raise TypeError(‘Broadcast function not implemented for CPU tensors’)
Ah ok, sorry for the misunderstanding.
Could you post a small code snippet reproducing this error?
It seems your input is still on the CPU as you mentioned.
When there is more than 1 GPU, I run this:
model = nn.DataParallel(model, device_ids=range(torch.cuda.device_count()))
I also change hidden layer line
from:
model.hidden = model.init_hidden()
to:
model.module.hidden = model.module.init_hidden()
Whenever there is a tensor, I change it to .cuda(). I think it has no issue since it runs when there is 1 GPU. However, when I use 4 GPUs in the cluster environment, it says the forward function has this error:
torch/nn/parallel/_functions.py", line 11, in forward
raise TypeError(‘Broadcast function not implemented for CPU tensors’)
TypeError: Broadcast function not implemented for CPU tensors
I also meet this problem.I think before you use nn.DataParallel(model,device_ids = [0,1,2,3]),you should initial model by using “model.cuda()”.so the complete sentence is
model.cuda() model = nn.DataParallel(model,device_ids = [0,1,2,3])