LSTM example with multuple GPU error: module 'torch' has no attribute 'long'

There is an example of LSTM for pytorch.

The below code works fine when using CPU or 1 GPU. However, when I use more than 1 GPU, it gives error:

AttributeError: module ‘torch’ has no attribute ‘long’

The code that caused the error:

def prepare_sequence(seq, to_ix):
idxs = [to_ix[w] for w in seq]
return torch.tensor(idxs, dtype=torch.long)

Why it doesn’t work for multuple GPUs? In this example, the batch size is 1, so I don’t think it is the issue of batch.

Did you use DataParallel or how do you use more than 1 GPU?
If so, did you keep batch_size=1 for multiple GPUs?
DataParallel tries to split the batch between all GPUs, so batch_size=1 could be problematic.
However, the error message seems to point to another issue. Could you verify my assumptions?

I do have all batch size equal to 1, same as that example.

Now it gives me this error:

TypeError: Broadcast function not implemented for CPU tensors

Even though I make sure all input into model are type cuda in GPU mode, it still gives me above error. It works in CPU mode, in 1 GPU mode, but not in 4 GPU mode.

It’s because the batch size cannot be split between all GPUs.
For 4 GPUs you would need a batch size of at least 4.
Have a look at the DataParallel example.

1 Like

I understand what you are saying and it makes sense :slight_smile:

However, I got rid of my first error by updating pytorch, and now my error is (I’m still using 1 batch size):

TypeError: Broadcast function not implemented for CPU tensors

I understand this error is because in multi-GPUs mode, I have to make sure all input are cuda type? As in this example and this example. In both examples they don’t mention anything about batch size. So I’m wondering if the error is caused by some missing inputs that I havn’t converted into cuda type? See below code:

if not all(input.is_cuda for input in inputs):
raise TypeError(‘Broadcast function not implemented for CPU tensors’)

I wish those 2 examples include more details :confused:

Ah ok, sorry for the misunderstanding.
Could you post a small code snippet reproducing this error?
It seems your input is still on the CPU as you mentioned.

It is OK :smiley:

I am trying to implement this tutorial.

When there is more than 1 GPU, I run this:
model = nn.DataParallel(model, device_ids=range(torch.cuda.device_count()))

I also change hidden layer line
from:
model.hidden = model.init_hidden()
to:
model.module.hidden = model.module.init_hidden()

Whenever there is a tensor, I change it to .cuda(). I think it has no issue since it runs when there is 1 GPU. However, when I use 4 GPUs in the cluster environment, it says the forward function has this error:

torch/nn/parallel/_functions.py", line 11, in forward
raise TypeError(‘Broadcast function not implemented for CPU tensors’)
TypeError: Broadcast function not implemented for CPU tensors

Thank you for helping me out :smiley:

I also meet this problem.I think before you use nn.DataParallel(model,device_ids = [0,1,2,3]),you should initial model by using “model.cuda()”.so the complete sentence is

model.cuda()
model = nn.DataParallel(model,device_ids = [0,1,2,3])