nn.DataParallel: TypeError: expected sequence object with len >= 0 or a single integer

In my forward function:

def __call__(self, train=True):
    if train:
        predicted = self.forward(...)
        loss = ....
        return loss # return a single value that's fine
        # loss.size() = the number of my GPUs.
    else:
        predicted = self.forward(...)
        return predicted # expected sequence object with len >= 0 or a single integer
        # In validation step I want to return the whole predict labels for other purpose
        # predicted with shape [16, 1] on each device and I have 4 GPU

My code works before model=nn.DataParallel(model).

Hi,

This is hard to say without more context.
Can you share the stack trace for your functoin as well as where this call function is defined?

The code is here. But the code is not quite organized. It has other problems prevent me from using nn.DataParallel only this one I cannot solve.

The first issue is that you should never redefine the __call__ method on a Module. Just the forward. This is going to provent it from working nicely with other parts of pytorch.

More generally, the error most likely refers to the creation of the DataParallel where the device argument does not have the right type.

I’m trying to remove the __call__(). But I don’t understand which part of the device is not right? To be honest, I don’t know which device should input.to(device) and model.to(device) use, when using nn.DataParallel. I just use device_ids[0].
You mean the bug is here? Thank you very much!

As mentioned in the DataParallel doc: " The parallelized module must have its parameters and buffers on device_ids[0] before running this DataParallel module."

I can’t find any reference to DataParallel in the repo so not sure where you do that. By I was talking about the place where you wrap your module in DataParallel.

Sorry, just the line before referenced.

model = Predictor(encoder, decoder, device)
# model.load_state_dict(torch.load("output/model/lr=0.001,dropout=0.1,lr_decay=0.5"))
model = nn.DataParallel(model, device_ids=[0,1,2,3])  # I add the code here
model.to(device)

Another question,(sorry I’m new to pytorch)


By this image in some blog about nn.DataParallel.
The first step in Backward(Compute loss gradient on GPU-1) results in imbalanced GPU usage.
Is that means in DataParallel loss.backward() only happens in GPU-1 not other GPUs, but optimizer.step(),optimizer.zero_grad() are parallel(step 2,3,4 in backward)?
Thank you very much.

What DataParallel does is more version 3 of this image: split the input on each GPUs and run on each of them independently. Then accumulate.
Note that the backward will run on the same device as the forward. whatever the device of the Tensor on which you call .backward().

Then where does the imbalanced GPU usage come from?
You means loss.backward() is also parallel right?
I’m a little confused.

It depends if the loss is inside the DataParallel or not.
If it is, then there won’t be any imbalance.
If it is outside and just computed on one GPU, then this GPU will do a bit more work indeed.

It depends if the loss is inside the DataParallel or not.

In DataParallel you mean inside the forward function? But most time the forward function won’t contain loss computation right ?
I’m also confused about imbalance come from loss.backward() or from loss=criterion(ture, pred) ?
Thank you for your patience!

The DataParallel takes a Module as input so it can contain anything you want :slight_smile:
And yes what is executed is what is in the forward function of your Module.

The imbalance won’t come from the loss.backward() because it runs at the same place as the forward. So if the forward is balanced, the backward will be as well.

Another weird problem in nn.DataParallel
in my main.py I put the model to device

encoder = Encoder(protein_dim, hid_dim, n_layers, kernel_size, dropout)
decoder = Decoder(atom_dim, hid_dim, n_layers, n_heads, pf_dim, DecoderLayer, SelfAttention,
                  PositionwiseFeedforward, dropout)
model = Predictor(encoder, decoder)
# model.load_state_dict(torch.load("output/model/lr=0.001,dropout=0.1,lr_decay=0.5"))
model = nn.DataParallel(model)
model.to(device)

trainer = Trainer(model, lr, weight_decay, scaler)
tester = Tester(model)
loss_train = trainer.train(train_dl, device=device)  # This line throw errors

But I got the following error

assert all(map(lambda i: i.is_cuda, inputs))
AssertionError

I have test all model.parameters() and inputs in train():

def train(self, dataloader, device):
    self.model.train()

    if self.scaler is None:
        for i, data_pack in enumerate(dataloader):
            data_pack = to_cuda(data_pack, device=device)

            assert (all(map(lambda i: i.is_cuda, self.model.parameters())))
            assert (all(map(lambda i: i.is_cuda, data_pack)))
            loss, _, _ = self.model(data_pack)  # This line throw errors

            self.optimizer.zero_grad()
            loss.sum().backward()
            self.optimizer.step()

The results are all True. But I still get this error in the third line loss, _, _ = self.model(data_pack)
What happened?
This is my forward function:

def forward(self, data):
    compound, adj, protein, correct_interaction, atom_num, protein_num = data
    # compound = [batch,atom_num, atom_dim]
    # adj = [batch,atom_num, atom_num]
    # protein = [batch,protein len, 100]

    compound_max_len = compound.shape[1]
    protein_max_len = protein.shape[1]
    compound_mask, protein_mask = self.make_masks(atom_num, protein_num, compound_max_len, protein_max_len)
    compound = self.gcn(compound, adj)
    # compound = torch.unsqueeze(compound, dim=0)
    # compound = [batch size=1 ,atom_num, atom_dim]

    # protein = torch.unsqueeze(protein, dim=0)
    # protein =[ batch size=1,protein len, protein_dim]
    enc_src = self.encoder(protein)
    # enc_src = [batch size, protein len, hid dim]

    predicted_interaction = self.decoder(compound, enc_src, compound_mask, protein_mask)
    # out = [batch size, 2]
    # out = torch.squeeze(out, dim=0)
    loss = self.Loss(predicted_interaction, correct_interaction.view(-1, 1))
    return torch.unsqueeze(loss, 0), predicted_interaction.cpu().detach().view(-1, 1), correct_interaction.cpu().detach().view(-1, 1)

Thank you very much !!!

From the DataParallel doc, you should send your model to the device before wrapping it in DataParallel!

You mean

model = nn.DataParallel(model)  # The order is wrong?
model.to(device)
model.to(device)  # This is right?
model = nn.DataParallel(model)

But in the doc

BTW, where is the complete doc of nn.DataParallel??

Here: https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel

I have tested both the orders. It doesn’t help.
Can you plz look at the code?

Is the error just with your own assert that checks if things are on the GPU?

No. I just use the assert to verify but all the inputs and parameters are on cuda. So I’m confused.