Batch_size does NOT equal after dataparallel forward

jamestang0219 · December 6, 2018, 8:35am

Hi, I trained a model on multi gpu using dataparallel, but at last batch, it got ValueError: Expected input batch_size to match target batch_size, anyone knows why?

model = torch.nn.DataParallel(model, device_ids=[0,1,2])
criterion = torch.nn.CrossEntropyLoss()
inputs = Batch.inputs.cuda()
targets = Batch.targets.cuda()
assert inputs.size(0) == targets.size(0)
logits = model(inputs)
loss = criterion(logits, targets)

ptrblck · December 6, 2018, 12:09pm

Could you print the shapes of the tensors yielding this error?
To get rid of this error, you could specify drop_last=True in your DataLoader for now.

I guess the last batch might have a single sample which might get squeezed somewhere, so that a size mismatch occurs.

jamestang0219 · December 10, 2018, 7:09am

the shape of TARGETS tensor in last yield batch is (12), and shape of INPUTS tensor is (12,128,512). But the shape of LOGITS tensor is (8, 311)

thanks!