Hi, I trained a model on multi gpu using dataparallel, but at last batch, it got ValueError: Expected input batch_size to match target batch_size, anyone knows why?
model = torch.nn.DataParallel(model, device_ids=[0,1,2])
criterion = torch.nn.CrossEntropyLoss()
inputs = Batch.inputs.cuda()
targets = Batch.targets.cuda()
assert inputs.size(0) == targets.size(0)
logits = model(inputs)
loss = criterion(logits, targets)
ptrblck
December 6, 2018, 12:09pm
2
Could you print the shapes of the tensors yielding this error?
To get rid of this error, you could specify drop_last=True
in your DataLoader
for now.
I guess the last batch might have a single sample which might get squeezed somewhere, so that a size mismatch occurs.
the shape of TARGETS tensor in last yield batch is (12), and shape of INPUTS tensor is (12,128,512). But the shape of LOGITS tensor is (8, 311)
thanks!