Dataparallel only outputs one half of the total batch size on 2 gpus

Hi all,

When I run the following CONV-LSTM model with input of shape (20,1,4,128,128), where the batch is in dimension 0:

#shape (batch, time, channels, height, width)

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
model= nn.DataParallel(model, device_ids=[0, 1])
model =
a = torch.randn(20,1,4,128,128).to(device)
output,_ = model(initial_states, a)

The output shape is (10, 2, 4, 128, 128) rather than (20, 2, 4, 128, 128). Any reason why this is the case?



I found the reason for the error: it’s because my forward method has multiple arguments (i.e. model.forward(self, x, h, c) rather than just model.forward(self, x) ). However I am using a conv-LSTM architecture so the other arguments h and c are required. What is the solution in this case?

cc @ptrblck


Note that nn.DataParallel is in maintenance mode and you should thus use DistributedDataParallel which should not suffer from these issues as each process uses its own input.