How to fetch full batch of output when using 2 GPUs

Hi, I am a newbie of Pytorch. Does anyone help me on this problem. Thanks a lot~
I just find the batch_size of output is 1/2 of the input when running my model on two GPUs. So, how to get the whole batch of output?
part of my code is as following:

device_ids = [0,1]
model = nn.DataParallel(model, device_ids=device_ids)
model = model.cuda()

print('start')
start = time.time()
for i in range(1):
    inputs = torch.zeros(100, 64, 256).cuda()
    outputs = model(inputs)
    print(outputs.shape)
end = time.time()
print('time cost:',end-start)

where the second dim ‘64’ represents the batch_size.
Here is the results:

torch.Size([260, 50, 10000])
time cost: 3.447650671005249

Obviously the printed dimension of batch is 1/2 of the input.

I am sure the two GPUs are both used in this situation.

I find tutorial here:https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

So, I guess there are no particularly serious errors in my situation, right? But I don’t know how to retrieve the output completely…
p.s.: The inputs in my case are mel-spectrograms of speech clips with different durations. And I don’t know how to use dataloader with data of different dimension when global padding is not feasible.

nn.DataParallel will chunk the input data in dim0, so you should make sure to permute your data and adapt the model if necessary.

@ptrblck Thank you very much for your reply.

Is dim0 the dimension of batch? If so, the inputs in my code no longer need to be permuted, right? because ‘100’ in dim0 indicates the batch_size.

I thought dim1 represents the batch size: