Batch size in Dataparllel

Vincent_Zhang · December 3, 2018, 9:10am

The docs of Dataparalle said that

The batch size should be larger than the number of GPUs used.

when cooperated with dataloader, should we always set drop_last=True in case that batch size equals gpu numbers? Because in some cases, batch size < gpu works fine, but others don’t. Below is an example.

from torch.nn import DataParallel


class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.conv = nn.Conv2d(in_channels=1024, out_channels=19, kernel_size=1)

    def forward(self, *input, **kwargs):
        return self.conv(input[0])


    my_model = SimpleModel().cuda()
    my_model = DataParallel(my_model, device_ids=[0, 1])  # use 2 gpus
    input_val = torch.ones(1, 1024, 16, 16).cuda()  # batchsize is 1, smaller than 2
    output = my_model(input_val, arbitaray_arg='test')  # this cause error in the forward() of 2ed gpu 
    output = my_model(input_val)  # this works fine
    print(output.shape)

The questions are where could I find some clue in the implementation of Dataparallel?
How should we avoid this or just set drop_last=True?

Deepali · December 4, 2018, 5:36am

drop_last=True provides uniform batch size, so I suppose there is no harm in using it.