The docs of Dataparalle said that
The batch size should be larger than the number of GPUs used.
when cooperated with dataloader, should we always set drop_last=True in case that batch size equals gpu numbers? Because in some cases, batch size < gpu works fine, but others don’t. Below is an example.
from torch.nn import DataParallel
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.conv = nn.Conv2d(in_channels=1024, out_channels=19, kernel_size=1)
def forward(self, *input, **kwargs):
return self.conv(input[0])
my_model = SimpleModel().cuda()
my_model = DataParallel(my_model, device_ids=[0, 1]) # use 2 gpus
input_val = torch.ones(1, 1024, 16, 16).cuda() # batchsize is 1, smaller than 2
output = my_model(input_val, arbitaray_arg='test') # this cause error in the forward() of 2ed gpu
output = my_model(input_val) # this works fine
print(output.shape)
The questions are where could I find some clue in the implementation of Dataparallel?
How should we avoid this or just set drop_last=True?