In the DataParallel documentation it is mentioned that:
The batch size should be larger than the number of GPUs used.
When using DataLoader, during training one can specify
drop_last=True so that we can make sure that no batch has size smaller than the number of GPUs.
But when evaluating the model, you cannot simply drop the last batch. You should evaluate that batch as well (Although it might not make a lot of difference). Depending on the size of the Dataset, it is possible that the last batch will have size smaller than the number of GPUs.
How should one handle this situation in an elegant way?
- I want to use DataParallel at evaluation time.
- I want to evaluate all batches, including the last batch.
- The last batch can have size smaller than number of GPUs.
What should I do?