How to handel the last batch during evaluation when using DataParallel

Hi,

In the DataParallel documentation it is mentioned that:

The batch size should be larger than the number of GPUs used.

When using DataLoader, during training one can specify drop_last=True so that we can make sure that no batch has size smaller than the number of GPUs.

But when evaluating the model, you cannot simply drop the last batch. You should evaluate that batch as well (Although it might not make a lot of difference). Depending on the size of the Dataset, it is possible that the last batch will have size smaller than the number of GPUs.

How should one handle this situation in an elegant way?

In summary:

  • I want to use DataParallel at evaluation time.
  • I want to evaluate all batches, including the last batch.
  • The last batch can have size smaller than number of GPUs.

What should I do?

You can feed the last batch through a non-parallelized version of your model. Or you can extend the final batch with zeroes and ignore those outputs.

1 Like

What I found is that, Although is it mentioned in the docs that the batch size should be larger than the number of GPUs, at least during evaluation, there is no error if the batch size is smaller than the number of GPUs.

So I guess the answer is to do nothing. Do not drop the last batch when evaluating. And it won’t cause an error.