If I set torch.backends.cudnn.benchmark=True, should I also set drop_last in DataLoader?


It’s commonly said that, if one wants to set torch.backends.cudnn.benchmark=True so as to speed up pytorch computation, he or she should always ensure that the input size of batches stay constants.

This arouse me a question. As we know, the size of a batch output from the DataLoader dose not always equal to the batch_size parameter we pass to it, because the dataset size is not always divisible by the batch_size. So we got a drop_last parameter in DataLoader to determine whether to drop the last incomplete batch.

The default value of drop_last is False, meaning that the last batch’s size might be smaller, which violates the precondition for torch.backends.cudnn.benchmark=True.

So, should I always set drop_last=True when I want to set torch.backends.cudnn.benchmark=True?


Using constant input sizes is not a hard requirement, but rather a recommendation.
For each new input shape cuDNN will profile available kernels internally and select the fastest one, which is then added to a cache. In the next iteration, this cache is checked for the currently used shapes and the already selected kernels will be picked.
The profiling itself will add an overhead and thus your first iteration for each new shape would be slower.

Depending on your use case and how many different input shapes you are expecting, using cudnn.benchmark=True could still work.
Your particular concern would rerun the profiling for the last batch if it’s smaller, so I wouldn’t bother dropping the last batch.