Last batch and evaluation require more gpu memory


I was wondering if I am missing something regarding the following two issues:

a) When I hit the last batch, it is usually smaller (the images that are concatenated are not sufficient for the entire batch_size). Although it runs, a classifier requires more gpu memory so when the batch is large enough, it crashes at the final batch.
b) When evaluating, the model will also run out of memory (barely, since reducing the batch size just a little means I can train and test without problems).

Am I doing something wrong? Is this something that should not even be happening? Any help is appreciated!

is there any chance you can give a repro?
It should not crash, it should flush the cache of the caching allocator and then re-alloc.

DataLoader has an option to drop the last batch (if it’s odd sized), with drop_last=True, you can use that option to unblock yourself: