Larger batches not faster

I’m using Google’s EfficientNet (the small version) on an RTX 3090. I’ve found that overall epoch time doesn’t vary much whether I train with big batches or small batches. Similarly, with classification I’m not seeing much difference in images/second whether I use big or small batches.

Am I missing something? I thought the idea of a batch was that all of the images got processed in parallel, so batch processing time should be relatively constant whether it’s 1 image or 32.

Your training code might suffer from a bottleneck, which might be unrelated to the GPU, e.g. the data loading. You could profile your use case using Nsight Systems or the native PyTorch profiler to check which part of your code is the current bottleneck.