Is my model overfitting?

ptrblck · April 10, 2019, 9:49am

Not necessarily. Although a larger batch size might yield smoother loss curves, the performance might be worse than using noisy smaller batches.

However, you could try to use artificially larger batches by accumulating gradients as described here and compare the results.
Alternatively, you could also use torch.utils.checkpoint to trade compute for memory.