Not necessarily. Although a larger batch size might yield smoother loss curves, the performance might be worse than using noisy smaller batches.
However, you could try to use artificially larger batches by accumulating gradients as described here and compare the results.
Alternatively, you could also use torch.utils.checkpoint
to trade compute for memory.