I want to ask about the effect of batch size on the model accuracy.
I have two semantic segmentation models I want to compare the result of each one of them but the batch sizes are differents
batch size of the first one is 8
batch size of the second one is 4 (i put 4 because I got out of memory during training)
can this be a fair comparation or not ???
There are many aspects of this and if you present results, you might also offer the first model with batch size 4. To my mind, it is important mention that the reason to compare 8 vs. 4 is memory (which, of course, makes it fair because you compare models trained in the same resource constraints, at least if it is indeed infeasible / nontrivial to use batch size 8).
- One line of thought to follow is that minibatch gradients are an stochastic estimate of the true gradient (over the entire dataset or even beyond). With this thinking, smaller batch sizes give more noisy gradient estimates. But is it good or bad?
- More noise can be viewed as a form of regularization,
- but the conventional wisdom is that the noise hurts, in particular towards the end of the training (when the gradient signal tends to be lower).
- The statistics used by training-mode batchnorm can be more extreme for smaller batches.
Thank you Thomas
i appreciate that