Virtual mini-batch - can this work?

wiiiktor · August 16, 2021, 10:54pm

Can this trivial approach really work?: Increasing Mini-batch Size without Increasing Memory | by David Morton | Medium I ask, because I try to train a CNN (VGG19) network with batch size of 1 and I am in a desperate need for some mini-batch normalization substitute

(I already implemented WS+GN described in https://arxiv.org/pdf/1903.10520.pdf which works quite well with batch size of 2, but I need precisely 1, to be able to make network conditionals; with batch size of even just 2 this batch would have to split in parts, which is obviously not acceptable)

ptrblck · August 17, 2021, 3:04am

Gradient accumulation can be used to simulate larger batch sizes. However, especially if you are using batchnorm (or potentially other norm layers with running stats) you would have to be careful about the internal running stats, as each forward pass would update them with the (noisy) batch statistics (you might need to play around with the momentum).
Also, if you are using a single sample, the batchnorm layers might not be able to calculate the batch statistics and could yield an error.