Can this trivial approach really work?: Increasing Mini-batch Size without Increasing Memory | by David Morton | Medium I ask, because I try to train a CNN (VGG19) network with batch size of 1 and I am in a desperate need for some mini-batch normalization substitute
(I already implemented WS+GN described in https://arxiv.org/pdf/1903.10520.pdf which works quite well with batch size of 2, but I need precisely 1, to be able to make network conditionals; with batch size of even just 2 this batch would have to split in parts, which is obviously not acceptable)