Batch norm training when batch size=1

googlebot · November 21, 2020, 7:35am

It is applied to vectors in feature space, though I’ve read that layer norm doesn’t work well with convolutions, stats would be per image region. I think the problem is rather with channel importance equalization, as layer norm “ties” all dimensions; I guess that is bad for early vision filters.

PS: if that’s not clear, I meant group norm applied to a channels last permuted tensor