It is applied to vectors in feature space, though I’ve read that layer norm doesn’t work well with convolutions, stats would be per image region. I think the problem is rather with channel importance equalization, as layer norm “ties” all dimensions; I guess that is bad for early vision filters.
PS: if that’s not clear, I meant group norm applied to a channels last permuted tensor