Does group norm maintain an running average of mean and variance ?
Looking at the code here: https://pytorch.org/docs/stable/_modules/torch/nn/modules/normalization.html
Neither group norm nor layer norm seem to maintain running averages. The description of them suggests they might: https://pytorch.org/docs/stable/nn.html?highlight=group%20norm#torch.nn.GroupNorm
“this layer uses statistics computed from input data in both training and evaluation modes”
Whether or not they are supposed to I don’t know. I don’t see running averages in the tensorflow version of group norm either: https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/contrib/layers/python/layers/normalization.py (group_norm)
Or layer norm for that matter:
As both compute the mean and std for the batch dim, i.e the mean’s shape is (N, 1) in layer norm, tracking a running average doesn’t make sense. Who is to say something similar will be at that exact position in your validation batch?
I also found the doc confusing. How can I freeze running statistics temporality to use other data? Thank you!