Loss doesn't decrease if model is switched to use GroupNorm instead of BatchNorm

I have a network that trains fine while using BatchNorm. If I keep everything else same and just change the normalization from BatchNorm to GroupNorm, I am noticing that the training fails. That is, the loss pretty much stagnates at a high value and doesn’t go down at all. But, if I switch to BatchNorm, the loss keeps steadily going down. Any idea what might be the reason for such a big difference between BatchNorm and GroupNorm behaviors?

I am using PyTorch 1.7.1.

Searching the forum, I found the following where someone else also reports loss not going down with GroupNorm. But, in that case, the issue could potentially be related to the way the model is being changed. In my case though, I am just changing the code to use GN and start training afresh.