How can I apply Group Normalization after a full-connection layer? Say the output of the full-connection layer is 1024. And the group normalization layer is using 16 groups.
self.gn1 = nn.GroupNorm(16, hidden_size) h1 = F.relu(self.gn1(self.fc1(x))))
Am I right? How should we understand the group normalization if it is applied to the output of a full-connection layer?