I am plotting the norm of classes in a Resnet18 classifier. so if I have this matrice as the last layer [100, 512], where 100 is the classifier size and 512 feature extractors size, the norm of each class will be computed from the corresponding 512 connections. I want to make sure that the norm value represents the logit correctly. This means, that if weight is negative, for example, -3.5, it will contribute to the norm being large but logit not (because we have a Relu as the activation before the classification layer). So, I would like to do weight clipping so the norm of a class represents the logit of that class better. I tried clipping by mapping all the negative values to 0 and keeping all the positive ones. This ruined the training. I was wondering if anyone has some experience in this regard here and can guide me. I am confused how to choose the range for clipping.