Is there a technique to decide which norm should one use, as I find different kinds of norms being used in different modules.
For example, multi head attention uses LayerNorm
In pytorch/examples, after conv2d, they use BatchNorm
In GANs they use spectral_norm
Then there is weight norm also, which internally uses frobenius norm
I also found about nuclear norm, which is sum of singular values of matrix
And some place, they consider maximum singular value of the matrix as the norm.
Is the eventual goal to bring all weights to a smaller range, so that we get convex curve between weight and loss?
If so, how does one see this plot, for example if I do dog cat classification, then how to plot this convex curve?
And does making all weight matrices symmetric and positive definite contribute to convexity?
Is a higher norm of matrix bad for convexity, and norm should be less than one and (>0)?