Nn.utils.parametrizations not working correctly?

musl · April 8, 2024, 12:54pm

Hi,
I’m getting very different results switching from the old spectral_norm to the new parametrization.
Here is the output features of unet using the old torch.nn.utils.spectral_norm:

tensor([[[[-0.0099, -0.0102, -0.0013,  ..., -0.0101, -0.0100, -0.0076],
          [-0.0214, -0.0407, -0.0349,  ..., -0.0185, -0.0293, -0.0054],
          [-0.0211, -0.0214, -0.0356,  ..., -0.0229, -0.0362, -0.0165],
          ...,
          [-0.0133, -0.0229, -0.0189,  ..., -0.0240, -0.0370, -0.0003],
          [-0.0028, -0.0027, -0.0099,  ..., -0.0031, -0.0209, -0.0099],
          [-0.0068, -0.0027, -0.0033,  ..., -0.0076,  0.0003,  0.0011]]],

And here is the output features using the new torch.nn.utils.parametrizations.spectral_norm:

tensor([[[[-6.4015e-03, -5.9812e-03, -5.5848e-04,  ..., -6.6797e-03,
           -5.4990e-03, -4.3101e-03],
          [-1.1050e-02, -2.3454e-02, -2.0890e-02,  ..., -1.0111e-02,
           -1.7187e-02, -2.5395e-03],
          [-1.0570e-02, -9.7215e-03, -1.9345e-02,  ..., -1.3104e-02,
           -2.1078e-02, -8.5044e-03],
          ...,
          [-7.6973e-03, -1.3273e-02, -9.9956e-03,  ..., -1.3413e-02,
           -2.1045e-02,  1.4347e-03],
          [-2.9391e-04,  5.1208e-05, -4.4562e-03,  ..., -7.5174e-04,
           -1.0988e-02, -5.6328e-03],
          [-3.7524e-03, -8.9492e-04, -8.4569e-04,  ..., -3.2808e-03,
            1.0416e-03,  1.3095e-03]]],

This is a bit-wise deterministic run, by the way.
Here’s a naive way of showing that the old function produces less extremal values. Min/Max of tensor outputs from the last 10 iters of 100 iters:

Old:

min	max
-4.7022	-0.1812
-4.4896	-0.1821
-3.4386	23.3624
-4.4896	-0.1821
-4.5798	-0.1844
-1.1253	19.4105
-4.5797	-0.1844
-4.6552	-0.1855
0.2628	22.2968
-4.6551	-0.1855

Parametrization:

min	max
-4.8454	-0.1950
-4.5237	-0.1968
-2.6446	32.7689
-4.5234	-0.1968
-4.7137	-0.2026
0.1529	27.5732
-4.7134	-0.2026
-4.8490	-0.2062
0.4218	31.8953
-4.8486	-0.2062

So it looks like on the new function the max values are about 30% higher than on the old one.
Is there a reason for that, or am I using it wrong? Thanks in advance.