Why is my CNN not scale invariant (in weight space)?

I have a CNN that looks as follows:

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (11): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (12): ReLU(inplace=True)
    (13): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (14): ReLU(inplace=True)
    (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (16): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): ReLU(inplace=True)
    (18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (19): ReLU(inplace=True)
    (20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (21): AvgPool2d(kernel_size=1, stride=1, padding=0)
  )
  (classifier): Linear(in_features=512, out_features=10, bias=True)
)

My working assumption is that when I multiply every weight and every bias by two, the resulting logits should have the same distribution as before, just scaled up by the factor (2 ** #layers). Specifically, every individual operation (Conv2d, ReLu, MaxPool2d, AvgPool2d, Linear) should change nothing about the resulting activation distribution except its scale (if anything).

Conv2d: Since it’s just a weighted sum plus the bias, the resulting activations should be twice as large
ReLu: As long as the sign doesn’t change there’s no change here
MaxPool2d: The maximum of scaled-up activations remains unchanged
AvgPool2d: The average of scaled-up activations is also just scaled up
Linear: Same as Conv2d

However, when I multiply every weight and bias in my network by two, the distribution of the logits changes. For example, when my CNN first gave me these logits for an input:

[ -5.5469,  -1.3721,  -1.7734,   2.9941,   1.6348,   1.5049,  13.4219, -3.9648,  -3.7793,  -3.0293]

after multiplying each weight and bias by two I get the following logits:

[ -628.0000,   409.0000,  -346.0000,   594.5000,  -304.0000,   144.8750,   1746.0000,  -708.0000,  -622.0000,  -276.0000]

Not only is the scale off, but even the signs of some of the activations changed. Am I experiencing a numerical/overflow issue in PyTorch or do I have some error in my thought process?

Yes if you multiply all the kernels by 2 you get the compound result.

5 * 2^8 = 1280

which at least is in the range, wo considering bias.

I think the trend holds quite well apart from some variations that could be just random.

Hi Korbinian!

Your logic is correct.

You are correct that this result is not consistent with multiplying your weights
by two. This result is also quite odd in that eight of the ten values are integers
and the remaining two are integer multiples of 1/2 and 1/8.

Please double-check your code that multiplies your weights by two. It’s probably
wrong and might also be breaking your network somehow.

If you can’t fix the problem, please post a fully-self-contained, runnable script
that reproduces your issue, together with the output you get when you run it.
Please try to illustrate your issue with a simplified example, perhaps with a
network that only has two or three layers.

Good luck!

K. Frank

1 Like