Similarity of output values for random input on untrained ResNet

Hello,

I am using the ResNet architecture as a feature extractor for an OCR task. While debugging I noticed that the output values of ResNet are very similar for any input image out of my dataset. So I decided to look at the output values when an untrained ResNet is given random input.

resnet = torchvision.models.resnet18(pretrained=False)

resnet.eval()
for _ in range(10):
    output = resnet(torch.rand([1, 3, 224, 224]))
    print(output[0][450:550]) # holds for any range

Which yields:

tensor([-1.1587, -1.1450,  1.5259,  0.6780, -1.1898,  1.3276,  1.4065,  0.8970,
        -1.2454, -1.0678, -1.4614,  0.9561,  1.1011, -0.6139,  0.8475, -0.9382,
         0.7540, -0.1661,  0.4512, -1.1243,  0.4361,  0.3670,  1.2436,  0.3186,
         1.5027, -0.4249, -0.5414,  2.1813,  0.8397, -0.9550, -0.6832,  1.1566,
        -2.1656, -1.3857,  0.3118, -0.2025,  2.1524,  0.4039,  0.0289, -1.1378,
         0.9507,  2.2936,  0.1370, -0.9087, -0.5366, -0.1129,  0.6830, -0.2568,
         0.6538,  0.6729, -0.5142,  0.0746, -0.5973, -0.0693,  0.3187,  0.6297,
         0.8349,  0.9990,  0.2052,  1.3444,  0.9707,  0.6360,  1.7593, -1.1037,
         2.7053, -0.3235, -1.3606,  0.1587, -2.3617,  0.9869, -0.4764, -0.7188,
        -0.3749,  0.4475, -1.5170,  1.0435,  0.8179,  1.0773,  0.1973,  0.8055,
         2.1609,  0.6247, -1.4064, -0.4369,  0.3356, -0.3441,  0.6190, -0.4036,
        -0.7672, -0.4250, -1.9830, -0.7269, -0.4921,  1.2250,  0.1022,  0.3778,
        -0.4552, -1.3471, -1.8389,  0.6905], grad_fn=<SliceBackward>)
tensor([-1.1920, -1.1297,  1.5316,  0.7081, -1.1729,  1.3323,  1.4214,  0.9075,
        -1.2718, -1.0987, -1.4925,  0.9335,  1.0910, -0.5857,  0.8446, -0.9113,
         0.7385, -0.1656,  0.4398, -1.1169,  0.4131,  0.3641,  1.2779,  0.3293,
         1.5286, -0.4296, -0.5640,  2.1615,  0.8044, -0.9519, -0.6629,  1.1419,
        -2.1183, -1.4250,  0.3177, -0.2175,  2.1561,  0.3962,  0.0535, -1.1837,
         0.9439,  2.2942,  0.0923, -0.8926, -0.5461, -0.1261,  0.6669, -0.2641,
         0.6605,  0.6535, -0.4878,  0.0700, -0.5891, -0.0443,  0.3056,  0.6454,
         0.8514,  1.0222,  0.1479,  1.3382,  0.9450,  0.5977,  1.7593, -1.1386,
         2.6689, -0.3277, -1.3361,  0.1698, -2.3721,  0.9978, -0.4780, -0.7501,
        -0.3701,  0.4395, -1.5692,  1.0734,  0.8103,  1.0783,  0.2129,  0.8095,
         2.1406,  0.5958, -1.3652, -0.4346,  0.3335, -0.3429,  0.6060, -0.4097,
        -0.7660, -0.4304, -1.9753, -0.7312, -0.5193,  1.2653,  0.1605,  0.4289,
        -0.4405, -1.3589, -1.8204,  0.7157], grad_fn=<SliceBackward>)

for the first two random inputs. As one can see, the output values are very similar. This similarity holds for other random inputs and all output values as well from what I can tell.

This is surprising to me because even if the untrained ResNet architecture is initialised with random weights, I do not expect to have similar outputs values with different inputs.

I would greatly appreciate an answer as to why these output values for completely different inputs would be so similar on an untrained network.

Hi Ivo!

  1. I can reproduce what you see.

  2. I don’t see anything wrong with this result (although I don’t have
    an expectation one way or the other whether it should work out
    this way).

  3. My intuition – that could be wrong – suggests that pumping a
    random input through a randomly-initialized ResNet causes the
    details of the input to get overpowered (and averaged away).
    ResNet has a lot of layers between its input and output, so it
    seems reasonable to me that the collective influence of all of
    those random-layer weights swamps the influence of the specific
    random input.

  4. The output, when interpreted as raw–score logits – even though
    similar for different inputs – doesn’t really correspond to any
    substantive prediction that favors any of the 1000 classes. If
    you use softmax() to convert the output logits into probabilities,
    you will find that no class is predicted with a probability of any
    significance.

Here is a script that illustrates some of this:

import torch
import torchvision

import numpy as np

print (torch.__version__)
print (torchvision.__version__)

_ = torch.manual_seed (2021)

resnetA = torchvision.models.resnet18 (pretrained = False)
_ = resnetA.eval()
resnetB = torchvision.models.resnet18 (pretrained = False)
_ = resnetB.eval()

input1 = torch.rand ([1, 3, 224, 224])
input2 = torch.rand ([1, 3, 224, 224])

outputA1 = resnetA (input1)
outputA2 = resnetA (input2)
outputB1 = resnetB (input1)
outputB2 = resnetB (input2)

print ('correlation (A1, A2) =\n', np.corrcoef (outputA1[0].detach().numpy(), outputA2.detach().numpy()))
print ('correlation (B1, B2) =\n', np.corrcoef (outputB1[0].detach().numpy(), outputB2.detach().numpy()))
print ('correlation (A1, B1) =\n', np.corrcoef (outputA1[0].detach().numpy(), outputB1.detach().numpy()))
print ('correlation (A1, B2) =\n', np.corrcoef (outputA1[0].detach().numpy(), outputB2.detach().numpy()))

print ('max "prediction" probability (A1) =', torch.softmax (outputA1, dim = 1).max())
print ('min "prediction" probability (A1) =', torch.softmax (outputA1, dim = 1).min())
print ('max "prediction" probability (B2) =', torch.softmax (outputB2, dim = 1).max())
print ('min "prediction" probability (B2) =', torch.softmax (outputB2, dim = 1).min())

And here is its output:

1.9.0
0.10.0
<path_to_pytorch>\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ..\c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
correlation (A1, A2) =
 [[1.        0.9992763]
 [0.9992763 1.       ]]
correlation (B1, B2) =
 [[1.         0.99932129]
 [0.99932129 1.        ]]
correlation (A1, B1) =
 [[1.         0.00547452]
 [0.00547452 1.        ]]
correlation (A1, B2) =
 [[1.         0.00402595]
 [0.00402595 1.        ]]
max "prediction" probability (A1) = tensor(0.0036, grad_fn=<MaxBackward1>)
min "prediction" probability (A1) = tensor(0.0002, grad_fn=<MinBackward1>)
max "prediction" probability (B2) = tensor(0.0069, grad_fn=<MaxBackward1>)
min "prediction" probability (B2) = tensor(8.3634e-05, grad_fn=<MinBackward1>)

Best.

K. Frank

2 Likes

Hi KFrank, thanks for the reply.

Your script is very helpful as it quantifies the behaviour I described.