Zeroing out some weights doesn't seem to affect output tensor

I’m bumping into some strange behavior and was hoping someone could explain it to me or show me where I’m making a mistake.
I’m trying to figure out how certain types of changes to the weights of a pretrained network will affect the output tensor compared to an “untampered” network.
With that in mind I have two identical networks:

base_model = models.vgg16(pretrained=True)
modified_model = models.vgg16(pretrained=True)

and then I modify a given layer. config['layer'] is equal to one of the values returned from layers below.

layers = list(models.vgg16().state_dict().keys())
layers = [layer for layer in layers if layer.split('.')[-1] != 'bias']
sh = modified_model.state_dict()[config['layer']].shape 

for i,b in range(798):
    dim = i // sh[0]
    modified_model.state_dict()[config['layer']][i - dim*sh[0]][dim] = 0.0

I then take an image and transform,batch etc. and feed it through the two networks.

    transform = transforms.Compose([            
        transforms.Resize(256),                    
        transforms.CenterCrop(224),                
        transforms.ToTensor(),                     
        transforms.Normalize(                      
        mean=[0.485, 0.456, 0.406],                
        std=[0.229, 0.224, 0.225]                  
        )])
    url = """
    

    """
    img = Image.fromarray(url_to_image(url))
    img_t = transform(img)
    
    batch_t = torch.unsqueeze(img_t, 0)
    
    base_model.eval();
    modified_model.eval();
    
    res_orig = base_model(img_t[None, ...])
    res_new = modified_model(img_t[None, ...])
    
    res_orig = res_orig.detach()
    res_new = res_new.detach()

and then print out L-norms and the equals value.

metrics = {"L1-Norm":np.linalg.norm(res_orig-res_new, ord=1), "L2-Norm":np.linalg.norm(res_orig-res_new, ord=2), "LInf-Norm":np.linalg.norm(res_orig-res_new, ord=np.inf)}
print(metrics)
print(torch.equal(res_orig,res_new))

I discoved that if config['layer'] is 'classifier.6.weight' or 'classifier.0.weight then the output of metrics and equals at the end is

{'L1-Norm': 0.0, 'L2-Norm': 0.0, 'LInf-Norm': 0.0}
True

But if it is any other weight layer then the outputs are different, for example classifier.3.weight is

{'L1-Norm': 2.3841858e-07, 'L2-Norm': 4.7497085e-07, 'LInf-Norm': 1.3113022e-06}
False

and features.26.weight is

{'L1-Norm': 1.9073486e-06, 'L2-Norm': 7.4874556e-06, 'LInf-Norm': 0.00014960021}
False

and so on for every other layer besides the 0th and 6th classifier layers.

Can someone please explain why this is happening?

Is this possible that this is a image specific issue, that somehow certain images wouldn’t activate certain nodes of the fully connected layers?

You might get “unlucky”, e.g. if the input activation were zeroed out through the last ReLU in the features submodule.
To verify it, you could e.g. also zero out the bias of the selected layer, which you would like to manipulate.

1 Like

@ptrblck Thanks for the reply. If that is the case that the input activations were zeroed out then we would not expect this behavior to hold on a dataset of images, correct?

Why would this not be the case for image inputs?

Sorry, I meant to emphasize a dataset of images as opposed to a single image, not that their is something special about images in general.

That seems plausible. You could verify it by checking the input activation (e.g. via forward hooks) and check the weights as well as the bias of the following layer, which you’ve manipulated.

1 Like