What does one have to watch out for in order to in place change a layers parameters?

I have a backwards function that is supposed to prune some parameters by multiplying the weight matrix with a binary matrix. Somehow the model never gets changed, it always remains the same, why might that be? I have checked, the multiplied tensors have identical dimensions, the model is set to training, the the keys exist in the model dictionary and the types are also compatible. Ive tried setting the weight tensor with setattr, with setting it directly, with using the state dictionary, but the model remains the same.

weight_value = self.prev_layer_weights[name]*difference_change
statesdict[prev_layer]= weight_value
self.model.load_state_dict(statesdict)

Best wishes and thanks a lot for any help

PS: I tried to leave the code short for understanding better but in case someone is curious, this is the whole function

def backward(self):
assert self.model.training, “Model is not in training mode”
old_parameters = self.model.parameters()
# self.compute_saliency_map(self.input,self.label).show()

    newlayers = {}
    if self.changed_activations=={}:
        self.default_loss.zero_grad()
        return self.default_loss

        

        """
        this is the correct code, i just want to test something
        print("the sum of changes is ")
        print(torch.sum(difference_change).item())
        weight_value = self.prev_layer_weights[item]*difference_change.squeeze(0).unsqueeze(1)*1000
        anti_overfitting_constant = weight_value.mean()
        newlayers[item]= (weight_value-anti_overfitting_constant)
    
        """
    statesdict = self.model.state_dict()
    prev_layer = None
    for name, layer in list(self.model.named_children())+[("output",None)]:
        if name not in self.activations.keys():
            layer.zero_grad()
            prev_layer = name + ".weight"
            print(prev_layer)
            continue
        

        # print(f"{name} has the shape {self.activations[name].shape} on the activations, {self.changed_activations[name].shape} on the changed activations as well as {self.prev_layer_weights[name].shape} for the weights")
        difference_change=abs((self.activations[name]-self.changed_activations[name]).squeeze(0).unsqueeze(1)*self.prev_layer_weights[name])
        percentile = (self.marked_pixels_count*3)/self.input.numel()
        limit = torch.quantile(difference_change, percentile).item()
        # limit = 0.01
        # self.distribution(difference_change)
        # print("The limit in this case was "+str(limit)) 
        difference_change[(difference_change>limit)]=0
        difference_change[(difference_change > 0)] = 1

        num_zeros = torch.sum(difference_change == 0).item()

        # Find the number of 1s
        num_ones = torch.sum(difference_change == 1).item()

        print(f"Number of 0s: {num_zeros}")
        print(f"Number of 1s: {num_ones}")
        # self.layer_factors[name]=difference_change# * self.marked_pixels_count/(self.width*self.height)  
        # self.layer_factors[name]= difference_change.squeeze(0).unsqueeze(1)
        weight_value = self.prev_layer_weights[name]*difference_change
        print(f"shape is {self.prev_layer_weights[name].shape} or {difference_change.shape}")
        statesdict[prev_layer]= weight_value

        old_stuff =getattr(self.model, prev_layer.rstrip(".weight"))

        with torch.no_grad():
            old_stuff.weight.copy_(weight_value)

        # old_weights = setattr(old_stuff, "weight",nn.Parameter(weight_value))
        # print(old_weights)

        

        # print(weight_value)


        # assert old_weights is not weight_value
        print(f"i am trying to change {prev_layer} by {num_zeros} zero entries")

        try:
            layer.zero_grad()
        except:
            pass
        prev_layer = name + ".weight"
    
    
    self.model.load_state_dict(statesdict)
    new_parameters = self.model.parameters()
    
    print(f"The difference between the two models is {self.calculate_parameter_change(old_params=old_parameters,new_params=new_parameters)}")

    # Check for missing and unexpected keys
    """
    if missing_keys:
        print("Missing keys in state_dict:", missing_keys)
    if unexpected_keys:
        print("Unexpected keys in state_dict:", unexpected_keys)
        """
    self.model.zero_grad()
    self.compute_saliency_map(self.input, self.label).show()
    self.measure_impact()
    # self.improve_image_attention()
    self.marked_pixels = None

I would also appreciate any answer like “please provide more info such as …” or “from what you wrote it should be working”

Manipulating the parameters inplace via copy_ should work as seen in this simple example:

model = models.resnet50()

old_stuff =getattr(model.conv1, "weight")
with torch.no_grad():
    old_stuff.copy_(torch.ones_like(old_stuff))

print((model.conv1.weight == 1.).all())
# tensor(True)

I assume you are not seeing any changes at all in your parameters?

1 Like

appreciate you trying it, in the toy examples it seems to be working. But somehow for my entire model, nothing is changing yeah :frowning:

Im still stuck on this… Ive made sure that grad_fn is None, that I zeroed out the gradients, that the shape is compatible, that the dtype is compatible…

If anyone still has ideas or suggestions please dont hesitate to let me know.

You could try this,

with torch.no_grad():
  self.prev_layer_weights[name].mul_(difference_change) #in-place mul

Also, you are referencing the same underlying object, right? You don’t have two copies? Or re-initializing the network somewhere?

1 Like

I like your snippet, especially considering that in the future I might want to track the changes for backpropagation, but I still get the same error.

Yes I only have one model. There is two activation hooks, I dont know if that makes any difference?

What is the exact error? Could you share the entire stacktrace?

1 Like

the weird part is I dont even get an error. But I have a function that checks the difference between the model parameters as well as a function that checks the specific weights and a saliency map, and all of those show the model didnt change from my modifications (even though they are quite extreme)

now that i say it, maybe y testing functions are wrong and the model does change? is there any simple way to measure this?

I assume you’re using a forward hook to cache intermediate results? Is there a way to run your code without the hooks?

As I know hooks have had issues with previous version of PyTorch. What version of PyTorch are you using?

1 Like

torch 1.13.0a0+gitunknown
torchaudio 0.13.1
torchvision 0.14.1a0
torchviz 0.0.2

I guess I could create a debug case without hooks, all in all they are necessary i think

even without hooks it doesnt work :confused: just tested

Perhaps try using deepcopy.copy to rule out any shallow copying issues?

You could also try printing out the intermediate values you want to change and see if weight_value is actually different to old_weight? Perhaps the difference_change value is just ones?

I print out the number of zeros and ones and i also assert that weight_value is different from the previous weight

thats a good idea, ill tr right away

sadly didnt work either, with deep copying the weights…

Can you share a minimal reproducible example for your code that shows this copying issue?

1 Like

yeah so… kind of embarassing story

i had four testing functions to make sure the model didnt change but I forgot to clone the old parameters, so I was always checking the same ones :melting_face:

Still thanks a lot for your help man

Not a problem, don’t worry about it. It happens to the best of us :wink:

1 Like