What does one have to watch out for in order to in place change a layers parameters?

SchulzKilian · May 24, 2024, 3:14pm

I have a backwards function that is supposed to prune some parameters by multiplying the weight matrix with a binary matrix. Somehow the model never gets changed, it always remains the same, why might that be? I have checked, the multiplied tensors have identical dimensions, the model is set to training, the the keys exist in the model dictionary and the types are also compatible. Ive tried setting the weight tensor with setattr, with setting it directly, with using the state dictionary, but the model remains the same.

weight_value = self.prev_layer_weights[name]*difference_change
statesdict[prev_layer]= weight_value
self.model.load_state_dict(statesdict)

Best wishes and thanks a lot for any help

PS: I tried to leave the code short for understanding better but in case someone is curious, this is the whole function

def backward(self):
assert self.model.training, “Model is not in training mode”
old_parameters = self.model.parameters()
# self.compute_saliency_map(self.input,self.label).show()

    newlayers = {}
    if self.changed_activations=={}:
        self.default_loss.zero_grad()
        return self.default_loss

        

        """
        this is the correct code, i just want to test something
        print("the sum of changes is ")
        print(torch.sum(difference_change).item())
        weight_value = self.prev_layer_weights[item]*difference_change.squeeze(0).unsqueeze(1)*1000
        anti_overfitting_constant = weight_value.mean()
        newlayers[item]= (weight_value-anti_overfitting_constant)
    
        """
    statesdict = self.model.state_dict()
    prev_layer = None
    for name, layer in list(self.model.named_children())+[("output",None)]:
        if name not in self.activations.keys():
            layer.zero_grad()
            prev_layer = name + ".weight"
            print(prev_layer)
            continue
        

        # print(f"{name} has the shape {self.activations[name].shape} on the activations, {self.changed_activations[name].shape} on the changed activations as well as {self.prev_layer_weights[name].shape} for the weights")
        difference_change=abs((self.activations[name]-self.changed_activations[name]).squeeze(0).unsqueeze(1)*self.prev_layer_weights[name])
        percentile = (self.marked_pixels_count*3)/self.input.numel()
        limit = torch.quantile(difference_change, percentile).item()
        # limit = 0.01
        # self.distribution(difference_change)
        # print("The limit in this case was "+str(limit)) 
        difference_change[(difference_change>limit)]=0
        difference_change[(difference_change > 0)] = 1

        num_zeros = torch.sum(difference_change == 0).item()

        # Find the number of 1s
        num_ones = torch.sum(difference_change == 1).item()

        print(f"Number of 0s: {num_zeros}")
        print(f"Number of 1s: {num_ones}")
        # self.layer_factors[name]=difference_change# * self.marked_pixels_count/(self.width*self.height)  
        # self.layer_factors[name]= difference_change.squeeze(0).unsqueeze(1)
        weight_value = self.prev_layer_weights[name]*difference_change
        print(f"shape is {self.prev_layer_weights[name].shape} or {difference_change.shape}")
        statesdict[prev_layer]= weight_value

        old_stuff =getattr(self.model, prev_layer.rstrip(".weight"))

        with torch.no_grad():
            old_stuff.weight.copy_(weight_value)

        # old_weights = setattr(old_stuff, "weight",nn.Parameter(weight_value))
        # print(old_weights)

        

        # print(weight_value)


        # assert old_weights is not weight_value
        print(f"i am trying to change {prev_layer} by {num_zeros} zero entries")

        try:
            layer.zero_grad()
        except:
            pass
        prev_layer = name + ".weight"
    
    
    self.model.load_state_dict(statesdict)
    new_parameters = self.model.parameters()
    
    print(f"The difference between the two models is {self.calculate_parameter_change(old_params=old_parameters,new_params=new_parameters)}")

    # Check for missing and unexpected keys
    """
    if missing_keys:
        print("Missing keys in state_dict:", missing_keys)
    if unexpected_keys:
        print("Unexpected keys in state_dict:", unexpected_keys)
        """
    self.model.zero_grad()
    self.compute_saliency_map(self.input, self.label).show()
    self.measure_impact()
    # self.improve_image_attention()
    self.marked_pixels = None

SchulzKilian · May 25, 2024, 10:41am

I would also appreciate any answer like “please provide more info such as …” or “from what you wrote it should be working”

ptrblck · May 25, 2024, 3:20pm

Manipulating the parameters inplace via copy_ should work as seen in this simple example:

model = models.resnet50()

old_stuff =getattr(model.conv1, "weight")
with torch.no_grad():
    old_stuff.copy_(torch.ones_like(old_stuff))

print((model.conv1.weight == 1.).all())
# tensor(True)

I assume you are not seeing any changes at all in your parameters?

SchulzKilian · May 25, 2024, 4:59pm

appreciate you trying it, in the toy examples it seems to be working. But somehow for my entire model, nothing is changing yeah

SchulzKilian · May 27, 2024, 10:26am

Im still stuck on this… Ive made sure that grad_fn is None, that I zeroed out the gradients, that the shape is compatible, that the dtype is compatible…

If anyone still has ideas or suggestions please dont hesitate to let me know.

AlphaBetaGamma96 · May 27, 2024, 11:38am

You could try this,

with torch.no_grad():
  self.prev_layer_weights[name].mul_(difference_change) #in-place mul

Also, you are referencing the same underlying object, right? You don’t have two copies? Or re-initializing the network somewhere?

SchulzKilian · May 27, 2024, 12:02pm

I like your snippet, especially considering that in the future I might want to track the changes for backpropagation, but I still get the same error.

Yes I only have one model. There is two activation hooks, I dont know if that makes any difference?

AlphaBetaGamma96 · May 27, 2024, 12:03pm

What is the exact error? Could you share the entire stacktrace?

SchulzKilian · May 27, 2024, 12:06pm

the weird part is I dont even get an error. But I have a function that checks the difference between the model parameters as well as a function that checks the specific weights and a saliency map, and all of those show the model didnt change from my modifications (even though they are quite extreme)

SchulzKilian · May 27, 2024, 12:06pm

now that i say it, maybe y testing functions are wrong and the model does change? is there any simple way to measure this?

AlphaBetaGamma96 · May 27, 2024, 12:07pm

I assume you’re using a forward hook to cache intermediate results? Is there a way to run your code without the hooks?

As I know hooks have had issues with previous version of PyTorch. What version of PyTorch are you using?

SchulzKilian · May 27, 2024, 12:09pm

torch 1.13.0a0+gitunknown
torchaudio 0.13.1
torchvision 0.14.1a0
torchviz 0.0.2

I guess I could create a debug case without hooks, all in all they are necessary i think

SchulzKilian · May 27, 2024, 12:35pm

even without hooks it doesnt work just tested

AlphaBetaGamma96 · May 27, 2024, 1:28pm

Perhaps try using deepcopy.copy to rule out any shallow copying issues?

You could also try printing out the intermediate values you want to change and see if weight_value is actually different to old_weight? Perhaps the difference_change value is just ones?

SchulzKilian · May 27, 2024, 1:41pm

I print out the number of zeros and ones and i also assert that weight_value is different from the previous weight

thats a good idea, ill tr right away

SchulzKilian · May 27, 2024, 4:14pm

sadly didnt work either, with deep copying the weights…

AlphaBetaGamma96 · May 29, 2024, 1:07pm

Can you share a minimal reproducible example for your code that shows this copying issue?

SchulzKilian · May 29, 2024, 1:21pm

yeah so… kind of embarassing story

i had four testing functions to make sure the model didnt change but I forgot to clone the old parameters, so I was always checking the same ones

Still thanks a lot for your help man

AlphaBetaGamma96 · May 29, 2024, 1:23pm

Not a problem, don’t worry about it. It happens to the best of us