Unable to implement backward hooks correctly

numbpy · December 26, 2019, 11:44am

I am trying to implement the LotteryTicket Hypothesis paper in Pytorch. For freezing the individual weights, I have created a mask_dict with layer name as key as boolean mask as the value. To freeze, I wanted to use backward hooks via register_hook and multiply incoming gradients by the mask to set some values to zero. The problem is whatever I try, I always a RumtimeError saying the size of tensors in not compatible.
Here’s the code for creating mask

for name, t in learn.model.named_parameters():
    if 'weight' in name:
        bottomk = torch.topk(torch.abs(t).view(-1), k=round(pr_value*t.nelement()), largest=False)
        clone = t.clone().detach()
        clone.view(-1)[bottomk.indices] = 0
        mask[name] = clone.bool()

I thought that maybe since incoming gradients come in reverse, it results in a mismatch but even switching the mask values doesn’t help. Below is my attempt,

for name, module in learn.model.named_modules():
    if 'linear' in name:
        key = name+'.weight'       # since layer keys have names layer.weight
        module.weight.register_hook(lambda x: x*mask[key])

albanD · December 27, 2019, 9:18am

Hi,

Why do you convert your mask to bool() since you multiply it with a float Tensor later?

Where do you get the error? What is the full stack trace?
Can you print the size of some Tensors to check they are what you expect.

numbpy · December 27, 2019, 11:15am

Sorry, I forgot to update, I was able to finally make the backward hook work but manually. I am trying to automate the process, for that I have a mask_dict which has key, value pairs as layer_name, bool_mask. I am trying to loop through the layer modules and hook using mask_dict keys but it doesn’t work. There’s no particular reason for using bool values, I just chose them as mask is boolean in nature.

def random_mask(model):
    mask_dict = {}
    for name, module in model.named_modules():
        if 'linear' in name:
            size = tuple(module.weight.shape)
            mask_dict[name] = (torch.randint(0, 2, size).bool().to(device='cuda'))
    return mask_dict        
               

def apply_mask(model, mask_dict):
    for name, module in model.named_modules():
        if 'linear' in name:
            module.weight.data *= mask_dict[name]
            #checking if names of layers and it's tensor shapes match with that of masks
            print('module name is:', name, 'and weight size is:', module.weight.size()) 
            print('corresponding tensor is:', mask_dict[name].shape) #matching shapes for multiplication

            module.weight.register_hook(lambda x: x*mask_dict[name])  #<--loop update doesn't work
           
#### manual update
#     model.linear1.weight.register_hook(lambda x: x*mask_dict['linear1'])
#     model.linear2.weight.register_hook(lambda x: x*mask_dict['linear2'])
#     model.linear3.weight.register_hook(lambda x: x*mask_dict['linear3'])
   

mask = random_mask(model)
#print(mask)
apply_mask(model, mask)


module name is: linear1 and weight size is: torch.Size([300, 784])
corresponding tensor is: torch.Size([300, 784]) 

module name is: linear2 and weight size is: torch.Size([100, 300])
corresponding tensor is: torch.Size([100, 300])

module name is: linear3 and weight size is: torch.Size([10, 100])
corresponding tensor is: torch.Size([10, 100])

It works perfect when I un-comment the manual update line and remove the loop update but not in it’s current state. The error I am getting is of tensor size mismatch for performing element-wise multiplication.

numbpy · December 29, 2019, 4:04am

Update: I modified the code a bit and printed the the incoming gradient shape and shape of the mask I am passing in the loop. It seems that the way backward hooks are called, it always takes the first value ie ‘linear1’(=name) so same mask is getting passed to all the gradients resulting in mis-match matrix size error. Here’s the output I got:

shape of grad torch.Size([10, 100])  mask-shape torch.Size([10, 100])

shape of grad torch.Size([100, 300])  mask-shape torch.Size([10, 100])

shape of grad torch.Size([300, 784])  mask-shape torch.Size([10, 100])

shape of grad torch.Size([10, 100])  mask-shape torch.Size([10, 100])

shape of grad torch.Size([100, 300])  mask-shape torch.Size([10, 100])

shape of grad torch.Size([300, 784])  mask-shape torch.Size([10, 100])

and this repeats on…

albanD · December 29, 2019, 9:33am

This is quite weird.
Could you give a small code sample that I can run (30/40) lines that reproduces this please?

jhuebotter · June 9, 2020, 1:16pm

This behavior is expected. The reason for this is in the way lambda functions work. The function is defined in the line you marked as not working, but not called. Your variable “name” is changed after the definition and before the call. The last known definition for the variable will then be used at call. To avoid this behavior, you can simply give the lambda function a second variable with a default value, which is set during definition instead of at call. It looks like so:

module.weight.register_hook(lambda x, name=name: x*mask_dict[name])

Then, your code should work just fine.