Hi, I’ve been trying to implement masking on gradients over some custom layer weights in my network, and I implement it in the following way (before start of each epoch/training step).
cur_hooks = {}
for n,p in net.named_children():
if isinstance(p, Conv2d) or isinstance(p, Linear):
gradient_mask = (p.some_func(p.novel_para).round().float()<=1e-6).data
gradient_mask = gradient_mask.float()
cur_hooks[n] = p.weight.register_hook(lambda grad: grad.mul_(gradient_mask))
If I print the hooks via
for k,v in cur_hooks.items():
print(k, v)
It prints:
conv1 <torch.utils.hooks.RemovableHandle object at 0x7f554c92a1c0>
conv1a <torch.utils.hooks.RemovableHandle object at 0x7f554c92afa0>
conv1b <torch.utils.hooks.RemovableHandle object at 0x7f554c92a370>
conv1c <torch.utils.hooks.RemovableHandle object at 0x7f554c92a9a0>
conv1d <torch.utils.hooks.RemovableHandle object at 0x7f554c92aee0>
conv2a <torch.utils.hooks.RemovableHandle object at 0x7f554e05bd90>
conv2b <torch.utils.hooks.RemovableHandle object at 0x7f554e05bb80>
shortcut_conv2 <torch.utils.hooks.RemovableHandle object at 0x7f554e05be20>
conv2c <torch.utils.hooks.RemovableHandle object at 0x7f554e05bc70>
conv2d <torch.utils.hooks.RemovableHandle object at 0x7f554e05bd30>
conv3a <torch.utils.hooks.RemovableHandle object at 0x7f554c92f250>
conv3b <torch.utils.hooks.RemovableHandle object at 0x7f554c92f100>
shortcut_conv3 <torch.utils.hooks.RemovableHandle object at 0x7f554c92f9d0>
conv3c <torch.utils.hooks.RemovableHandle object at 0x7f554c92f430>
conv3d <torch.utils.hooks.RemovableHandle object at 0x7f554c92f880>
conv4a <torch.utils.hooks.RemovableHandle object at 0x7f554c92f2e0>
conv4b <torch.utils.hooks.RemovableHandle object at 0x7f554df74d90>
shortcut_conv4 <torch.utils.hooks.RemovableHandle object at 0x7f554df74f70>
conv4c <torch.utils.hooks.RemovableHandle object at 0x7f554df74cd0>
conv4d <torch.utils.hooks.RemovableHandle object at 0x7f554df74430>
linear <torch.utils.hooks.RemovableHandle object at 0x7f554df74f10>
However, at the very first backward() call, it runs into error-
Traceback (most recent call last):
File "/mnt/train.py", line 144, in <module>
loss.backward()
File "/home/user/base/lib/python3.9/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/user/base/lib/python3.9/site-packages/torch/autograd/__init__.py", line 130, in backward
Variable._execution_engine.run_backward(
File "/mnt/train.py", line 107, in <lambda>
cur_hooks[n] = p.weight.register_hook(lambda grad: grad.mul_(gradient_mask))
RuntimeError: The size of tensor a (3) must match the size of tensor b (512) at non-singleton dimension 3
Apparently (If I’m not wrong) - its using the hook for the Linear layer (the last hook) for the first Convolution layer (conv1) - and I can’t seem to figure out what I did wrong. Any suggestions?