I am multiplying a mask to my weight matrix during the forward prop to sparsify my network. However, during backprop, I want the gradient updates to be zero for the masked weights (mask[i] = 0).
Is there any way to do that using register_hook?
Here’s a toy code I have used, but am getting an error.
The thing is that the retain_grad() is also a hook And hooks are executed in the order they are added. So if you add your masking after the retain_grad, the gradient seen by retain_grad won’t be masked.