For register_forward_pre_hook (first snippet), why 5, which is the final output, is also returned when I just register hook for nn.Linear.
For register_backward_hook (second snippet), I am not sure what these tensor([60, 60]) correspond to. I could see maybegrad_output is gradient respect to output of nn.Linear. But how about another two tensor([60, 60])?
In the first case, only the input should be given, not the output.
Are you running this in an interpreter? That would explain why the result of model(x) is printed in both your examples.
For the backward hook, as you can see in the documentation, they are not working as expected at the moment. So you should not use them
Can I use hook to add a parameter masking function to Conv2d. Specifically, I’d like to add binary mask buffer to each conv2d module, during each training step, I need to update the mask buffer and then use it to mask the weight.
Yes, for the use cases that have been tested, pruning works well with DataParallel,DistributedDataParallel, as well as apex and torch.quantization.
Please flag anything that doesn’t work as you try out the functionality.
So then what is the best way to check gradients for each layer? I used to apply a forward hook using ‘register_forward_hook’ on each layer and was thinking of doing the same checking gradients on each layer by using ‘register_backward_hook’ for each layer.