class Identity(nn.Module):
def __init__(self):
pass
def forward(x):
return x
hooked_layer = Identity()
hookfn = lambda model,input,output: output*2
### hookfn can in principle be complex function
### even non differentiable such as quantization
hooked_layer.register_forward_hook(hookfn)

Now consider activation A, which passes through hooked_layer to become A_hooked. I understand that in the computation of loss, A_hooked is used.
Now I want to understand how will back-propagation happen.
1.) How will the gradient of A be computed? Will gradient of A_hooked be copied to A or will gradient of A equal to half of the gradient of A_hooked.
2.) In the above case if gradient of A_hooked is not copied to gradient of A β then what will happen if I use some non-differentiable hook β such as quantization
3.) Lastly in the layers following our original hooked_layer, will A or A_hooked be used in backpropagation

I think the simplest way to understand what will happen here is to know that the autograd lives below torch.nn and is completely unaware of what torch.nn does.
So in this case, whatever is the Tensor you give to the rest of the net is the one that will get gradients (it does not matter if it comes from a hook or not).
And in this case, since A_hooked depends on A, then the gradients will flow back from A_hooked to A.

Can somebody confirm that forward hook supports backpropagation? For example, I obtain activations (or weights) from particular layers using forward hooks. Then, I compute the L1 norm of these activations (or weights) and added it to my main loss term, i.e. loss = loss + L1_loss. Now, when I call loss.backward() the gradients will flow through the hooks and then the activations (or weights) will be penalized accordingly right?
Another option I came across was iterating over model parameters (maybe with some if-else statements), which should work just fine, but it doesnβt provide the flexibility I am looking for.