Gradient computation when using forward hooks

Suppose I have a custom nn.Module

class Identity(nn.Module):
    def __init__(self):
    def forward(x):
        return x

hooked_layer = Identity()
hookfn = lambda model,input,output: output*2    
### hookfn can in principle be complex function 
### even non differentiable such as quantization 

Now consider activation A, which passes through hooked_layer to become A_hooked. I understand that in the computation of loss, A_hooked is used.
Now I want to understand how will back-propagation happen.
1.) How will the gradient of A be computed? Will gradient of A_hooked be copied to A or will gradient of A equal to half of the gradient of A_hooked.
2.) In the above case if gradient of A_hooked is not copied to gradient of A – then what will happen if I use some non-differentiable hook – such as quantization
3.) Lastly in the layers following our original hooked_layer, will A or A_hooked be used in backpropagation


I think the simplest way to understand what will happen here is to know that the autograd lives below torch.nn and is completely unaware of what torch.nn does.
So in this case, whatever is the Tensor you give to the rest of the net is the one that will get gradients (it does not matter if it comes from a hook or not).
And in this case, since A_hooked depends on A, then the gradients will flow back from A_hooked to A.

1 Like