How do forward hooks work with respect to gradient?

Summary: I want to do transfer learning where I use a pre-existing network as a backbone. I want to access intermediate layers instead of only the final layer. All the pre-trained networks I have seen don’t offer a simple model to load the weights so I believe I am going to need to access the layers using hooks.

I created a forward hook to access the output of an intermediate layer. My next question is, how does back propagation work with respect to this?

In particular, lets say I use the output of that intermediate layer as input to a final layer. How do I apply back propagation with the hook working?

  1. Can I lock the weights of the backbone and only update the weights of the layers I add?

  2. Can I update both the weights of my added layers and the backbone? How does the backward pass work with respect to a forward hook? Does the hook get added to the computation graph automatically?


  1. Yes, you can freeze parameters by setting their .requires_grad attribute to False.
  2. The forward hook would in the simple use case just return the intermediate activation, so Autograd will not record any operation and you can just pass the tensor to any layers.