How to calculate the gradient of the previous layer when the gradient of the latter layer is given?

AlphaBetaGamma96 · May 27, 2022, 12:02pm

That is true you can use the chain rule but remember you using the chain rule in the context of Tensors rather than just scalars so it’s not simple as just multiplying by a scalar but rather a matrix product.

Won’t this give you the gradient of the loss w.r.t the parameters of your network?

But you want to change the gradient of the loss w.r.t params (of fc) and determine how that changes the gradients of the loss w.r.t conv2d?

One thing you could have a look into is per-sample gradients via hooks, because you’ll need to define a formula which takes grad_output and multiplies it with a manual expression such that it defines the new gradient. It won’t be as simple as element-wise multiplication as you also have a batch dimension too which autograd explicitly sums over when defining the gradient.

An example of this being explained in far better detail can be found here in which backprop is explained well. It’ll give you a clear example of how you can change a gradient then define the gradients of any upstream layers.