I’m trying to get the gradient of input but without calculating the gradient of model parameters.
More specifically, there is an input A and this goes into a model M and then output P comes out.
With this output P and the input A, I will make another input B (for example, elementwise multiplication bla bla) and this goes into the model M and then I get final output Q.
Now, the loss function is gonna be formulated with this output Q and given label.
To make things clear, I will do
input A -> model M -> output P (process #1)
C(a result of doing something with an output P and input A) -> model M -> output Q (process #2)
and then I will do
loss = criterion(output Q, label)
but the point is
- the gradient of loss wrt the model M’s parameter in process #2
should not be saved in model.something.weight.grad and model.something.bias.grad
- but the gradient of loss wrt C in process #2 should be calculated and saved in C.grad.
Backpropagation flows through C and goes to process #1.
Here in process #1, on the other hand,
- the gradient of loss wrt the model M’s parameter should be calculated and saved.
As far as I know, .detach() makes all the superior parameters(parameters that are closer to the input) not calculate their gradient.
Is there any way to do this?