Calculating gradient of input except the gradient of model parameter

Hello,
I’m trying to get the gradient of input but without calculating the gradient of model parameters.
More specifically, there is an input A and this goes into a model M and then output P comes out.
With this output P and the input A, I will make another input B (for example, elementwise multiplication bla bla) and this goes into the model M and then I get final output Q.
Now, the loss function is gonna be formulated with this output Q and given label.

To make things clear, I will do
input A -> model M -> output P (process #1)
C(a result of doing something with an output P and input A) -> model M -> output Q (process #2)
and then I will do

loss = criterion(output Q, label)
loss.backward()

but the point is

  1. the gradient of loss wrt the model M’s parameter in process #2
    should not be saved in model.something.weight.grad and model.something.bias.grad
  2. but the gradient of loss wrt C in process #2 should be calculated and saved in C.grad.

Backpropagation flows through C and goes to process #1.
Here in process #1, on the other hand,

  1. the gradient of loss wrt the model M’s parameter should be calculated and saved.

As far as I know, .detach() makes all the superior parameters(parameters that are closer to the input) not calculate their gradient.
Is there any way to do this?

Hi,

The problem is that M is the same during process 1 and 2. You won’t be able to get 2 different behaviors here.

If would suggest doing two steps (add the corresponding optimizer zero grad and step of course):

# Process 1 as usual:
P = M(A)
C = some_func(P)

# Now break the graph and get a new C
C_2 = C.detach().requires_grad_()

# Process 2
Q = M(C_2)
loss = criterion(Q, label)

# Compute grads for C_2 (but don't change .grads in M)
C_2_grad = autograd.grad(loss, C_2)[0]

# Not do a regular backward for process 1 to populate the .grad in M
C.backward(C_2_grad)

Thanks a lot. It works!!