How to attach a cost to a graph

I have a process that looks something like this:

module_one = #.... some NN
module_two = #.... some NN

X2 = module_one(X1)
Y = module_two(X2)

I want to manually get the cost for X2 as resulting from the backwards pass of model one and pass it as the cost to module_two

So something like

X2.required_grad = True
Y = module_two(X2)
cost = X2.grad.cpu().detach()

module_one.attach_cost(cost)
cost.bakcwards()
module_one_optim.step()

I am unsure how to accomplish the attach_cost step.

I know that for this simple example I could just combine the forward pass for both modules into one… in practice, due to the structure of my code this is rather hard, so I’d prefer to just manually attach the cost and do the backprop based on that.

Is this possible?

HI,

Could you define what you mean by “cost” here? Are they gradients?

If you want to backprop each module independently, you can do:

X2_out = module_one(X1)

X2_in = X2_out.detach().requires_grad_()
Y = module_two(X2_in)
loss = crit(Y, target)


grad_X2 = autograd.grad(loss, X2_in) # backprop module 2
grad_X1 = autograd.grad(X2_out, X1, grad_X2) # backprop module 1

1 Like

Yes, I meant gradients: cost = X2.grad.cpu().detach()

I hope my understanding here is not fundamentally broken and those are the same thing … ?
Is there a difference between taking the the .grad from the inputs of the first module (if I set those to require gradient) and doing it via autograd.grad?

But anyway, I’ll try doing the backprop on the first module via autograd.grad and passing the .grad from the inputs of the second module (after backprop has been done on that the usual way).

Is there a difference between taking the the .grad from the inputs of the first module (if I set those to require gradient) and doing it via autograd.grad ?

The value will be the same but it is trickier to do with .backward() because you need to make sure to reset the value properly and saving in .grad a Tensor that requires gradient is not recommended.

Alright, well I guess both of those are analogous for my purposes.
But… are the original inputs required to do autograd.grad ? Is it not possible to just call is in a similar way to backward where you just pass the loss and not the inputs that lead to it ?

The whole point of autograd.grad is that you specify what you want gradient for. If you want all, use .backward(). Otherwise how do you know which tensors the gradients returned by autograd.grad correspond to?