# How to attach a cost to a graph

I have a process that looks something like this:

``````module_one = #.... some NN
module_two = #.... some NN

X2 = module_one(X1)
Y = module_two(X2)
``````

I want to manually get the cost for X2 as resulting from the backwards pass of model one and pass it as the cost to module_two

So something like

``````X2.required_grad = True
Y = module_two(X2)

module_one.attach_cost(cost)
cost.bakcwards()
module_one_optim.step()
``````

I am unsure how to accomplish the `attach_cost` step.

I know that for this simple example I could just combine the forward pass for both modules into oneâ€¦ in practice, due to the structure of my code this is rather hard, so Iâ€™d prefer to just manually attach the cost and do the backprop based on that.

Is this possible?

HI,

Could you define what you mean by â€ścostâ€ť here? Are they gradients?

If you want to backprop each module independently, you can do:

``````X2_out = module_one(X1)

Y = module_two(X2_in)
loss = crit(Y, target)

``````
1 Like

Yes, I meant gradients: `cost = X2.grad.cpu().detach()`

I hope my understanding here is not fundamentally broken and those are the same thing â€¦ ?
Is there a difference between taking the the `.grad` from the inputs of the first module (if I set those to require gradient) and doing it via `autograd.grad`?

But anyway, Iâ€™ll try doing the backprop on the first module via `autograd.grad` and passing the `.grad` from the inputs of the second module (after backprop has been done on that the usual way).

Is there a difference between taking the the `.grad` from the inputs of the first module (if I set those to require gradient) and doing it via `autograd.grad` ?

The value will be the same but it is trickier to do with .backward() because you need to make sure to reset the value properly and saving in .grad a Tensor that requires gradient is not recommended.

Alright, well I guess both of those are analogous for my purposes.
Butâ€¦ are the original inputs required to do `autograd.grad` ? Is it not possible to just call is in a similar way to `backward` where you just pass the loss and not the inputs that lead to it ?

The whole point of autograd.grad is that you specify what you want gradient for. If you want all, use .backward(). Otherwise how do you know which tensors the gradients returned by autograd.grad correspond to?