Suppose I have two intermediate nodes o_1 = f(a, ...)
and o_2 = g(a, ...)
. Both o_1
and o_2
contribute to the final loss. I want to get the gradient flowing to a
from o_1
only, and not from o_2
. PyTorch:
Suppose I have two intermediate nodes o_1 = f(a, ...)
and o_2 = g(a, ...)
. Both o_1
and o_2
contribute to the final loss. I want to get the gradient flowing to a
from o_1
only, and not from o_2
. This is the part of the gradient of the loss that flows to a
through the node o_1
.
To reiterate what I want for more clarity:
When computing gradients, a single node A can have multiple paths to the final gradient target (the loss). The upstream gradients reach A and are accumulated by summation in A to form its final gradient value. I want to hook into this accumulation process and store each upstream gradient that comes. I.e., I want to store the upstream gradient from the different edges flowing from A separately, instead of having the summed.
The only way I can think of is to detach the gradient of o_2
, but I actually need to get the gradients of a lot of these edges in the computational graph, and detaching o_2
will make the downstream edges incorrect.
The only way I can think of is to detach the gradient of o_2
, but I actually need to get the gradients of a lot of these edges in the computational graph, and detaching o_2
will make the downstream edges incorrect.