Given a vector
x = [x_1, ..., x_n]
I compute a function f, such that
f(x) = [g(x_1,x_2), ..., g(x_(n-1), x_n)]
.
I apply a loss L on f(x) that is computed elementwise. This results in a loss vector L(f(x)) which I then sum up in order to call backward on it.
Now to my question. Let’s take the first element of f(x) which is g(x_1,x_2). When I compute the gradient of this part of the vector wrt to x, the function g(x_1,x_2) will affect both the variable x_1 and x_2.
However, I’d like that for the part of the graph that constructs g(x_1,x_2), the variable x_2 is regarded as detached. In other words, d/dx_2 g(x_1, x_2) should be 0. The same goes for the next element of f(x) (for g(x_2, x_3), its gradient wrt to x_3 should be regarded as 0). Currently the f(x) is constructed in one go through matrix operations. I realize I can iteratively construct f(x) and selectively detach the required variable beforehand, however that requires a lot of restructuring of my code base which I would like to avoid. Is there a way to selectively detach variables from a dynamic graph such that no gradient is computed wrt to it?
P.S I realize that in my current example, x_n would never get updated. Let’s disregard this.