Detach a variable from a graph

bananacode · November 12, 2018, 3:08pm

Given a vector

x = [x_1, ..., x_n]

I compute a function f, such that

f(x) = [g(x_1,x_2), ..., g(x_(n-1), x_n)].

I apply a loss L on f(x) that is computed elementwise. This results in a loss vector L(f(x)) which I then sum up in order to call backward on it.
Now to my question. Let’s take the first element of f(x) which is g(x_1,x_2). When I compute the gradient of this part of the vector wrt to x, the function g(x_1,x_2) will affect both the variable x_1 and x_2.
However, I’d like that for the part of the graph that constructs g(x_1,x_2), the variable x_2 is regarded as detached. In other words, d/dx_2 g(x_1, x_2) should be 0. The same goes for the next element of f(x) (for g(x_2, x_3), its gradient wrt to x_3 should be regarded as 0). Currently the f(x) is constructed in one go through matrix operations. I realize I can iteratively construct f(x) and selectively detach the required variable beforehand, however that requires a lot of restructuring of my code base which I would like to avoid. Is there a way to selectively detach variables from a dynamic graph such that no gradient is computed wrt to it?

P.S I realize that in my current example, x_n would never get updated. Let’s disregard this.

Vincent_Zhang · November 15, 2018, 7:59am

did you try vectorization before send them to function g

stacked_x = torch.stack(x, dim=0)
shift_x = torch.stack((x[1:n-1,:], torch.zeros_like(x[0]), dim=0)
detached_x = shift_x.detach()
g(stacked_x, detached_x)  # vectorization of function g is needed.

bananacode · November 22, 2018, 1:48pm

This is how I plan on implementing it. I still need to rewrite the code for it unfortunately.