Operations on gradient before summing in autograd

Hello Everyone,

From the documentation of pytorch, I come to understand that torch.autograd.grad(Y, X, G) computes the following: sum of partial derivative of ‘Y’ wrt ‘X’ multiplied by G.

I would like to do the following instead:
sum of (partial derivative of ‘Y’ wrt ‘X’)^2 multiplied by G.

Is this possible to do in pytorch? Do we have to override the torch.autograd.grad? Or is there another way?
A crude way may be to do torch.autograd.grad(Y[i,j], X, grad_outputs=torch.ones_like(Y[i,j]) ), then square each entry of the this result and multiply with G. But that is too much computationally expensive.


Actually the sum is not there, it’s a matrix product: G * J where J is the Jacobian of the function that given X produced Y.
Unfortunately, this is a limitation of automatic differentiation itself, not pytorch. So you won’t be able to change the jacobian before doing the multiplication with G.

Ok. In that case, can we then get the Jacobian J by using G = I, the Identity matrix for 2D output Y?

I’m afraid it’s not possible. This notation with matrix multiply and 2D Jacobian works if you consider 1D inputs and 1D outputs.
AD only performs vector jacobian product. You need special function if you want to be able to do this.

If you are using the latest version of pytorch, you can find such function here.

As a workaround I get the Jacobian for each index of Y. This is computationally intensive. But does the work for now.

def squared_grad(self, Y, X, G):
        result = torch.zeros_like(X)
        g = torch.zeros_like(G)
        g_np = g.data.cpu().detach().numpy()
        gout = torch.ones(1).cuda()[0]
        for index, e in np.ndenumerate(g_np):
            result += torch.autograd.grad(Y[index], X, grad_outputs=gout, retain_graph=True)[0]**2*G[index]
        return result

Any comments?