Mixing pytorch autograd with custom vjp for optimization

Amit_Gupta · April 23, 2024, 4:45pm

Hi, I want to train a my NN on first derivative with respect to the input.
I am computing an intermediate representation using a C++ binding, which is then fed to NN.
ie.

E = NN(θ,ζ(r))

Here θ is the parameters of NN, and r is the input, and ζ is the transform function (descriptor).
Loss function uses derivatives against r,

F = - ∂E/∂ζ(r) * ∂ζ/∂r

Here the function ζ is implemented in C++ for performance and several other reasons, and ∂ζ/∂r is computed using Enzyme AD C++ vjp bindings.

Now if I am correct, the optimization would need the derivatives,

∂^2E/∂ζ∂θ * ∂ζ/∂r

Currently I am computing it as

zeta = cpp_zeta_function(r) # non autograd tracked function
E = model(zeta)
dE_dzeta = torch.autograd.grad(..., create_graph=True, retain_graph=True)[0]
F = cpp_zeta_function.gradient(r, dE_dzeta) # non autograd tracked function
forces = torch.from_numpy(F)
force_summed = scatter_add(forces, image, dim=0)
losses = loss(forces_summed, forces_true)

I do not get any leaf tensor warning so it seems to be working. Is this approach correct? Can Autograd figure this activity flow correctly?

If not is there a way for tell the optimizer to actually use ∂^2E/∂ζ∂θ * ∂ζ/∂r gradients ( compute cpp_zeta_function.gradient(r, ∂^2E/∂ζ∂θ) manually)?

JuanFMontesinos · April 23, 2024, 5:11pm

You can implement your own backprop and bind your functions there.
https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html

Amit_Gupta · April 23, 2024, 6:00pm

But will it not run in same issues? I have only implemented the first derivative, and custom autograd function will also fail if autograd could not figure out that it does not need this derivative for second derivatives.