Hi, I want to train a my NN on first derivative with respect to the input.
I am computing an intermediate representation using a C++ binding, which is then fed to NN.
ie.
E = NN(θ,ζ(r))
Here θ is the parameters of NN, and r is the input, and ζ is the transform function (descriptor).
Loss function uses derivatives against r,
F = - ∂E/∂ζ(r) * ∂ζ/∂r
Here the function ζ is implemented in C++ for performance and several other reasons, and ∂ζ/∂r is computed using Enzyme AD C++ vjp bindings.
Now if I am correct, the optimization would need the derivatives,
∂^2E/∂ζ∂θ * ∂ζ/∂r
Currently I am computing it as
zeta = cpp_zeta_function(r) # non autograd tracked function
E = model(zeta)
dE_dzeta = torch.autograd.grad(..., create_graph=True, retain_graph=True)[0]
F = cpp_zeta_function.gradient(r, dE_dzeta) # non autograd tracked function
forces = torch.from_numpy(F)
force_summed = scatter_add(forces, image, dim=0)
losses = loss(forces_summed, forces_true)
I do not get any leaf tensor warning so it seems to be working. Is this approach correct? Can Autograd figure this activity flow correctly?
If not is there a way for tell the optimizer to actually use ∂^2E/∂ζ∂θ * ∂ζ/∂r
gradients ( compute cpp_zeta_function.gradient(r, ∂^2E/∂ζ∂θ)
manually)?