This is a problem that I frequently encountered.
Let say I want to find a network G (with output of size k and m parameters) which minimize Loss = F(G(z)), where F(x) is a function. I know that the gradient of the Loss with respect to the parameters of G is equal to (Grad_x F) (Jacobian G) where Grad_x F is a vector of size k and Jacobian G is size k*m. Now if I know F and G, I can just maximize the loss and apply backward().
But, what if I already have Grad_x F, but I don’t have access to F directly. Is there a easy way to maximize the loss? Basically, I want to know how to do the vector-jacobian product (vjp) directly and have it added to the .grad, so that I can then use any optimizer to minimize F(G(z)).