Hello,

I am trying to figure out where to start with implementing this and was hoping for some pointers. Sorry if I’m missing something obvious.

Instead of computing a “Standard” L2 cost function of the form

|| N(s) - d ||_2^2

Where N is the network, s is the network input and d is the expected output, I want to apply a chain rule cost function,i.e

|| F(N(s)) - d ||_2^2

Where F is a function that cannot be directly defined, but I can compute the gradient for it using other means, i.e there is an (unfortunately expensive) way to compute F*(F(s) - d), so that my training gradient should be F*(F(N(s)) - d) * grad(N(s))

Thanks