Gradients in loss function and efficient calculation

Looks like this has also been discussed here Computing batch Jacobian efficiently - #4 by albanD, not sure if there are any updates.