How to avoid recalculating a function when we need to backpropagate through it twice?

Hi,

non-linear cases

The gradient is always linear :slight_smile:
This won’t work if u/v/f have paramters though because the flipping of the gradient for the second loss only happens before f. So parameters in these functions would just see the sum of the two losses.