How to avoid recalculating a function when we need to backpropagate through it twice?

albanD · August 24, 2020, 4:26pm

Hi,

non-linear cases

The gradient is always linear
This won’t work if u/v/f have paramters though because the flipping of the gradient for the second loss only happens before f. So parameters in these functions would just see the sum of the two losses.