Hi,
non-linear cases
The gradient is always linear
This won’t work if u/v/f have paramters though because the flipping of the gradient for the second loss only happens before f
. So parameters in these functions would just see the sum of the two losses.