Why by adding second loss and applying the back.ward(), generated gradient is as same as before?

ptrblck · August 31, 2020, 1:46am

I don’t think that histc is differentiable without an approximation, so you could follow this topic for potential workarounds.