Could someone please elaborate on this sentence in description of nn.BCELoss()?

So as part of the description of nn.BCELoss() on it is written :

“For one, if either yn=0y_n = 0yn​=0 or (1−yn)=0(1 - y_n) = 0(1−yn​)=0 , then we would be multipying 0 with infinity. Secondly, if we have an infinite loss value, then we would also have an infinite term in our gradient, since lim⁡x→0ddxlog⁡(x)=∞\lim_{x\to 0} \frac{d}{dx} \log (x) = \inftylimx→0​dxd​log(x)=∞ . This would make BCELoss’s backward method nonlinear with respect to xnx_nxn​ , and using it for things like linear regression would not be straight-forward.”

my question is on the last sentence; how does a loss term and its gradient going to infinity translate into to a nonlinear backward method and its incompatibility with linear regression?

Hi Meme!

I’ve noticed that sentence too. I think it’s pretty much nonsense, where
either the author was just sloppy or didn’t understand what the words
he was using meant.

The best I can come up with is that he wanted to say something like:

“When BCELoss becomes inf, its backward() method – and the
whole backpropagation – will become polluted with infs and nans,
causing training to fail. We therefore clamp BCELoss's internal log()
function at -100 to save you from such an ignoble fate.”


K. Frank

Hi K. Frank,

Yes exactly. your translation of the author’s words do make all the sense and it’s also what i would have expected to be written there. Those words about nonlinearity of backprop just made me feel there’s whole other dimension of backward loss propagation that i have no clue about. Relieved to know I’m not the only one.