Could someone please elaborate on this sentence in description of nn.BCELoss()?

aynmeme · September 9, 2020, 11:12am

So as part of the description of nn.BCELoss() on https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html it is written :

“For one, if either yn=0y_n = 0yn=0 or (1−yn)=0(1 - y_n) = 0(1−yn)=0 , then we would be multipying 0 with infinity. Secondly, if we have an infinite loss value, then we would also have an infinite term in our gradient, since lim⁡x→0ddxlog⁡(x)=∞\lim_{x\to 0} \frac{d}{dx} \log (x) = \inftylimx→0dxdlog(x)=∞ . This would make BCELoss’s backward method nonlinear with respect to xnx_nxn , and using it for things like linear regression would not be straight-forward.”

my question is on the last sentence; how does a loss term and its gradient going to infinity translate into to a nonlinear backward method and its incompatibility with linear regression?

KFrank · September 10, 2020, 1:36am

Hi Meme!

I’ve noticed that sentence too. I think it’s pretty much nonsense, where
either the author was just sloppy or didn’t understand what the words
he was using meant.

The best I can come up with is that he wanted to say something like:

“When BCELoss becomes inf, its backward() method – and the
whole backpropagation – will become polluted with infs and nans,
causing training to fail. We therefore clamp BCELoss’s internal log()
function at -100 to save you from such an ignoble fate.”

Best.

K. Frank

aynmeme · September 10, 2020, 8:29am

Hi K. Frank,

Yes exactly. your translation of the author’s words do make all the sense and it’s also what i would have expected to be written there. Those words about nonlinearity of backprop just made me feel there’s whole other dimension of backward loss propagation that i have no clue about. Relieved to know I’m not the only one.

Cheers
M.